Dictionaries are lists of words and phrases of particular interest to you that you want to identify in documents, such as types of weapons, names of illicit drugs, names of specific organisations and war crime indicators. Each word or phrase is referred to as an entry. As ETA processes text it looks for entries and creates a link on the text graph whenever it finds one.
For example, this word list ...
... creates these text references in a document.
Dictionaries provide a simple way of creating links on the text graph, including text references which appear in ETA documents.
Entity Extraction Scripts (EESs) work much faster when Dictionaries are use to create the initial text graph literals (which avoids using matching pattern elements generic matching pattern elements like Token as in Token<string=XXX>, which are common and so give slow running EES rules).
Dictionaries offer the following capabilities:
- multiword phrases
- features in extracted text references
- typographic filtering
- automatic pluralization
- escaping - to allow complex characters within word list items
- work with EESs to make them briefer and faster
- can generate text references directly (without any need for EESs)
Tools for working with dictionaries are:
- Screen editor - with code highlighting
- Import capability
- Text Graph Analyzer for testing and review
To configure Dictionaries:
- On the Main Navigation Bar click Configurations.
- Click Dictionaries.
- Do one of the following:
Create or modify the word list.
- Open a configuration by clicking on its name.
- Copy of a configuration in the list by clicking the ‘Create a copy’ icon beside it, entering a name for the new configuration then clicking Create & Open.
- Create a new configuration by entering the name on the Create tab at the bottom of the pane then clicking Create.
- Copy a configuration from another project by clicking the Copy From tab at the bottom of the pane, selecting the project the configuration is in, selecting the configuration you want to copy, then clicking Copy.
- Import a configuration by clicking the Import tab at the bottom of the pane, clicking Chose file, navigating to the file then clicking Open. Rename the file if necessary then click Import.
Click Save or Save & Test.