Getting started with dictionaries

Dictionaries provide a simple, highly productive method for creating text references in documents, and more generally creating links on text graphs.

Entity Extraction Scripts (EESs) work much faster when Dictionaries are use to create the initial text graph literals (which avoids using matching pattern elements generic matching pattern elements like Token as in Token<string=XXX>, which are common and so give slow running EES rules).

Dictionaries contain word lists

At its most basic, the Dictionaries comprise Word Lists which are lists of words and phrases that ETA can then recognise in documents. Each word or phrase in a word list is called an entry.

For example:

#wordlist MoneyWords
money
cash
lucre
dosh
readies

White space

All white space is consider equal except for line feeds, which are used to separate command phrases and word list entries. Blank lines can be added for readability without changing the meaning of a Dictionary.

Comments

Use C++ / Java comments style:

// this is a comment until end of line
/* this is a comment 
which spans multiple lines
*/

 

fontfontfont