The Text Graph is a convenient model for extracting information from continuous text. The graph for basic plain text is a one-dimensional graph, comprising a single chain of alternating nodes and links. The links represent tokens and the nodes represent the transition between tokens.
Consider the text: hello, world!
During Named Entity Extraction this text is split into four tokens:
"hello", comma, "world", exclamation mark
This creates a graph of five nodes and four links (each named Token) formed from these tokens:
The process of information extraction on plain text can be accommodated within a process of adding, deleting and modifying links on the graph, as below for example:
Nodes and Tokens are the backbone of the graph. Each node is linked to its adjacent nodes by Tokens. Tokens are the most basic links in the graph. Nodes and Tokens are created automatically during document processing and cannot be added to or deleted subsequently.
Nodes contain any white space characters (spaces, carriage returns, etc.) between the Tokens. Each conventional word becomes a token. Alphanumeric sequences are divided into tokens where letter sequences join number sequences.
Like other graph elements, Nodes and Tokens also have features (see Features). These are key-value pairs that contain more information about the element's state and position.
A good way to find out the kind and subkind descriptors available is to put some text relevant to your project into the graph analyzer UI, and see what is listed. In the example below, the cursor is hovering over "Term.symbolic" and the instances are shown highlighted on the text graph above.
Links can be made between any pair of nodes in the graph.
Each link contains:
Links represent spans of text.
In EES, rules that affect links make changes to the graph immediately after the rule has fired.
Each link name is based on a hierarchical structure that includes an optional namespace.
For example; a processing module might create a link called
That link would have the namespace "transportation" and be of the type "vehicle.car.small".
The link can then be referred to by the following names:
transportation:* // this will match any link using the "transportation" namespace
transportation:vehicle // any link that matches both the "transportation" namespace and falls within the "vehicle" hierarchy
transportation:vehicle.car.small // the exact name of the link
ETA uses several built-in modules to create or use links under special names for specific purposes:
Entity extraction scripts are a scripting language based on an extended form of Context Sensitive Grammar (CSG), the third level within the Chomsky hierarchy, pictured below: