The Text Graph

The Text Graph is a convenient model for extracting information from continuous text. The graph for basic plain text is a one-dimensional graph, comprising a single chain of alternating nodes and links. The links represent tokens and the nodes represent the transition between tokens.

Example:

Consider the text: hello, world!

During Named Entity Extraction this text is split into four tokens:

"hello", comma, "world", exclamation mark

This creates a graph of five nodes and four links (each named Token) formed from these tokens:

The process of information extraction on plain text can be accommodated within a process of adding, deleting and modifying links on the graph, as below for example:

 

fontfontfont