Using machine learning
The Document Processing configuration contains an option to enable machine learning. When this option is ticked, ETA uses manually marked up text references and connections to create a machine learning dictionary and machine learning entity extraction script, then uses them in document processing. This enables you to correct entity extraction errors and add new classes of extracted text references and connections.
The dictionary and entity extraction script can be copied into a custom configuration and used as a base for custom rules.
To use machine learning:
- Open the Document Processing configuration you want to use for your . Tick ‘Enable Machine Learning’ then click Save.
- Open a document with markup mistakes or omissions then click Mark Up.
A message at the top of the Document pane confirms that Machine Learning is active.
- Add text references and connections as required.
ETA does the following:
- When you create a new text reference, the exact text will be added to the ‘learned’ namespace in the Machine Learning dictionary (unless an identical entry already exists).
- When you delete a text reference:
- if it was in the Machine Learning dictionary in the ‘Learned’ namespace it will be deleted
- if it was not in the ‘Learned’ namespace it is added to the ‘removed’ namespace
- When ETA applies the dictionary during document processing, any matching ‘removed’ namespace text references are removed first then any matching ‘learned’ text references are added (if they don't overlap existing markup). This removes any bad text references before new ones are added and classes are changed.
- The dictionary will be applied immediately before Late Stage entity extraction scripts, so a custom script can be used to interact with and refine the result.
- When you create a new connection between text references, Machine Learning will update the Machine Learning script (which is an entity extraction script). For each connection that is created this way, a very simple rule is added that matches the exact text of the connection.
- If you remove a connection, Machine Learning will remove the matching rule from the script (if it was created by Machine Learning).
- When ETA applies the script during document processing, the rules will add the Machine Learning connections but not remove any that you have deleted.
Note: Script rules created by Machine Learning are very basic. They simply consist of existing text references and the tokens between them. They will work as they are but it is recommended that you use them as a template for developing custom entity extraction scripts.