Document Processing configuration

Document processing is an integral part of the Ingestion configuration. It is used to define what information is extracted from your documents and how it is marked up in the document output. It is associated with loading and normalizing documents, information extraction and storage of document information.

In its default state, document processing has a built in Learned Entity Extractor that will extract common entities such as people, organisations and locations from your documents.

What is ‘early stage’ and ‘late stage’?

When no dictionaries or entity extraction scripts have been applied, ETA’s Learned Entity Extractor will extract common entities such as people, organisations and locations by default.

The ‘early stage’ runs added dictionaries and entity extraction scripts before ETA’s Learned Entity Extractor, and anything added in the ‘late stage’ is run after.

Document Processing workflow

The overall document processing workflow has its own configuration.

Most modules in the workflow are configurable. They include:

Figure 1: ETA document processing workflow

To configure document processing:

  1. On the Main Navigation Bar click Configurations.
  2. Click Document Processing.
  3. Do one of the following:
  4. Complete or modify the fields on the screen. Each section is described below.
  5. Click Save.

 

fontfontfont