Getting started

The key components of ETA are:

Figure 1: Key components of ETA

The first step to extract information from unstructured data is to create a project. You can do this from the ETA Home screen, which is displayed after you log in.

You then need to add a collection to the project and ingest or upload files to it. ETA can extract text from more than 1,400 file types. As files are ingested into ETA the text is extracted from them and entities (such as people, organisations, locations and email addresses) and non-entities (such as dates and times, addresses and money) are identified and marked with coloured labels. Each unbroken annotation of marked up text is referred to as a text referenceText that has been marked up (that is, highlighted with a coloured label) in a document.. When the files have been ingested and marked up you can explore and analyse the extracted information using networks and a variety of tools.

ETA contains a full set of default configurations to enable you to extract information from collections. You can modify them to suit your requirements and you can create your own. The Ingestion configuration, for example, contains settings for document pre-processing, deduplicationThe removal of duplicate documents from the ingestion queue, so identical documents are only processed once., tagging and language detection, and enables you to specify whether or not you want to store source documents in ETA. The Document Processing configuration contains settings for processing documents and enables you to specify any dictionaries you want to use for identifying specific words and phrases, and entity extraction scripts you want to run. For more about configurations see the ETA Configurations Guide.

The table below shows the tasks in a basic ETA workflow (to see a flowchart click here).


For more information see ...

1. Log in to ETA

Logging in to ETA

2. Create a project

Managing projects

3. Create a collection

Managing collections

4. Add documents to the collection

Adding documents to a collection

5. Explore and analyse the information extracted from the documents

Exploring and analysing information

Getting the most from ETA

If you want to practise using ETA with training data you can download a range of projects from ETA Center (the ETA customer portal) then import them into ETA. Each project demonstrates one or more specific aspects of ETA and can help you improve your skills before you begin using live data.

For tips on getting the most from ETA see Planning your information extraction project and Improving your results.