The Watchlist Wizard enables you to import a single watchlist from a CSV fileA Comma Separated Variable file, where variables in the file are separated by a comma, a tab, a semicolon or another character. and create a dictionary from the entities in it. It then prompts you to:
As you process documents in the collection, watchlist entities in the dictionary will be identified.
If a watchlist CSV file is updated and you want to update the corresponding dictionary, you can do so easily without needing to use the wizard again.
To import a watchlist using the Watchlist wizard:
The wizard is displayed.
The first few rows of the file are displayed.
Note: It is best to select a column with unique information (for example, names of people or organisations) so that the same text is not matched by multiple entities.
Dictionary features are additional details about words and phrases that you can view with marked up text, such as a person’s date of birth and phone number or an organisation’s address. You can view these details in several places in ETA, for example, when you select a text reference in a document or when you view the details of a node in a network.
Note: You will be able to see an example of the feature titles on the next screen in the wizard.
The Document pane shows an example of the way a document would look after your watchlist dictionary has been used to mark up text.
If you are not satisfied with the results, click the back arrow at the bottom of the screen and change your settings. For example, if you entered the name of a watchlist entity but it was not marked up, return to step 3 in the wizard and check that you have selected the correct column as the ‘dictionary words’.
If you are not satisfied with the features and/or feature titles that are shown, return to the previous screen in the wizard then select different columns and/or edit the feature titles.
You now need to attach the dictionary to a document processing and ingestion configuration and choose the document processing stage in which the dictionary will be applied.
Note: ‘Early Stage’ and ‘Late Stage’ refer to the points in document processing at which ETA searches for text that matches words and phrases in your dictionary. If you choose early stage, text that matches will be marked up only once. If you choose late stage, the same text may be marked up a second time (or more) as a result of other dictionaries or entity extractions scripts being deployed later in document processing. For example, the text ‘John Davis’ is in your watchlist dictionary and you choose ‘Early Stage’. During early stage processing, ‘John Davis’ is marked up as a person. In late stage processing, when another dictionary is applied to the text, ‘John Davis’ is also identified as being an organisation. However, because you chose to apply your watchlist dictionary in early stage processing, the text is not marked up a second time. If you had chosen late stage, ‘John Davis’ would have been marked up as a person and an organisation.
In this step you enter details about the network that will be generated from your watchlist.
Note: In most cases you would select the name field as this will enable you to quickly identify nodes in Network Graph View.
The wizard checks the column for data that matches the field type you selected. If none of the data matches, the column is coloured red.
You can now use the dictionary to identify watchlist entities in document collections.