Text reference evaluation

ETA provides a mechanism to create an evaluation report by comparing text references. This is called Text Reference Evaluation.

About Text Reference Evaluation

Only document collectionsA collection is a container for storing and organising ingested files and documents. Only the textual content is stored in collections, not the original files and documents. containing the same documents (that is, identical source_hash) are eligible for evaluation.

One document collection is elected as the Gold Standard, while another is used as the Test collection.

The Gold Standard collection acts as the base for comparison (which determines the terminology used in the evaluation report)

For example, if a text reference is found in the Gold document but not in the Test document; evaluation will report it as a MISS.

The evaluation result is summarized by a F1 score. If the Gold and Test documents are identical, the maximum F1 of 100% is achieved.

F1 score degrades when dissimilarity between the Gold and Test documents increases.

Note: You can compare a collection with itself to extract the Tag Counts.


Navigating to Text Reference Evaluation

1. Click Documents on the ETA Menu Bar.

2. Click the Text Reference Evaluation button below the collection list.

This opens the Evaluation Page as shown in the next section.

The Evaluation Page

On the left of the Evaluation Page is the Existing Evaluations List Pane at the top, and the Create Evaluation Pane at the bottom.

Existing Evaluations

You can click on any existing Evaluation in the list to see the Summary Report associated with that evaluation.

Note: Good Housekeeping recommends deleting evaluations as soon as you are finished with them.

Warning: Renaming either of the Collections involved in the Evaluation will orphan the evaluation, making it unusable. Undoing the rename will restore the Evaluation.

Create New Evaluation

In the Evaluation main page, enter a Name for the evaluation report to be created and click Create button.

4. Choose the Gold Standard and Test document collections. Text Reference Category will be automatically set to the collection's default, you may specify your own if required

5. Review your settings and click the Save & Evaluate button to begin the evaluation process


6. The evaluation process may take a while, a spinner will be displayed during the process. You may continue using ETA while you wait, evaluation will continue in background

Evaluation Report

1. After the evaluation process is done, you will find the name of your evaluation on the listing, click on it to see the evaluation report.

1b. or, as a shortcut, you'll find a document collection starting with [Evaluation] on the collection list. Click on the Report icon to show it as an evaluation report.

2. A detail summary table will show the overall F1 score and other components. You may click on those Blue Numbers to drill down to documents.

3. To delete the Evaluation Report when you finish, click the Trash button. The document collection created for evaluation will be removed as well.

At ease, both your original Gold and Test documents are safe and are not modified.

Compare a Collection with Itself

This can be a [relatively] fast way of extracting Tag Counts from a Collection. Start by entering a unique name for the evaluation and click Create.

This opens the Evaluation Settings Pane. For this use case, select the same Collection for the Gold Standard Documents and the Test Documents. Click Save & Evaluate to get the evaluation running.

This leaves you looking at the whirligig going around and around for a while, depending on the size of the Document Collection.

At the end of the process, the Summary Table is presented as shown below.

The two columns of interest in the Summary Table outlined in red above are the list of Tag Categories on the left, and the "Correct" counts on the right.

The Summary Table will cut-and-paste into a spreadsheet neatly.

Clicking on any number in the Correct column will display an Evaluation Tags Pane with details of the tags that were in this Tag Category. In the example below I clicked on the number outlined in red, which represented the count for Category "Ethnicity".