Troubleshooting rule sets



The F1 score is lower than you expect

When an F1 score is calculated, each element in the gold standard document is counted as one element. The size and word count of the elements are immaterial. While it may seem that single-word elements that are spurious or missing have a more negative effect on the F1 score than multi-word elements, the effect on the score is the same, regardless of the word count.

Resolving spurious or missing elements will result in a higher F1 score.

An element is shown as spurious for no obvious reason

The parent element may have been added to some of the gold standardA set of model data that you can learn from and test on. For example, in ETA, this would be a collection of documents that have been created with specific, preferred properties such as correct document tags and text references. In ETA Harvester this would be a collection of documents harvested from web pages where only the correct elements have been selected (that is, only the content you want). documents but in one of them, the child element was selected instead.

This can happen when an element takes up exactly the same screen space as its parent.

Decide if you want the rule to be on the parent or the child element. If you want it to be on the parent, go to this element then add it to the gold standard. If you want it to be on the child, delete the rule then create a new one based on the child element.

You have added all the elements you need to the gold standard but the document table still indicates that some elements are spurious or missing

The spurious and/or missing element may not be visible.

You can view a list of these elements on the Errors tab, add spurious elements to the gold standard and remove missed elements.