Edit document view

Overview

It is possible to manually mark up a document, or manually correct mistakes made my automated named entity extraction.

To open document editor, press "Edit" in document view.

Document Model

Sintelix document model has two layers of data

Text references are marked-up spans of text, easily visible on document view. They're not copied to the Network by default.

Entities are groups of text references within a document. They correspond to all the mentions of the same object (distinct Person, Organisation, etc) and group all the information about that object.

Entities and their features are copied to the network by default.

Document editing is performed mostly by adding, removing and changing type of text references.

However, because it is entity, and not text references, that form a network, the end goal of document editing is to produce correct entity information.

Not all text references need to be grouped into an entity. The Document Processing Configuration output table contains a column "Document Entity Resolution" which decides, for each type, whether that type forms and entity.

Document editing enforces that table by ensuring that all text references that need to form an entity do so.

Using the Document Editor

It is recommend to use the editor using mouse and keyboard. The interface is not optimised for touch screens, touch pads, or trackballs.

Creating a new text reference

To add a new text reference to a document, follow those steps:

Text reference will be created. If the current Document Processing Configuration declares the selected type as "Document Entity Resolution", a new entity for that text reference will be created as well.

Deleting an existing text reference

To delete a text reference, right-click on it. From the menu, select "Delete".

Changing the type of a text reference

To change the type of a text reference, follow those steps:

Similarly to deletion, if the text reference belongs to an entity that has other text references, a popup appears. The popup gives an option to change the type of all the text references of the entity.

Merging entities

All text references that mention the same object should be grouped into one entity. If they're not, follow those steps:

Tip: make sure the features and label of the merged entity are correct (see "Editing features" below)

Splitting entities

To split an entity into two entities, follow those steps:

Tip: make sure the features and label of both entities are correct (see "Editing features" below)

Editing features

To edit features, click on a text reference. Features and entity label are available on the left hand side.

Press the Save button to save any changes to features or label.

Creating a text reference and immediately adding it to an existing entity

It is possible to create a text reference and immediately add it to an existing entity:

Creating multiple text references with the same text

It is possible to quickly mark up all instances of identical text:

Deleting connections

Connections are a layer of information that links entities together, and eventually produces network links.

When a text reference is involved in some connection, a number appears showing how many connections use this text reference:

To delete a connection, click on any text reference involved in the connection and press delete button in the "Connections" panel.

Creating new connections

To create a new connection, follow those steps:

Note: the Ontology link schema has a list of valid link types. If a link you'd like to create doesn't exist on the Ontology, modify it.

Adding Document Tags

To add a new a tag category click on the + button next to Document Tags.

Enter the name of the new category in the Tag Category field and then click the Add button:

To add a new tag from an existing category enter the tag name in the Add Tag... field and then click the Add button:

Deleting Document Tags

To delete an existing tag, click the Delete button next to the tag you want to delete:

Note: If a Document Tag category has no tags, the category is automatically removed by the system.

 

fontfontfont