Finding similar documents

Sintelix can help you find documents which are similar to a specific document of interest. The easiest way to do this is through the Document View of the specific document.

From the Document View, go to Tools and select Find Similar Documents.

Selecting this will automatically take you to the Search Page with the Document Similarity search facet open with your document ID pre-filled.

Your active collectionA collection is a container for storing and organising ingested files and documents. Only the textual content is stored in collections, not the original files and documents. is also selected and your search results can be seen in the Search Results pane to the right of the screen.

The Tolerance Slider is set to 10% by default, which is the recommended maximum tolerance to ensure higher quality matching.

If you want to increase the matching precision, you can lower the tolerance and run the search again.

Using the Tolerance Slider

The Tolerance Slider is used to adjust the match precision between two different documents. It can be adjusted to a value between 0 and 50 percent. A lower tolerance value will require higher similarity between documents for a match to be detected. For example, at 1%, two documents would need to be practically identical in content to be considered a match.

At 10%, the matching tolerance would be relaxed enough that things like emails from different people that contained mostly the same information, would be grouped together. It is recommended that the Tolerance Slider be set no higher than 10% if you want to detect documents that are very similar.

A tolerance above 10% can be used for broader filtering in order to identify documents which may be related in terms of subject matter.

 

fontfontfont