Scoring

Scoring section of a Discrimination configuration takes values of Data Transformation channels and provides a similarity metric.

Scoring configuration declares a series of scorers. A typical scorer will be bound to values of a channel and will have a weight associated with it.

Final score is the sum of all scorers and if it's greater than zero, two objects are considered equal.

Available scorers

Cosine scorer

Cosine scorer takes two channels and calculates a cosine similarity score between them. The value is between 1.0 (one vector is a multiply of another vector) and 0.0 (vectors don't share any common values).

<scorer class="scorer:Cosine">
<input channel="weightedContext" />
<adjustedWeight>1.3</adjustedWeight>
<rawWeight>0.0</rawWeight>
</scorer>

Configuration is:

Text scorer

Text scorer is designed to provide similarity between two entity texts.

<scorer class="scorer:Text">
<weight>1.0</weight>
<uniquenessScored>true</uniquenessScored>
<minOverlap>1.0</minOverlap>
<simplifiedText>true</simplifiedText>
<input name="text" channel="text" />
<input name="fame" channel="fame" />
</scorer>

Configuration is:

Constant scorer

<scorer class="scorer:Constant">
<weight>-0.5</weight>
</scorer>

Constant scorer provides a constant offset added to the final score. Since other scorers will provide a positive score, and values greater than 0 imply two entities are equal, it typically encodes a similarity threshold.

Blockade scorer

Blockade scorer prevents a positive similarity score if two values aren't equal. It has no effect if they're equal or missing.

<scorer class="scorer:Blockade">
<input channel="gender" />
</scorer>

Configuration is:

Identifier scorer

Identifier scorer forces the entities being matched to be equal if a specified channel has the same value in both, and prevents them from being equal if their values are different. If either channel value is missing it has no effect.

<scorer class="scorer:Identifier">
<input channel="location-id" />
</scorer>

Configuration is:

 

fontfontfont