class pclTextBox::Distance

sys::Obj
  pclTextBox::Distance

A collection of algorithms for measuring the distance between two strings.

These distance measures are based on the number of changes needed to make the sequence of characters in one string match that in the other. Different algorithms use different operations.

hamming

static Int hamming(Str word1, Str word2)

Returns a count of the number of different characters in the two words, assuming they are the same size.

An Err is raised if the two words are not of the same size.

jaccardSimilarity

static Float jaccardSimilarity(Str word1, Str word2)

Returns the jaccard similarity measure based on 2-grams of letters in the two words.

jaccardSimilarityN

static Float jaccardSimilarityN(Str word1, Str word2, Int n)

Returns the jaccard similarity measure based on n-grams of letters in the two words.

levenshtein

static Int levenshtein(Str word1, Str word2)

Returns the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.

optimalStringAlignment

static Int optimalStringAlignment(Str word1, Str word2)

Returns the minimum number of single-character edits (insertions, deletions, transpositions or substitutions) required to change one word into the other.

sorensonDiceSimilarity

static Float sorensonDiceSimilarity(Str word1, Str word2)

Returns the sorenson_dice_index measure based on 2-grams of letters in the two words.

sorensonDiceSimilarityN

static Float sorensonDiceSimilarityN(Str word1, Str word2, Int n)

Returns the sorenson_dice_index measure based on n-grams of letters in the two words.