A Simple Algorithm for Fuzzy Text Matching
by lastlegume
Exactly as the title states: a simple, javascript based code segment for matching two strings while allowing for some inaccuracy. Because of the method of matching, this segment is O(N^3), so it should not be used if time is an issue. The code is shown below and an interactive example are below:
Code (Javascript)
function fuzzy(guess, answer) {
// normalizes the format of the strings (optional)
guess = guess.toLowerCase().trim();
answer = answer.toLowerCase().trim();
// checks if the strings are equals beforehand to skip the iteration if unnecessary
if (guess === answer)
return true;
// variable that holds the number of correct characters
let score = 0;
// number of characters back or forwards a substring is allowed to be before being counted as nonexistent.
var leniencyFactor = 0.5;
var leniency = Math.ceil(answer.length ** leniencyFactor);
// factor that controls the amount that can be wrong
var fuzzyFactor = -0.45;
var fuzziness = 1 - answer.length ** fuzzyFactor;
// maximum possible score (subtracts the length of answer because one character substrings are not being checked)
var neededFuzzyAmount = ((answer.length) * (answer.length + 1) * (answer.length + 2)) / 6 - answer.length;
//checks substrings of lengths started at the length of the guess to a length of 2
for (let i = guess.length; i >= 2; i--) {
//moves through the guess string to find all substrings of length i
for (let s = 0; s <= (guess.length - i); s++) {
//checks the substrings of guess with equal length substrings in answer from the same index - leniency to the same index + leniency
for (let a = Math.max(0, s - leniency); a <= Math.min(s + leniency, answer.length - i); a++) {
// if the substrings are equal, adds the number of correct characters (the length of the substring) to the score
if (guess.substring(s, s + i) === answer.substring(a, a + i)) {
score += i;
}
}
// if a high enough score is reached, it skips the rest of the program
if (score > neededFuzzyAmount ** fuzziness)
return true;
}
}
return score > neededFuzzyAmount ** fuzziness;
}
Try It Yourself
Guess: Answer:
Use default exponential models (uncheck if you want to manually set leniency and fuzziness)
Fuzzy Factor:
Factor controlling the amount that can be wrong (almost anything matches when at 0, gets stricter as it decreases)Value:
Leniency Factor:
Factor that affects the leniency (most lenient at 1, least lenient at 0)Value: