gov.nih.nlm.nls.gspell
Class LevenshteinDistance
java.lang.Object
gov.nih.nlm.nls.gspell.LevenshteinDistance
- public final class LevenshteinDistance
- extends java.lang.Object
LevenshteinDistance
This classs holds the methods to compute a modified Levenshtein distance.
|
Method Summary |
int |
LD(char[] s,
char[] t,
char[] lowercaseS,
char[] lowercaseT,
int pThreshold)
Method LD. |
int |
LD(java.lang.String s,
java.lang.String t,
int pThreshold)
Method LD. |
static void |
main(java.lang.String[] argv)
This is a test main, whose purpose is to test the functionality
of each method developed for this class. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LevenshteinDistance
public LevenshteinDistance()
- This is a constructor for LevenshteinDistance.
LD
public int LD(java.lang.String s,
java.lang.String t,
int pThreshold)
- Method LD. This method returns a modified Levenshtein distance for
any given two strings.
It is noted whether there
is any case difference, an extra .5 is added to the distance. This
slighly penalizes for case differences, rather than severely penalizes
when each character case difference were to be counted as a full transformation.
- Parameters:
s - t -
- Returns:
- double
LD
public int LD(char[] s,
char[] t,
char[] lowercaseS,
char[] lowercaseT,
int pThreshold)
- Method LD. This method returns a modified Levenshtein distance for
any given two strings.
It is noted whether there
is any case difference, an extra .5 is added to the distance. This
slighly penalizes for case differences, rather than severely penalizes
when each character case difference were to be counted as a full transformation.
The char arrays are being passed in to speed things up. Also,
the lowercase char arrays are being passed in to speed things up.
It was observed that in a typical invocation of this method,
this method gets called over and over again, with one of the
strings being held constant, and one of them changing. By being
able to pass in the char arrays, I can convert the string
that remains constant only once to a char array, and lowercase
it only once. I can do array indexing rather than method invocation
which should speed things up a bit as well.
We are conserned with speed in this method because this is the
most expensive method of the algorithm. Speed this method up,
and the application is sped up.
To futher the efficiencies, this method returns an int of the
distance that is the distance multiplied by 1000.
Thus, an edit distance of 2 multiplied by 1000 = 2000;
- Parameters:
s - t - lowercaseS - lowercaseT - pThreshold - (ala agrep)
- Returns:
- int of the distance multiplied by 1000.
main
public static final void main(java.lang.String[] argv)
- This is a test main, whose purpose is to test the functionality
of each method developed for this class.
This main strives to test the boundary conditions as well as
some sample common ways each method is intended to be
used.
- Returns:
- int 0|-1 0 is returned if no problems, -1 is
is returned if there is a problem.
The use and distribution of this material is subject to the terms and conditions included in the file SPECIALIST_NLP_TOOLS_TERMS_AND_CONDITIONS.TXT, located in the root directory of the distribution.