gov.nih.nlm.nls.gspell
Class LevenshteinDistance

java.lang.Object
  extended bygov.nih.nlm.nls.gspell.LevenshteinDistance

public final class LevenshteinDistance
extends java.lang.Object

LevenshteinDistance This classs holds the methods to compute a modified Levenshtein distance.


Constructor Summary
LevenshteinDistance()
          This is a constructor for LevenshteinDistance.
 
Method Summary
 int LD(char[] s, char[] t, char[] lowercaseS, char[] lowercaseT, int pThreshold)
          Method LD.
 int LD(java.lang.String s, java.lang.String t, int pThreshold)
          Method LD.
static void main(java.lang.String[] argv)
          This is a test main, whose purpose is to test the functionality of each method developed for this class.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LevenshteinDistance

public LevenshteinDistance()
This is a constructor for LevenshteinDistance.

Method Detail

LD

public int LD(java.lang.String s,
              java.lang.String t,
              int pThreshold)
Method LD. This method returns a modified Levenshtein distance for any given two strings. It is noted whether there is any case difference, an extra .5 is added to the distance. This slighly penalizes for case differences, rather than severely penalizes when each character case difference were to be counted as a full transformation.

Parameters:
s -
t -
Returns:
double

LD

public int LD(char[] s,
              char[] t,
              char[] lowercaseS,
              char[] lowercaseT,
              int pThreshold)
Method LD. This method returns a modified Levenshtein distance for any given two strings. It is noted whether there is any case difference, an extra .5 is added to the distance. This slighly penalizes for case differences, rather than severely penalizes when each character case difference were to be counted as a full transformation. The char arrays are being passed in to speed things up. Also, the lowercase char arrays are being passed in to speed things up. It was observed that in a typical invocation of this method, this method gets called over and over again, with one of the strings being held constant, and one of them changing. By being able to pass in the char arrays, I can convert the string that remains constant only once to a char array, and lowercase it only once. I can do array indexing rather than method invocation which should speed things up a bit as well. We are conserned with speed in this method because this is the most expensive method of the algorithm. Speed this method up, and the application is sped up. To futher the efficiencies, this method returns an int of the distance that is the distance multiplied by 1000. Thus, an edit distance of 2 multiplied by 1000 = 2000;

Parameters:
s -
t -
lowercaseS -
lowercaseT -
pThreshold - (ala agrep)
Returns:
int of the distance multiplied by 1000.

main

public static final void main(java.lang.String[] argv)
This is a test main, whose purpose is to test the functionality of each method developed for this class. This main strives to test the boundary conditions as well as some sample common ways each method is intended to be used.

Returns:
int 0|-1 0 is returned if no problems, -1 is is returned if there is a problem.


The use and distribution of this material is subject to the terms and conditions included in the file SPECIALIST_NLP_TOOLS_TERMS_AND_CONDITIONS.TXT, located in the root directory of the distribution.