gov.nih.nlm.nls.gspell
Class Candidate

java.lang.Object
  extended bygov.nih.nlm.nls.gspell.Candidate

public class Candidate
extends java.lang.Object

Candidate is an association between a document and a rank to some other document. Mon Jul 23 11:24:40 EDT 2001, divita Initial Version


Constructor Summary
Candidate()
          This is a constructor for Candidate
Candidate(java.lang.String pDocument, int pEditDistance, java.lang.String pMethod)
          This is a constructor for Candidate
 
Method Summary
 long getCorpusFrequency()
          Method getCorpusFrequency returns the number of times this term has appeared in a corpus (such as MedLINE).
 double getEditDistance()
          Method getEditDistance returns the edit Distance between the query and the candidate
 int getEditDistance1000()
          Method getEditDistance1000 returns the edit Distance between the query and the candidate multiplied by 1000.
 java.lang.String getFromMethod()
          Method getFromMethod
 java.lang.String getMessage()
          Method getMessage returns either nothing, "Correct" or "Case difference"
 java.lang.String getName()
          Method getDocument
 double getRank()
          Method getRank The rank is the normalized ranking with a value between 0 and 1, where 1 is an exact match, and 0 is there is no match at all.
 int getSortValue()
          Method getSortValue returns the value that candidates should be sorted against.
 boolean isCorrect()
          Method isCorrect returns true if the candidate is the same as the query or if there is a case difference.
 void set(java.lang.String pDocument, int pEditDistance, java.lang.String pMethod)
          set sets the value of this instance
 void setCorpusFrequency(long pFrequency)
          Method setCorpusFrequency adds the number of times this term has appeared in a corpus (such as MedLINE).
 void setMaxFreq(long pMaxFreq)
          Method setMaxFreq sets the frequency of the most prolific word in corpus.
 void setMethod(java.lang.String pMethod)
          Method setMethod
 java.lang.String toString()
          Method toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Candidate

public Candidate(java.lang.String pDocument,
                 int pEditDistance,
                 java.lang.String pMethod)
This is a constructor for Candidate

Parameters:
pDocument -
pEditDistance -
pMethod -

Candidate

public Candidate()
This is a constructor for Candidate

Method Detail

set

public void set(java.lang.String pDocument,
                int pEditDistance,
                java.lang.String pMethod)
set sets the value of this instance

Parameters:
pDocument -
pEditDistance -
pMethod -

getEditDistance

public double getEditDistance()
Method getEditDistance returns the edit Distance between the query and the candidate

Returns:
double

getEditDistance1000

public int getEditDistance1000()
Method getEditDistance1000 returns the edit Distance between the query and the candidate multiplied by 1000. This makes sorting faster by allowing integer math rather than real math.

Returns:
int

getSortValue

public int getSortValue()
Method getSortValue returns the value that candidates should be sorted against. This involves an edit Distance between the query and the candidate multiplied by 1,000. This value is multiplied by a constant depending on the method type ( .5 for homonyms and common misspellings, 1 for everything else). This value is add to a huge number minus the frequency that this candidate is found in a corpus. If corpus frequencies are not used, the frequency is 0. The huge number is a value larger than the maximum number of times the most frequent word of the corpus is found. For instance, in the '99 Medline corpus, "the" is the most frequent term, and appears 3,895,319 times. The constant above should be 4,000,000. Since this is not exactly true, the huge number in practice is currently to 1,000,000. The constant can be set via the --maxCorpusFreq setting in the config file.

Returns:
int

getName

public java.lang.String getName()
Method getDocument

Returns:
String

getFromMethod

public java.lang.String getFromMethod()
Method getFromMethod

Returns:
String

getMessage

public java.lang.String getMessage()
Method getMessage returns either nothing, "Correct" or "Case difference"

Returns:
String

isCorrect

public boolean isCorrect()
Method isCorrect returns true if the candidate is the same as the query or if there is a case difference.

Returns:
boolean

setMethod

public void setMethod(java.lang.String pMethod)
Method setMethod


setCorpusFrequency

public void setCorpusFrequency(long pFrequency)
Method setCorpusFrequency adds the number of times this term has appeared in a corpus (such as MedLINE).


getCorpusFrequency

public long getCorpusFrequency()
Method getCorpusFrequency returns the number of times this term has appeared in a corpus (such as MedLINE).

Returns:
long pFrequency

toString

public java.lang.String toString()
Method toString

Returns:
String

getRank

public double getRank()
Method getRank The rank is the normalized ranking with a value between 0 and 1, where 1 is an exact match, and 0 is there is no match at all. The distribution for this normalized score is intended to have edit distances of .5 rank above .90. The distribution looks like

This distribution is likely to change behavior as we become more safisticated at calculating the differences between two strings.

Returns:
double

setMaxFreq

public void setMaxFreq(long pMaxFreq)
Method setMaxFreq sets the frequency of the most prolific word in corpus. This number is a value larger than the maximum number of times the most frequent word of the corpus is found. For instance, in the '99 Medline corpus, "the" is the most frequent term, and appears 3,895,319 times. The default value for this number is 1,000,000

Parameters:
pMaxFreq -


The use and distribution of this material is subject to the terms and conditions included in the file SPECIALIST_NLP_TOOLS_TERMS_AND_CONDITIONS.TXT, located in the root directory of the distribution.