gov.nih.nlm.nls.nlp.textfeatures
Class Collection

java.lang.Object
  extended by gov.nih.nlm.nls.nlp.textfeatures.MmObject
      extended by gov.nih.nlm.nls.nlp.textfeatures.Collection
All Implemented Interfaces:
java.io.Serializable

public final class Collection
extends MmObject

Collection is a collection of documents. Tue Mar 06 12:53:20 EST 2001, divita Initial Version

Version:
$Id: Collection.java,v 1.9 2005/12/20 20:13:00 divita Exp $
See Also:
Serialized Form

Field Summary
 
Fields inherited from class gov.nih.nlm.nls.nlp.textfeatures.MmObject
serialVersionUID
 
Constructor Summary
Collection()
          This is a constructor for Collection.
Collection(gov.nih.nlm.nls.utils.GlobalBehavior pSettings)
          This is a constructor for Collection.
Collection(java.lang.String pFileName)
          Deprecated.  
Collection(java.lang.StringBuffer pCollectionText)
          This is a constructor for Collection.
Collection(java.lang.String pFileName, gov.nih.nlm.nls.utils.GlobalBehavior pSettings)
          This is a constructor for Collection.
 
Method Summary
 void addDocument(Document pDocument)
          Method addDocument
 void countWordFrequencies(Document pDocument)
          Method countWordFrequencies aggregates document frequencies for for words
 java.lang.String displayContent(gov.nih.nlm.nls.utils.GlobalBehavior pSettings)
          Method displayContent diplays the relevant content for this object.
 void displayContentToOut(gov.nih.nlm.nls.utils.GlobalBehavior pSettings)
          Method displayContentToOut diplays the relevant content for this object.
 java.io.BufferedReader getBufferedReader()
          Method getBufferedReader returns the buffered reader There is no automatic method to close the buffer when it is not used anymore.
 java.io.File[] getCollectionFiles()
          Method getCollectionFiles
 void getCollectionFrequencyList()
          Method getCollectionFrequencyList prints out to the settings stream the corpus frequencies for each word in the corpus.
 java.util.Vector getDocuments()
          Method getDocuments retrieves the set of Documents associated with this collection
 gov.nih.nlm.nls.utils.GlobalBehavior getSettings()
          Method getSettings
 void init(java.lang.String pCollection)
          Method init
 boolean isInteractive()
          Method isInteractive returns true if the input is coming from standard input.
static void main(java.lang.String[] argv)
          This is a test main, whose purpose is to test the functionality of each method developed for this class.
 java.lang.String peek()
          Method peek returns the first X number of characters of the collection.
(package private)  java.lang.String readCollectionContent(java.lang.String pFileName)
          Method readCollectionContent reads in the contents of a collection from a file or from standard input.
 java.io.File[] readDirectoryContent(java.io.File pFileName)
          Method readDirectoryContent returns a File[] of (abstract) files
 void setBufferedReader(java.lang.String pFileName)
          Method setBufferedReader opens the fileStream of the file for this collection.
 void setDocuments(java.util.Vector pDocuments)
          Method setDocuments
 void setGlobalBehaviors(gov.nih.nlm.nls.utils.GlobalBehavior pSettings)
          Method setGlobalBehaviors
 
Methods inherited from class gov.nih.nlm.nls.nlp.textfeatures.MmObject
appendOriginalString, getCharOffset, getId, getLabel, getOriginalString, getSpan, getStrippedString, getTrimmedString, setId, setLabel, setOriginalString, setSpan, setStrippedString, setTrimmedString, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Collection

public Collection()
           throws java.lang.Exception
This is a constructor for Collection. This constructor assumes that the document is read in from the standard input.

Throws:
java.lang.Exception

Collection

public Collection(java.lang.StringBuffer pCollectionText)
This is a constructor for Collection. This constructor takes as a parameter a string to be used as the original Collection. This constructor is useful for interactive processing where the the entire content coming from standard input will be thought of as one document and one collection; and for term processing where the the collection to be analized does not come from a file, but from a command line, query, or type in from a web page.

Parameters:
pCollectionText -

Collection

public Collection(java.lang.String pFileName)
           throws java.lang.Exception
Deprecated. 

This is a constructor for Collection. This constructor takes as a parameter a fileName to be read and analyzed.

Parameters:
pFileName - The full path to a file to be read in.
Throws:
java.lang.Exception

Collection

public Collection(java.lang.String pFileName,
                  gov.nih.nlm.nls.utils.GlobalBehavior pSettings)
           throws java.lang.Exception
This is a constructor for Collection. This constructor takes as a parameter a fileName to be read and analyzed.

Parameters:
pFileName - The full path to a file to be read in.
pSettings -
Throws:
java.lang.Exception

Collection

public Collection(gov.nih.nlm.nls.utils.GlobalBehavior pSettings)
           throws java.lang.Exception
This is a constructor for Collection. This constructor infers from the global Behaviors which collection to read in and get analyzed. This constructor will then read in that collection. If the input is coming from standard input, this method assumes that the tokenizer will read from the standard input when it needs to.

Parameters:
pSettings -
Throws:
java.lang.Exception
Method Detail

init

public void init(java.lang.String pCollection)
Method init

Parameters:
pCollection -

setDocuments

public void setDocuments(java.util.Vector pDocuments)
Method setDocuments

Parameters:
pDocuments - a Vector of Document

addDocument

public void addDocument(Document pDocument)
Method addDocument

Parameters:
pDocument -

getDocuments

public java.util.Vector getDocuments()
Method getDocuments retrieves the set of Documents associated with this collection

Returns:
Vector of Document

readCollectionContent

final java.lang.String readCollectionContent(java.lang.String pFileName)
                                      throws java.lang.Exception
Method readCollectionContent reads in the contents of a collection from a file or from standard input.

Parameters:
pFileName - The full Pathname of the file, or null (for standard input)
Returns:
String
Throws:
java.lang.Exception

displayContentToOut

public void displayContentToOut(gov.nih.nlm.nls.utils.GlobalBehavior pSettings)
Method displayContentToOut diplays the relevant content for this object.

Parameters:
pSettings -

displayContent

public java.lang.String displayContent(gov.nih.nlm.nls.utils.GlobalBehavior pSettings)
Method displayContent diplays the relevant content for this object. This method checks to see if the --collections flag has been set. If it has been set, it will print out the name of the collection.

Parameters:
pSettings -
Returns:
String

isInteractive

public boolean isInteractive()
Method isInteractive returns true if the input is coming from standard input.

Returns:
boolean

setBufferedReader

public void setBufferedReader(java.lang.String pFileName)
                       throws java.lang.Exception
Method setBufferedReader opens the fileStream of the file for this collection. One can get the BufferedReader with the setBufferedReader() method.

Parameters:
pFileName -
Throws:
java.lang.Exception

getBufferedReader

public java.io.BufferedReader getBufferedReader()
Method getBufferedReader returns the buffered reader There is no automatic method to close the buffer when it is not used anymore. Please close the buffer when you are done with it.

Returns:
BufferedReader

peek

public java.lang.String peek()
Method peek returns the first X number of characters of the collection. This is useful for determining what kind of tokenizer to build and use.

Returns:
String

readDirectoryContent

public java.io.File[] readDirectoryContent(java.io.File pFileName)
Method readDirectoryContent returns a File[] of (abstract) files

Parameters:
pFileName -
Returns:
File[]

getCollectionFiles

public java.io.File[] getCollectionFiles()
Method getCollectionFiles

Returns:
File[]

countWordFrequencies

public void countWordFrequencies(Document pDocument)
Method countWordFrequencies aggregates document frequencies for for words

Parameters:
pDocument -

getCollectionFrequencyList

public void getCollectionFrequencyList()
Method getCollectionFrequencyList prints out to the settings stream the corpus frequencies for each word in the corpus.


setGlobalBehaviors

public void setGlobalBehaviors(gov.nih.nlm.nls.utils.GlobalBehavior pSettings)
Method setGlobalBehaviors

Parameters:
pSettings -

getSettings

public gov.nih.nlm.nls.utils.GlobalBehavior getSettings()
Method getSettings

Returns:
GlobalBehavior

main

public static final void main(java.lang.String[] argv)
This is a test main, whose purpose is to test the functionality of each method developed for this class. This main strives to test the boundary conditions as well as some sample common ways each public method is intended to be used.

Parameters:
argv - The command line input, tokenized


The use and distribution of this material is subject to the terms and conditions included in the file SPECIALIST_NLP_TOOLS_TERMS_AND_CONDITIONS.TXT, located in the root directory of the distribution.