Derivations Procedures - nomD

This step includes:

  • Generate nomD (derivation from nominalization in Lexicon)
  • Automatically tag zeroD and suffixD to add to derivation table

I. Directory:

  • ${DERIVATION}/1.nomD

II. Input Files (./data/${YEAR}/dataOrg/):
shell> ${NOM_D}/bin/GetNomD ${YEAR}
0

  • These procedures are automatically done in the Step-0
    • shell> cd ${NOM_D}/data
    • shell> mkdir -p ${YEAR}/dataOrg

    • LRNOM:
      => link LRNOM to LRNOM.${YEAR} from new release (${LEXICON}/data/tables)
    • prepositions.data:
      Get the latest preposition if LEXICON.release.${YEAR} is ready
      shell>cd ${LC}/Proc/bin/GetPrePosition ${LC_YEAR} ${LEXICON_YEAR}
      => copy prepositions.data.${YEAR} generated from ${LEX_CHECK}/data/Files/prepositions.data.${YEAR}
      => This file should include all prepositions from previous year plus new prepositions
    • nomD.tagNo.txt
      => copy nomD.tagNo.txt.${PREV_YEAR} to nomD.tagNo.txt.${YEAR}
      => might need to be updated in step-3
    • nomD.tagYes.txt
      => copy nomD.tagYes.txt.${PREV_YEAR} to nomD.tagYes.txt.${YEAR}
      => might need to be updated in step-5

    • LRSPL:
      => link LRSPL to ./5.allD/data/${YEAR}/dataOrg/LRSPL.${YEAR} from new release (${LEXICON}/data/tables)
    • dTypeStr.data:
      => copy dTypeStr.data.${PREV_YEAR} to dTypeStr.data.${YEAR}

III. Result files used in allD

  • ${TAR_DIR}/nomD.yes.Z.data.${YEAR}
  • ${TAR_DIR}/nomD.yes.S.data.${YEAR}

IV. Summary of GetNomD

StepDescription and ProgramInputOutputNotes
0
  • Prepare directories and files
See section II.See section II.
  • 1.nomD/data/${YEAR}/dataOrg
    • LRNOM
    • prepositions.data (see description above for update)
    • nomD.tagNo.txt
    • nomD.tagYes.txt
  • 5.allD/data/{$YEAR}/dataOrg
    • LRSPL
    • dTypeStr.data
1
  • Retrieve raw nomD pairs
  • GetNomDRawFromNomFile.java
  • ${SRC_DIR}:
    • LRNOM
  • nomD.raw.data
 
2
  • Add tag (yes|no) to nomD
  • GetNomDMetaFile.java
  • ${SRC_DIR}:
    • nomD.tagNo.txt
    • prepositions.data

  • ${TAR_DIR}:
    • nomD.raw.data
  • nomD.meta.data
  • nomD.yes.data
  • nomD.no.data
  • Might need to rerun if Step-3 find some invalid dPairs from nomD
3
  • Add dType (P|Z|S|PS|ZS|SS|U)
  • DType.java
  • ${ALL_SRC_DIR}:
    • LRSPL
    • dTypeStr.data

  • ${TAR_DIR}:
    • nomD.yes.data
  • nomD.yes.data.type
  • nomD.yes.data.type.Z
  • nomD.yes.data.type.S
  • nomD.yes.data.type.P
  • nomD.yes.data.type.ZS
  • nomD.yes.data.type.SS
  • nomD.yes.data.type.PS
  • nomD.yes.data.type.U
  • nomD.yes.data.type.ZandS
  • Follow the message from the program
  • Make sure nomD.yes.data.type.U is empty. If not, sent to linguist to tag [S|Z|No]:
    • invalid dPair (No): add to 1.nomD/data/${YEAR}/dataOrg/nomD.tagNo.txt, then rerun Steps: 2~3
    • valid dPair [S|Z]: add to 5.allD/data/${YEAR}/dataOrg/dTypeStr.data.${YEAR}, then rerun Step-3.
  • Make sure the word count of valid nomD are the same (in the message)
4
  • Add negation tag: [O|N], sort
  • AddNegationTagToFile.java
  • ${TAR_DIR}:
    • nomD.yes.data.type.ZandS
    • nomD.yes.data.type.Z
    • nomD.yes.data.type.S
  • nomD.yes.data.${YEAR}
  • nomD.yes.Z.data.${YEAR}
  • nomD.yes.S.data.${YEAR}
  • Total number of S and Z should = ZandS
5
  • Check afflix on nomD.yes.data.${YEAR}
  • CheckDerivationByAffix6.java
  • ${ALL_SRC_DIR}:
    • LRSPL

  • ${SRC_DIR}:
    • nomD.tagYes.txt

  • ${TAR_DIR}:
    • nomD.yes.data.${YEAR}
  • nomD.pattern3.rpt
  • Make sure nomD.pattern3.rpt is empty. If not, send to linguist to tag (Yes|No):
    • invalid dPair (No): add to nomD.tagNo.txt, then rerun Steps: 2~5
    • valid dPair (Yes): add to nomD.tagYes.txt, then rerun Steps: 2~5
6
  • Steps 1 ~ 5
See aboveSee aboveNot recomended!

V. Processes details:
Save mesage to log.${STEP} in ./Logs/${YEAR}/

  • shell>cd ${DERIVATION}/1.nomD/bin
  • shell>GetNomD ${YEAR}

    0: Prepare directories and fiels
    => generates: 1.nomD/data/dataOrg/*
    => generates: 5.allD/data/dataOrg/*

    1: Retrieve std-raw nomD pairs
    => generates: ./data/nomD.raw.data

    2: Add tag (yes|no): to nomD: meta, yes, no files
    => requires:

    • ../dataOrg/preposition.data
      Use in program to identify invalid dPairs from nomD
      • xxxparticle|noun|eui1|xxx|verb|eui2
        lookup|noun|E0222422|look|verb|E003804
      • xxx-particle|noun|eui1|xxx|verb|eui2
        grown-up|noun|E0030484|grow|verb|E0030480
    • ../dataOrg/nomD.tagNo.txt
      Use to tag invalid dPair from nomD, which can't identify by above algorithm

    => generates:
    • ./data/nomD.meta.data
    • ./data/nomD.yes.data
    • ./data/nomD.no.data

    3: Add dType (P|Z|S|PS|ZS|SS|U): Split nomD.yes to (Z) and (S)
    => generates:

    • ./data/nomD.yes.data.type

    • ./data/nomD.yes.data.type.Z
    • ./data/nomD.yes.data.type.S
    • ./data/nomD.yes.data.type.P
      => must be empty
    • ./data/nomD.yes.data.type.ZS (Z by SpVars)
    • ./data/nomD.yes.data.type.SS (S by SpVars)
    • ./data/nomD.yes.data.type.PS (P by SpVars)
      => must be empty
    • ./data/nomD.yes.data.type.U (Unknown)
      => must be empty
      => if not empty, send to linguist to tag (S|Z|No):
      • if invalid (No), add to 1.nomD/../nomD.tagNo.txt
      • if valid (S|Z), add to 5.allD/../dTypeStr.data

    4: Add negation tag: (O|N), sort uniquely
    => generates:

    • ./data/nomD.yes.data.${YEAR} (only inlcude S and Z)
    • ./data/nomD.yes.data.S.${YEAR}
    • ./data/nomD.yes.data.Z.${YEAR}

    5: Check afflix on nomD.yes.data.${YEAR} ..
    => generates:

    • ./data/nomD.pattern3.rpt
      Should be empty, the number of possible invalid nomD: 0 (should be 0)
      => if not empty
      • if invalid: add to ${1.nomD}/data/${YEAR}/dataOrg/nomD.tagNo.txt, and repeat steps 2 ~5
      • if valid: add to ${1.nomD}/data/2015/dataOrg/nomD.tagYes.txt
    or

    5: Run above 1-4 steps

  • Compare the nomD.no.data to previous year and validate the difference
  • The final nomD (belows) are used in zeroZ and suffixD for auto-tag:
    • ${1.nomD}/data/${YEAR}/data/nomD.yes.Z.data.${YEAR}
    • ${1.nomD}/data/${YEAR}/data/nomD.yes.S.data.${YEAR}

Please refer to derivation design documents in Lexical Tools for details.