Derivations Procedures - prefixD

Generate prefixD pairs in derivation table:

I. Directory: ${DERIVATION}/2.prefixD

II. Input Files (./data/${YEAR}/dataOrg/):
shell> ${PREFIX_D}/bin/GetPrefixD ${YEAR}
0

  • inflVars.data:
    => link inflVars.data to ./inflVars.data.${YEAR} (from ${LEXICON_DIR})
  • LEXICON:
    => link LEXICON to ./LEXICON.${YEAR} (from ${LEXICON_DIR}/LEXICON.release), no need!
  • link prefixD.tag.txt to ./prefixD.tag.txt.${YEAR} (from ./dataOrg/prefixD.tag.txt.${PREV_YEAR})
  • link prefixList.data to ./dataOrg/prefixList.data.${YEAR} (copy/update from previous year)

  • link prefixD.meta.data.conflict.tag.data to ./dataOrg/prefixD.meta.data.conflict.tag.data.${YEAR}
  • Just touch/create ./dataOrg/prefixD.meta.data.conflict.tag.data.${YEAR} at the init phase

III. Final files for allD (release)

  • ${TAR_DIR}/prefixD.yes.data.${YEAR}

IV. Summary of GetPrefixD

StepDescription and ProgramInputOutputNotes
0
  • Prepare directories and files
See section II.See section II.
  • 2.prefixD/data/${YEAR}/dataOrg
    • inflVars.data
    • prefixD.tag.txt
      => Make sure the prefixD.tag.txt.${YEAR} exist from the previous dataOrg
    • prefixList.data
    • prefixD.meta.data.conflict.tag.data
1
  • Get valid prefix base forms from LEXICON
  • GetBaseForms.java
  • ${SRC_DIR}:
    • inflVars.data
  • bases.data
 
2
  • Retrieve all raw prefixD pairs
  • GetPrefixFromBaseFile.java
  • ${SRC_DIR}:
    • prefixList.data

  • ${TAR_DIR}:
    • bases.data
  • prefixD.raw.data.all
  • prefixD.rawNo.rpt.all
  • This step retrieves all prefixD (including DONE-${YEAR} and TBD).
  • Use results from step 8 (next) for release, new prefix
8
  • Retrieve raw prefixD pairs for this release
  • GetPrefixFromBaseFile.java
  • 8
    DONE
  • ${SRC_DIR}:
    • prefixList.data

  • ${TAR_DIR}:
    • bases.data
  • prefixD.raw.data.DONE
  • prefixD.rawNo.rpt.DONE
  • This step provides option of prefixes to retrieve:
    • TBD: all prefixD that are marked as TBD
    • DONE: all prefixD exclude TBD (used for release)
      => The result is linked to prefixD.raw.data and used for release
    • prefix: all prefixD for the specified prefixes
3
  • Add tags to prefixD meta file (meta)
  • GetPrefixMetaFile.java
  • DPairTagList
  • ${SRC_DIR}:
    • prefixD.tag.txt

  • ${TAR_DIR}:
    • prefixD.raw.data
      Link to ./prefixD.raw.data.DONE
  • prefixD.meta.data
  • prefixD.meta.data.conflict
    The conflict file should be empty
  • The conflict file (prefixD.meta.data.conflict) lists all inconsistnent prefixD tags between SpVars in two records
  • Ideally, all prefixD should be consistent among SpVars between records
  • In the inital 1st run (before add tags to annually updates), no conflict should exist.
  • If not empty, send conflicts to linguist to tag [yes|no|both] on EUI lines
    • [yes]: all prefixD tags among SpVars between records are valid
    • [no]: all prefixD tags among SpVars between records are invalid
    • [both]: prefixD tags among SpVars between records inlcude valid and invalid (exception)
  • Update the tag result to ./dataOrg/prefixD.meta.data.conflict.tag.data
  • In the past, no both cases (exceptions) in prefixD
  • If spVar conflicts exist, run the next steps (14) to update the results to prefixD.tag.txt automatically, then re-run this step: 3, Otherwise, go to step-4.
14
  • Auto-fix prefixD.tag.txt for conflicts of dPair tags by SpVars
  • FixConflictDPairTags.java
    =>Used to fix inconsistency tag between spVars automatically
  • ${SRC_DIR}:
    • prefixD.tag.txt.${YEAR}
    • prefixD.meta.data.conflict.tag.data
  • ${SRC_DIR}:
    • prefixD.tag.txt.${YEAR}.fixDPair
  • Make sure update linguist tagging result to ./dataOrg/prefixD.meta.data.conflict.tag.data before running this step
  • Manully exam ./dataOrg/prefixD.tag.txt.${YEAR}.fixDPair
    => should not have any more conflict tags between spVars
  • Update the prefixD.tag.txt.${YEAR} with the lastest tag from Step 4
  • If prefixD.tag.txt.${YEAR}.fixDPair passes exam, move it to prefixD.tag.txt.${NEXT_YEAR}
    => then, re-run Step 3.
4
  • Split tags on prefixD meta file [yes|no|tbd|tbt]
  • SplitPrefixDMetaFile.java
  • ${SRC_DIR}:
    • prefixList.data

  • ${TAR_DIR}:
    • prefixD.meta.data
  • prefixD.yes.data
  • prefixD.no.data
  • prefixD.tbd.data
  • prefixD.tbt.data
  • prefixD.yesNo.data
  • Make sure prefixD.tbt.data is empty. If not, sent to linguists to tag:
    • Tag prefixD: [yes|no]
      • valid prefixD: [yes]

        Tag negation: [O|N] if prefix is: a-, an-, de-, dys-, in-, under-

        • Negative: [N]
        • Otherwise: [O]
      • invalid prefixD: [no]
  • Manually update the results to ./dataOrg/prefixD.tag.txt
    • Copy prefixD.tbt.data.tagged.txt to ./dataOrg/Tags/
    • Add the tagged results to ./dataOrg/prefixD.tag.txt.${YEAR}
    • ln -sf ./prefixD.tag.txt.${YEAR} prefixD.tag.txt
  • Rerun steps: 3, 14, 4 until prefixD.tbt.data is empty
5
  • Verify dType on prefixD.yes.data
  • DType.java
  • ${ALL_SRC_DIR}:
    • LRSPL
    • dTypeStr.data

  • ${TAR_DIR}:
    • prefixD.yes.data
  • prefixD.yes.data.type
  • prefixD.yes.data.type.Z
  • prefixD.yes.data.type.S
  • prefixD.yes.data.type.P
  • prefixD.yes.data.type.ZS
  • prefixD.yes.data.type.SS
  • prefixD.yes.data.type.PS
  • prefixD.yes.data.type.U
Make sure unknonw dType (|U|) from prefixD is empty.
6
  • Add negation tag (N|O), sort uniquely
  • AddNegationTagToFile.java
  • DPairTagList.java
  • ${SRC_DIR}:
    • prefixList.data
    • prefixD.tag.txt

  • ${TAR_DIR}:
    • prefixD.yes.data
  • prefixD.yes.data.${YEAR}
  • prefixD.yes.data.${YEAR}.conflict
  • Check if there are any missing tags from the output log.
    • ** NegTagErr (43305): a|avascularize|verb|E0566090|vascularize|verb|E0064067|yes ...
    • -- Error negTag no (must be 0): x
  • The conflict file lists all inconsistnent negation tags between SpVars in two records
    • Send conflicts to linguist to tag (N|O) on EUI lines
    • No prefixD pair should be B in negation (even the prefix is class of B)
    • In the past, there are some (5) conflicts need to be corrected, as shown below:
      • anti- is O (not N) when it has spVar of ante-
        antebrachium|noun|E0072172|brachium|noun|E0013901|O|
        antibrachium|noun|E0072172|brachium|noun|E0013901|N|

        antebrachial|adj|E0203565|brachial|adj|E0013883|O|
        antibrachial|adj|E0203565|brachial|adj|E0013883|N|

      • im- is O (not N) when it has spVar of em-
        empanel|verb|E0024983|panel|noun|E0045258|O|
        impanel|verb|E0024983|panel|noun|E0045258|N|

        embower|verb|E0580659|bower|noun|E0434097|O|
        imbower|verb|E0580659|bower|noun|E0434097|N|

      • dis- is O (not N) when it has spVar of di-
        disyllable|noun|E0523982|syllable|noun|E0059482|O|
        dissyllable|noun|E0523982|syllable|noun|E0059482|N

      => These data (5 cases in 2016, 2017, 2018) re-occur yearly because they are class of O or N with SpVars, the negation are assigned by computer (not manually tag).
    • Copy ./data/prefixD.yes.data.${YEAR}.conflict to ./dataOrg/prefixD.yes.data.${YEAR}.conflict.tag.data
    • Update tags of ./dataOrg/prefixD.yes.data.${YEAR}.conflict.tag.data from previous year (for fixing the re-occurring). Please note that the line no (1st field) might be different!
    • Also, Update linguist's tag from ./data/prefixD.yes.data.${YEAR}.conflict to ./dataOrg/prefixD.yes.data.${YEAR}.conflict.tag.data (if any) before run Step 15
    • Update prefixD tag file: cp ./dataOrg/prefixD.tag.txt.2018.fixDPair ./dataOrg/prefixD.tag.txt.2018
    • Use Steps: 15-16 to update the results to prefixD.yes.data.${YEAR}
15
  • Auto-fix prefixD.tag.txt for conflicts of negation tags by SpVars for class of B
    • fix negation conflict for B class
    • list possible negation conflict for O|N classes
  • FixConflictNegationTags.java
  • ${SRC_DIR}:
    • prefixD.tag.txt.${YEAR}
    • prefixList.data
    • prefixD.yes.data.${YEAR}.conflict.tag.data
  • ${SRC_DIR}:
    • prefixD.tag.txt.${YEAR}.fixNegation
  • ${TAR_DIR}:
    • prefixD.negation.fix.data

  • Manully exam ./dataOrg/prefixD.tag.txt.${YEAR}.fixNegation
  • If it is OK, move it to ./dataOrg/prefixD.tag.txt.${YEAR}
16
  • Auto-fix prefixD.yes.data.${YEAR} for conflicts of negation tags by SpVars for classes of N and O from step-15
  • FixConflictNegationForClassNandO.java
  • ${TAR_DIR}:
    • prefixD.yes.data.${YEAR}
    • prefixD.negation.fix.data
  • prefixD.yes.data.${YEAR}.fixNegation
    => move to prefixD.yes.data.${YEAR}
  • Manully exam ./data/prefixD.yes.data.${YEAR}.fixNegation (for the negation changes by computer, N|O)
    => Check if it fixes the negation of class O|N of above 5 conflict negation cases
  • If it is OK,
    • mv ./data/prefixD.yes.data.${YEAR} ./data/prefixD.yes.data.${YEAR}.beforeFixNegation
    • cp -rp ./data/prefixD.yes.data.${YEAR}.fixNegation ./data/prefixD.yes.data.${YEAR}
7
  • Check afflix on prefixD.yes.data.${YEAR}
  • CheckDerivationByAffix6.java
  • ${ALL_SRC_DIR}:
    • LRSPL

  • ${SRC_DIR}:
    • prefixD.tagYes.txt
      => copy form the previous year

  • ${TAR_DIR}:
    • prefixD.yes.data.${YEAR}
  • prefixD.pattern3.rpt
  • Make sure prefixD.pattern3.rpt is empty. If not, send to linguists to tag (Yes|No):
    • invalid dPair (No): add to prefixD.tagNo.txt, then rerun Steps: 3~7
    • valid dPair (Yes): add to prefixD.tagYes.txt, then rerun Step: 7
11
  • Steps 1 ~ 7
See aboveSee aboveNot recomended!

V. Processes Details:

  • shell>cd ${DERIVATION}/prefixD/bin
  • shell>GetPrefixD ${YEAR}

    1. Routine process (no new PD-Rules, no new Tag)

    1: Get valid prefix base forms from LEXICON
    => generates ./data/bases.data

    2: Retrieve raw prefixD pairs
    or use
    8: Retrieve possible raw prefixD pairs with options
    DONE for all prefix is done tagged
    => generates:

    • ./data/prefixD.raw.data
    • ./data/prefixD.rawNo.rpt

    3: Add tags to prefixD meta file
    => generates ./data/prefixD.meta.data
    must be tagged of [yes|no], all errors must be fixed
    use tag of tbd to bypass entry with tagging errors

    3.1: Check conflicts by SpVars (different dPair tags between 2 records).
    => generates ./data/prefixD.meta.data.conflict
    Send to linguist to double check "[yes|no|both]"
    => Ideally, the tag of prefixD between two records should be the same
    => This file lists all inconsistent prefixD tags between two records (caused by SpVars).
    => If not empty, sent to linguist to tag [yes|no|both] the EUI line.

    • yes: all tags between these two records should be yes
    • no: all tags between these two records should be no
    • both: tags between these two records could be yes or no. In the past, no tag of both for prefixD

    => manually update this result to prefixD.tag.txt and rerun step 3 ~ 6.

    14: Auto-fix prefixD.tag.txt for conflicts by SpVars
    => Put the revised tagged file to: ./dataOrg/prefixD.meta.data.conflict.tag.data
    => copy ./dataOrg/prefixD.tag.txt.${YEAR}.fix to ./dataOrg/prefixD.tag.txt.${YEAR} and rerun this step.

    4: Split prefixD meta file
    => generates

    • ./data/prefixD.yesNo.data
    • ./data/prefixD.yes.data
    • ./data/prefixD.no.data
    • ./data/prefixD.tbd.data (tbt + tbd prefixes)
    • ./data/prefixD.tbt.data (to be tagged => annual update dPairs)

    Make sure prefixD.tbt.data is empty. If not, sent to linguists to tag:

    • Tag prefixD: [yes|no]
      • valid prefixD: yes

        Tag negation: (O|N) if prefix is: a-, an-, de-, dys-, in-, under-

        • Negative: N
        • Otherwise: O
      • invalid prefixD: no

    5: Verify dType on prefixD.yes.data
    => generates ./data/prefixD.yes.data.type

    • ./data/prefixD.yes.data.type.Z (must be 0)
    • ./data/prefixD.yes.data.type.P (should = ./data/prefixD.yes.data)
    • ./data/prefixD.yes.data.type.Z (must be 0)
    • ./data/prefixD.yes.data.type.ZS (must be 0)
    • ./data/prefixD.yes.data.type.PS (should = 0)
    • ./data/prefixD.yes.data.type.SS (must be 0)
    • ./data/prefixD.yes.data.type.U (must be 0)

    6: Add negation tag (N|O), it is uniquely sorted in the program (not by sort -u)
    => generates ./data/prefixD.yes.data.2014
    Negation tagging error must be fixed
    => send to linguist to tag the negation (N|O)

    6.1: Check conflict (inconsistent) tags between SpVars
    generates ./data/prefixD.yes.data.${YEAR}.conflict


    => Ideally, the tag of prefixD between two records should be the same
    Also, might cause inconsistent Negation tag on prefixD.
    => Ideally, the tag of negation between two records should be the same
    => If not empty, sent to linguist to tag (N|O|B) the EUI line.
    => The negation could have exceptions:

    • anti- is O (not N) when it has spVar of ante-
    • im- is O (not N) when it has spVar of em-
    • dis- is O (not N) when it has spVar of di-


    => manually update this result to prefixD.yes.data.${YEAR}
    => The final prefix is in ${DERIVATION}/prefixD/data/${YEAR}/data/prefixD.yes.data.${YEAR}

    15: Auto-fix prefixD.tag.txt for negation conflicts by SpVars
    => Put the revised tagged file to: ./dataOrg/prefixD.yes.data.${YEAR}.conflict.tag.data
    Known cases in 2015 are:

    1|E0013901|E0072172|
    # 556|antebrachium|noun|E0072172|brachium|noun|E0013901|O|
    # 1431|antibrachium|noun|E0072172|brachium|noun|E0013901|N|
    2|E0013883|E0203565|
    # 557|antebrachial|adj|E0203565|brachial|adj|E0013883|O|
    # 1432|antibrachial|adj|E0203565|brachial|adj|E0013883|N|
    3|E0024983|E0045258|
    # 11245|empanel|verb|E0024983|panel|noun|E0045258|O|
    # 15077|impanel|verb|E0024983|panel|noun|E0045258|N|
    4|E0434097|E0580659|
    # 11243|embower|verb|E0580659|bower|noun|E0434097|O|
    # 15072|imbower|verb|E0580659|bower|noun|E0434097|N|
    5|E0059482|E0523982|
    # 9310|disyllable|noun|E0523982|syllable|noun|E0059482|O|
    # 10500|dissyllable|noun|E0523982|syllable|noun|E0059482|N|
    	

    => copy ./dataOrg/prefixD.tag.txt.${YEAR}.fixNegation to ./dataOrg/prefixD.tag.txt.${YEAR} and rerun this step.
    => Check log, run step-16 if possible negation fix exist

    16: Auto-fix prefixD.tag.txt for negation conflicts by SpVars for class N and O
    => Check fix file exist: ./data/prefixD.negation.fix.data
    => copy ./data/prefixD.yes.${YEAR}.fixNegation to ./data/prefixD.yes.${YEAR}

    7: Check afflix on prefixD.yes.data.${YEAR}
    => generates ./data/prefixD.pattern3.rpt (should be empty)

    11: Run above 1-7 steps (default)
    => above steps from 1 ~ 7

    2. Add new PD-Rules process
    8: Retrieve possible raw prefixD pairs with options

    • use this option ${PREFIX} to generate all prefixD pairs for a specified prefix (check the prefixD.rawNo.rpt.${PREFIX})
    • send to linguists for tagging (see below)
    • add new tagged dPairs results to ./dataOrg/prefixD.tag.txt
    • update ./dataOrg/prefixList.data (so the prefix will be added tag)
    • use DONE to retrieved all prefix are not TBD

    Same procedures as above (regular)
    3: Add tags to prefixD meta file
    4: Split prefixD meta file
    5: Verify dType on prefixD.yes.data
    6: Add negation tag (N|O)
    7: Compare original tag and result tag files

    3. Add tag for new prefix dPairs (annual updates)

    • send ./data/prefixD.tbt.data to linguists for tagging:
      • derivation: [yes|no]
      • negation: O|N (if tagged yes and the prefix is: a-, an-, de-, dys-, in-, under-)
    • Append new tagging results to ./dataOrg/prefixD.tag.txt
    • re-run this process until all prefixD are tagged (0 in prefixD.tbt.data)

Please refer to derivation design documents in Lexical Tools for details.