VTT File Format

Format Information

  • Format version: 2010.0
  • Build dates: 02.01.2010 ~ current
  • What is new:
    • Meta data section:
      • Add default tags file for quick loading
      • Add file (save) history: VTT format version, user, time stamp

VTT opens and saves tagged text files in VTT file format. Two types of file format will be displayed (opened) in VTT:

  • Pure text
  • Correct VTT format
Please pay special attention to the VTT file format since VTT does not display file with incorrect format. Instead, an error message dialog will be displayed. VTT file format includes three parts, which are separated by pre-set "headers", as described below:

I. Meta Data
Meta data are included in the VTT file in this version. Meta data are stored in the VTT file with preserved tag and format:

  • The default tags file for quick loading
  • The file (saving) history: VTT file format version, user name, time stamp
  • Separated header for Meta Data Content:
    #<---------------------------------------------------------------------->
    #<Meta Data>
    #<Tags file|confirmation|path of default tags file>
    #<File History|VTT file format version|User name|Time stamp>
    #<---------------------------------------------------------------------->
    ...
    
  • Preserved tags and format:
    • TAGS_FILE:
       Field 1Field 2Field 3
      DescriptionPreserved Tagconfirmation flagpath of default tags file
      Java TypeStringbooleanString
      ExampleTAGS_FILE
      • true
      • false
      • /usr/vtt/data/tags.data

    • FILE_SAVE:
       Field 1Field 2Field 3Field 4
      DescriptionPreserved TagVTT file format versionUser nameTime stamp
      Java TypeStringStringStringString
      ExampleFILE_SAVEVTT.2010.0VTT Guest
      • 2/4/10 11:57:43 AM
      • mm/dd/yy hh:mm:ss

II. Text Content

  • The original not-tagged text in UTF-8
  • A line starts with # is ignored and used as a comment
  • Separated header for Text Content:
    #<---------------------------------------------------------------------->
    #<Text Content>
    #<---------------------------------------------------------------------->
    ...
    

III. Tags Configuration

  • Each line represents a Tag configuration used in VTT
  • Each line must contain all 14 fields in the correct format and legal value
  • A line starts with # is ignored and used as a comment
  • The first tag is reserved (reserved tag) by VTT
    • It's name is pre-defined as Text/Clear
    • It's Bold|Italic|Underline properties are used for clear markup
    • It's Display property is not used
    • It's foreground and background colors are used for the high-light color
  • A tag id uniquely define by name and category in VTT
  • Separated header and reserved (the 1st) tag for Tags Configuration:
    #<---------------------------------------------------------------------->
    #<Tags Configuration>
    #<Name|Category|Bold|Italic|Underline|Display|FR|FG|FB|BR|BG|BB|FontFamily|FontSize>
    #<---------------------------------------------------------------------->
    Text/Clear||false|false|false|true|255|255|255|0|51|153|Monospaced|12
    #<---------------------------------------------------------------------->
    ...
    
  • Tag Fields Format:
     Field 1Field 2Field 3Field 4Field 5Field 6Field 7Field 8Field 9Field 10Field 11Field 12Field 13Field 14
    DescriptionNameCategoryBoldItalicUnderlineDisplay Foreground-RedForeground-GreenForeground-Blue Background-RedBackground-GreenBackground-Blue Font FamilyFont size
    Java TypeStringStringbooleanbooleanbooleanboolean intintintintintint StringString
    ExampleText/Clear 
    • true
    • false
    • true
    • false
    • true
    • false
    • true
    • false
    0 ~ 255 0 ~ 255 0 ~ 255 0 ~ 255 0 ~ 255 0 ~ 255
    • Dialog
    • DialogInput
    • Monospaced
    • SansSerif
    • Serif
    • 8
    • 10
    • 12
    • 14
    • ...
    • +0
    • +2
    • -2

IV. Markups Information

  • Each line represents a Markup information used in VTT
  • A line starts with # is ignored and used as a comment
  • Each line must contain the correct format/data for Fields 1 ~ 5
    • Field 1 (Offset) & field 2 (Length): must be integer (< 2147483647)
    • The combination of field 3 (tag name) and field 4 (tag category): must exist from the tags list
    • field 5: Annotation can be empty
    • Spaces in the beginning and end are trimmed in Fields 1 ~ 5
  • Fields 6 is the tagged text, which is used for NLP purposed and is ignored in VTT
  • Fields 6+ are ignored in VTT and can be used for other NLP purposes
  • Character "|" is not allowed to used in the first 5 fields
  • No two lines should have same offset and length (A word can only markuped with one tag)
  • All lines are sorted by offset (smaller first) and then length (larger first)
  • Separated header for markups information:
    #<---------------------------------------------------------------------->
    #<Markups Information>
    #<Offset|Length|TagName|TagCategory|Annotation|TagText>
    #<---------------------------------------------------------------------->
    ...
    

     Field 1Field 2Field 3Field 4Field 5Field 6More Fields
    DescriptionOffsetLengthTagged nameTag CategoryAnnotationTagged TextOther NLP fields
    Java TypeintintStringStringStringNot usedNot used