Machine Output (2008) Explained

We have made major changes to the MetaMap Machine Output with the 2008 release - please see the Changes in MMO (HTML: 44 kb) document for a description of the various changes. A summary of the changes is that we have added three new objects (args, aas, and neg_list), added positional information, and added sources information to the MetaMap Machine Output.

The machine output is the result of the MetaMap/SKR processing executed on a citation or item of free text and provides a vast amount of raw information about the input data. Each data items is first broken down into it's respective sentences or utterances. Each utterance is then broken down into it's respective noun phrases. Then, each noun phrase is tagged using the MedPost/SKR tagger to identify nouns, verbs, prepositions, adjectives, punctuation, etc., and where the head or main idea of the noun phrase is located. Candidates are identified from the list of variants for each concept. Once we have all of this information, MetaMap can then map UMLS concepts to the noun phrase identifying the best possible coverage. All of this processing information is then printed in the machine output format you see below.


The machine output consists of eight (8) main objects (always in this order
and always one line per element):
   args        -- Actual arguments used to create the current MMO file.
   aas         -- List of Acronym/Abbreviations found in the input text.
   neg_list    -- List of any negation found in the input text - NOTE that
                  this is currently not being populated, but the empty list
                  is still in the file for use when it becomes available.
   utterance   -- One per sentence
      phrase   -- As many as are needed to complete the sentence/utterance. Always matched up with a "candidates" and "mappings" objects.
      candidates -- Always matched up with a phrase object and may be empty "candidates([])." depending on the phrase.
      mappings -- Always matched up with a phrase object and may be empty "mappings([])." depending on the phrase.
   'EOU'.      -- Marks the end of the utterance.


Format:

args:

   args('Command-Line call',MetaMapOptions").
      where Command-Line call is simply the command-line call, including the
      (site-specific) absolute pathname of the binary runtime executable.

     MetaMapOptions are a comma-separated list of terms of the form
     OptionName-OptionValue, or simply [] if not options are specified.


aas:

   aas([AcronymsAndAbbreviations]).
      where AcronymsAndAbbreviations is a comma-separated list of 4-tuples of
      the form:   Acronym * Expansion * CountList * CUIList


neg_list:

   neg_list(ListOfNegations).
      As of this release, the negex term is simply a placeholder of the form
      neg_list([]). Once Negex is fully incorporated into MetaMap, however, the
      ListOfNegations will be a comma-separated list of terms of the form

      negation(<type of negation>, <negation trigger>, <trigger positional info>,
               <negated concept>, <concept positional info>)


utterance:

   utterance('sentence/utterance identifier',"sentence/utterance",Positional Information).
      where sentence/utterance identifier consists of:
           MEDLINE ID.Location marker ti (title) or ab (abstract).one up number for each section.
      The Positional Information consists of StartPos/SpanLength
      where StartPos is a zero offset from within the original text.


phrase:

    phrase('identified noun phrase',[tagging information],Positional Information).
      The Positional Information consists of StartPos/SpanLength
      where StartPos is a zero offset from within the original text.


candidates:

    candidates(
       [ev(negated candidate score,'UMLS concept ID','UMLS concept','preferred name for concept - may or may not be different',
           [matched word or words lowercased that this candidate matches in the phrase - comma separated list],
           [semantic type(s) - comma separated list],
           [match map list - see below],candidate involved with head of phrase - yes or no,
           is this an overmatch - yes or no,
           [Source List - comma separated list],
           Positional Information
          )
       ]
    ).
    The Positional Information consists of StartPos/SpanLength
    where StartPos is a zero offset from within the original text.


mappings:

    mappings(
      [map(negated overall score for this mapping, 
            [ev(negated candidate score,'UMLS concept ID','UMLS concept','preferred name for concept - may or may not be different',
                 [matched word or words lowercased that this candidate matches in the phrase - comma separated list],
                 [semantic type(s) - comma separated list],
                 [match map list - see below],candidate involved with head of phrase - yes or no,
                 is this an overmatch - yes or no,
                 [Source List - comma separated list],
                 Positional Information
               )
            ]
          )
      ]
    ).
    The Positional Information consists of StartPos/SpanLength
    where StartPos is a zero offset from within the original text.


Match Map List:  The match map list consists of information on how the candidate concept matches up to words
                 in the original phrase and if there is any lexical variation in the matching. NOTE: The span
                 word counts don't include the following syntactic elements: aux, compl, conj, det, modal, prep,
                 pron, and punc which are ignored by MetaMap.  For example, in the phrase "of the drug therapy", 
                 the word "drug" would be counted as word #1 and the word "therapy" would be word #2.

    [[[phrase word span begin,phrase word span end],[concept word span begin,concept word span end],variation]]

    Example: This mapping shows word 1 of the phrase maps to word 1 of the concept with 0 lexical variation

      [[[1,1],[1,1],0]]
         ^^^ Match up of words in TEXT
               ^^^ Match up of words in STRING
                    ^ Variation

    Example: This shows word 2 of the phrase maps to word 1 of the concept with 0 lexical variation and word 3
             of the text maps to word 2 of the concept with 0 lexical variation.

       [[2,2],[1,1],0],[[3,3],[2,2],0]

args('/nfsvol/nls/bin/metamap08 -Z 08 -q MMO_Help.txt MMO_Help.mmo_08',[mm_data_year-'08',machine_output-[],infile-'MMO_Help.txt',outfile-'MMO_Help.mmo_08']).
aas(["TDM"*"therapeutic drug monitoring" *[1,3,5,27]*['C1720825']]).
neg_list([]).
utterance('9496399.ab.1',"OBJECTIVE: To define the total allowable variability that is clinically tolerated for certain drug assays performed by the therapeutic drug monitoring (TDM) laboratory at our institution.",148/188).
phrase('OBJECTIVE',[head([lexmatch([objective]),inputmatch(['OBJECTIVE']),tag(noun),tokens([objective])])],148/9).
candidates([ev(-1000,'C0018017','Objective','Goals',[objective],[inpr],[[1,1],[1,1],0]],yes,no,['AOD','MTH','MSH','PSY','SNOMEDCT','NCI','RCD'],[148/9])]).
mappings([map(-1000,[ev(-1000,'C0018017','Objective','Goals',objective],[inpr],[[[1,1],[1,1],0]],yes,no,['AOD','MTH','MSH','PSY','SNOMEDCT','NCI','RCD'],[148/9])]) ...]).
phrase(:,[punc([inputmatch([:]),tokens([])])]).
candidates([]).
mappings([]).
...

'EOU'.

Generated via "metamap08 -q" using 2008AA UMLS Knowledge Source data