Machine Output (prior to MetaMap 2008) Explained

The machine output is the result of the MetaMap/SKR processing executed on a citation or item of free text and provides a vast amount of raw information about the input data. Each data items is first broken down into it's respective sentences or utterances. Each utterance is then broken down into it's respective noun phrases. Then, each noun phrase is tagged using the MedPost/SKR tagger to identify nouns, verbs, prepositions, adjectives, punctuation, etc., and where the head or main idea of the noun phrase is located. Candidates are identified from the list of variants for each concept. Once we have all of this information, MetaMap can then map UMLS concepts to the noun phrase identifying the best possible coverage. All of this processing information is then printed in the machine output format you see below.


The machine output consists of five (5) main objects (always in this order):
   utterance   -- One per sentence
      phrase   -- As many as are needed to complete the sentence/utterance. Always matched up with a "candidates" and "mappings" objects.
      candidates -- Always matched up with a phrase object and may be empty "candidates([])." depending on the phrase.
      mappings -- Always matched up with a phrase object and may be empty "mappings([])." depending on the phrase.
   'EOU'.      -- Marks the end of the utterance.


Format:

utterance:

   utterance('sentence/utterance identifier',"sentence/utterance").
      where sentence/utterance identifier consists of:
           MEDLINE ID.Location marker ti (title) or ab (abstract).one up number for each section.


phrase:

    phrase('identified noun phrase',[tagging information]).


candidates:

    candidates(
       [ev(negated candidate score,'UMLS concept ID','UMLS concept','preferred name for concept - may or may not be different',
           [matched word or words lowercased that this candidate matches in the phrase - comma separated list],
           [semantic type(s) - comma separated list],
           [match map list - see below],candidate involved with head of phrase - yes or no,
                 is this an overmatch - yes or no
          )
       ]
    ).


mappings:

    mappings(
      [map(negated overall score for this mapping, 
            [ev(negated candidate score,'UMLS concept ID','UMLS concept','preferred name for concept - may or may not be different',
                 [matched word or words lowercased that this candidate matches in the phrase - comma separated list],
                 [semantic type(s) - comma separated list],
                 [match map list - see below],candidate involved with head of phrase - yes or no,
                 is this an overmatch - yes or no
               )
            ]
          )
      ]
    ).


Match Map List:  The match map list consists of information on how the candidate concept matches up to words
                 in the original phrase and if there is any lexical variation in the matching. NOTE: The span
                 word counts don't include the following syntactic elements: aux, compl, conj, det, modal, prep,
                 pron, and punc which are ignored by MetaMap.  For example, in the phrase "of the drug therapy", 
                 the word "drug" would be counted as word #1 and the word "therapy" would be word #2.

    [[[phrase word span begin,phrase word span end],[concept word span begin,concept word span end],variation]]

    Example: This mapping shows word 1 of the phrase maps to word 1 of the concept with 0 lexical variation

      [[[1,1],[1,1],0]]
         ^^^ Match up of words in TEXT
               ^^^ Match up of words in STRING
                    ^ Variation

    Example: This shows word 2 of the phrase maps to word 1 of the concept with 0 lexical variation and word 3
             of the text maps to word 2 of the concept with 0 lexical variation.

       [[2,2],[1,1],0],[[3,3],[2,2],0]

utterance('9496399.ab.1',"OBJECTIVE: To define the total allowable variability that is clinically tolerated for certain drug assays performed by the therapeutic drug monitoring (TDM) laboratory at our institution.").
phrase('OBJECTIVE',[head([lexmatch([objective]),inputmatch(['OBJECTIVE']),tag(adj),tokens([objective])])]).
candidates([ev(-1000,'C0018017','Objective','Goals',[objective],[inpr],[[[1,1],[1,1],0]],yes,no)]).
mappings([map(-1000,[ev(-1000,'C0018017','Objective','Goals',[objective],[inpr],[[[1,1],[1,1],0]],yes,no)])]).
phrase(:,[punc([inputmatch([:]),tokens([])])]).
candidates([]).
mappings([]).
...

'EOU'.