Three basic approaches can be used to map a UMLS term to MeSH: through synonyms, through associated expressions, and through interconcept relationships. These approaches can be combined into a strategy that maximizes both specificity (selected MeSH terms are relevant) and sensitivity (the number of concepts that fail to be mapped to MeSH is small).
Strategy
The overall strategy can be understood as involving four steps. For a given UMLS concept, referred to as the source concept (SC), the path to the most closely related MeSH terms utilizes the following steps in this order:
Mapping algorithm
The mapping algorithm can be visualized as building a graph of ancestors, using the SC as the initial point, or seed, in building this graph. Then from this graph the closest MeSH terms are selected. Other concepts than the SC itself can be used to start populating the graph of ancestors. Children and narrower concepts of the SC can be used together as the seed of the graph when no MeSH terms can be found from the graph seeded by the SC. Failing to find a MeSH term by that method, a new graph is generated, using siblings of the SC.
In the event of using concepts other than the SC itself as the seed for the graph, the concepts chosen as the seed must be compatible in semantic type assignment. Compatibility is defined as the situation where at least one of the semantic types (STs) of the concept is identical to or has an "inverse_isa" relationship in the Semantic Network to at least one of the STs of SC. Siblings of the SC must have at least one ST in common to be used as seed of the graph.
Step 1: Building the graph of the ancestors of the SC
The ancestors of a given concept can be represented as a directed graph, ideally acyclic. Starting from the seed, its parents and broader concepts are added to the graph. Then, recursively, parents and broader concepts of all newly added concepts are added, until no new concept can be found.
To prevent non relevant concepts from being added to the graph, the semantic types of any concept added to the graph must be compatible with those of its direct descendant in the graph.
Step 2: Selecting MeSH terms from the ancestors
The graph of the ancestors is first restricted to MeSH terms (synonyms or from associated expressions). Then, to prevent MeSH terms to come only from one part of the seed, that is one particular child or sibling, selected MeSH terms must be common to a certain percentage of the seed concepts that have MeSH ancestors. Finally, MeSH candidates that are ancestors from each other are removed. The selected MeSH terms are thus insured to be semantically as close as possible to the SC. This measure of closeness relies on semantics rather than on the number of nodes between the two concepts, which is biased by the difference in granularity between components of the UMLS.
Figure 2 shows how "Neck" and "Veins" are selected from the ancestors of "Vein of neck, NOS". Although the MeSH term "Head" is at the same distance as "Veins", it is not selected because it is an ancestor of another selected term ("Neck").