TOOLS

Restrict to MeSH Algorithm

Three basic approaches can be used to map a UMLS term to MeSH: through synonyms, through associated expressions, and through interconcept relationships. These approaches can be combined into a strategy that maximizes both specificity (selected MeSH terms are relevant) and sensitivity (the number of concepts that fail to be mapped to MeSH is small).

Strategy

The overall strategy can be understood as involving four steps. For a given UMLS concept, referred to as the source concept (SC), the path to the most closely related MeSH terms utilizes the following steps in this order:

  1. A MeSH term is a synonym of the SC. The two terms share the same identifier in the Metathesaurus (CUI). This MeSH term is selected and no further search is performed.

  2. An associated expression (ATX) provides a translation of the SC. The ATX can be understood as an expression tree in which leaves are elementary concepts and nodes logical operators or main heading to subheading relationship indicators (Figure 1). For mapping to MeSH headings, all MeSH leaves are selected, except those under a negative (NOT) operator. For example, the concept "Mumps pancreatitis" is mapped to the following MeSH terms: "Mumps" and "Pancreatitis" (main headings), "complication" and "etiology" (subheadings).

  3. The SC has hierarchically related concepts from which MeSH terms can be selected. This method is detailed under mapping algorithm.

  4. Finally, if no MeSH term can be found from the ancestors, the non-hierarchically related concepts (RO concepts) are explored. These concepts are related to the SC, but the exact nature of this relationship has not been explicitly given. Steps 1 to 3 are then applied to each RO concept linked to the SC. For example, "Choroidal detachment, NOS" is related to the MeSH term "Retinal Detachment".

Figure 1 - Expression tree ...
Figure 1 - Expression tree for the associated expression describing the concept: "Mumps pancreatitis". The main heading (MH) is qualified by (QB) a subheading (SH). The 2 MH/SH expressions are combined with a logical operator (AND).


Mapping algorithm

The mapping algorithm can be visualized as building a graph of ancestors, using the SC as the initial point, or seed, in building this graph. Then from this graph the closest MeSH terms are selected. Other concepts than the SC itself can be used to start populating the graph of ancestors. Children and narrower concepts of the SC can be used together as the seed of the graph when no MeSH terms can be found from the graph seeded by the SC. Failing to find a MeSH term by that method, a new graph is generated, using siblings of the SC.

In the event of using concepts other than the SC itself as the seed for the graph, the concepts chosen as the seed must be compatible in semantic type assignment. Compatibility is defined as the situation where at least one of the semantic types (STs) of the concept is identical to or has an "inverse_isa" relationship in the Semantic Network to at least one of the STs of SC. Siblings of the SC must have at least one ST in common to be used as seed of the graph.

Step 1: Building the graph of the ancestors of the SC

The ancestors of a given concept can be represented as a directed graph, ideally acyclic. Starting from the seed, its parents and broader concepts are added to the graph. Then, recursively, parents and broader concepts of all newly added concepts are added, until no new concept can be found.

To prevent non relevant concepts from being added to the graph, the semantic types of any concept added to the graph must be compatible with those of its direct descendant in the graph.

Step 2: Selecting MeSH terms from the ancestors

The graph of the ancestors is first restricted to MeSH terms (synonyms or from associated expressions). Then, to prevent MeSH terms to come only from one part of the seed, that is one particular child or sibling, selected MeSH terms must be common to a certain percentage of the seed concepts that have MeSH ancestors. Finally, MeSH candidates that are ancestors from each other are removed. The selected MeSH terms are thus insured to be semantically as close as possible to the SC. This measure of closeness relies on semantics rather than on the number of nodes between the two concepts, which is biased by the difference in granularity between components of the UMLS.

Figure 2 shows how "Neck" and "Veins" are selected from the ancestors of "Vein of neck, NOS". Although the MeSH term "Head" is at the same distance as "Veins", it is not selected because it is an ancestor of another selected term ("Neck").

 Figure 2 - Graph of the ancestors of 'Vein of neck, NOS'
Figure 2 - Graph of the ancestors of "Vein of neck, NOS". MeSH terms are double framed. The selected MeSH terms are "Neck" and "Veins". Arrows point to parents or broader concepts.