TOOLS

NLM Medical Text Indexer (MTI)

The NLM Medical Text Indexer (MTI) combines human NLM Index Section expertise and Natural Language Processing technology to curate the biomedical literature more efficiently and consistently.

Batch MTI    Interactive MTI

MTI is the main product of the Indexing Initiative project and has been providing indexing recommendations based on the Medical Subject Headings (MeSH®) vocabulary since 2002. In 2011, NLM expanded MTI's role by designating it as the first-line indexer (MTIFL) for a few journals; today the MTIFL workflow includes over 350 journals and continues to increase. The close collaboration of the NLM Index Section, Lister Hill National Center for Biomedical Communications, and Office of Computer & Communications Systems continues to expand and refine the ability of MTI to provide assistance to the indexers.

MTI provides recommendations to the NLM Indexers, MTIFL or MTI First Line indexing partially automates the standard indexing process at the US National Library of Medicine. MTIFL provides the initial indexing for a citation. A human indexer then reviews this indexing and modifies it as required by adding any missed terms, removing any incorrect terms, and supplying Publication Types. The process of the human curation of MTIFL results is called MTIFL Completion. The following link will take you to a more detailed description of MTIFL and a current list of the journals in the MTIFL program. MTIFL Webpage

  (324 kb) The NLM Medical Text Indexer System for Indexing Biomedical Literature. J.G. Mork, A. Jimeno Yepes, A.R. Aronson. 2013
This expanded version paper incorporates all of the material from our shorter 2013 BioASQ Workshop paper and also contains unpublished material providing a more comprehensive description of the MTI system.

  (503 kb) MTI Processing Flow White Paper

There are several different ways to use MTI depending on your needs and how much data you have to process:

  • Interactively from our web pages (Now Publicly Available)
    Note: Interactive MTI no longer requires you to have an UTS Account! Interactive MTI is only intended for testing MTI with your data and to see how the various options work. The input text MUST contain only a single block of text to be processed and the input text MUST NOT exceed 10,000 characters in length. There is a program that verifies these limits and your request will be denied if either limit is exceeded.

    The MTI Interactive web page can be accessed using the following URL: MTI Interactive

  • MeSH on Demand
    MeSH on Demand is a new use of MTI added in 2014 in collaboration with the NLM MeSH Section. MeSH on Demand is a very simplified interface to the MTI system. The MeSH on Demand interface allows users to provide any text (e.g., MEDLINE citation or free text) as input and provides a list of relevant MeSH Descriptors and MeSH Supplementary Concepts that summarizes the input text and a list of the top ten citations related to the text in PubMed as a result. These results are very heavily filtered in favour of terms with high confidence.

    The MeSH on Demand web page can be accessed using the following URL: MeSH on Demand

  • Batch-Mode (Requires UTS Account)
    The MTI Batch-mode does require you to have an UTS Account to allow us to keep data private to each user. The batch-mode facilitates the processing of large sets of data with MTI. It is generally better to submit a single large file instead of several smaller files. You will be notified via email when the batch finishes and will have a reasonable amount of time to download the results file after the batch completes.

    The MTI Batch-Mode web page can be accessed using the following URL: MTI Batch

  • Download our Web API
    We also have a java-based API (Application Program Interface) that you can download and either use via our example java programs, or incorporate the java code into your existing program. Through the API you can submit requests to either the MTI Interactive or the MTI Batch-mode facilities instead of going through the web interfaces. The results are returned into the java program once the request has finished.

    The Web API web page can be accessed using the following URL: Web API

image is MTI NLM Medical Text Indexer Providing Indexing Assistance Since 2002 with three arrows along the bottom signifying data flow. Biomedical Literature > MTI/MTIFL > MeSH Suggestions


The image says "MTI NLM Medical Text Indexer Providing Indexing Assistance Since 2002" and then has three arrows along the bottom signifying data flow with the titles from left to right being "Biomedical Literature", "MTI/MTIFL", and "MeSH Suggestions."

Here is a link to our general publications webpage and specifically to the section containing all of our MTI related publications.
MTI Publications

image is Indexing Life Cycle starting with Biomedical Text > MTI/MTIFL > MeSH Suggestions > MEDLINE Indexing > back to Biomedical Text.  In the center is a circle with Enhancing MEDLINE Access

The Indexing Life Cyle diagram to the right illustrates how MTI/MTIFL fits into the MEDLINE indexing process and assists in enhancing access to the Biomedical Literature via MEDLINE.

The Biomedical Literature is first processed by MTI/MTIFL which provides a set of MeSH Suggestions to the MEDLINE Indexer who then indexes the journal literature providing a detailed summary of the topics in the document. The topics are described using some (or all) of the following:

  • MeSH Descriptors (including Check Tags)
  • MeSH Qualifiers
  • MeSH Supplementary Concept Records
  • MeSH Publication Types
  • Grant Support
  • DataBank Repositories

These components are then added into the MEDLINE citation completing the cycle to aid user searches and ultimately enhance access to the document itself.

image is MTI NLM Medical Text Indexer Providing Indexing Assistance Since 2002 with three arrows on bottom signifying data flow. Biomedical Literature > MTI/MTIFL > MeSH Suggestions
Current MTI Processing Flow

Current System (2013):

The NLM Medical Text Indexer (MTI) system is the primary product and focus of the Indexing Initiative. MTI produces both semi- and fully-automated indexing recommendations based on the Medical Subject Headings (MeSH®) controlled vocabulary and has been in use at NLM since 2002. MTI is in daily use to assist Indexers, Catalogers, and NLM's History of Medicine Division (HMD) in their indexing efforts.

Every weeknight MTI provides recommendations for approximately 4,000 new citations for Indexing and processes a mixed file of approximately 7,000 old and new records for both Cataloging and HMD. MTI was also used on a regular basis between 2002 and 2012 to provide fully-automated keyword indexing for NLM's Gateway meeting abstract collection, which was not manually indexed. In 2011, MTI was designated as the First-Line Indexer (MTIFL) for 14 journals (89 in 2013) because of its success with those publications. For MTIFL journals, MTI indexing is treated like human indexing and, of course, subject to the normal manual review process. MEDLINE® Indexers and Revisers consult MTI recommendations for approximately 58% of the articles they index, and the MTI recommendations are tightly integrated into the Cataloging and HMD system. Although mainly used in indexing efforts for processing MEDLINE citations consisting of identifier, title, and abstract, MTI is also capable of processing arbitrary biomedical text.

MTI provides an ordered list of MeSH Main Headings (MH), Subheadings (SH), and CheckTags (CT) as a final result. MHs are the main descriptors or headings from the MeSH Vocabulary (e.g., Lung). SHs are used to qualify the MHs (e.g., Lung/abnormalities means that the article is about the abnormalities associated with the Lung more than the Lung itself), and CTs are a special type of MHs that are required to be included for each article and cover species, sex, human age groups, historical periods, pregnancy, and various types of research support (e.g., Male).


Individual Component Descriptions:


link to Phrasex description link to MetaMap Indexing algorithm description link to Trigram algorithm description link to PubMed Related Citations algorithm description link to Restrict to MeSH process description link to Extract MeSH Descriptors process description link to Clustering and Ranking process description image is the processing flow for the original production MTI system which includes the Trigram method in addition to the current system diagram
Initial MTI Production System

Initial Production System (2002):

The MTI system consists of software for applying alternative methods of discovering MeSH headings for citation titles and abstracts and then combining them into an ordered list of recommended indexing terms. The top portion of the diagram consists of three paths, or methods, for creating a list of recommended indexing terms: MetaMap Indexing, Trigrams and PubMed Related Citations. The first two paths actually compute UMLS Metathesaurus® concepts which are passed to the Restrict to MeSH process. The results from each path are weighted and combined using the Clustering process. The system is highly parameterized not only by path weights but also by several parameters specific to the Restrict to MeSH and Clustering processes.

A prototype MTI system described below had two additional indexing methods which were removed because their results were subsumed by the three remaining methods.


image is the processing flow for the original prototype MTI system which includes INQUERY, Approximate Matching, and Trigram methods in addition to the current system diagram
Original MTI Prototype System

Original Indexing Initiative Prototype System: (~1996):

The Indexing Initiative Prototype System consists of software for applying alternative methods of discovering MeSH headings for citation titles and abstracts and then combining them into an ordered list of recommended indexing terms. The top portion of the diagram consists of five paths, or methods, for creating a list of recommended indexing terms: the INQUERY method, MetaMap Indexing, Barrier Words with Approximate Matching, Trigrams and PubMed Related Citations. The middle three paths actually compute UMLS Metathesaurus® concepts which are passed to the Restrict to MeSH process, and the outer two paths compute MeSH headings directly. The results from each path are weighted and combined using the Clustering process. The system is highly parameterized not only by path weights but also by several parameters specific to the Restrict to MeSH and Clustering processes.