INQUERY Algorithm

Indexing Initiative
A. Noun Phrase Extraction:

Text is parsed into noun phrases using Phrasex. The design is modular, so that we can replace the Phrasex with other noun phrase extractors in future development.

B. Layered Searching Strategy:

1. The noun phrases are used as query string to Inquery database, (see description in C below) the output of a query is a ranked list of MeSH headings.

2. The layered search method searches the first field, if no record is retrieved, then search the next field, and so on.

3. The fields for searching are listed in sequence as follows:
    a. TITLE
    b. Synonyms
    c. UMLS Related Concepts
    d. UMLS Co-occurring Concepts
    e. PubMed Citations

4. The ranking scores returned by the Inquery database are used as a part of the computation for Mapping Score. (Currently we are not doing any semantic aggregation of the ranked MeSH headings, like we did with our earlier standalone experiment. We believe that such aggregation will improve performance)

C. Inquery Database:

1. MeSH Main Headings as Titles for the records.

2. Each record includes the following fields:
    TITLE: MeSH Main Heading
    CUI: Concept Unique Identifier
    SYN: Entry terms from MeSH, plus synonyms from UMLS
    STY: UMLS Semantic Type
    MN: MeSH Tree Number
    REL: UMLS Related Concepts, including broader, narrower, and other related concepts.
    COT: UMLS Co-occurring Concepts, top 50 terms are taken
    PMCIT: 10 top PubMed citations with title, abstract, and MeSH headings, from query using the MeSH Heading as Major Topic.

Last Modified: May 30, 2019 ii-public2
     Contact Us    |   Contact Us (SemRep)    |   Copyright    |   Privacy    |   Accessibility    |   Freedom of Information Act    |    Get Acrobat Reader button
Links to Our Sites:
Indexing Initiative (II)
Investigating computer-assisted and fully automatic methodologies for indexing biomedical text. Includes the NLM Medical Text Indexer (MTI).
Semantic Knowledge Representation (SKR)
Develop programs to provide usable semantic representation of biomedical text. Includes the SemRep program.
Program to map biomedical text to the UMLS Metathesaurus. Information and downloadable material for the MetaMap program.
Word Sense Disambiguation (WSD)
Test collection of manually curated MetaMap ambiguity resolution in support of word sense disambiguation research.
MEDLINE Baseline Repository (MBR)
Static MEDLINE® Baselines for use in research involving biomedical citations. Allows for query searches and test collection creation.
Structured Abstracts (SA)
Information about NLM's research on Structured Abstracts in the MEDLINE® Baselines.
Lister Hill Center Homepage Link - Image of Lister Hill Center Lister Hill National Center for Biomedical Communications   NLM Homepage Link - NLM Logo U.S. National Library of Medicine   NIH Homepage Link - NIH Logo National Institutes of Health
DHHS Homepage Link - DHHS Logo Department of Health and Human Services