INFORMATION & RESOURCES

Publications

The following publications are under the Indexing Initiative project umbrella. Where applicable, we have indicated that the Dataset   or Test Collection   associated with a paper is also available. In this first table, we have included our picks for the preferred first read for several of the Research Areas. Each Research Area heading itself is a link to the papers associated with it. Some papers will also appear in multiple areas - for example, an MTI Machine Learning paper will also appear in the Machine Learning area.

Some papers may not be completely accessible while we transition the documents. If you have an accessibility issue with any of the papers or associated documents, please Contact Us and we will assist you as best we can.



Areas of Research

General Indexing Initiative
   
  (201 kb) Mining MEDLINE for problems associated with vitamin D. Dina Demner-Fushman, MD, PhD, James G. Mork, MSc, Alan R. Aronson, PhD. AMIA 2013
  2013 Vitamin D Dataset Available
  (1.9 mb) 2012 Report to the Board of Scientific Counselors
  (7.9 mb) 2012 PowerPoint Presentation to the Board of Scientific Counselors
  (12.8 mb) 2012 Powerpoint Presentation to the Board of Scientific Counselors (PDF version, no slide animations)
  (1 mb) Presentation for NIH-sponsored Workshop on Natural Language Processing, April 2012
  (2.1 mb) Presentation for NIH-sponsored Workshop on Natural Language Processing, April 2012 (PDF version, no slide animations)
  (395 kb) Automatic inference of indexing rules for MEDLINE, BMC Bioinformatics, 2008
  (100 kb) Automatic inference of indexing rules for MEDLINE, BioNLP 2008
  (1.5 mb) 2008 Report to the Board of Scientific Counselors
  (139 kb) The NLM Indexing Initiative, 2000
  (203 kb) 1999 Report to the Board of Scientific Counselors
MetaMap
   
MetaMap Fundamentals
 (201 kb) An overview of MetaMap: Historical perspective and recent advances, JAMIA 2010
 (12 kb) The Evolution of MetaMap, a Concept Search Program for Biomedical Text, AMIA 2009
  (723 kb)
  (1.1 mb) Presentation: The Evolution of MetaMap, a Concept Search Program for Biomedical Text, Presented November 16, 2009 at AMIA 2009
  (716 kb)
  (1.1 mb) Presentation: The Current State of MetaMap and MMTx Webcast, August 20, 2009
  (280 kb) MetaMap: Mapping Text to the UMLS Metathesaurus, July 2006
  (50 kb) MetaMap Options and Examples, September 2006
  (58 kb) Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program, 2001
2014 BioNLP MetaMap Tutorial
  (2.8 mb) MetaMap 2014 BioNLP Tutorial Slides (PowerPoint)
2010 AMIA MetaMap Tutorial
  (38 kb) MetaMap 2010 AMIA Tutorial Description (PDF)
  (3 kb) MetaMap 2010 AMIA Tutorial Description (Text)
  (3.2 mb) MetaMap 2010 AMIA Tutorial Slides (PowerPoint)
  (1.6 mb) MetaMap 2010 AMIA Tutorial Slides (PDF)
Technical Documents
  (231 kb) Linking UNASSIGNED and Unmapped Structured Abstract Labels to the Five Canonical NLM Categories
  (187 kb) Increasing UMLS Coverage and Reducing Ambiguity via Automated Creation of Synonymous Terms: First Steps toward Filling UMLS Synonymy Gaps, 2017 White Paper
Ambiguity in the UMLS Metathesaurus
  •   (17 kb) 2013 Higher Degree Metathesaurus Ambiguity Report - Excel
  •   (201 kb) 2013 Summary Report 
  •   (14 kb) 2012 Higher Degree Metathesaurus Ambiguity Report - Excel
  •   (29 kb) 2012 Summary Report
  •   (13 kb) 2011 Higher Degree Metathesaurus Ambiguity Report - Excel
  •   (28 kb) 2011 Summary Report
  •   (12 kb) 2010 Higher Degree Metathesaurus Ambiguity Report - Excel
  •   (182 kb) 2010 Summary Report
  •   (384 kb) 2009 Summary Report
  •   (144 kb) 2008 Summary Report
  •   (100 kb) 2007 Summary Report
  •   (201 kb) 2006 Summary Report
  •   (124 kb) 2005 Summary Report
  •   (72 kb) 2004 Summary Report
  •   (92 kb) 2003 Summary Report
  •   (289 kb) 2002 Summary Report
  •   (182 kb) 2001 Summary Report
  •   (112 kb) 2000 Summary Report
  •   (59 kb) 1999 Summary Report
Filtering the UMLS Metathesaurus for MetaMap
  •   (22 kb) 2012 Report
  •   (22 kb) 2011 Report
  •   (153 kb) 2010 Report
  •   (22 kb) 2009 Report
  •   (210 kb) 2008 Report
  •   (72 kb) 2007 Report
  •   (68 kb) 2006 Report
  •   (59 kb) 2005 Report
  •   (145 kb) 2004 Report
  •   (51 kb) 2003 Report
  •   (48 kb) 2002 Report
  •   (48 kb) 2001 Report
  •   (48 kb) 2000 Report
  •   (18 kb) 1999 Report
  (99 kb) MetaMap Technical Notes, 1996
  (27 kb) MetaMap Variant Generation, 2001
  (25 kb) MetaMap Candidate Retrieval, 2001
  (63 kb) MetaMap Evaluation, 2001
  (19 kb) MetaMap Mapping Algorithm, 2001
  (36 kb) MetaMap Update Procedures, 2000
  (41 kb) Comparison of LVG and MetaMap Functionality, 1994
MetaMap Indexing (MMI)
 (5 kb) MMI project description, 1997
  (32 kb) MMI ranking function, 1997
  (39 kb) A MEDLINE indexing experiment, 1997
PhraseX
  (163 kb) Finding UMLS Metathesaurus concepts in MEDLINE. Srinivasan, Suresh; Thomas C. Rindflesch; William T. Hole; and Alan R. Aronson. 2002. Isaac Kohane (ed.) Proceedings of the AMIA Annual Symposium, 727-31.
General Papers
  (14 kb) MetaMap in the CALBC Workshop II. J. G. Mork, L. Peters, A. Jimeno Yepes, A. R. Aronson, O. Bodenreider, CALBC Workshop II, 2011.
  (229 kb) Semantic processing for enhanced access to biomedical knowledge, Rindflesch, Thomas C., and Alan R. Aronson. 2002. Vipul Kashyap and Leon Shklar (eds.) Real World Semantic Web Applications, 157-72. IOS Press.
  (50 kb) Analysis of biomedical text for chemical names: A comparison of three methods, 1999
  (92 kb) Hierarchical concept indexing of full-text documents in the UMLS information sources map. Journal of the American Society for Information Science (1998). 50(6):514-23.
  (41 kb) Query expansion using the UMLS Metathesaurus. Proceedings of the 1997 AMIA Annual Fall Symposium, 485-89.
  (51 kb) Finding the findings: Identification of findings in medical literature using restricted natural language processing. Proceedings of the 1996 AMIA Annual Fall Symposium, 239-43.
  (46 kb) The effect of textual variation on concept based information retrieval, 1996
  (70 kb) Exploiting a large thesaurus for information retrieval. (1994) Proceedings of RIAO, 197-216.
 
Medical Text Indexer (MTI)
   
General Papers
  (2 mb) 12 Years On -- Is the NLM Medical Text Indexer Still Useful and Relevant? James Mork, Alan Aronson, and Dina Demner-Fushman. Journal of Biomedical Semantics 2017 8:8. DOI: 10.1186/s13326-017-0113-5.
External Link to BMC Journal of Biomedical Semantics 2017 8:8.
  NLM Medical Text Indexer Technical Report to the LHNCBC Board of Scientific Counselors April 2016.
  (164 kb) Using Learning-to-Rank to Enhance NLM Medical Text Indexer Results. Ilya Zavorin, James G. Mork, Dina Demner-Fushman. BioASQ 2016.
  (17 kb) Poster: Resolving Hierarchical Ambiguity in Indexing Recommendations. James G. Mork, Dina Demner-Fushman. AMIA 2016.
  (73 kb) Extracting Characteristics of the Study Subjects from Full-Text Articles. Dina Demner-Fushman, James G. Mork. AMIA 2015.
  2015 Subject Extraction Test Collection Available
  (708 kb) Feature engineering for MEDLINE citation categorization with MeSH. Antonio Jose Jimeno Yepes, Laura Plaza, Jorge Carrillo-de-Albornoz, James G Mork and Alan R Aronson. BMC Bioinformatics 2015, 16:113.
External Link to BMC Bioinformatics 2015, 16:113.
  (32 kb) Vocabulary Density Method for Customized Indexing of MEDLINE Journals (Poster). James G. Mork, Dina Demner-Fushman, Susan C. Schmidt, Alan R. Aronson. AMIA 2014.
  Actual Poster (111 kb)         2014 Vocabulary Density Study Datasets Available
  (95 kb) Recent Enhancements to the NLM Medical Text Indexer. James G. Mork, Dina Demner-Fushman, Susan C. Schmidt, Alan R. Aronson. BioASQ 2014.
  2014 Vocabulary Density Study Datasets Available
  (96 kb) 2014 BioASQ Poster on Recent Enhancements to the NLM Medical Text Indexer. James G. Mork, Dina Demner-Fushman, Susan C. Schmidt, Alan R. Aronson. BioASQ 2014.
  (324 kb) An expanded version of our The NLM Medical Text Indexer System for Indexing Biomedical Literature paper.
This expanded version paper incorporates all of the material from our shorter  2013 BioASQ Workshop paper and also contains unpublished material providing a more comprehensive description of the MTI system.
  (324 kb) The NLM Medical Text Indexer System for Indexing Biomedical Literature. J.G. Mork, A. Jimeno Yepes, A.R. Aronson. BioASQ 2013.
  (158 kb) From Indexing the Biomedical Literature to Coding Clinical Text: Experience with MTI and Machine Learning Approaches, BioNLP 2007
  (44 kb) Automatic Indexing of Specialized Documents: Using Generic vs. Domain-Specific Document Representations, BioNLP 2007
  (100 kb) Semi-Automatic Indexing of Full Text Biomedical Articles, AMIA 2005
  (50 kb) Evaluation of French and English MeSH Indexing Systems with a Parallel Corpus, AMIA 2005
  (54 kb) The NLM Indexing Initiative's Medical Text Indexer, MedInfo 2004
  (319 kb) Application of a Medical Text Indexer to an Online Dermatology Atlas, MedInfo 2004
  (2.1 mb) Automated and Semi-automated Indexing, Report to the Board of Regents 2002
  (130 kb) Automatic MeSH Term Assignment and Quality Assessment, AMIA 2001
Machine Learning
  (102 kb) Identifying Publication Types Using Machine Learning.. Antonio J. Jimeno Yepes, James G. Mork, Alan R. Aronson. BioASQ Workshop 2013.
  2013 BioASQ Publication Types Dataset Available
  (201 kb) Comparison and combination of several MeSH indexing approaches. Jimeno-Yepes, Antonio, Mork JG, Demner-Fushman D, Aronson AR. AMIA Annual Symposium Proceedings. Vol. 2013. American Medical Informatics Association, 2013.
  2013 MTI ML Dataset Available
  (1.2 mb) MeSH indexing based on automatically generated summaries Antonio J Jimeno-Yepes, Laura Plaza, James G Mork, Alan R Aronson, Alberto Diaz. BMC Bioinformatics 2013
  (228 kb) MEDLINE MeSH Indexing: Lessons Learned from Machine Learning and Future Directions. A. Jimeno Yepes, B. Wilkowski, J.G. Mork, D. Demner Fushman, and A.R. Aronson, ACM SIGHIT International Health Informatics Symposium, Miami, FL, USA, 2012.
  2012 MTI ML Dataset Available
  (988 kb) A One-Size-Fits-All Indexing Method Does Not Exist: Automatic Selection Based on Meta-Learning. Antonio Jimeno-Yepes, James G. Mork, Dina Demner-Fushman, Alan R. Aronson, JCSE, vol. 6, no. 2, pp.151-160, 2012.
  2011 MTI ML Dataset Available
  (78 kb) Automatic algorithm selection for MeSH Heading indexing based on meta-learning. A. Jimeno Yepes, J.G. Mork, D. Demner Fushman, and A.R. Aronson, International Symposium on Languages in Biology and Medicine, Singapore, December, 2011.
  2011 MTI ML Dataset Available
  (262 kb) A bottom-up approach to MEDLINE indexing recommendations. A. Jimeno Yepes, B. Wilkowski, J.G. Mork, E. van Lenten, D. Demner Fushman, A. R. Aronson, AMIA, Washington DC, 2011.
Subheading Attachment
  (39 kb) Fine-Grained Indexing of the Biomedical Literature: MeSH Subheading Attachment for a MEDLINE Indexing Tool, AMIA 2007
  (103 kb) Multiple Approaches to Fine-Grained Indexing of the Biomedical Literature, Proc Pacific Symposium on Biocomputing 2007
User-Centered Evaluation
  (1.1 mb) User-centered Evaluation of the MTI System, 2007
  (510 kb) A MEDLINE Indexing Experiment Using Terms Suggested by MTI, June 2002
 
Word Sense Disambiguation (WSD)
   
  (150 ) Knowledge-based and knowledge-lean methods combined in unsupervised word sense disambiguation. A. Jimeno Yepes, A.R. Aronson, ACM SIGHIT International Health Informatics Symposium, 2012.
  (490 kb) Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts, Laura Plaza, Antonio J Jimeno-Yepes, Alberto Diaz, Alan R Aronson, BMC Bioinformatics, August 2011
  (538 kb) Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation, Antonio J Jimeno-Yepes, Bridget T McInnes, Alan R Aronson, BMC Bioinformatics, June 2011
  (316 kb) Collocation analysis for UMLS knowledge-based word sense disambiguation, Antonio J Jimeno-Yepes, Bridget T McInnes, Alan R Aronson, BMC Bioinformatics, June 2011
  (202 kb) Preliminary results for Biomedical Word Sense Disambiguation based on Semantic Clustering. Martin-Wanton, R. Berlanga-Llavori, A. Jimeno Yepes, Proceedings of DEXA, Toulouse, France, 2011.
  (18 kb) Self-training and co-training in biomedical word sense disambiguation. A. Jimeno Yepes, A. R. Aronson, Proceedings of ACL BioNLP, Portland, USA, 2011.
  (18 kb) Knowledge-based biomedical word sense disambiguation: comparison of approaches, Antonio Jimeno-Yepes, Alan R. Aronson, BMC Bioinformatics, November 2010
  (51 kb) Query expansion for UMLS Metathesaurus disambiguation based on automatic corpus extraction. A. Jimeno Yepes, A. R. Aronson, Proceedings of ICMLA, Bethesda, USA, 2010.
  (87 kb) Improving an automatically extracted corpus for UMLS Metathesaurus word sense disambiguation, Antonio Jimeno-Yepes, Alan R. Aronson, BioSEPLN, 2010.
  (386 kb) Word sense disambiguation by selecting the best semantic type based on Journal Discriptor Indexing: preliminary experiment. Humphrey, SM; Rogers, WJ; Kilicoglu H; Demner-Fushman, D; Rindflesch, TC. J Am Soc Inf Sci Technol 2006 Jan;57(1):96-113.

Erratum in: J AM Soc Inf Sci, Mar. 2006, 57(4):726.     (20.6kb)
  (93 kb) Developing a Test Collection for Biomedical Word Sense Disambiguation, Marc Weeber, James G. Mork, Alan R. Aronson, AMIA 2001
  WSD Test Collection Available
  (76 kb) Automatic indexing by discipline and high-level categories: methodology and potential applications. Humphrey, SM; Rindflesh, TC; Aronson, AR. In: Soergel D, Srinivasan P, Kwasnik B, editors. Proceedings of the 11th ASIST SIG/CR Classification Research Workshop; 2000 Nov 12; Chicago. Silver Spring (MD): American Society for Information Science and Technology; 2000. p. 103-16.
  (551 kb) Automatic indexing of documents from journal descriptors: a preliminary investigation. Humphrey, SM. J Am Soc Inf Sci. 1999 Jun;50(8):661-74.
  (38 kb) Ambiguity resolution while mapping free text to the UMLS Metathesaurus. (1994) Proceedings of the 18th Annual Symposium on Computer Applications in Medical Care, 240-4.
 
Machine Learning
   
MTI ML Package
  (102 kb) Identifying Publication Types Using Machine Learning.. Antonio J. Jimeno Yepes, James G. Mork, Alan R. Aronson. BioASQ Workshop 2013.
  2013 BioASQ Publication Types Dataset Available
  (201 kb) Comparison and combination of several MeSH indexing approaches. Jimeno-Yepes, Antonio, Mork JG, Demner-Fushman D, Aronson AR. AMIA Annual Symposium Proceedings. Vol. 2013. American Medical Informatics Association, 2013.
  2013 MTI ML Dataset Available
  (228 kb) MeSH indexing: machine learning and lessons learned. A. Jimeno Yepes, B. Wilkowski, J.G. Mork, D. Demner Fushman, and A.R. Aronson, ACM SIGHIT International Health Informatics Symposium, Miami, FL, USA, 2012.
  2012 MTI ML Dataset Available
  (988 kb) A One-Size-Fits-All Indexing Method Does Not Exist: Automatic Selection Based on Meta-Learning. Antonio Jimeno-Yepes, James G. Mork, Dina Demner-Fushman, Alan R. Aronson, JCSE, vol. 6, no. 2, pp.151-160, 2012.
  2011 MTI ML Dataset Available
General Papers
  (1.2 mb) MeSH indexing based on automatically generated summaries Antonio J Jimeno-Yepes, Laura Plaza, James G Mork, Alan R Aronson, Alberto Diaz. BMC Bioinformatics 2013
  (352 kb) GeneRIF indexing: sentence selection based on machine learning. Antonio J Jimeno-Yepes, J Caitlin Sticco, James G Mork, Alan R Aronson. BMC Bioinformatics 2013
  (116 kb) Using the argumentative structure of scientific literature to improve information access. Antonio Jimeno Yepes, James G Mork, Alan R Aronson. BioNLP 2013
  (78 kb) Automatic algorithm selection for MeSH Heading indexing based on meta-learning. A. Jimeno Yepes, J.G. Mork, D. Demner Fushman, and A.R. Aronson, International Symposium on Languages in Biology and Medicine, Singapore, December, 2011.
  2011 MTI ML Dataset Available
  (262 kb) A bottom-up approach to MEDLINE indexing recommendations. A. Jimeno Yepes, B. Wilkowski, J.G. Mork, E. van Lenten, D. Demner Fushman, A. R. Aronson, AMIA, Washington DC, 2011.
 
Clinical Related
   
  (146 kb) A knowledge-based approach to medical records retrieval. D. Demner-Fushman, S. Abhyankar, A. Jimeno-Yepes, R. Loane, B. Rance, F. Lang, N. Ide, E. Apostolova, and A.R. Aronson, In Text Retrieval Conference (TREC 2011) Proceedings, pages 163-172.
  (148 kb) UMLS Content Views Appropriate for NLP Processing of the Biomedical Literature vs. Clinical Text (Full Paper), JBI 2010
  (25 kb) UMLS Content Views Appropriate for NLP Processing of the Biomedical Literature vs. Clinical Text (400 Word Abstract), AMIA 2009
  (508 kb) Extracting Rx Information from Clinical Narrative (Full Paper), 2010
  (53 kb) Extracting Rx Information from Clinical Narrative (Short Paper), JAMIA 2010
  (50 kb) Methodology for Creating UMLS Content Views Appropriate for Biomedical Natural Language Processing, AMIA 2008
  (158 kb) From Indexing the Biomedical Literature to Coding Clinical Text: Experience with MTI and Machine Learning Approaches, BioNLP 2007
 
Challenges Participation
   
BioASQ Challenge - 2013
  (324 kb) An expanded version of our The NLM Medical Text Indexer System for Indexing Biomedical Literature paper.
This expanded version paper incorporates all of the material from our shorter 2013 BioASQ Workshop paper and also contains unpublished material providing a more comprehensive description of the MTI system.
  (324 kb) The NLM Medical Text Indexer System for Indexing Biomedical Literature. J.G. Mork, A. Jimeno Yepes, A.R. Aronson. 2013.
TREC Medical Records Track Participation - 2011
  (146 kb) A knowledge-based approach to medical records retrieval. D. Demner-Fushman, S. Abhyankar, A. Jimeno-Yepes, R. Loane, B. Rance, F. Lang, N. Ide, E. Apostolova, and A.R. Aronson, In Text Retrieval Conference (TREC 2011) Proceedings, pages 163-172.
I2B2 - 2009
  (508 kb) Extracting Rx Information from Clinical Narrative (Full Paper), 2010
  (53 kb) Extracting Rx Information from Clinical Narrative (Short Paper), JAMIA 2010
TREC Genomics Track - 2003, 2004, 2005, 2006, 2007
  (112 kb) Combining Resources to Find Answers to Biomedical Questions, Demner-Fushman D, Humphrey SM, Ide NC, Loane RF, et al. Proc TREC 2007, 2005-14.
  (156 kb) Finding Relevant Passages in Scientific Articles: Fusion of Automatic Approaches vs. an Interactive Team Effort, Demner-Fushman D, Humphrey SM, Ide NC, Loane RF, Ruch P, Ruiz ME, Smith LH, Tanabe LK, Wilbur WJ, Aronson AR. Proc TREC 2006, 569-76.
  (296 kb) Fusion of knowledge-intensive and statistical approaches for retrieving and annotating textual genomics documents, Aronson AR, Demner-Fushman D, Humphrey SM, Lin J, Liu H, Ruch P, Ruiz ME, Smith LH, Tanabe LK, Wilbur WJ. Proc TREC 2005, 36-45.
  (275 kb) Knowledge-intensive and statistical approaches to the retrieval and annotation of genomics MEDLINE citations, Aronson AR, Demner D, Humphrey SM, Ide NC, Kim W, Liu H, Loane RR, Mork JG, Smith LH, Tanabe LK, Wilbur WJ, Xie N. Proc TREC 2004, 503-11.
  (357 kb) Methods for Accurate Retrieval of MEDLINE Citations in Functional Genomics, Kayaalp, Mehmet, Aronson, Alan R, Humphrey, Susanne M, Ide, Nicholas C, Tanabe, Lorraine K. Proc TREC 2003, 175-84.
 
Lister Hill Content View (LNCV)
   
  (148 kb) UMLS Content Views Appropriate for NLP Processing of the Biomedical Literature vs. Clinical Text (Full Paper), JBI 2010
  (25 kb) UMLS Content Views Appropriate for NLP Processing of the Biomedical Literature vs. Clinical Text (400 Word Abstract), AMIA 2009
  (50 kb) Methodology for Creating UMLS Content Views Appropriate for Biomedical Natural Language Processing, AMIA 2008