SemMedDB Database Details - Version 2.0
In this page, we provide detailed information about the SemMedDB schema. Database tables, their fields as well as the relationships between the tables are explained. Examples for each table are provided.
Tables:
Name: CITATIONS table
This table contains relevant metadata for each PubMed citation and has the following data fields:
- PMID:PubMed identifier of the citation
- ISSN: ISSN identifier of the journal or the proceedings where the article was published
- DA: Creation date for the citation
- DCOM: Completion date for the citation
- DP: Publication date for the citation
PMID | ISSN | DA | DCOM | DP |
19851774 | 1432-203X | 2010 01 21 | 2010 03 18 | 2009 Dec |
Name: CONCEPT table
This table contains information about the UMLS Metathesaurus concepts as well as EntrezGene terms used by SemRep. In the current version, UMLS Metathesaurus concepts are from the UMLS 2006AA release.
Data fields in this table are as follows:
- CONCEPT_ID: Auto generated primary key for each concept
- CUI: Concept identifier (CUI) of the concept, corresponds to UMLS CUI if it is from UMLS, and the gene identifier from EntrezGene if it is from EntrezGene
- TYPE: "META" if it is a UMLS Metathesaurus concept, "ENTREZ" if it is an EntrezGene symbol
- PREFERRED_NAME: UMLS Metathesaurus preferred name for the concept, or the official gene name from EntrezGene
- GHR: Corresponding Genetics Home Reference (GHR) identifier, if the concept is a gene or a disorder
- OMIM: Corresponding Online Mendelian Inheritance in Men (OMIM) identifier, if the concept is a gene or a disorder
CONCEPT_ID | CUI | TYPE | PREFERRED_NAME | GHR | OMIM |
1844 | C0003873 | META | Rheumatoid Arthritis | NULL | 180300:604302 |
1276072 | 215 | ENTREZ | ABCD1 | NULL | NULL |
Name: CONCEPT_SEMTYPE table
This table links concepts in the CONCEPT table with their semantic types. A concept may have multiple semantic types. There is a 1-to-many relation between the CONCEPT and CONCEPT_SEMTYPE tables. The data fields are as follows:
- CONCEPT_SEMTYPE_ID: Auto-generated primary key for each concept-semantic type pair
- CONCEPT_ID:Foreign key to the CONCEPT table
- SEMTYPE: UMLS semantic type abbreviation, such as aapp (Amino Acid, Protein, or Peptide) or gngm (Gene or Genome). For the list of all abbreviations, see SRDEF.
- NOVEL: Identifies whether the concept is novel or not. Novelty of a concept-semantic type pair is computed based on its distance from root of the UMLS Metathesaurus hierarchy and has been used in automatic summarization approaches based on SemRep [1].
CONCEPT_SEMTYPE_ID | CONCEPT_ID | SEMTYPE | NOVEL |
2628 | 1844 | dsyn | Y |
1481123 | 1276072 | gngm | Y |
Name: PREDICATION table
Each record in this table identifies a unique predication. The data fields are as follows:
- PREDICATION_ID: Auto-generated primary key for each unique predication
- PREDICATE: The string representation of each predicate (for example TREATS, PROCESS_OF)
- TYPE: Can be ignored
PREDICATION_ID | PREDICATE | TYPE |
87120 | PROCESS_OF | semrep |
Name: PREDICATION_ARGUMENT table
Each record in this table links a unique predication with one of its arguments. There is a 1-to-many relation between the PREDICATION and PREDICATION_ARGUMENT tables.
The data fields are as follows:
- PREDICATION_ARGUMENT_ID: Auto-generated primary key for each predication argument
- PREDICATION_ID: Foreign key to the PREDICATION table
- CONCEPT_SEMTYPE_ID: Foreign key to the CONCEPT_SEMTYPE table
- TYPE: 'S' for subject argument and 'O' for object argument
PREDICATION_ARGUMENT_ID | PREDICATION_ID | CONCEPT_SEMTYPE_ID | TYPE |
176604 | 87120 | 2628 | S |
176605 | 87120 | 21437 | O |
Name: SENTENCE table
This table contains information about individual sentences from PubMed citations and includes the following data fields:
- SENTENCE_ID: Auto-generated primary key for each sentence
- PMID: The PubMed identifier of the citation that the sentence belongs to
- TYPE: 'ti' for the title of citation and 'ab' for the abstract
- NUMBER: The location of the sentence within the title or the abstract
- SENTENCE: The actual string of this sentence
SENTENCE_ID | PMID | TYPE | NUMBER | SENTENCE |
113049226 | 19855969 | ti | 1 | Rheumatoid arthritis in patient with homozygous haemoglobin C disease. |
Name: SENTENCE_PREDICATION table
This table links a sentence with the predications extracted from it. There is a 1-to-many relation between the SENTENCE and SENTENCE_PREDICATION tables.
It includes the following data fields:
- SENTENCE_PREDICATION_ID: Auto-generated primary key for each sentence-predication pair
- SENTENCE_ID: Foreign key to the SENTENCE table
- PREDICATION_ID: Foreign key to the PREDICATION table
- PREDICATION_NUMBER: The number of times the predication is extracted from the sentence. If there are two instances of the same unique predication in a sentence, the value is 2.
- CURR_TIMESTAMP: The timestamp for the record
The rest of the fields in SENTENCE_PREDICATION table provide mention-level information for the elements of the predication (predicate, subject, and object).
- INDICATOR_TYPE:The type of the predicate, such as VERB for verbal predicates, and NOM for nominalizations and other argument-taking nouns. For a full list of indicator types, see the Appendix in [2]
- PREDICATE_START_INDEX: The first character position of the predicate mention
- PREDICATE_END_INDEX: The last character position of the predicate mention
- SUBJECT_TEXT: The subject mention in the sentence
- SUBJECT_DIST: The distance of the subject mention (counted in noun phrases) from the predicate mention (0 for certain indicator types, such as NOM)
- SUBJECT_MAXDIST: The number of potential arguments (in noun phrases) from the predicate mention in the direction of the subject mention (0 for certain indicator types, such as NOM)
- SUBJECT_START_INDEX: First character position of the subject mention in the sentence
- SUBJECT_END_INDEX: Last character position of the subject mention in the sentence
- SUBJECT_SCORE: The confidence score of the mapping between the subject mention and the subject concept
- OBJECT_*: The fields representing information about the object, in the same way the SUBJECT_* fields do for the subject
SENTENCE_PREDICATION_ID | SENTENCE_ID | PREDICATION_ID | PREDICATION_NUMBER | ... | CURR_TIMESTAMP |
57109318 | 113049226 | 87120 | 1 | ... | 2011-11-17 19:58:38.0
|
Name: PREDICATION_AGGREGATE table
This table is a convenience table that joins the salient information from all the above tables for efficient access.
The entity-relationship diagram of SemMedDB is shown below graphically:
- Fiszman M., et al. (2004). Abstraction summarization for managing the biomedical research literature. Proceedings HLT-NAACL Workshop on Computational Lexical Semantics. 76-83.
- Kilicoglu, H., et al. (2011). Constructing a semantic predication gold standard from the biomedical literature. BMC Bioinformatics, 12(486).