Indexing Initiative

link to https://www.nih.gov - image is NIH.gov logo link to https://www.nlm.nih.gov - image spells out U.S. National Library of Medicine
 Home > SemMedDB Database Download
SemMedDB Database Download

The databases downloadable here (from version 3.0 on) have been created using the new database schema (The schema and database information is available here).

Note that the database file named "WHOLEDB" contains the entire database except the ENTITY table, which the new database provides. The individual tables are also provided separately. Users can download the entire database at once or the individual tables separately depending on their needs. The file names consist of four parts: database name _ R (or A) _ table name _ PubMed to date. The letter R represents that the database was generated with standard SemRep options, whereas A denotes that it was generated with the anaphora resolution option.

The new database schema differs from the previous one (versions 2X) in the following ways:

  1. We simplified the schema significantly by removing the CONCEPT, CONCEPT_SEMTYPE, PREDICATION_ARGUMENT, and SENTENCE_PREDICATION tables. The relevant contents of these tables can still be derived from PREDICATION if needed.
  2. A GENERIC_CONCEPT table has been added to the schema. This table contains generic concepts, as indicated by SemRep. The concepts that are not in this table are considered novel.
We no longer produce an annual release of the database of predications generated by SemRep using the sortal anaphora resolution. For sortal anaphora resolution in SemRep, see our BMC Bioinformatics paper.

For semmedVER31_R, we have generated anew the entire database, to resolve an issue reported in the previous version in which some predications were found to have the wrong combination of subject name and subject semantic type or object name and object semantic type. semmedVER31_R has also been enhanced with two new columns, SECTION_HEADER and NORMALIZED_SECTION_HEADER, in the SENTENCE table. These columns are used to store the section information of structured abstracts if the original citations provide that information.

The GENERIC_CONCEPT table has been updated in the June 30 2018 and all subsequent releases. Consequently, the SUBJECT_NOVELTY and OBJECT_NOVELTY columns of the PREDICATION table have been updated as follows: If the concept is not in the GENERIC_CONCEPT table, the value is set to 1; otherwise, it is set to 0.

Starting with version VER40, the PMID column in SENTENCE is dependent on the PMID in the CITATION table with a foreign key constraint. Therefore, all the PMIDs in the SENTENCE table have corresponding rows in the CITATION table, which has metadata PMID information.


Database name: semmedVER43_R (Processed using MEDLINE BASELINE 2020 + Covid-19 citations + PubMed Update Files through August 27, 2020 ) New Item

Semrep version: Regular semrep version 1.8
Number of citations processed: 31,432,482
Number of predications: 107,645,218
* This database was obtained from SemRep results with the anaphora feature turned off.

TABLE NAME Size # Rows Download linksha1summd5sum
Entire Database 19G  N/A  download download download
CITATIONS 147M  31,432,482  download download download
ENTITY 35G  1,462,840,846 download download download
GENERIC_CONCEPT 4.7K  259  download download download
PREDICATION 2.6G  107,645,218  download download download
PREDICATION_AUX 3.3G  107,645,218  download download download
SENTENCE 13G  209,494,346  download download download


Database name: semmedVER42_R (Processed using MEDLINE BASELINE 2020 + Covid-19 citations)
semmedVER42_R is a superset of semmedVER41_R that includes in addition data derived from all PubMed citations obtained using the query

( covid OR sars-cov-2 OR wuhan OR coronavirus OR 2019-ncov OR sars ) AND 2019:2020[dp]
The additional Covid citations were processed with SemRep and the 2020AA UMLS data, which includes Covid-19 terms in CUIS C5203670, C5203671, C5203672, C5203673, C5203674, C5203675, and C5203676.
Semrep version: Regular semrep version 1.8
Number of citations processed: 30,454,210
Number of predications: 103,284,300
* This database was obtained from SemRep results with the anaphora feature turned off.

TABLE NAME Size # Rows Download linksha1summd5sum
Entire Database 18G  N/A  download download download
CITATIONS 142M  30,439,312  download download download
ENTITY 33G  1,369,836,547 download download download
GENERIC_CONCEPT 4.7K  259  download download download
PREDICATION 2.5G  103,284,300  download download download
PREDICATION_AUX 3.1G  103,208,390  download download download
SENTENCE 12G  199,843,876  download download download


Database name: semmedVER41_R (Processed using MEDLINE BASELINE 2020)
Semrep version: Regular semrep version 1.8
Number of citations processed: 30,426,087
Number of predications: 103,156,767
* This database was obtained from SemRep results with the anaphora feature turned off.

TABLE NAME Size # Rows Download linksha1summd5sum
Entire Database 18G  N/A  download download download
CITATIONS 141M  30,426,087  download download download
ENTITY 33G  1,369,836,668  download download download
GENERIC_CONCEPT 4.7K  259  download download download
PREDICATION 2.5G  103,156,767  download download download
PREDICATION_AUX 3.1G  103,156,767  download download download
SENTENCE 12G  199,751,911  download download download



Database name: semmedVER40_R (Processed using MEDLINE BASELINE 2019)
Semrep version: Regular semrep version 1.8
Number of citations processed: 29,115,337
Number of predications: 97,972,561
* This database was obtained from SemRep results with the anaphora feature turned off.

TABLE NAME Size Download linksha1summd5sum
Entire Database 17.6G download download download
CITATIONS 140M download download download
ENTITY 33G download download download
GENERIC_CONCEPT 4.5K download download download
METAINFO 802 download download download
PREDICATION 2.51G download download download
PREDICATION_AUX 3.15G download download download
SENTENCE 11.8G download download download





Database name: semmedVER31_R (Processed up to June 30 2018)

Semrep version: Regular semrep version 1.7
Number of citations processed: 28,429,379
Number of predications: 96,363,098
* This database was obtained from SemRep results with the anaphora feature turned off.

TABLE NAME START DATE END DATE Size Download linksha1summd5sum
Entire Database 1865 June 30 2018 17.7G download download download
CITATIONS 1865 June 30 2018 140M download download download
ENTITY 1865 June 30 2018 36G download download download
GENERIC_CONCEPT N/A N/A 4.5K download download download
METAINFO N/A N/A 802 download download download
PREDICATION 1865 June 30 2018 2.47G download download download
PREDICATION_AUX 1865 June 30 2018 3.12G download download download
SENTENCE 1865 June 30 2018 11.9G download download download




Database name: semmedVER31_R (Processed up to December 31 2017)
Semrep version: Regular semrep version 1.7
Number of citations processed: 27,851,419
Number of predications: 93,876,632
* This database was obtained from SemRep results with the anaphora feature turned off.

TABLE NAME START DATE END DATE Size Download linksha1summd5sum
Entire Database 1865 Dec 31 2017 17.1G download download download
CITATIONS 1865 Dec 31 2017 136M download download download
ENTITY 1865 Dec 31 2017 33G download download download
GENERIC_CONCEPT N/A N/A 129K download download download
METAINFO N/A N/A 778 download download download
PREDICATION 1865 Dec 31 2017 2.41G download download download
PREDICATION_AUX 1865 Dec 31 2016 3.05G download download download
SENTENCE 1865 Dec 31 2017 11.5G download download download





 
Copyright, Privacy, Accessibility, Viewers and Players,
Freedom of Information Act, Contact Us
Last Modified: September 22, 2020    Server: ii-public1
link to https://www.usa.gov/ - image is USA.gov logo link to https://www.hhs.gov - image is HHS.gov logo link to https://www.nih.gov - image is NIH.gov logo link to https://www.nlm.nih.gov - image spells out U.S. National Library of Medicine