We have deleted all previous versions of the SemMedDB database (versions 31, 40, 41, and 42)
The databases downloadable here (from version 3.0 on) have been created using the new database schema (The schema and database information is available here).
Note that the database file named "WHOLEDB" contains the entire database except the ENTITY table, which the new database provides. The individual tables are also provided separately.
Users can download the entire database at once or the individual tables separately depending on their needs. The file names consist of four parts: database name _ R (or A) _ table name _ PubMed to date. The letter R represents that the database was generated with standard SemRep options, whereas A denotes that it was generated with the anaphora resolution option.
The new database schema differs from the previous one (versions 2X) in the following ways:
We no longer produce an annual release of the database of predications generated by SemRep using the sortal anaphora resolution. For sortal anaphora resolution in SemRep, see our BMC Bioinformatics paper.
The GENERIC_CONCEPT table has been updated in the June 30 2018 and all subsequent releases. Consequently, the SUBJECT_NOVELTY and OBJECT_NOVELTY columns of the PREDICATION table have been updated as follows: If the concept is not in the GENERIC_CONCEPT table, the value is set to 1; otherwise, it is set to 0.
Starting with version VER40, the PMID column in SENTENCE is dependent on the PMID in the CITATION table with a foreign key constraint. Therefore, all the PMIDs in the SENTENCE table have corresponding rows in the CITATION table, which has metadata PMID information.
Please note that all downloads in the tables below require a UMLS Terminology Services (UTS) account; to sign up for a UTS account, please click here.
Database name: semmedVER43_R (Processed using MEDLINE BASELINE 2020 + Covid-19 citations + PubMed Update Files through June 23, 2021 )TABLE NAME | Size | # Rows | Download link | sha1sum | md5sum |
---|---|---|---|---|---|
Entire Database | 20G | N/A | download | download | download |
CITATIONS | 153M | 32,708,196 | download | download | download |
ENTITY | 37G | 1,576,363,221  | download | download | download |
GENERIC_CONCEPT | 4.7K | 259 | download | download | download |
PREDICATION | 2.7G | 112,796,186 | download | download | download |
PREDICATION_AUX | 3.4G | 112,796,186 | download | download | download |
SENTENCE | 13G | 221,256,193 | download | download | download |
TABLE NAME | Size | # Rows | Download link | sha1sum | md5sum |
---|---|---|---|---|---|
CITATIONS | 152M | 32,470,549 | download | download | download |
ENTITY | 39G | 1,555,897,812  | download | download | download |
GENERIC_CONCEPT | 3.9K | 259 | download | download | download |
PREDICATION | 2.7G | 111,846,030 | download | download | download |
PREDICATION_AUX | 3.6G | 111,846,028 | download | download | download |
SENTENCE | 14G | 219,049,752 | download | download | download |
Copyright,
Privacy,
Accessibility,
Viewers and Players,
HHS Vulnerability Disclosure,
Freedom of Information Act, Contact Us Last Modified: June 24, 2021 Server: ii-public2 |