Data Download
We are releasing the following datasets from our big data platform. We are making our best efforts to mine all experimental data of previous coronavirus related studies. If you have other specific data need or have datasets to contribute, please contact us @here. We will update our datasets periodically to provide more information to help your research combat the disease.
Broad-spectrum antiviral agents
-
Based on in vitro viral infection assay results (EC50<=1uM) and clinical data (in vivo active), 462 molecules were found against at least 2 virus species.
Annotated preclinical studies on coronavirus
-
A collection of 1101 in vitro and in vivo records for 256 small molecules and biologics related to SARS/MERS
Unannotated preclinical studies on coronavirus targets
- A collection of 816 records for 479 molecules from various sources. We currently don't have the capability to annotate the sources or confirm the correctness of all datasets. This dataset may contains missing values and "dirty" data. Please use the data carefully and make your own effort to confirm the data source (journals, patents, websites) and extract useful (signal) information from the set. Some datasets use standardlized value PX=-log[M].
-
Some subsets extracted:
-
SMILES of 986 molecules tested for coronavirus, but we are not sure if they are active or inactive at enzymatic/cellular level
- SMILES
Previous clinical effort for SARS/MERS
-
A collection of clinical and preclinical drug pipelines related to SARS/MERS, but without clinical conclusion. Most of these pipelines are deactivated.
-
Downloadable datasets:
Literature Mining
- A comprehensive literature mining result kindly provided by Causaly, focusing on chemicals/drugs, genes and molecular mechanisms. This data includes 2090 relationships, 1229 aggregrate relationships, and 976 articles based on a search query of "Chemicals&Drugs,Genes,Cellular&Molecular Mechanisms [AFFECTING] [Genus:Coronavirus]" with the data source coming from MEDLINE and PubMedCentral. This search resulted in several Target Concepts including: sars coronavirus, middle east respiratory syndrome coronavirus, human coronavirus, etc.
- Full dataset
The following figure is the keyword relationship network:

-
The aggregate relationship data can be found in this dataset
-
This data describes source concept items and classifies them into different categories. Some relevant categories include: Amino Acid Peptide Protein, Biologically Active Substance, Chemical, Nucleic Acid, etc.
-
Literature articles and relationships data gives a list of 977 relevant articles and shows the evidence of the relationship from the original article.
-
Literature evidence data lists all relevant source concepts (biological substances, chemicals, etc) and its relation to an article. Overall, there are over 2000 relevant pieces of information relevant to coronavirus.
Compound Libraries for drug repurposing.
-
Please visit Drugbank to download the most recent data The latest release of DrugBank (version 5.1.5, released 2020-01-03) contains 13,490 drug entries including 2,636 approved small molecule drugs, 1,365 approved biologics (proteins, peptides, vaccines, and allergenics), 131 nutraceuticals and over 6,350 experimental (discovery-phase) drugs. Additionally, 5,191 non-redundant protein (i.e. drug target/enzyme/transporter/carrier) sequences are linked to these drug entries. Each entry contains more than 200 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data.
-
Selleck Libraries This zip archive contains drug libraries provided by selleck.cn to be used for drug repurposing.
Your feedback is highly appreciated.