Data sources
The CC capitalizes on many data sources. The following is an extensive list of resources that are worth considering in current and future versions of the CC. Inside each CC level, we list the resources in alphabetical order. Please feel free to make requests by contacting or .
To identify new datasets of your interest, we recommend visiting the NAR Online Molecular Biology Database Collection and OmicsTools to stay up-to-date with data releases. Also, the LINCS Data Portal is of interest.
-
Already in the CC. -
Not yet in the CC.
Observational data resources
A
Chemistry
-
ChemoPy [paper code]: A small chemoinformatics library focused on physicochemical properties and some fingerprints. -
ClassyFire [paper code]: Automatic classification of chemical compounds. Ontology / taxonomy -
DeepChem [web code]: A powerful deep learning chemoinformatics library, containing a large number of featurizers. The RDKitDescriptors featurizer is also interesting. -
E3FP [paper code]: Hashed representations of 3D molecular structure. -
molBLOCKS [paper code]. Decompose small molecules into fragments (scaffolds). -
PyBioMed [paper code]. A number of physicochemical descriptors and the common fingerprints. Very similar to ChemoPy. It can also featurize sequence data (protein and DNA). -
RDKIT [paper code]. The standard library for chemoinformatics in python
. It calculates several fingerprints and also does 3D conformational sampling. -
Silicos-IT [web code]. It has implementations for the chemical beauty score (QED). -
Synthetic accessibility score [code]. A simple synthetic accessibility score. -
TF3P fingerprint [paper code]
B
Targets
-
BindingDB [paper data]. Chemical-protein binding data (patents). -
Cando [paper]. Docking signatures of compounds. 14k protein structures are used. -
ChEMBL [paper data]. Known drug targets. Drug metabolizing enzymes. Chemical-protein binding. Chemical-target-based assay activity. -
ChEMBL drug target predictor [blog code] -
Comparative Toxicogenomics Database (CTD) [paper data]. Chemical-gene interaction data, including regulatory interactions. -
Drug Repositioning Hub [paper data]. Drug targets and mechanisms of action, among others. -
DrugBank [paper data]. Known drug targets with pharmacological action. Drug metabolizing enzymes. Drug (off-)targets. -
Human Metabolome Database [paper data]. Metabolizing enzymes. -
PIDGIN [paper code]. Target prediction trained with negative data. -
PubChem Bioassays [paper data]. Repository of bioassay data. -
STITCH [paper data]. Integrative compound-protein interaction database. It has orthology mapping. -
Therapeutic Target Database [paper data]. Mode of action of drugs. -
Vogt et al. Future Science OA (2018) [paper]. Computationally derived compound profiling matrices. -
PROTAC-DB: an online database of PROTACs [link].
C
Networks
-
Chemical Entities of Biological Interest (ChEBI) [paper data]. Chemical ontology of biological roles. -
InWEB [paper data]. Integrative protein interaction database -
Kyoto Encyclopedia of Genes and Genomes (KEGG) [paper data]. A standard pathway database. -
MetaPhORS [paper data]. Orthology and paralogy relationships between genes. -
PathwayCommons [paper data]. Integrative interaction database, using pathway data. -
Reactome [paper data]. A standard pathway database. -
Recon [paper data]. Currently, we are using version 1 of Recon. -
STRING [paper data]. Protein interaction database. Includes physical and regulatory interactions. -
DoroThea [resource]. TF-drug interactions (only about 200 drugs). -
Small Molecule Pathway Database [paper data]. Well annotated small molecule pathways.
D
Cells
-
Broad Therapeutics morphology data [paper data]. 812-feature cell-painting assays. -
Cancer Therapeutic Response Portal (CTRP) [paper data]. Large cell line panel of drug sensitivity profiles. -
ChEMBL [paper data]. Literature cell-based assays. -
ChEMBL animal models [paper]. -
Clue.io morphology [data]. Cell painting assay. -
Genomics of Cell Sensitivity in Cancer (GDSC) [paper data]. Large cell line panel of drug sensitivity profiles. -
MOSAIC [paper data]. Chemical-genetic interaction data in yeast. -
Next-Generation L1000 Connectivity Map [paper data]. Transcriptomics perturbational data. -
NCI-60 [paper data]. Cell-line growth-inhibition data. -
DepMap [data]. Genetic screens in cancer. -
Non-antibiotic drugs on microbiome [paper data]. Over a thousand drugs screened against bacterial commensals. Only a few hundreds are active. -
Consensus LINCS L1000 signatures from ThinkLab [data]. Aggregated signatures. -
PharmacoDB [data]: Integrative pharmacogenomics resource across CCL panels. -
DeepCodex [data paper]: An autoencoder of LINCS data. -
Paccmann [data paper]: A cell sensitivity predictor based on chemical structure. -
Metabolomics profiling against E.coli [paper]. -
Multi-omics drug-RNA database (RNAactDrug) [data] -
PROGENy [github]
E
Clinics
-
ADReCS [data]. Side effects data. -
AEOLUS [paper data]. Side effect data. -
ChEMBL [paper data]. Drug indications. Therapeutic areas. -
Comparative Toxicogenomics Database (CTD) [paper data]. Compound-disease associations -
Drug Repositioning Hub [paper data]. Drug indications and therapeutic areas. -
DrugBank [paper data]. Drug-drug interactions. Therapeutic areas. -
DrugCentral [paper data]. Drug indications. Drug side effects. -
Kyoto Encyclopedia of Genes and Genomes (KEGG) [paper data]. Therapeutic areas. -
Ginas [data]. Software for registering substances. -
OFFSIDES [paper data]. Side effects extracted from hospital medical records. -
Side Effect Resource (SIDER) [paper data]. Drug side effects. -
RepoDB [paper data]. Drug indications. -
TWOSIDES [paper data]. Drug-drug interactions. -
Wikipedia [paper data]. Wikipedia pages. -
Tox21 [data]. A panel of 12 NR/SR pathways related to toxicity. -
ToxCast [data]. A panel of >600 features related to toxicity. -
eToxLab [paper data]: A framework for predictive toxicology. -
ProtTox-II [data]: Toxicity predictions of several kinds. It has an API that allows 250 queries a day. -
Effects of drugs on immune system [paper]