|
|
# The Chemical Checker repository
|
|
|
|
|
|
The Chemical Checker (CC) is a data-driven resource of small molecule bioactivity data. The main goal of the CC is to express data in a format that can be used off-the-shelf in daily computational drug discovery tasks. The resource is organized in **5 levels** of increasing complexity, ranging from the chemical properties of the compounds to their clinical outcomes. In between, we consider targets, off-targets, perturbed biological networks and several cell-based assays, including gene expression, growth inhibition, and morphological profiles.
|
|
|
The [Chemical Checker](mduranf.cargo.site) (CC) is a data-driven resource of small molecule bioactivity data. The main goal of the CC is to express data in a format that can be used off-the-shelf in daily computational drug discovery tasks. The resource is organized in **5 levels** of increasing complexity, ranging from the chemical properties of the compounds to their clinical outcomes. In between, we consider targets, off-targets, perturbed biological networks and several cell-based assays, including gene expression, growth inhibition, and morphological profiles.
|
|
|
|
|
|
The CC is maintained by the [Structural Bioinformatics & Network Biology Laboratory](http://sbnb.irbbarcelona.org) at the Institute for Research in Biomedicine ([IRB Barcelona](http://irbbarcelona.org)). Should you have any questions, please send an email to [miquel.duran@irbbarcelona.org](miquel.duran@irbbarcelona.org) or [patrick.aloy@irbbarcelona.org](patrick.aloy@irbbarcelona.org).
|
|
|
|
... | ... | @@ -22,7 +22,7 @@ For further information, please refer to: |
|
|
|
|
|
## Signaturization of the data
|
|
|
|
|
|
The main task of the CC is to convert raw data into formats that are suitable inputs for machine-learning tools such as `sklearn`, `keras` or `tensorflow`.
|
|
|
The main task of the CC is to convert raw data into formats that are suitable inputs for machine-learning toolkits such as [scikit-learn](https://scikit-learn.org/).
|
|
|
|
|
|
Accordingly, the backbone pipeline of the CC is devoted to processing every dataset and converting it to a series of formats that may be readily useful for machine learning. The main assets of the CC are the so-called *CC signatures*:
|
|
|
|
... | ... | @@ -49,7 +49,9 @@ All data in the CC resource are stored as `HDF5` files and can be accessed with |
|
|
* [Signaturization](signaturization)
|
|
|
* [Dataset correlations](dataset-correlation)
|
|
|
|
|
|
## Resource structure, pipeline and access to the data
|
|
|
Public code for data signaturization can be found here: [](https://github.com/cicciobyte/signaturizer)
|
|
|
|
|
|
## Code structure, pipeline and access to data
|
|
|
|
|
|
This repository is divided into two parts:
|
|
|
|
... | ... | |