... | ... | @@ -4,7 +4,7 @@ The Chemical Checker (CC) is a data-driven resource of small molecule bioactivit |
|
|
|
|
|
The CC resource is ever-growing and maintained by the [Structural Bioinformatics & Network Biology Laboratory](http://sbnb.irbbarcelona.org) at the Institute for Research in Biomedicine ([IRB Barcelona](http://irbbarcelona.org)). Should you have any questions, please send an email to [miquel.duran@irbbarcelona.org](miquel.duran@irbbarcelona.org) or [patrick.aloy@irbbarcelona.org](patrick.aloy@irbbarcelona.org).
|
|
|
|
|
|
This project first presented to the scientific community in the following paper: [Duran-Frigola et al., *Extending the small molecule similarity principle to all levels of biology* (2019)](https://www.dropbox.com/s/x2rqszfdfpqdqdy/duranfrigola_etal_ms_current.pdf?dl=0), and has since produced a number of [related publications](publications).
|
|
|
This project was first presented to the scientific community in the following paper: [Duran-Frigola et al., *Extending the small molecule similarity principle to all levels of biology* (2019)](https://www.dropbox.com/s/x2rqszfdfpqdqdy/duranfrigola_etal_ms_current.pdf?dl=0), and has since produced a number of [related publications](publications).
|
|
|
|
|
|
## Source data and datasets
|
|
|
|
... | ... | @@ -38,8 +38,8 @@ There are other important types of data: |
|
|
|Name|Abbreviation|Description|
|
|
|
|---|---|---|
|
|
|
|Similarity vectors|`sims*`|Full similarity vectors, molecule-specific. Indexed (binned) by p-value and designed to occupy little disk space.|
|
|
|
|Nearest neighbors|`neig*`|Nearest neighbors using a distance metric of choice, typically the cosine distance.|
|
|
|
|Predicted nearest neighbors|`nprd*`|Predicted nearest neighbors by exploiting correlations between datasets.|
|
|
|
|Nearest neighbours |`neig*`|Nearest neighbours using a distance metric of choice, typically the cosine distance.|
|
|
|
|Predicted nearest neighbours |`nprd*`|Predicted nearest neighbours by exploiting correlations between datasets.|
|
|
|
|Clusters|`clus*`|Clusters or data partitions of the data. Typically obtained with a simple clustering algorithm such as k-Means.|
|
|
|
|2D projections|`proj*`|2D representations of the data, typically performed with t-SNE.|
|
|
|
`*` denotes correspondence with the signature type `0`-`3`.
|
... | ... | @@ -60,8 +60,7 @@ In the **production phase**, signatures are generated. The scripts in this part |
|
|
|
|
|
* The bulk update of data performed every six months.
|
|
|
* Sporadic additions of datasets.
|
|
|
* Any external data that may be mapped (compounds) or [connected](#connectivity) (other
|
|
|
biological entities) to the existing resource, which in turn produces new signatures for the queries.
|
|
|
* Any external data that may be mapped (compounds) or [connected](#connectivity) (other biological entities) to the existing resource, which in turn produces new signatures for the queries.
|
|
|
|
|
|
The data generated by the production phase can be easily accessed with a simple python library.
|
|
|
|
... | ... | |