... | ... | @@ -18,7 +18,7 @@ For further information, please refer to: |
|
|
* [Bioactivity data sources](data)
|
|
|
* [Compound libraries](libraries)
|
|
|
* [Resource meta-data](database)
|
|
|
* [Datasets and file-system organization](datasets)
|
|
|
* [Datasets specifications](datasets)
|
|
|
|
|
|
## Signaturization of the data
|
|
|
|
... | ... | @@ -37,18 +37,24 @@ There are other important types of data: |
|
|
|
|
|
|Name|Abbreviation|Description|
|
|
|
|---|---|---|
|
|
|
|Nearest neighbors|`nneigh`|Nearest neighbors using a distance metric of choice, typically the cosine distance.|
|
|
|
|Nearest neighbours|`nneigh`|Nearest neighbours using a distance metric of choice, typically the cosine distance.|
|
|
|
|Clusters|`clust`|Clusters or data partitions of the data. Typically obtained with a simple clustering algorithm such as k-Means.|
|
|
|
|2D projections|`proj`|2D representations of the data, typically performed with t-SNE.|
|
|
|
|
|
|
All data in the CC resource are stored as `HDF5` files. Details of the pipeline are given in the link below:
|
|
|
All data in the CC resource are stored as `HDF5` files. Measuring correlations between signatures belonging to different datasets yield a systematic assessment of the **small molecule similarity principle** (similar molecules have similar properties). Please follow the links below for more details:
|
|
|
|
|
|
* [Signaturization](signaturization)
|
|
|
* [Dataset correlations](dataset-correlation)
|
|
|
|
|
|
## Resource organization and structure
|
|
|
|
|
|
|
|
|
|
|
|
## Similarity searches in the web
|
|
|
|
|
|
Signature similarity searches can be performed at a high level using the CC web interface, available at [http://chemicalchecker.org](http://chemicalchecker.org). This resource is limited to the 25 *exemplar* datasets of the CC.
|
|
|
|
|
|
In the *Main* page, the user can query small molecules and obtain an overview of their location inside the CC. The user will learn the CC datasets where these molecules have data available, with gray 2D density plots indicating whether they are peripheral (low-density regions) or central (high-density regions). To have a better sense of the location of query molecules, landmark compounds from popular compound collections can be displayed. Deeper insights can be obtained by clicking on the *Explore* button for a molecule of choice.
|
|
|
In the *Main* page, the user can query small molecules and obtain an overview of their location inside the CC. The user will learn the CC datasets where these molecules have data available, with grey 2D density plots indicating whether they are peripheral (low-density regions) or central (high-density regions). To have a better sense of the location of query molecules, landmark compounds from popular compound collections can be displayed. Deeper insights can be obtained by clicking on the *Explore* button for a molecule of choice.
|
|
|
|
|
|
In the *Explore* page, we look for similar molecules in the CC resource and display them in a 25-column table, corresponding to all CC datasets. In CC datasets where the molecule *is* present, we measure similarities to other molecules in the dataset. If the molecule *is not* present, we infer similarities only to the molecules in the dataset.
|
|
|
|
... | ... | |