... | ... | @@ -33,11 +33,11 @@ As such, the backbone scripts of the CC are devoted to processing every dataset |
|
|
|
|
|
There are also other important types of data:
|
|
|
|
|
|
| Name | Abbreviation | Description |
|
|
|
| --- | --- | --- |
|
|
|
| Nearest neighbors | `nneigh` | Nearest neighbors using a distance metric of choice, typically the cosine distance. |
|
|
|
| Clusters | `clust` | Clusters or data partitions of the data. Typically obtained with a simple clustering algorithm such as k-Means. |
|
|
|
| 2D projections | `proj` | 2D representations of the data, typically performed with t-SNE. |
|
|
|
|Name|Abbreviation|Description|
|
|
|
|---|---|---|
|
|
|
|Nearest neighbors|`nneigh`|Nearest neighbors using a distance metric of choice, typically the cosine distance.|
|
|
|
|Clusters|`clust`|Clusters or data partitions of the data. Typically obtained with a simple clustering algorithm such as k-Means.|
|
|
|
|2D projections|`proj`|2D representations of the data, typically performed with t-SNE.|
|
|
|
|
|
|
All data in the CC resource are stored as `HDF5` files. For further information, please refer to:
|
|
|
* [Signaturization](signaturization)
|
... | ... | @@ -56,7 +56,18 @@ For further information, please refer to: |
|
|
|
|
|
## Connectivity
|
|
|
|
|
|
XXXX
|
|
|
The CC contains both chemical and biological signatures. One of the most interesting features of biological signatures is that they can be *connected* to signatures of biology. This idea was first popularized by the Connectivity Map in the context of gene expression data.
|
|
|
|
|
|
In we generalize the notion of connectivity to other types of data and provide functionalities to connect small molecules to other biologically-annotated entities such as disease, cell lines or genetic perturbation experiments. Some examples would be:
|
|
|
|
|
|
* A molecule whose gene expression profile is *opposite* to a disease-characteristic gene expression signature.
|
|
|
* Simply, a molecule whose targets are *closeby* in protein-protein interaction networks.
|
|
|
* A molecule whose gene-sensitivity profiles resemble a basal gene expression of a cell line.
|
|
|
|
|
|
Finding the right connectivity strategy requires a deep understanding of the datasets and, with the CC, we simplify this by manually assigning connectivity functions to each dataset. Some datasets cannot be connected (such as chemical fingerprints), and some others may enjoy different connectivity functions.
|
|
|
|
|
|
For further information, please refer to:
|
|
|
* [Connectivity](connectivity)
|
|
|
|
|
|
## Customary drug discovery tasks
|
|
|
|
... | ... | |