|
# Similarity and connectivity
|
|
# Similarity and connectivity
|
|
|
|
|
|
:construction: This page is under construction.
|
|
The CC is based upon the similarity principle, i.e. similar molecules have similar properties. **Similarity** can be defined between **pairs of molecules** in any of the CC [datasets](datasets).
|
|
|
|
|
|
|
|
When it comes to **comparing molecules to other biological entities** (antibodies, shRNAs, diseases, etc.), the similarity principle can be generalized to the notion of **connectivity**. A classical view of connectivity are molecules that *mimic* the transcriptional profile of a shRNA, or molecules that *revert* the transcriptional profile of a disease state.
|
|
|
|
|
|
|
|
These are some ways similarity and connectivity can be applied in the CC:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Easy calculation of similarity and connectivity
|
|
|
|
|
|
In the context of the CC, "connectivity" is a generalization of "similarity". That is, it is a more flexible means to compare entities. The notion of connectivity is of special interest to *unsupervised* drug discovery since it enables mapping of external biological data to the chemical space.
|
|
|
|
|
|
|
|
In the [CC pipeline](production phase), connectivity happens at the pre-processing step. The pre-processing step has [two phases](datasets):
|
|
|
|
|
|
|
|
1. XX
|
|
1. XX
|
|
2. XX
|
|
2. XX
|
... | @@ -17,11 +23,12 @@ Another important matter here is the distance. The CC works with *common* distan |
... | @@ -17,11 +23,12 @@ Another important matter here is the distance. The CC works with *common* distan |
|
|
|
|
|
## Standard input files
|
|
## Standard input files
|
|
|
|
|
|
|Type|Format|Description||
|
|
|Type|Format|Description|
|
|
|Feature sets||GMT|
|
|
|---|---|---|
|
|
|Key-feature pairs||TSV|
|
|
|Feature sets|GMT|XX|
|
|
|Key profiles||TSV|
|
|
|Key-feature pairs|TSV|XX|
|
|
|InChIKeys|TSV|
|
|
|Key profiles|TSV|XX|
|
|
|
|
|InChIKeys|TSV|XX|
|
|
|
|
|
|
## Documentation
|
|
## Documentation
|
|
|
|
|
... | | ... | |