This is an old version of this page. You can view the most recent version or browse the history.

similarity and connectivity

Similarity and connectivity

The CC is based upon the similarity principle, i.e. similar molecules have similar properties. Similarity can be defined between pairs of molecules in any of the CC datasets.

When it comes to comparing molecules to other biological entities (antibodies, shRNAs, diseases, etc.), the similarity principle can be generalized to the notion of connectivity. A classical view of connectivity are molecules that mimic the transcriptional profile of a shRNA, or molecules that revert the transcriptional profile of a disease state.

These are some ways similarity and connectivity can be applied in the CC:

Easy calculation of similarity and connectivity

More precisely, connectivity starts with standard input files and finishes with a signature type 0.

In some datasets, this procedure may be of considerable complexity and we need to conceive workflows that can be wrapped into a predict() method. For other datasets, the procedure will be almost trivial.

Another important matter here is the distance. The CC works with common distance metrics, such as the cosine or euclidean distances. Sometimes, connectivity may require other types of metrics (e.g. GSEA-like, overlap, etc.). We might consider learning siamese networks that transform original distances to the more standard ones. This is an unexplored avenue, though.

Standard input files

Type	Format	Description
Feature sets	GMT	XX
Key-feature pairs	TSV	XX
Key profiles	TSV	XX
InChIKeys	TSV	XX

Documentation

Every pre-processing script needs to have a README file.

GitLab