... | ... | @@ -4,7 +4,7 @@ The main feature of the CC is the automatic conversion of virtually any compound |
|
|
|
|
|
The central type of data are the signatures (one numerical vector per molecule), which are of four types:
|
|
|
|
|
|
* `sign0` [Signatures type 0](#signatures-type-0): A sufficiently-processed version of the raw data, containing TF-IDF weightnings if applicable. They usually show explicit knowledge, which enables connectivity and interpretation.
|
|
|
* `sign0` [Signatures type 0](#signatures-type-0): A sufficiently-processed version of the raw data. They usually show explicit knowledge, which enables connectivity and interpretation.
|
|
|
* `sign1` [Signatures type 1](#signatures-type-1): A PCA/LSI-projected version of the data, retaining 90% of the variance. They keep most of the complexity of the original data and they can be used for *almost-exact* similarity calculations.
|
|
|
* `sign2` [Signatures type 2](#signatures-type-2): Network embedding of the similarity matrix derived from signatures type 1. They have fixed length, which is convenient for machine learning, and capture both explicit *and* implicit similarity relationships in the data.
|
|
|
* `sign3` [Signatures type 3](#signatures-type-3): Network embedding of observed *and* inferred similarity networks. Their added value, compared to signatures type 2, is that they can be derived for virtually *any* molecule in *any* dataset. :warning: These signatures are not calculated yet, and won't be in the near future.
|
... | ... | |