... | ... | @@ -9,13 +9,13 @@ The central type of data are the signatures (one numerical vector per molecule), |
|
|
* `sign2` [Signatures type 2](#signatures-type-2): Network embedding of the similarity matrix derived from signatures. They have fixed-length, which is convenient for machine learning, and capture both explicit and implicit similarity relationships in the data.
|
|
|
* `sign3` [Signatures type 3](#signatures-type-3): Network embedding of observed *and* inferred similarity networks. Their added value, compared to signatures type 2, is that they can be derived for virtually *any* molecule in *any* dataset. :warning: These signatures are not calculated yet, and won't be in the near future.
|
|
|
|
|
|
Besides, there are other (auxiliary) types of data that may be of interest. `*` denotes correspondence with signatures type `0`-`3`.
|
|
|
Besides, there are other (auxiliary) types of data that may be of interest. The asterisk `*` denotes correspondence with signatures type `0`-`3`.
|
|
|
|
|
|
* `sims*` [Similarity vectors](#similarity-vectors): Full similarities stored as light `int8` data. Each molecule receives one such similarity vector per dataset. They may be observed (`_obs`) or predicted (`_prd`) similarities. These signatures are [only applicable to exemplary datasets](production-phase). Currently, we only keep `sims1`.
|
|
|
* `neig*` [Nearest-neighbors](#nearest-neighbors): . Currently, we consider the 1000-nearest neighbors, which is more than sufficient in any realistic scenario. For now, we only keep `neig1`.
|
|
|
*
|
|
|
* `neig*` [Nearest neighbors](#nearest-neighbors): . Currently, we consider the 1000-nearest neighbors, which is more than sufficient in any realistic scenario. For now, we only keep `neig1`.
|
|
|
* `nprd*` [Predicted nearest neighbors](#predicted-nearest-neighbors)
|
|
|
|
|
|
I consider the numbering `0`-`3` to be conceptually closed. However, further auxiliary data types may be introduced in the future. Note that all names have a four-letter code follwed by a digit. Future data should stick to this nomenclature.
|
|
|
I consider the numbering `0`-`3` to be conceptually closed. However, further auxiliary data types may be introduced in the future. Note that all names have a 4-character code followed by a digit. Future data should stick to this nomenclature.
|
|
|
|
|
|
## Commonalities
|
|
|
|
... | ... | @@ -23,17 +23,17 @@ I consider the numbering `0`-`3` to be conceptually closed. However, further aux |
|
|
|
|
|
I suggest that
|
|
|
|
|
|
## Signatures type 0
|
|
|
## `sign0` Signatures type 0
|
|
|
|
|
|
## Signatures type 1
|
|
|
## `sign1` Signatures type 1
|
|
|
|
|
|
## Signatures type 2
|
|
|
## `sign2` Signatures type 2
|
|
|
|
|
|
## Signatures type 3
|
|
|
## `sign3` Signatures type 3
|
|
|
|
|
|
## Similarity vectors
|
|
|
## `sims*` Similarity vectors `sims*`
|
|
|
|
|
|
## Nearest neighbors
|
|
|
## `neig*` Nearest neighbors and `nprd*` predicted nearest neighbors
|
|
|
|
|
|
Below, I list an schematic proposal of the classes:
|
|
|
|
... | ... | |