... | ... | @@ -12,8 +12,11 @@ The central type of data are the signatures (one numerical vector per molecule), |
|
|
Besides, there are other (auxiliary) types of data that may be of interest. The asterisk `*` denotes correspondence with signatures type `0`-`3`.
|
|
|
|
|
|
* `sims*` [Similarity vectors](#similarity-vectors): Full similarities stored as light `int8` data. Each molecule receives one such similarity vector per dataset. They may be observed (`_obs`) or predicted (`_prd`) similarities. These signatures are [only applicable to exemplary datasets](production-phase). Currently, we only keep `sims1`.
|
|
|
* `neig*` [Nearest neighbors](#nearest-neighbors): . Currently, we consider the 1000-nearest neighbors, which is more than sufficient in any realistic scenario. For now, we only keep `neig1`.
|
|
|
* `nprd*` [Predicted nearest neighbors](#predicted-nearest-neighbors)
|
|
|
* `neig*` [Nearest neighbors](#nearest-neighbors): XXXX. Currently, we consider the 1000-nearest neighbors, which is more than sufficient in any realistic scenario. For now, we only keep `neig1`.
|
|
|
* `nprd*` [Predicted nearest neighbors](#predicted-nearest-neighbors): XXXX
|
|
|
* `clus*` [Clusters](#clusters): XXXX
|
|
|
* `proj*` [2D Projections](#2d-projections): XXX
|
|
|
|
|
|
|
|
|
I consider the numbering `0`-`3` to be conceptually closed. However, further auxiliary data types may be introduced in the future. Note that all names have a 4-character code followed by a digit. Future data should stick to this nomenclature.
|
|
|
|
... | ... | @@ -23,17 +26,23 @@ I consider the numbering `0`-`3` to be conceptually closed. However, further aux |
|
|
|
|
|
I suggest that
|
|
|
|
|
|
## `sign0` Signatures type 0
|
|
|
## Signatures type 0
|
|
|
|
|
|
## Signatures type 1
|
|
|
|
|
|
## Signatures type 2
|
|
|
|
|
|
## Signatures type 3
|
|
|
|
|
|
## `sign1` Signatures type 1
|
|
|
## Similarity vectors
|
|
|
|
|
|
## `sign2` Signatures type 2
|
|
|
## Nearest neighbors
|
|
|
|
|
|
## `sign3` Signatures type 3
|
|
|
## Predicted nearest neighbors
|
|
|
|
|
|
## `sims*` Similarity vectors `sims*`
|
|
|
## Clusters
|
|
|
|
|
|
## `neig*` Nearest neighbors and `nprd*` predicted nearest neighbors
|
|
|
## 2D projections
|
|
|
|
|
|
Below, I list an schematic proposal of the classes:
|
|
|
|
... | ... | |