... | @@ -17,7 +17,6 @@ Besides, there are other (auxiliary) types of data that may be of interest. The |
... | @@ -17,7 +17,6 @@ Besides, there are other (auxiliary) types of data that may be of interest. The |
|
* `clus*` [Clusters](#clusters): Centroids and labels of a k-means clustering.
|
|
* `clus*` [Clusters](#clusters): Centroids and labels of a k-means clustering.
|
|
* `proj*` [2D Projections](#2d-projections): t-SNE 2D projections of the data.
|
|
* `proj*` [2D Projections](#2d-projections): t-SNE 2D projections of the data.
|
|
|
|
|
|
|
|
|
|
I consider the numbering `0`-`3` to be conceptually closed. However, further auxiliary data types may be introduced in the future. Note that all names have a 4-character code followed by a digit. Future data should stick to this nomenclature.
|
|
I consider the numbering `0`-`3` to be conceptually closed. However, further auxiliary data types may be introduced in the future. Note that all names have a 4-character code followed by a digit. Future data should stick to this nomenclature.
|
|
|
|
|
|
## Commonalities
|
|
## Commonalities
|
... | @@ -30,29 +29,49 @@ Every CC data will have the following methods: |
... | @@ -30,29 +29,49 @@ Every CC data will have the following methods: |
|
* `predict()`: Uses the fitted models to go from input to output.
|
|
* `predict()`: Uses the fitted models to go from input to output.
|
|
* `validate()`: Performs a validation across external data such as MoA and ATC codes.
|
|
* `validate()`: Performs a validation across external data such as MoA and ATC codes.
|
|
|
|
|
|
## Signatures type 0
|
|
Also, data are always sorted by a certain key (and InChIKey, typically). These keys should be accessible and iterable without having to load the whole dataset into memory.
|
|
|
|
|
|
|
|
* `keys`
|
|
|
|
* `__iter__`
|
|
|
|
* `__getattr__`
|
|
|
|
|
|
|
|
I think that it may be interesting to keep cognizance of the folder where persistency models are stored:
|
|
|
|
|
|
|
|
* `PATH`
|
|
|
|
|
|
|
|
### Signature commonalities
|
|
|
|
|
|
|
|
All signatures type 0-3 contain a numerical data matrix:
|
|
|
|
|
|
|
|
* `V`: Typically, a dense matrix (it can be sparse in the case of signatures type 0).
|
|
|
|
|
|
|
|
All
|
|
|
|
|
|
|
|
## Peculiarities
|
|
|
|
|
|
|
|
### Signatures type 0
|
|
|
|
|
|
* From: standard input
|
|
* From: standard input
|
|
* To: `sign0.h5`
|
|
* To: `sign0.h5`
|
|
|
|
|
|
## Signatures type 1
|
|
### Signatures type 1
|
|
|
|
|
|
* From: `sign0.h5`
|
|
* From: `sign0.h5`
|
|
* To: `sign1`.h5`
|
|
* To: `sign1`.h5`
|
|
|
|
|
|
## Signatures type 2
|
|
### Signatures type 2
|
|
|
|
|
|
## Signatures type 3
|
|
### Signatures type 3
|
|
|
|
|
|
## Similarity vectors
|
|
### Similarity vectors
|
|
|
|
|
|
## Nearest neighbors
|
|
### Nearest neighbors
|
|
|
|
|
|
## Predicted nearest neighbors
|
|
### Predicted nearest neighbors
|
|
|
|
|
|
## Clusters
|
|
### Clusters
|
|
|
|
|
|
## 2D projections
|
|
### 2D projections
|
|
|
|
|
|
Below, I list an schematic proposal of the classes:
|
|
Below, I list an schematic proposal of the classes:
|
|
|
|
|
... | | ... | |