... | ... | @@ -263,9 +263,7 @@ Below I sequentially list the steps of the pipeline. This is a linear and qualit |
|
|
* Here again, t-SNE has no out-of-sample method, so we need to learn a mapping between signatures type 1 and the 2D projections.
|
|
|
* I suggest using AdaNet in this case, too.
|
|
|
* Save the models for persistency.
|
|
|
|
|
|
Once reference calculations are done, we can move to the full dataset.
|
|
|
|
|
|
15. Predict signatures type 1.
|
|
|
* If the molecule is in reference (or is a near-duplicate of it), take signature.
|
|
|
* Else, use the persistent model to predict.
|
... | ... | @@ -289,9 +287,7 @@ Below I sequentially list the steps of the pipeline. This is a linear and qualit |
|
|
* Do 1-to-many or predict, as necessary.
|
|
|
* Keep the `proj1.h5` file under `./full`.
|
|
|
* Do the validation plots.
|
|
|
|
|
|
Points 1-19 are applicable to any dataset. Comparison of CC datasets is, **for now**, only among *exemplary* ones. From here on, we only perform the calculations on these 25 exemplary datasets.
|
|
|
|
|
|
20. Link exemplary to full datasets
|
|
|
* In the `./exemplary`, keep the corresponding signature files available from `./full`.
|
|
|
* It is not necessary that signature files are copied, they can just be *linked* with a pointer.
|
... | ... | @@ -310,7 +306,7 @@ Below I sequentially list the steps of the pipeline. This is a linear and qualit |
|
|
24. Prepare for inference.
|
|
|
* Produce correlation matrices and placeholders necessary for inference.
|
|
|
* Save them under `./exemplary`.
|
|
|
25. Infer similarities
|
|
|
25. Infer similarities.
|
|
|
* Add datasets to `sims.h5 (*_prd)`under `./molecules`.
|
|
|
26. Calculate CC scores.
|
|
|
* Popularity
|
... | ... | |