Include network embeddings
We have to integrate the network embeddings produced by @mbertoni using node2vec
. These embeddings should be stored as netemb.h5
. The format of netemb.h5
files should be the same of sig.h5
, i.e. a keys
sorted vector, a V
matrix (np.float32
) (rows sorted accordingly to keys
), date
, etc.
Present
- As of today,
_similarity.py
produces similarity vectors stored in/aloy/web_checker/molecules/<IN>/<CH>/INCHIKEY/<similars.h5 or sig.h5>
. - These vectors are currently read by @mbertoni and stored as a network.
- Then,
node2vec
is ran to produce embeddings.
Future
- The network should be, in principle, loadable from the output of
faiss
. - I would recommend that we directly implement this option.
As for the MoA / ATC validations, once the code is integrated within the resource, I will the corresponding functions.