Pipeline web checker
This issue needs to be discussed with @mduran
We need to create a pipeline to generate the data that the checker's web needs. Basically, this data is in a format of database. However, there are other elements like json files and images. Since this pipeline is new, we need to create the database from scratch( define tables, etc) and then we also need to define for further updates how we'll fill the new DB release from the old one.
Let's start with the current tables that we are using and let's see how we can transform them to be used in the new DB.
Current tables:
- Libraries
- Library_description
I guess these two tables are fine and they should be kept as they are. Only question is will they change in the future? if so, how?
- Pubchem
This table will be the same but only with 1M molecules that will be part of the web. By the way, how to decide which 1M go in? This number of molecules might change in the future?
- Structure
This table shouldn't be useful anymore since the inchi data is not displayed in the web
- Physchem
- Scores
These two table contain info of each molecule. Maybe, we could create a new table that would merge these two tables with only the fields that are needed
- Showtargets
- Lookup
- Coordinates
We can leave them like they are
- Projections
- Coordinate_stats
These two tables contain the information to display the points of the CC plots. Where does this information come from? From the proj1.h5 files?
Other elements
- Molecule Image files
- json files with similarities precalculated
Once, the new DB is ready, we should wonder what will happen in further updates. Basically the question is, can we add new molecules? if so, how?
Explore page
Could we already use the signature type3 or the neigh1 signatures to get the info to be used in the explore page?