Preprocess scripts
A directory structure is created for the preprocess/signature0 step. One directory for each dataset that contains a run.py file and maybe more files.
@mbertoni There is a final step in all preprocesses scripts. This final step has three tasks and the input are two maps(an inchikey to inchi map and the inchikey to raw map):
- Find from the inchikeys in the map which are not in structure table and add them to the table
- For each of the missing inchikeys draw its image and store it in IN/CH/INCHIKEY/
- Store the raws in the sign0.h5
The first task requires some complex query to the DB, we should decide if we use our ORM framework or a psycopg query. We also need to find a place to put common methods like drawing.