Raws for the chemical layer
The outmolecules/outmolecules.py
script should help you get an idea.
The actual "fingerprinters" (i.e. way to create raw data) are placed at chemutils/raws.py
.
Remember:
- Do the union set of inchikeys of all the previous steps (tables).
-
fp2d
,fp3d
,scaffolds
,subskeys
andphyschem
tables are persistent. We don't have to calculate them every times. Therefore, only do the calculations for new molecules.fp3d
andphyschem
are the slowest. Especially in the case offp3d
, calculations may fail due to the complexity of some molecules. In such a case, it could be interesting to store the inchikey in thefp3d
raws table, with aNULL
value, so that we don't attempt the prediction every time we update the CC.