SQL database
A lot of meta-data related to the CC is stored in a PostgreSQL database called chemical_checker_<yyyy>_<mm>
. Tables are listed in alphabetical order.
Central tables
The most important tables of the CC are:
datasets
- It contains dataset specifications.
- Every time a dataset is stored in the CC, this table should be updated.
general_properties
- Compound properties such as molecular weight or Lipinski's rule of 5 violations.
-
⚠ This table is currently calledphyschem
. libraries
- Abbreviation and description of compound libraries.
-
⚠ This table is currently calledlibrary_description
pubchem
- CID, name and synonyms of molecules.
structure
- It contains
inchikey
andinchi
. - This is an evergrowing table. Every time a molecule is seen by the CC, it should be stored here.
Analysis tables
Analysis tables, typically related to exemplary datasets.
coordinate_clust_paired_conditionals
- Conditional probabilities between clusters in different datasets.
coordinate_conditionals
- Conditional probabilities between similar molecules in different datasets.
coordinate_correlation
- Canonical correlation analysis between datasets.
coordinate_paired_conditionals
- Paired conditional probabilities between similar molecules in different datasets.
coordinate_ranks
- Coincidence of ranks between pairs of similar molecules in different datasets.
scores
- Popularity, singularity and mappability scores.
similarity_inference_performance
- Performance analysis of the similarity inference.
-
⚠ This table is currently calledmapper_performance
.
Auxiliary tables for the production phase
The pipeline requires some
data
- Mainly used to guide the downloads.
- Whether it is necessary to download again, whether it is a frozen dataset, etc.
-
🤔 I am not sure this table is necessary. sims_versions
- This is relevant to the full similarity vectors, especially.
- It tells, for each molecule and coordinate, when was the
sims1
calculated and stored in the./molecules/
directories. -
⚠ This table is currently calleddistance_versions
.
Auxiliary tables for the CC web app
The CC web app requires some auxiliary tables.
coordinate_proj_lims
- Contains the number of molecules, and
(xlim, ylim)
of the projections. library_compounds
- Belonging of compounds to exemplary libraries.
-
⚠ This table is currently calledlibraries
lookup
- Fast database lookup.
projections
-
(x,y)
coordinates of the 2D projections. showtargets
- Targets to show in the molecule cards.
showtargets_description
- Annotations of protein targets.