|
|
# Database
|
|
|
# SQL database
|
|
|
|
|
|
XXX |
|
|
\ No newline at end of file |
|
|
A lot of meta-data related to the CC is stored in a PostgreSQL database called `chemical_checker_<yyyy>_<mm>`. Tables are listed in **alphabetical order**.
|
|
|
|
|
|
## Central tables
|
|
|
|
|
|
The most important tables of the CC are:
|
|
|
|
|
|
* `datasets`
|
|
|
* It contains [dataset specifications](datasets).
|
|
|
* Every time a dataset is stored in the CC, this table should be updated.
|
|
|
* `general_properties`
|
|
|
* Compound properties such as molecular weight or Lipinski's rule of 5 violations.
|
|
|
* :warning: This table is currently called `physchem`.
|
|
|
* `libraries`
|
|
|
* Abbreviation and description of compound libraries.
|
|
|
* :warning: This table is currently called `library_description`
|
|
|
* `pubchem`
|
|
|
* CID, name and synonyms of molecules.
|
|
|
* `structure`
|
|
|
* It contains `inchikey` and `inchi`.
|
|
|
* This is an evergrowing table. Every time a molecule is seen by the CC, it should be stored here.
|
|
|
|
|
|
## Analysis tables
|
|
|
|
|
|
Analysis tables, typically related to exemplary datasets.
|
|
|
|
|
|
* `coordinate_clust_paired_conditionals`
|
|
|
* Conditional probabilities between clusters in different datasets.
|
|
|
* `coordinate_conditionals`
|
|
|
* Conditional probabilities between similar molecules in different datasets.
|
|
|
* `coordinate_correlation`
|
|
|
* Canonical correlation analysis between datasets.
|
|
|
* `coordinate_paired_conditionals`
|
|
|
* Paired conditional probabilities between similar molecules in different datasets.
|
|
|
* `coordinate_ranks`
|
|
|
* Coincidence of ranks between pairs of similar molecules in different datasets.
|
|
|
* `scores`
|
|
|
* Popularity, singularity and mappability scores.
|
|
|
* `similarity_inference_performance`
|
|
|
* Performance analysis of the similarity inference.
|
|
|
* :warning: This table is currently called `mapper_performance`.
|
|
|
|
|
|
## Auxiliary tables for the production phase
|
|
|
|
|
|
The pipeline requires some
|
|
|
|
|
|
* `data`
|
|
|
* Mainly used to guide the downloads.
|
|
|
* Whether it is necessary to download again, whether it is a frozen dataset, etc.
|
|
|
* :thinking: I am not sure this table is necessary.
|
|
|
* `sims_versions`
|
|
|
* This is relevant to the full similarity vectors, especially.
|
|
|
* It tells, for each molecule and coordinate, when was the `sims1` calculated and stored in the `./molecules/` directories.
|
|
|
* :warning: This table is currently called `distance_versions`.
|
|
|
|
|
|
## Auxiliary tables for the CC web app
|
|
|
|
|
|
The [CC web app](http://chemicalchecker.org) requires some auxiliary tables.
|
|
|
|
|
|
* `coordinate_proj_lims`
|
|
|
* Contains the number of molecules, and `(xlim, ylim)` of the projections.
|
|
|
* `library_compounds`
|
|
|
* Belonging of compounds to exemplary libraries.
|
|
|
* :warning: This table is currently called `libraries`
|
|
|
* `lookup`
|
|
|
* Fast database lookup.
|
|
|
* `projections`
|
|
|
* `(x,y)` coordinates of the 2D projections.
|
|
|
* `showtargets`
|
|
|
* Targets to show in the molecule cards.
|
|
|
* `showtargets_description`
|
|
|
* Annotations of protein targets. |