Miquel Duran-Frigola · 06e2c872
--- a/database.md
+++ b/database.md
-# Database
+# SQL database

-XXX
\ No newline at end of file
+A lot of meta-data related to the CC is stored in a PostgreSQL database called `chemical_checker_<yyyy>_<mm>`. Tables are listed in **alphabetical order**.
+
+## Central tables
+
+The most important tables of the CC are:
+
+* `datasets`
+ * It contains [dataset specifications](datasets).
+ * Every time a dataset is stored in the CC, this table should be updated.
+* `general_properties`
+ * Compound properties such as molecular weight or Lipinski's rule of 5 violations.
+ * :warning: This table is currently called `physchem`.
+* `libraries`
+ * Abbreviation and description of compound libraries.
+ * :warning: This table is currently called `library_description`
+* `pubchem`
+ * CID, name and synonyms of molecules.
+* `structure`
+ * It contains `inchikey` and `inchi`.
+ * This is an evergrowing table. Every time a molecule is seen by the CC, it should be stored here.
+
+## Analysis tables
+
+Analysis tables, typically related to exemplary datasets.
+
+* `coordinate_clust_paired_conditionals`
+ * Conditional probabilities between clusters in different datasets.
+* `coordinate_conditionals`
+ * Conditional probabilities between similar molecules in different datasets.
+* `coordinate_correlation`
+ * Canonical correlation analysis between datasets.
+* `coordinate_paired_conditionals`
+ * Paired conditional probabilities between similar molecules in different datasets.
+* `coordinate_ranks`
+ * Coincidence of ranks between pairs of similar molecules in different datasets.
+* `scores`
+ * Popularity, singularity and mappability scores.
+* `similarity_inference_performance`
+ * Performance analysis of the similarity inference.
+ * :warning: This table is currently called `mapper_performance`.
+
+## Auxiliary tables for the production phase
+
+The pipeline requires some 
+
+* `data`
+ * Mainly used to guide the downloads.
+ * Whether it is necessary to download again, whether it is a frozen dataset, etc.
+ * :thinking: I am not sure this table is necessary.
+* `sims_versions`
+ * This is relevant to the full similarity vectors, especially.
+ * It tells, for each molecule and coordinate, when was the `sims1` calculated and stored in the `./molecules/` directories.
+ * :warning: This table is currently called `distance_versions`.
+
+## Auxiliary tables for the CC web app
+
+The [CC web app](http://chemicalchecker.org) requires some auxiliary tables.
+
+* `coordinate_proj_lims`
+ * Contains the number of molecules, and `(xlim, ylim)` of the projections.
+* `library_compounds`
+ * Belonging of compounds to exemplary libraries.
+ * :warning: This table is currently called `libraries`
+* `lookup`
+ * Fast database lookup.
+* `projections`
+ * `(x,y)` coordinates of the 2D projections.
+* `showtargets`
+ * Targets to show in the molecule cards.
+* `showtargets_description`
+ * Annotations of protein targets.