Miquel Duran-Frigola · 67fca240
--- a/dataset-correlation.md
+++ b/dataset-correlation.md
 # Correlation between datasets

-With molecules expressed as [signatures](signaturization), it becomes straightforward to assess the similarity principle, that is, it is easy to check whether similar molecules in a particular dataset are still similar in another dataset.
+With molecules expressed as [signatures](signaturization), it is easy to apply the similarity principle.

 As of today, we measure the following types of correlation (only between the 25 exemplary datasets):

@@ -8,9 +8,14 @@ As of today, we measure the following types of correlation (only between the 25
 * Shared pairs of similar molecules.
 * Coincidence of similarity ranks.

-These correlations are used not only for analysis, but more importantly, to *predict* unseen similarities. If two molecules are similar in a certain space, they are likely to be similar in other highly correlated spaces, too.
+These correlations greatly help analysis (here we show a consensus on exemplary datasets):

-This approach yields predictors that are, hopefully, of sufficient quality for the [CC web app](http://chemicalchecker.org). However, I do think we have to be more ambitious and aim at more proficient classifiers, potentially using deep learning (e.g. siamese networks).
+![consensus](/uploads/c1ac016c0c7436755c1709ec2ca3afeb/consensus.png)
+
+Such correlations, in a very simple manner, together with conditional probabilities, are used internally by the [CC web app](http://chemicalchecker.org).
+
+## Signatures Type 3
+
+Signatures Type 3 are the attemp to predict, for *any* given molecule (with *any* given information available for it), the signature corresponding to a certain data type.

-This will be a crucial achievement for the development of the CC as it will yield signatures of type 3, which are currently lacking.