TODO signature type 3

Not every molecule in our universe has a signature available in every bioactivity space. Signature type 3 are predictions of signatures type 2 exploiting the information available in other spaces. For example, if we want B2 signature 2 for MOLECULE_X we can use available MOLECULE_X's signatures 2 from all other spaces (from A1 to E5) to infer the signature in B2. Such prediction is called Signature 3.

Steps for type 3 signatures calculation:

GLOBAL steps (when the calculation involves all spaces):

train 600 (25x24) cross predictor (e.g. input: A1.sign2, output: B4.sign2)
- test dropout layer to reduce overfit in small sample spaces (intersection between input and output spaces can be small)
compose universal signature 2
- huge matrix for all spaces and for all molecules in the universe (adding NaN where molecules are missing)
compute subsampling matrix
- idea: log-odd-ratio of observed vs. expected value of co-occurrence of pair of spaces (e.g. like a amino-acid substitution matrix)

LOCAL steps (where the calculation is limited to one space):

get universal signature 2 (excluding current space) for molecules available in current space
augment data by subsampling
- the strategy is to enable prediction with unseen spaces combinations (favor lower probabilities from subsampling matrix)

Edited Feb 19, 2019 by Martino Bertoni