TODO signature type 3
Not every molecule in our universe has a signature available in every bioactivity space. Signature type 3 are predictions of signatures type 2 exploiting the information available in other spaces. For example, if we want B2 signature 2 for MOLECULE_X we can use available MOLECULE_X's signatures 2 from all other spaces (from A1 to E5) to infer the signature in B2. Such prediction is called Signature 3.
Steps for type 3 signatures calculation:
GLOBAL steps (when the calculation involves all spaces):
-
train 600 (25x24) cross predictor (e.g. input: A1.sign2, output: B4.sign2)
- test dropout layer to reduce overfit in small sample spaces (intersection between input and output spaces can be small)
-
compose universal signature 2
- huge matrix for all spaces and for all molecules in the universe (adding NaN where molecules are missing)
-
compute subsampling matrix
- idea: log-odd-ratio of observed vs. expected value of co-occurrence of pair of spaces (e.g. like a amino-acid substitution matrix)
LOCAL steps (where the calculation is limited to one space):
- get universal signature 2 (excluding current space) for molecules available in current space
-
augment data by subsampling
- the strategy is to enable prediction with unseen spaces combinations (favor lower probabilities from subsampling matrix)