... | ... | @@ -23,8 +23,8 @@ Every dataset has one (or more) pre-processing script(s), always consisting of t |
|
|
* The complexity of this step can vary dramatically:
|
|
|
- *Very simple:* Like in the case of 2D fingerprints where, simply, we take the corresponding molecular properties of the InChIKey provided. Likewise, the case of indications, where we read drug-disease pairs and map them.
|
|
|
- *Simple:* The case of binding data where, in some occasions, we map target classes to the binding data.
|
|
|
- *Not so simple:* The case of pathways, where we map targets to human orthologs, and then these to pathway annotations. In this case, the input may be of two types (i.e. targets or the pathways themselves).
|
|
|
- *Complex:* The case of interactomes, where we map targets to human orthologs and these to several networks using HotNet. Here again, in this case the input may be of two types (i.e. targets or the neighbors themselves).
|
|
|
- *Not so simple:* The case of pathways, where we map targets to human orthologs, and then these to pathway annotations. In this case, the [input](#entry-points) may be of two types (i.e. targets or the pathways themselves).
|
|
|
- *Complex:* The case of interactomes, where we map targets to human orthologs and these to several networks using HotNet. Here again, in this case the [input](#entry-points) may be of two types (i.e. targets or the neighbors themselves).
|
|
|
- *Very complex:* The case of LINCS transcriptomics data (`D1.001`), we start from signatures of interest, we compare them to the Touchstone signatures using a GSEA-like metric, we aggregate them if necessary and we filter the outcome accordingly.
|
|
|
|
|
|
In practice:
|
... | ... | @@ -72,3 +72,131 @@ From Signature Type 0 onwards, the CC only deals with two distance metrics: the |
|
|
It may happen that some datasets require more advanced metrics, though. In this case, we recommend applying any required **transformation** of the data in the pre-processing, so as Signatures Type 0 are natively comparable using cosine/Euclidean distances. This can be achieved by metric learning algorithms. For example, one incorporate a Siamese network in the pre-processing:
|
|
|
|
|
|
![connectivity_examples-02](/uploads/4205269881f764f5e74af564838ebc10/connectivity_examples-02.png)
|
|
|
|
|
|
## Entry points
|
|
|
|
|
|
The *mapping* (prediction) for new molecules/entities can be entered at one or multiple steps of the predict pipeline. The corresponding argument is `entry_point`.
|
|
|
|
|
|
### `A` Chemistry
|
|
|
|
|
|
#### `A1.001` 2D fingerprints
|
|
|
|
|
|
* Key-SMILES pairs: `smiles`
|
|
|
* InChIKeys: `inchikey` [Default]
|
|
|
|
|
|
#### `A2.001` 3D fingerprints
|
|
|
|
|
|
* Key-SMILES pairs: `smiles`
|
|
|
* InChIKeys: `inchikey` [Default]
|
|
|
|
|
|
#### `A3.001` Scaffolds
|
|
|
|
|
|
* Key-SMILES pairs: `smiles`
|
|
|
* InChIKeys: `inchikey` [Default]
|
|
|
|
|
|
#### `A4.001` Structural keys
|
|
|
|
|
|
* Key-SMILES pairs: `smiles`
|
|
|
* InChIKeys: `inchikey` [Default]
|
|
|
|
|
|
#### `A5.001` Physicochemistry
|
|
|
|
|
|
* Key-SMILES pairs: `smiles` [Default]
|
|
|
* InChIKeys: `inchikey`
|
|
|
|
|
|
### `B` Targets
|
|
|
|
|
|
#### `B1.001` Mechanism of action
|
|
|
|
|
|
* Key-Protein pairs (-1/+1; default = -1): `proteins` [Default]
|
|
|
* Key-Class pairs (-1/+1; default = -1): `classes`
|
|
|
|
|
|
#### `B2.001` Metabolic genes
|
|
|
|
|
|
* Key-Protein pairs: `proteins` [Default]
|
|
|
* Key-Class pairs: `classes`
|
|
|
|
|
|
#### `B3.001` Crystals
|
|
|
|
|
|
* Key-PDBs pairs: `pdbs` [Default]
|
|
|
* Key-ECOD pairs: `domains`
|
|
|
|
|
|
#### `B4.001` Binding
|
|
|
|
|
|
* Key-Protein pairs (2/1; default = 1): `proteins` [Default]
|
|
|
* Key-Class pairs (2/1; default = 1): `classes`
|
|
|
|
|
|
#### `B5.001` HTS bioassays
|
|
|
|
|
|
* Key-Protein pairs: `proteins` [Default]
|
|
|
* Key-Class pairs: `classes`
|
|
|
|
|
|
### `C` Networks
|
|
|
|
|
|
#### `C1.001` Small molecule roles
|
|
|
|
|
|
* Key-ChEBI pairs: `terms` [Default]
|
|
|
|
|
|
#### `C2.001` Small molecule pathways
|
|
|
|
|
|
* Key-InChIKey pairs (exact nodes) (10-1; default = 5): `inchikeys` [Default]
|
|
|
* Key-InChiKey pairs (neighbors) (10-1; default = 5): `inchikey_neighbors`
|
|
|
|
|
|
#### `C3.001` Signaling pathways
|
|
|
|
|
|
* Key-Protein pairs (2/1; default = 1): `proteins` [Default]
|
|
|
* Key-Pathway pairs (2/1; default = 1): `pathways`
|
|
|
|
|
|
#### `C4.001` Biological processes
|
|
|
|
|
|
* Key-Protein pairs (2/1; default = 1): `proteins` [Default]
|
|
|
* Key-Process pairs (2/1; default = 1): `processes`
|
|
|
|
|
|
#### `C5.001` Interactome
|
|
|
|
|
|
* Key-Protein pairs (exact nodes) (2/1; default = 1): `proteins` [Default]
|
|
|
* Key-Protein pairs (neighbors) (10-1; default = 5): `protein_neighbors`
|
|
|
|
|
|
### `D` Cells
|
|
|
|
|
|
#### `D1.001` Gene expression
|
|
|
|
|
|
* Key-(perturbation)-Up&Down genes: `up_down` [Default]
|
|
|
|
|
|
#### `D2.001` Cancer cell lines
|
|
|
|
|
|
* Key-Profile (score): `profile` [Default]
|
|
|
|
|
|
#### `D3.001` Chemical genetics
|
|
|
|
|
|
* Key-Strain (2/1; default = 1): `strain` [Default]
|
|
|
|
|
|
#### `D4.001` Morphology
|
|
|
|
|
|
* Key-Measure (score): `measure` [Default]
|
|
|
|
|
|
#### `D5.001` Cell bioassays
|
|
|
|
|
|
* Key-Cell: `cell` [Default]
|
|
|
|
|
|
### `E` Clinics
|
|
|
|
|
|
#### `E1.001` Therapeutic areas
|
|
|
|
|
|
* Key-ATC: `atc` [Default]
|
|
|
|
|
|
#### `E2.001` Indications
|
|
|
|
|
|
* Key-Disease (4/1; default = 2): `disease` [Default] (vocabulary: MEDIC/MeSH)
|
|
|
|
|
|
#### `E3.001` Side effects
|
|
|
|
|
|
* Key-Side effect: `side_effect` [Default] (vocabulary: UMLS)
|
|
|
|
|
|
#### `E4.001` Diseases and Toxicology
|
|
|
|
|
|
* Key-Disease (M/T; mandatory): `disease` [Default] (vocabulary: MEDIC/MeSH)
|
|
|
|
|
|
#### `E5.001` Drug-drug interactions
|
|
|
|
|
|
* Key-Drug: `drug` [Default] (vocabulary: DrugBank) |
|
|
\ No newline at end of file |