... | ... | @@ -41,4 +41,26 @@ In practice: |
|
|
|InChIKeys|`TSV`|A one-column file containing InChIKeys. This will fetch the corresponding molecular properties from the CC database.|
|
|
|
|Key-feature pairs|`TSV`|A two-column file containing keys (first column) and features (second column). Features can be, for example, protein identifiers. Optionally, a third column can be included to specify the *weight* of the key-feature annotation.|
|
|
|
|Key profiles|`TSV`|A multiple-column file containing keys (first column) and features (second column onwards). These can be, for example, NCI-60 profiles, or chemical-genetic interaction profiles. If a header is not included, the order of the columns should match the one used in the CC internally.|
|
|
|
|Feature sets|`GMT`|A [GMT](http://software.broadinstitute.org/cancer/software/genepattern/file-formats-guide#GMT)-like file, typically used for gene sets. **First column:** Sample (signature) identifier. **Second column:** Agent (perturbagen, molecule, etc.) identifier. If empty, assume the same than first column. This is used in case it is necessary to aggregate downstream. **Third column:** Up features (genes). May be `NULL`. **Fourth column:** Down features (genes). If empty, assume that there is no direction in the gene set, and only take third column. May be `NULL`.| |
|
|
\ No newline at end of file |
|
|
|Feature sets|`GMT`|A [GMT](http://software.broadinstitute.org/cancer/software/genepattern/file-formats-guide#GMT)-like file, typically used for gene sets. **First column:** Sample (signature) identifier. **Second column:** Agent (perturbagen, molecule, etc.) identifier. If empty, assume the same than first column. This is used in case it is necessary to aggregate downstream. **Third column:** Up features (genes). Can be `NULL`. **Fourth column:** Down features (genes). If empty, assume that there is no direction in the gene set, and only take the third column. Can be `NULL`.|
|
|
|
|
|
|
We highly recommend that, when designing the datasets, features are as explicit as possible. A good way to start would be the metanodes defined in the Bioteque:
|
|
|
|
|
|
|Node|Abbreviation|
|
|
|
|---|---|
|
|
|
|Assay|`ASY`|
|
|
|
|Cell|`CLL`|
|
|
|
|Chemical entity|`CHE`|
|
|
|
|Compartment|`CMP`|
|
|
|
|Domain|`DOM`|
|
|
|
|Compound|`CPD`|
|
|
|
|Gene/Protein|`GEN`|
|
|
|
|Disease|`DIS`|
|
|
|
|Molecular function|`MFN`|
|
|
|
|Pathway/processes|`PWY`|
|
|
|
|Protein class|`PCL`|
|
|
|
|Perturbagen|`PGN`|
|
|
|
|Symptom|`SYM`|
|
|
|
|Tissue|`TIS`|
|
|
|
|Pharmacologic class|`PHC`|
|
|
|
|
|
|
Obviously, it is mandatory that the *vocabularies* used in the production phase and the mapping phase match. |