|
|
# Datasets
|
|
|
|
|
|
Datasets are the cornerstone of the CC.
|
|
|
In the CC nomenclature, a dataset is determined by:
|
|
|
|
|
|
1. One coordinate.
|
|
|
2. One (typically) or multiple (eventually) sources having the same type of (mergeable) data.
|
|
|
3. A processing procedure yielding signatures type 0.
|
|
|
|
|
|
## Levels, coordinates and datasets
|
|
|
|
... | ... | @@ -44,7 +48,7 @@ In turn, each level is divided in 5 sublevels or **coordinates** representing di |
|
|
|`E4`|Disease phenotypes|Manually curated relationships between chemicals and diseases. Chemicals include drug molecules and environmental substances, among others.|
|
|
|
|`E5`|Drug-drug interactions|Changes in the effect of a drug when it is taken together with a second drug. Drug-drug interactions may alter pharmacokinetics and/or cause side effects.|
|
|
|
|
|
|
Each of the coordinates can contain an arbitrary number of **datasets**. All datasets are fully described in the [PostGreSQL database](database), and searchable at `http://chemicalchecker.org/datasets/`. They receive a numbered coding (e.g. `A1.001`).
|
|
|
Each of the coordinates can contain an arbitrary number of **datasets**. All datasets are fully described in the [PostGreSQL database](database), and searchable at the [CC web app](http://chemicalchecker.org/datasets/) (:warning: not done yet). They receive a numbered coding (e.g. `A1.001`).
|
|
|
|
|
|
## Dataset characteristics
|
|
|
|
... | ... | |