... | ... | @@ -158,6 +158,8 @@ Pipeline scripts are used to produce CC signatures, models and analyses. These s |
|
|
|
|
|
### Dataset addition
|
|
|
|
|
|
Adding and updating datasets is the most important and computationally intensive part of the repository. In consequence, I anticipate that this part of the pipeline will be in constant evolution.
|
|
|
|
|
|
#### Six-month pipeline
|
|
|
|
|
|
![6-month-pipeline.svg](/uploads/04a4dcb2ad2956cf4accbb0562b7fd38/6-month-pipeline.svg)
|
... | ... | @@ -284,16 +286,10 @@ Please beware that, for simplicity, here I have omitted processes that are relev |
|
|
|
|
|
#### Sporadic datasets
|
|
|
|
|
|
Here again, linear views of the pipeline would be the following. The explanation is more succinct since many details were already given [above](#six-month-pipeline).
|
|
|
Whenever, during a research project, we want to introduce a new dataset to the CC resource, we can follow this reasoning:
|
|
|
|
|
|
0. Write scripts to process data (cannot be automatized, as this is specific for dataset)
|
|
|
* If the dataset is of a new type,
|
|
|
*
|
|
|
*
|
|
|
* Also, fill in the [PostGreSQL table](#database) with the fields .
|
|
|
1. Download or calculate accordingly
|
|
|
*
|
|
|
2. Process
|
|
|
![cc_pipelines-new-dataset.svg](/uploads/4b95d9f973bf32f031dad02f01444b75/cc_pipelines-new-dataset.svg)
|
|
|
|
|
|
Note that, necessarily, adding new data to the CC will require some scripting. Please refer to [dataset processing](datasets#dataset-processing) for guidelines.
|
|
|
|
|
|
### New data mapping |