Miquel Duran-Frigola · 9e4bb954
--- a/production-phase.md
+++ b/production-phase.md
@@ -146,61 +146,26 @@ Accompanying the folder structure, there is a [PostGreSQL database](database) th
 ## Pipeline
-Pipeline scripts are used to produce CC signatures, models and analyses. These scripts are typically run in the `pac-one cluster` at IRB Barcelona.
+Pipeline scripts are used to produce CC signatures, models and analyses. These scripts are typically run in the `pac-one cluster` at IRB Barcelona. Below, we provide detailed explanations of the different pipeline modalities:
-Broadly speaking, the CC pipeline has two modalities:
 1. [Dataset addition](#dataset-addition)
- * The resource is fully updated every 6 months.
+ * [Updates every 6 months](#six-month-pipeline).
- * Datasets may be added sporadically.
+ * [Sporadic addition or updates of datasets](#sporadic-datasets).
-2. [Data mapping](#new-data-mapping)
+2. [Mapping of new data](#new-data-mapping)
 * Mapping of data to a dataset
 * Individually querying
 * Connectivity
 ### Dataset addition
-The resource update will happen **every 6 months**
+#### Six-month pipeline
 ![6-month-pipeline.svg](/uploads/04a4dcb2ad2956cf4accbb0562b7fd38/6-month-pipeline.svg)
+Below I sequentially list the steps of the pipeline. This is a **linear and qualitative** view of the pipeline and does not necessarily correspond to the organization of scripts nor to the sequence of executions.
-* `levels`: all datasets in all coordinates of t
-* `coordinates`: 
-* `datasets`: `None`
-* `xxx`: 
-```
-```
-```
-```
-#### Pipeline execution
-The pipeline will be typically run in the `pac-one cluster`.
-Once the reference is done, with all of the models, one can run the pipeline for any dataset (including the `full` dataset).
-##### Fit and produce the models
-#### Predict for any dataset
-The arguments should be, at least:
-* `--datasets`: Datasets `A1.001`-`E5.999` to calculate. One can also specify the level `A`-`E` or the coordinate `A1`-`E5`. All are considered by default.
-* `--matrices`: What matrices to keep (e.g. `sig0`).
-* `--only_exemplar`: Calculate only exemplar datasets.
-### A linear view of the 6 monthly pipeline
-Below I sequentially list the steps of the pipeline. This is a linear and qualitative view of the pipeline and does not necessarily correspond to the organization of scripts in the repository.
 1. Download data.
- * We need a [SQL table](database#downloads) specifying, for each download file, at least whether they are **completed** or not. Files that are internal to the SB&NB are anyway downloaded from an `ftp` repository or the like.
+ * We need a [SQL table](database#downloads) specifying, for each download file, at least whether they are **completed** or not. Files that are internal to the SB&NB are simply copied.
 * If the data are **completed** *or* the data has not been updated since the last CC update, don't download, just copy/move from the previous CC version.
 * After this step, all data *and* libraries should be stored in the disk.
 2. Read small molecule structures.
@@ -317,4 +282,18 @@ Please beware that, for simplicity, here I have omitted processes that are relev
 * Updating the PubChem entries (name, synonyms, etc.).
 * Filling up the table of targets to show.
+#### Sporadic datasets
+Here again, linear views of the pipeline would be the following. The explanation is more succinct since many details were already given [above](#six-month-pipeline).
+0. Write scripts to process data (cannot be automatized, as this is specific for dataset)
+ * If the dataset is of a new type, 
+ * 
+ * 
+ * Also, fill in the [PostGreSQL table](#database) with the fields .
+1. Download or calculate accordingly
+ * 
+2. Process
 ### New data mapping