|
|
|
The pipeline is the process that the Chemical Checker resource follows in order to update its internal resources.
|
|
|
|
|
|
|
|
The chemical Checker has two main resources: the **Package** and the **Website**.
|
|
|
|
The pipeline will try to differentiate between these two entities. For the same reason, the database resource that the checker provides should be divided in two databases. One permanent database that is updated and other web related database that is created each time a new release is produced.
|
|
|
|
|
|
|
|
|
|
|
|
# 1 Download
|
|
|
|
|
|
|
|
* can be parallelized on Datasources
|
|
|
|
* code 100% there
|
|
|
|
|
|
|
|
```python
|
|
|
|
from chemicalchecker.database import Datasource
|
|
|
|
|
|
|
|
# check if Datasource table is there
|
|
|
|
if not Datasource._table_exists():
|
|
|
|
# create the Datasource table
|
|
|
|
Datasource._create_table()
|
|
|
|
# populate it with Datasources needed for exemplary Datasets
|
|
|
|
Datasource.from_cvs('./exemplary_datasources.csv')
|
|
|
|
# start 45 download jobs (one per Datasource), job will wait until finished
|
|
|
|
job = Datasource.download_hpc('/aloy/scratch/sbnb-adm/tmp_job_download')
|
|
|
|
# check if the downloads are really done
|
|
|
|
if not Datasource.test_all_downloaded():
|
|
|
|
print("Something went WRONG while DOWNLOAD, should retry")
|
|
|
|
# print the faulty one
|
|
|
|
for ds in Datasource.get():
|
|
|
|
if not ds.available():
|
|
|
|
print("Datasource %s not available" % ds)
|
|
|
|
```
|
|
|
|
|
|
|
|
# 2 Molrepo
|
|
|
|
|
|
|
|
* can be parallelized on Datasource having Molrepos
|
|
|
|
* code 30% there, missing:
|
|
|
|
* 2/14 parsers adapted
|
|
|
|
* verify chembldb requirements
|
|
|
|
* decide what to do if table is already there, update?
|
|
|
|
|
|
|
|
```python
|
|
|
|
from chemicalchecker.database import Molrepo
|
|
|
|
|
|
|
|
# create the Molrepo table
|
|
|
|
Molrepo._create_table()
|
|
|
|
# start 14 molrepo jobs (one per Datasource), job will wait until finished
|
|
|
|
job = Molrepo.molrepo_hpc('/aloy/scratch/sbnb-adm/tmp_job_molrepo')
|
|
|
|
# check if the molrepos are really done
|
|
|
|
if not Molrepo.test_all_available():
|
|
|
|
print("Something went WRONG while MOLREPO, should retry")
|
|
|
|
``` |
|
|
|
\ No newline at end of file |