"Entry points" in pre-processing scripts
Pre-processing scripts should have a predict() method, too. I think there was a misunderstanding here.
This is what I wrote in the wiki:
" Every dataset has a particular processing protocol, always consisting of two consecutive steps:
-
Fetching of data and conversion to a standard input file.
-
It is very important that data are minimally transformed here.
-
Data may be fetched from the downloaded files, from calculated properties, or from a file of interest of the user.
-
-
From standard input to signature type 0
-
When adding/updating a dataset, all procedures here must be encapsulated in a
fit()
method. -
Accordingly, a
predict()
method must be available. -
Acceptable standard inputs include:
.gmt
,.h5
and.tsv
. It is strongly recommended that input features are recognizable entities, e.g. those defined in theBioteque
.
-
It is of the utmost importance that step 2 is endowed with a predict()
method. Having the ability to convert any standard input to a signature type 0 (in an automated manner) will enable implementation of connectivity methods. This is a critical feature of the CC and I anticipate that most of our efforts will be put in this particular step."