... | ... | @@ -64,3 +64,11 @@ We highly recommend that, when designing the datasets, features are as explicit |
|
|
|Pharmacologic class|`PHC`|
|
|
|
|
|
|
Obviously, it is mandatory that the *vocabularies* used in the production phase and the mapping phase match.
|
|
|
|
|
|
## Distance metrics
|
|
|
|
|
|
From Signature Type 0 onwards, the CC only deals with two distance metrics: the cosine distance and the Euclidean distance. These are well-accepted metrics that capture two different properties: the direction and the absolute distance, respectively.
|
|
|
|
|
|
It may happen that some datasets require more advanced metrics, though. In this case, we recommend applying any required **transformation** of the data in the pre-processing, so as Signatures Type 0 are natively comparable using cosine/Euclidean distances. This can be achieved by metric learning algorithms. For example, one incorporate a Siamese network in the pre-processing:
|
|
|
|
|
|
|