Adanet TODO
Technical
-
Training on large spaces (aka out-of-core or online learning) to avoid the following:
ValueError: Cannot create a tensor proto whose content is larger than 2GB.
- Generate train/test splits for all spaces
-
Make a
AdanetWrapper
class -
Serialize generated models (aka
Save
andLoad
functions) -
Standard predictor interface (aka
Fit
andPredict
functions)
Validation
- Plot train-test performance vs. epoch (Tensorboard?)
- Scatterplot test y_pred vs y_true for each component, annotate correlation
-
Previous two comparing to simple
sklearn.linear_model.LinearRegression
- Are distances conserved? y_true pairwise distance vs. y_pred pairwise distance (cosine)
Gridsearch
Training on a small A5 subset reveals that we can overfit, we must be able to overfit also on large space, so
-
Gridsearch on 3 parameters:
-
learning_rate
-
adanet lambda
-
layer_size
-
additional gridsearch as discussed in comments
-
learning_rate:
[1e-2, 1e-3]
-
train_step:
[1e5, 1e6]
-
layer_size:
[2**9, 2**8, 2**7]