Speed-up similarity search
I see three ways of speeding up the similarity search (currently it is prohibitively slow).
- Modify the
similarity
scripts so that they can work withcdist
. - Hash/binarize the vectors and use e.g. Hamming distances. For this, we can use data-dependent or data-independent algorithms.
- Use
annoy
or any other nearest-neighbor search.