HDF5 string compatibility between py2 and py3
string are handled differently between py2-3, numpy and hdf5, resulting in weird behaviour when using a signature generated in py2 in a py3 environment. According to h5py documentation the more general approach to get compatible datasets is to save py2 string as unicode (which is the default string in py3)
example:
import h5py
inks = ['AAKJLRGGTJKAMG-UHFFFAOYSA-N', 'AAOVKJBEBIDNHE-UHFFFAOYSA-N']
fh = h5py.File('./prova.h5','w')
try:
str_dtype = h5py.special_dtype(vlen=unicode) # this works in py2 and fails in py3
else:
str_dtype = h5py.special_dtype(vlen=str) # because str is the new unicode in py3
fh.create_dataset('compatible_string_dataset',data=[i.encode() for i in inks], dtype=str_dtype)
This logic should be used whenever we write strings to HDF5