Large-scale comparison of machine learning methods for drug target prediction on ChEMBL

Directory structure

genDirStructure.cpp creates the following directory structure:
SampleIdTable.txtsample names corresponding to binary data stored in subdirectories
chemFeatures/clused clustering file is there, further results from clusterMinFull.cpp are stored in a subdirectory clusterMinFull
chemFeatures/done subdirectory for each dense, real-valued data matrix (csv)
chemFeatures/sone subdirectory for each sparse matrix (fpf)
trainfile, that describes targets (assays) to consider, and file, that describes compound-assay relations (double entries may exist) are in this directory
runin subdirectories results from the C/C++ pipeline are stored there

Further the Python pipeline assumes directories:
dataPythonall data stored for Python format
dataPythonReducedonly compounds considered in Python format (reduces main memory assumption)
resPythonDeep learning results stored in subdirectories

You might consider downloading data provided below by the following commands:
chmod u+x
./ ~/jkuLSCData

Data SampleIdTable.txt

Data chemFeatures/cl

Data chemFeatures/d

Data chemFeatures/s

Data train

Data dataPython

Data dataPythonReduced

Contact: Andreas Mayr (