Learning deep statistical models from massive data: Application to brain network architecture in humans

The present proposal exploits the recent insight that the relationships between – rather than within – major brain networks are key to functional brain organization. Psychological processes probably go hand-in-hand with characteristic coupling changes between major brain networks. The importance of such distinctive network configurations is likely to extend from health to the most devastating mental disorders. We therefore want to extract a detailed model of human inter-network dynamics. This poses a data representation problem with the optimal feature engineering being currently unclear.

State-of-the-art deep neural network algorithms can solve the data representation and classification problem in an identical process. Using extensive brain imaging data from the Human Connectome Project (HCP), we will address three objectives: i) profit from the many unlabeled and the few labeled brain imaging maps, ii) compute generative models as explicit descriptions of brain network mechanisms, and iii) reverse-infer brain activity underlying specific psychological processes. This will be enabled by recent advances in probabilistic models for inductive and transductive semi-supervised learning (Kingma et al., 2014 NIPS).

A combination of probabilistic modelling and deep neural networks exploits parametric density estimators, treats classification as a specialized missing data-imputation task, and performs approximate Bayesian inference. Such stochastic inference allows conjointly optimizing both the model and variational parameters. These models have been shown to be flexible, scalable to massive data, and highly competitive in semi-supervised learning problems. The developed computational strategies will be made openly accessible.

The scientific computation will be implemented in Python. Capitalizing on its mature and rich ecosystem helps enhance replicability, reusability, and provenance tracking. Data preprocessing and feature design will be handled by Nilearn (http://nilearn.github.io/) – the currently most comprehensive machine learning library for brain imaging data. Large-scale learning of deep neural networks will be realized on GPUs by the theano package (https://github.com/Theano/Theano). Distributed multi-core computing, in turn, will profit from Amazon Web Services. All required software solutions are free, open-source, and under continuous improvement on the social-coding platform Github.