Researchers at CeMM, the Research Center for Molecular Medicine of the Austrian Academy of Sciences, have developed knowledge-primed neural networks (KPNNs), a new method that combines the power of deep learning with the interpretability of biological network models. KPNNs learn multiple layers of protein signaling and gene regulation from single-cell RNA-seq data, thereby providing a much-needed boost in our ability to convert massive single-cell atlas data into biological insights. These findings have now been published in the renowned scientific journal Genome Biology.
Computer systems that emulate key aspects of human problem solving are commonly referred to as artificial intelligence (AI). This field has seen massive progress over the last years. Most notably, deep learning enabled groundbreaking progress in areas such as self-driving cars, computers beating the best human players in strategy games (Go, chess), computer games, and in poker, and initial applications in diagnostic medicine. Deep learning is based on artificial neural networks – networks of mathematical functions that are iteratively reorganized until they accurately map the data describing a given problem to its solution.
In biology, deep learning has established itself as a powerful method to predict phenotypes (i.e., observable characteristics of cells or individuals) from genome data (for example gene expression profiles). Deep learning is usually a “black box” method: Neural networks are very powerful predictors when provided with enough training data. For example, they have been used to predict cell type from gene expression profiles, and protein structures from DNA sequence data. But standard neural networks cannot explain the learnt relationship of inputs to outputs in a human-understandable way. For this reason, deep learning has so far contributed little to advancing our mechanistic understanding of molecular functions within cells.
To address this lack of interpretability, CeMM Postdoctoral Fellow Nikolaus Fortelny and CeMM Principal Investigator Christoph Bock pursued the idea of performing deep learning directly on biological networks, instead of the generic, fully connected artificial neural networks used in conventional deep learning. They established “knowledge-primed neural networks” (KPNNs) that are based on signaling pathways and gene-regulatory networks. In KPNNs, each node corresponds to a protein or a gene, and each edge has a mechanistic biological interpretation (e.g., protein A regulates the expression of gene B).
The CeMM researchers show in their new study published in Genome Biology that deep learning on biological networks is technically feasible and practically useful. By forcing the deep learning algorithm to stay close to gene-regulatory processes that are encoded in the biological network, KPNNs create a bridge between the power of deep learning and our rapidly growing knowledge and understanding of complex biological systems. As a result, the approach provides concrete insights into the investigated biological systems, while maintaining high prediction performance. This powerful new methodology uses an optimized approach for deep learning, which stabilizes node weights in the presence of redundancy, enhances the quantitative interpretability of node weights, and controls for the uneven connectivity inherent to biological networks.
CeMM researchers demonstrated their new KPNN method on large single-cell datasets, including a compendium of 483,084 single-cell transcriptomes for immune cells established by the Human Cell Atlas consortium. In this dataset, the scientists discovered unexpected diversity in the cell-type-defining regulatory networks between immune cells from bone marrow and cord blood.
The KPNN method combines the predictive power of deep learning and its ability to infer activity levels across multiple hidden layers with the functional interpretability of biological networks. KPNNs are particularly useful for the single-cell RNA-seq data, which are generated at massive scale using single-cell sequencing assays. Moreover, KPNNs are broadly applicable to other areas of biology and biomedicine where relevant prior knowledge can be represented as networks.
The predictions and biological insights obtained by KPNNs will be useful for dissecting cell signaling and gene regulation in health and disease, for identifying novel drug targets, and for deriving testable biological hypotheses from single-cell sequencing data. More generally, the study illustrates the future impact that artificial intelligence and deep learning, will have on mechanistic biology as the scientific community learns how to make AI results biologically interpretable.
The study “Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data” was published in Genome Biology on 21 July 2020. DOI: 10.1186/s13059-020-02100-5
Nikolaus Fortelny and Christoph Bock
This study was co-funded by an Austrian Science Fund (FWF) Special Research Programme grant (FWF SFB F 6102-B21), a New Frontiers Group award of the Austrian Academy of Sciences and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No 679146 awarded to Christoph Bock). Nikolaus Fortelny was supported by a fellowship from the European Molecular Biology Organization (EMBO ALTF 241-2017).