Visible neural network that simulates eukaryotic cell growth by embedding the Gene Ontology hierarchy into its architecture, enabling interpretable genotype-phenotype prediction.
DCell is a visible neural network (VNN) developed by the Ideker Lab that simulates the growth of a eukaryotic cell by structuring its computational graph to mirror the biological organization of the cell itself. Published in Nature Methods in 2018, it represented a significant departure from standard deep learning models in biology: rather than learning opaque latent representations, DCell encodes the Gene Ontology (GO) hierarchy directly into the network topology, so that every internal node corresponds to a named biological subsystem — a pathway, protein complex, or cellular process.
The model was trained on yeast (Saccharomyces cerevisiae) genetic interaction data comprising several million genotypes with experimentally measured growth phenotypes. By learning on this large-scale perturbation dataset, DCell achieves near-laboratory accuracy in predicting cell fitness from arbitrary combinations of gene disruptions, while simultaneously exposing the subsystem-level activity patterns that drive each prediction.
DCell's core contribution is demonstrating that interpretability and predictive power need not be in tension. The visible architecture allows researchers not only to obtain accurate growth predictions but also to trace exactly which biological subsystems are most activated or suppressed by a given genotype — a transparency that black-box neural networks cannot provide.
DCell is a deep feedforward neural network in which connectivity is determined by the Gene Ontology graph rather than by learned or random wiring. Genes form the input layer, encoding the disruption state (knockout or wild-type) of each gene in the genome. Each GO term is represented as a subsystem node that receives inputs from its child subsystems and from the genes directly annotated to it. This hierarchical connectivity propagates information from individual gene states upward through biological processes, cellular components, and molecular functions to a final output neuron predicting growth. The resulting network spans approximately 2,526 subsystems organized across multiple hierarchical levels.
Training used large-scale genetic perturbation data from Saccharomyces cerevisiae, including double-deletion growth measurements that capture epistatic interactions between gene pairs. The model was optimized using standard backpropagation, with the fixed biological topology acting as a strong structural prior that constrains the solution space. Because the network is fully differentiable, subsystem activity gradients can be computed with respect to the output, allowing downstream attribution analysis to identify which subsystems are most influential for a particular prediction.
DCell is primarily applied in systems biology and functional genomics research. Geneticists use the model to predict and interpret the growth consequences of gene deletions and combinations in yeast, enabling prioritization of candidates for wet-lab validation. Disease researchers apply the VNN framework to model how human genetic variants propagate through cellular pathways to affect phenotype, informing gene-disease association studies. In drug discovery, DCell-style models are used to identify critical subsystems whose disruption confers drug sensitivity or resistance, helping to pinpoint actionable targets. The interpretable subsystem outputs are also used to generate mechanistic hypotheses about genetic interactions, providing a starting point for hypothesis-driven experiments that would otherwise require broad exploratory screens.
DCell established visible neural networks as a viable alternative to black-box deep learning for biological prediction tasks, influencing a line of subsequent work that applies constrained network architectures to cancer genomics, drug response prediction, and multi-omics integration. The paper has accumulated several hundred citations and is widely referenced in discussions of interpretable machine learning in biology. A notable limitation is that the model was trained exclusively on yeast data and does not generalize directly to mammalian systems without retraining on equivalent large-scale perturbation datasets, which remain scarcer for human cells. Additionally, DCell's hierarchy is fixed at training time, meaning it cannot dynamically adapt to newly discovered biological relationships without retraining from scratch.
Ma, J., Yu, M.K., Fong, S. et al. Using deep learning to model the hierarchical structure and function of a cell. Nat Methods 15, 290–298 (2018).
DOI: 10.1038/nmeth.4627