StemSight at a Glance
Introduction
Self renewal is the ability of a stem cell to undergo numerous cycles of cell division while maintaining an undifferentiated state. The molecular mechanisms that drive stem cell self renewal processes in different classes of stem cells at different stages of development and tissue homeostasis are only partially understood.
To further characterize the molecular foundations of stem cell self-renewal, we developed a cell-type-specific Bayesian network machine learning approach to integrate and analyze the largest single collection of high-throughput murine embryonic stem cell (ESC) data, comprising more than 1.5 million data points. Computational evaluation shows our results are highly accurate, biologically relevant, and significantly improved over prior efforts, which overlook the functional importance of specific cell types in mammals. The inferred functional relevance network confirms roles of genes and proteins known to be involved in ESC self-renewal and predicts many novel players. In addition, our study provides insights into how to manage overfitting and evaluate the performance of networks generated from mammalian data.
Stem Cell Self-Renewal
Stem cells can divide symmetrically to generate two identical daughter cells or asymmetrically to produce one stem cell and one restriced progenitor cell.
A major challenge of ongoing research is to determine whether core conserved pathways can be distilled from the cacophony of biological interactions that direct embryonic and somatic stem cell fate and self-renewal. More than a dozen signaling pathways are implicated in self-renewal, suggesting regulation by a complex interplay of external signaling cues, transcriptional control, and molecular activities. Despite this inherent complexity, most models of self-renewal oversimplify the intricate dynamics associated with maintaining a cell lineage throughout development and adulthood.
Bayesian Network Machine Learning
A Bayesian network is a machine learning tool for organizing pieces of knowledge and encoding statistical dependence relationships among these pieces of knowledge. Such graphical models, in which each circle represents a node and each directed edge represents a dependence relationship, provide a flexible framework for combining different types of observed data and prior knowledge.
A naïve Bayes network is a simplified version of a Bayesian network in which all child nodes are dependent on the parent and independent of each other. This type of graphical device may be used to organize statistical information and generate probabalistic models of biological functional relationship (FR) networks, which are typically rendered as dense, complex graphs that represent molecular elements as nodes and predicted functional linkages between nodes as undirected edges.
Our Approach
Our cell-type specific approach is designed to control the inherent complexity of mammalian systems biology by focusing on a single cell system during a specific developmental stage, within the context of a clearly defined biological process.
Our methodology integrates high-throughput genomic evidence from disparate datasets, infers functional relationships among genes, and predicts networks of genes likely to be functionally related in a specific biological context for further analysis.
Results Using Mouse Embryonic Stem Cells
We trained a Bayesian classifier using:
- Data from more than 800 different mouse stem cell microarray expression assays, chromatin immunoprecipitation (ChIP) assays, and whole-genome RNA Interferance (RNAi) screens. This compendium of mESC data comprised more than 1.5 million datapoints, and was integrated into a training set of more than 6 billion pairwise interactions between genes.
- A manually curated gold standard derived from ~100 current journal articles on mouse Embryonic stem cell cell renewal and cell fate. This gold standard includes more than 2000 experimentally validated pairwise interactions between genes or gene products known to be involved in mESC self-renewal.
- A reference gene list of ~23,000 protein coding genes, downloaded from Mouse Genome Informatics (MGI).
Performance metrics and cross validation confirmed this approach helps achieve optimal results for mammalian systems.
Receiver operating characteristic (ROC) curves illustrate the mESC network outperforms other Bayesian classifier-based FRNs of this type for mammalian systems while minimizing overfitting.
Project Deliverables and Next Steps
This website is designed to provide you access to our underlying data, dynamic network visualization, and functional analyses of genes and proteins likely to be related in the context of self-renewal. This comprehensive online resource can be used as a reference for hypothesis creation and experimental design.
Future studies will examine shared and unique molecular characteristics of mouse stem cell and cell fate pathways in embryonic and adult stem cells as well as induced pluripotent and cancer stem-like cells. Comparative studies will contrast human and mouse stem cell fate pathways.