mESC functional relevance network

StemSight at a Glance

Introduction

Self renewal is the ability of a stem cell to undergo numerous cycles of cell division while maintaining an undifferentiated state. The molecular mechanisms that drive stem cell self renewal processes in different classes of stem cells at different stages of development and tissue homeostasis are only partially understood.

To further characterize the molecular foundations of stem cell self-renewal, we developed a cell-type-specific Bayesian network machine learning approach to integrate and analyze the largest single collection of high-throughput murine embryonic stem cell (ESC) data, comprising more than 1.5 million data points. Computational evaluation shows our results are highly accurate, biologically relevant, and significantly improved over prior efforts, which overlook the functional importance of specific cell types in mammals. The inferred functional relevance network confirms roles of genes and proteins known to be involved in ESC self-renewal and predicts many novel players. In addition, our study provides insights into how to manage overfitting and evaluate the performance of networks generated from mammalian data.

Stem Cell Self-Renewal

Stem cells can divide symmetrically to generate two identical daughter cells or asymmetrically to produce one stem cell and one restriced progenitor cell.

Variations on a Stem Cell Self Renewal Theme

A major challenge of ongoing research is to determine whether core conserved pathways can be distilled from the cacophony of biological interactions that direct embryonic and somatic stem cell fate and self-renewal. More than a dozen signaling pathways are implicated in self-renewal, suggesting regulation by a complex interplay of external signaling cues, transcriptional control, and molecular activities. Despite this inherent complexity, most models of self-renewal oversimplify the intricate dynamics associated with maintaining a cell lineage throughout development and adulthood.

Bayesian Network Machine Learning

A Bayesian network is a machine learning tool for organizing pieces of knowledge and encoding statistical dependence relationships among these pieces of knowledge. Such graphical models, in which each circle represents a node and each directed edge represents a dependence relationship, provide a flexible framework for combining different types of observed data and prior knowledge.

Inferring Functional Relationships with Bayesian Network Machine Learning

A naïve Bayes network is a simplified version of a Bayesian network in which all child nodes are dependent on the parent and independent of each other. This type of graphical device may be used to organize statistical information and generate probabalistic models of biological functional relationship (FR) networks, which are typically rendered as dense, complex graphs that represent molecular elements as nodes and predicted functional linkages between nodes as undirected edges.

Our Approach

Our cell-type specific approach is designed to control the inherent complexity of mammalian systems biology by focusing on a single cell system during a specific developmental stage, within the context of a clearly defined biological process.

Project Workflow

Our methodology integrates high-throughput genomic evidence from disparate datasets, infers functional relationships among genes, and predicts networks of genes likely to be functionally related in a specific biological context for further analysis.

Results Using Mouse Embryonic Stem Cells

We trained a Bayesian classifier using:

  • Data from more than 800 different mouse stem cell microarray expression assays, chromatin immunoprecipitation (ChIP) assays, and whole-genome RNA Interferance (RNAi) screens. This compendium of mESC data comprised more than 1.5 million datapoints, and was integrated into a training set of more than 6 billion pairwise interactions between genes.
  • A manually curated gold standard derived from ~100 current journal articles on mouse Embryonic stem cell cell renewal and cell fate. This gold standard includes more than 2000 experimentally validated pairwise interactions between genes or gene products known to be involved in mESC self-renewal.
  • A reference gene list of ~23,000 protein coding genes, downloaded from Mouse Genome Informatics (MGI).

Performance metrics and cross validation confirmed this approach helps achieve optimal results for mammalian systems.

results

Receiver operating characteristic (ROC) curves illustrate the mESC network outperforms other Bayesian classifier-based FRNs of this type for mammalian systems while minimizing overfitting.

Project Deliverables and Next Steps

This website is designed to provide you access to our underlying data, dynamic network visualization, and functional analyses of genes and proteins likely to be related in the context of self-renewal. This comprehensive online resource can be used as a reference for hypothesis creation and experimental design.

Future studies will examine shared and unique molecular characteristics of mouse stem cell and cell fate pathways in embryonic and adult stem cells as well as induced pluripotent and cancer stem-like cells. Comparative studies will contrast human and mouse stem cell fate pathways.