IDSS Conversations: Caroline Uhler

Date Published

March 24, 2016

Caroline Uhler joined the MIT faculty in October 2015 as an assistant professor in the Department of Electrical Engineering and Computer Science. She was awarded the 2015 Doherty Professorship in Ocean Utilization in November 2015. She joined the Institute for Data, Systems, and Society (IDSS) — which addresses complex societal challenges by advancing education and research at the intersection of statistics, data science, information and decision systems, and social sciences — as a member of the Laboratory for Information and Decision Systems (LIDS).

Uhler’s research focuses on mathematical statistics, in particular on graphical models and the use of algebraic and geometric methods in statistics, and its applications to biology. Her current projects include the development of causal inference algorithms to infer gene regulatory networks, the development of ellipsoid packing algorithms to study the spatial organization of chromosomes, and the study of Brownian motion models for phylogenetic inference using quantitative traits.

Uhler spoke with IDSS about her work and her perspective on being part of both LIDS and IDSS.

Q. How would you explain your work — both the theoretical and the applied — to someone not in the field?

A: Let’s start with graphical models. That’s my main interest. A graph in mathematical language is a network. So you have nodes [the points on the graph] and you have edges [the lines between the nodes].

There are two different kinds of models on a graph: Either you have data on the nodes or you have data on the edges. What I work on is gene expression data, so the nodes are the genes and the edges represent interactions between these genes. What we get to measure is the genes — how much protein they produce. Missing edges represent some kind of independence relation. Assuming you have knowledge on all nodes, a missing edge between gene 1 and gene 3 means that gene 1 does not provide any further information on gene 3 other than what we already know from all other genes.

The models I work on — with data on the nodes — can be undirected or directed. Undirected networks only represent association, meaning gene 1 has some effect on gene 2, but it carries no information about direct effect. In contrast, directed graphs can represent causal relationships, meaning that if this gene changes its expression by some amount, then the edge weights tell us by how much the expression of the other genes change. Most things in our world are directed — they have a cause-effect relationship. Such directed graphical models are therefore more informative and important for various applications. These are the kinds of models I am mainly working on.

Q. Genes are expressed in many ways: the color of your hair, the length of your bones. Are you interested in any particular trait or set of traits?

A: My interests are more basic. We have brain cells, and we have lung cells, and we have heart cells, and they are very different from each other. Even though gene expression varies a lot in different cell types, the DNA sequence in each one of our cells is approximately the same. So how is that possible? One thing I’m interested in is developing methods to infer gene regulatory networks for different cell types using causal graphs. Which are the key genes that are differentially expressed in the different cell types? What makes these differences?

Once the gene regulatory networks for the different cell types are known, the next challenge is to understand the mechanisms that drive the network structure. One of the hypotheses is that cell-type specific gene regulatory networks arise from differential packing of our genomes in the cell nucleus. Humans have 46 chromosomes and they’re all nicely packed as little ellipsoids into the nucleus. Interestingly, the nucleus in different cell types comes in different shapes, implying different packings of the chromosomes. Such packings allow for accessing different genes by the transcription factors — which are the [proteins] that turn on or off genes. These differences in packing could explain the emergence of different gene regulatory networks that lead to different cell types. I’m interested in modeling the spatial organization of the chromosomes using ellipsoid packing models to predict how the gene regulatory networks change in cellular differentiation and reprogramming.