Goncalo Abecasis: Interview with Christiana Fogg

From Genome Analysis Wiki
Jump to: navigation, search

In March 2013, I was awarded the Overton Prize by the International Society for Computational Biology (ISCB). I am not sure I deserve it and, perhaps to double check if they had the right person, ISCB asked Christiana Fogg to interview me. She kindly allowed me to post a transcript of the interview below. Christiana's story was published in Bioinformatics.

Contents

Questions about training and early career

What attracted you to population genetics and biostatistics as a student?

From a young age, I have always been fascinated with understanding how life works. My parents used to take me to a bookstore on Sundays, and I remember building quite a collection of wildlife books. Early on, I probably expected to be a naturalist in the African savanna or in the South American rainforest but, over time, this matured into a interest in genetics.

Although I didn't know it at the time, a key skill that later contributed to my success in genetics was my interest in computer programming. While in high school, I participated in a boys group organized by a marketer, a civil engineer and a computer programmer. The club was meant to keep us busy and out of trouble, but they did encourage us to try programming and point us in the direction of very useful techniques, like object oriented programming and the like. By the end of high school, I was working almost full time as a computer programmer and taking my classes in the evenings (in fact, that is how I paid for the first year of my studies in the UK; and with a lot of my help from my parents, for the later two years also).

Did you know you wanted to pursue genetics early in your studies and how did your interests evolve as your training progressed?

Early in my training, I was quite sure I would end up studying the agricultural or cancer genetics (coherence was obviously not my strong suit).

I was first drawn to Human Genetics by working with Mary Anne Shaw at the University of Leeds. We were investigating how genetic variation in the interleukin-1 gene cluster (a set of immune genes where variation was easy to measure with then available techniques) was related to infection by leishmaniasis and other tropical parasites. Through serendipity, this resulted in an offer of funding for a PhD at the University of Oxford working with Bill Cookson, who was quite interested in how interleukin-1 and other genes might contribute to asthma susceptibility.

Working with Bill Cookson at the Welcome Trust Center for Human Genetics was a great opportunity. The Center was a mecca for human geneticists at the time, with great support from the Welcome Trust, and lots of smart people trying new ways to run genetic studies and looking to make rapid progress in many different traits. Bill always encouraged us to try new things, and as we pushed the limits of the sequencing and genotyping technologies of the time, we were soon generating datasets that were beyond the reach of existing analysis tools and methods.

It was easy to realize that new analysis methods and computer software were needed - and being in Oxford, working at the Welcome Trust Center was just the right place to be. There were a lot of very smart and easy to talk to statistical geneticists there at the time - Eric Sobel, Martin Farrall, Angela Marlow, and (eventually) Lon Cardon. They were all extremely patient and taught me a lot.

Can you describe any significant observations as a trainee that inspired you to continue pursuing research?

My interest in the computational and statistical aspects of human genetics really developed when I told Bill that I could solve our data analysis problems by implementing the analysis tools we were then using. Bill was surprisingly encouraging and put up gracefully with the expected series of bugs and mishaps that must have happened along the way. I can't remember the details of these bugs, but I know I always repeat the mantra that "all software is buggy, and this is no exception!".

After a few months, Bill recruited Lon Cardon (who had just arrived in Oxford) as a co-mentor for me. Lon obviously new a lot about statistical genetics and was happy to invest lots of time suggesting books and articles I should read and arguing with me about the right way to tackle different problems. A big conundrum of my career is trying to figure out how to make sure I am as generous with my time to my students as Lon and Bill were with me. I always feel like I am extremely busy, but I am sure they were busy too!

How was your career path influenced by the race to sequence the human genome and the emergence of genomics?

The race to sequence the human genome was obviously very exciting and meant that genetics was constantly in the news. I really become deeply involved in research towards the later stages of the human genome project and you could say that my major initial contributions resulted from a natural follow-up to the genome project. Specifically, once one human genome had been sequenced, it was very natural to start thinking about how individual genomes differed from this initial sequence and to understand how these differences contribute to the great diversity we see among people today.

In analyzing the data I had generated working with Bill and his team, I developed a series of methods for describing and interpreting genetic variation that turned out to be extremely useful. As a result, my focus gradually shifted from laboratory methods, technology and data generation to issues related to study design and analysis.

Can you describe how high-throughput sequencing has significantly changed your research, the field of population genetics, and the study of complex traits?

High-throughput sequencing, and high-throughput genetics more generally (also including arrays, for example) has completely changed human population genetics and disease studies. In the study of human population genetics, we now have very clear answers about the degree and structure of genetic variation in the world today, but have also gained a lot of detail on human population history -- including very ancient events, like admixture with Neanderthals. The work of people like Richard Durbin, Heng Li, John Novembre, Andy Clarke and many others in this area is amazing and truly an inspiration. Human genetic studies are also completely different - we now have many specific and definite connections between genetic variation and human disease and, although the genetics is more complex and the single variant effect sizes are smaller than some had hoped, we are starting to understand the cards Nature has dealt us.

Questions on Mentorship

Do you have any scientific mentors that greatly impacted your career? If so, describe how their mentorship has shaped you as a scientist and as a mentor yourself

I would say that I have been very lucky to come across a great series of mentors. From the guys who organized my boys group in high school, to Mary Anne Shaw in Leeds, Bill Cookson and Lon Cardon in Oxford. One other person that stands out is Mike Boehnke at the University of Michigan. Mike somehow convinced the Biostatistics Department at the University of Michigan to take a flutter on my, when I had just finished my Ph.D. and had much less formal training in statistics than most of my colleagues. He has always been generous with his time, and I probably can't count the times that I have interrupted him in his office, bounced some ideas off him, and came out energized and thinking about something new to try.

As I knew them, I remember my mentors as demanding, generous with their time, unrelentingly positive and encouraging, and totally transparent. It is obviously a standard I'd like to meet, although I doubt I am there yet.

Do you find that your research attracts trainees with diverse research interests and backgrounds?

Absolutely. There are great unsolved and constantly changing problems in genetics that include biology, computer science and mathematical and statistical modelling. It is hard to be bored and easy to be awed. I have mentored trainees from a wide variety of backgrounds, both social and educational, and it would be hard to say they are very much alike each other. Maybe I should sequence them, to confirm they have something in common. At first glance, I can't figure out what it would be.

What do you find most rewarding about working with trainees?

It is great to set a student free on an interesting open problem and have them solve it. You can do so much more with a few good trainees than you could ever accomplish on your own.

Questions on Research Impact

I see that your CV highlights an invitation the White House, during which Vice President Biden highlighted your research in a report. Can you describe how it felt for your research to be highlighted and what it means for the field of biostatistics in general?

I was thrilled. I remember I had very short notice (perhaps a couple of days) and had to rush and find something to wear. Although it is cheesy, it is really amazing to live in a country that functions so much like a meritocracy. I didn't have to write a check, join a committee, vote - anything. I had a good idea about how to sequence a lot of genomes more rapidly, proposed it, and not only did I get funded to try it out (it worked, by the way), but my work was selected as one of the highlights for the Vice President's speech on the important of technology development and biomedical research.

Your research is not only focused on identifying genetic variants relevant to human disease, but you are also interested in understanding how linkage disequilibrium impacts genetic variation. Have any observations surprised you in your analysis of linkage disequilibrium in the human genome?

Linkage disequilibrium describes how groups of variants are shared among individuals -- sometimes it seems a bit arcane, but it is full of surprises. I remember that, a little over 10 years ago, we had a hard time publishing our first large scale description of linkage disequilibrium because one of the reviewers thought our observations (from data) contradicted prior simulations... I guess it goes to show that one can grow too attached to a model. A few years later, we and others were surprised to show that much of the genetic variation in any individual could be recovered very accurately by comparing each individual to a reference set of individuals and, more recently, we have used the process to make it relatively inexpensive to sequence large numbers of individuals. At our last count, >30,000 human genomes had been sequenced using our "low-coverage" linkage disequilibrium based approach.

Questions about collaboration and dealing with large datasets

I see that your work is intensely collaborative and involves researchers from around the world. How do you manage your collaborations? Has this also driven how you share data and tools between collaborators?

Saying that I manage my collaborations is probably too optimistic a view of things. I like to thing about how I "keep up" with my collaborations. I really believe in sharing data and in sharing tools. So many great discoveries and advances come from bringing in insights, ideas and approaches from a different field. Genomic datasets are now so complex, that our intuition can have a very hard time distinguishing between potential approaches and selecting the best one. We need to encourage people to try and share new approaches - and being open about data and tools is a great way to do that.

That said, encouraging data sharing is a process. There are legitimate concerns about protecting the identity and privacy of research subjects and, once in a while, people do use data you share pre-publication to gain an advantage. Still, there is no doubt we are moving in the right direction - expectations for data sharing and collaboration are so much more open than when I started.

Can you give an example(s) of a collaboration/collaborator that stands out with respect to discoveries that have stemmed from this work?

This one is a really tough question. I have been lucky to work with so many great people.

One of my most interesting collaborations was been with David Schlessinger and my other colleagues in the Sardinia project, particularly Francesco Cucca and Serena Sanna. When I first met David, and he described the idea of conducting a thorough genetic study in an isolated valley in Sardinia, I never thought it would happen. It seemed so ambitious. But David and our Sardinian colleagues have boundless energy and real dedication, and the study probably accounts for most of my highly cited papers!

Working with Anand Swaroop on the genetics of age-related macular degeneration and James Elder on psoriasis has also been a treat. The genetic effects in those two conditions are so clear that they really have helped us evaluate and test new gene-mapping approaches and ideas; without first collecting the even larger sample sizes that would be required to tackle more typical human complex traits.

In the 1000 Genomes Project, I have had the chance to work with some of the smartest people I know - David Altshuler, Gil McVean, Richard Durbin. A memory that stands out is a conversation with Richard Durbin at Cold Spring Harbor in 2007 and which eventually led to the project. I had just given a talk describing the idea of imputing missing genotypes in humans and how it could be extended to enable cost effective whole genome sequencing approaches. Richard Durbin invited me for lunch and explained that, although he really liked the idea, my model for high throughput sequence data was deeply wrong. Alas, he was right. Fortunately, we were able to modify the approach to work with realistic sequencing technologies and eventually propose a strategy for the 1000 genomes project - at a time when only a handful of genomes had been sequenced.

Has the volume of available sequence data driven your design for improved analytical tools?

Yes - the speed at which genomic datasets increase in size makes it hard to keep up. In fact, I think I am supposed to sit down with my Dean and discuss how to reduce the electric bill for our computing. More seriously, as data accumulates, questions that previously required a lot of computation and modeling become easier to answer, and we can move on to tackle progressively more challenging ones.

What other challenges have you confronted in developing analytical tools for studying complex traits?

Most of my work focuses on studies of populations of thousands of individuals. The space of relationships between individuals grows extremely fast - the challenge isn't simply digesting one genome's worth of information, but actually figuring out how to evaluate things across thousands of genomes, while accounting for uncertainty about their relationships (as you know, we are all related to each other, it is just that unfortunately the precise pattern of relatedness is rather distant and uncertain).

Future studies

How do you see GWAS studies advancing our understanding of complex traits?

Anand Swaroop gave me the best analogy for describing GWAS. He describes GWAS as neighborhood crime maps. These don't always tell you the full story, but they do tell you where to look and where to pay attention. For human complex trait genetics, it is been a massive change for the better. From a situation where most published discoveries were false and irreproducible, they moved us to a situation where the clear majority of published findings are correct and can be reproduced. That is a huge thing.

In some cases, they have given us clear key insights (like the important role of the complement pathway and HDL metabolizing genes in macular degeneration); or the fact that LDL-cholesterol is causally implicated in heart disease, but HDL-cholesterol probably is not. For the majority of cases, well -- we have the neighborhood crime map, and a fair bit of detective work ahead to solve the crime! Some are disappointed that genetic effects detected in GWAS are relatively small, but that is a bit similar to being disappointed that most fossils are not a T-rex. Our goal has to be to understand nature, in all its beautiful complexity. We shouldn't be disappointed if the full picture is more complex than we guessed when we were learning to draw.

Have you seen any of your observations of gene variants associated with various diseases inform clinical studies that are focused on improving disease treatment?

The process of designing and implementing new treatments in humans has a long turn-around time, unfortunately. But, we can now sometimes make pretty good predictions about age related macular degeneration decades before disease will manifest. We now know that pursuing HDL-raising therapies for heart disease is not productive. These are all good news and clear progress.

In your opinion, what diseases that you have examined may be most effectively treated or managed in patients for whom a genetic screen reveals an increased likelihood of disease?

For most diseases, our understanding of the biology is not nearly as advanced as we'd like and our ability to design improved treatments and disease management plans lags our ability to understand disease. That said, I think you'll see genetic information used in more and more settings. Just today, one of my former students published a paper showing that genetic information can be used to focus disease prevention trials on at risk individuals, reducing their cost and duration - and bringing treatments to patients sooner.

Questions about Award

The G. Christian Overton Award honors early and middle career stage scientists who make significant contributions to the field of computational biology.

How do you feel being the 2013 recipient of this award?

It is truly an honor. Looking at the list of past recipients, I am very humbled. Lots of great scientists on whose work I depend regularly. Perhaps someone made a mistake; but I will wait until after Berlin before I check on that.

Do you feel that the recognition of you research helps highlight the importance of biostatistics in the study of genetics?

Statistics and modeling uncertainty is critical to modern genetic studies. Most of the problems we are tackling now don't have clear black-and-white answers. Most people, including me, crave a simple mechanistic answer. But a mathematical view of things, with uncertainty, now describes our understanding of human disease much better.

There is so much work to do in human genetics. If this award encourages members of the ISCB to bring some of their considerable expertise to bear on the big open problems in genetics, that would be an amazing outcome.