«I enjoy the mathematisation of complex problems»
Why he is fascinated by statistics and how he, with the financial support of an ERC Grant, develops mathematical methods that make the hidden connections in large data sets visible. That is what Peter Bühlmann, Professor of Mathematics at ETH Zurich, is talking about in the interview with Rolf Probala.
Peter Bühlmann, you are a statistician. How come?
Back at my Swiss high school, I had a cool young mathematics teacher whom I also liked very much as a person. But for me, mathematics then was not that exciting. It was a bit too mechanical for my taste. You would learn certain rules and subsequently derive something with their help. However, my father was a professor for mathematics and so I got a small glimpse of the creativity of mathematics back home; I became aware that you can formalise a real phenomenon with the help of mathematics and use this field to contribute to the solution of societal issues and problems. And so, I thought, I’d give it a go and study mathematics.
And what made you specialise in statistics?
As a student, after my first lecture in statistics, I was already fascinated by the fact that you can infer something backwards with the help of statistics. For example, if you throw a coin multiple times and note the number of times the outcome is either head or tail, you can find out whether the coin was manipulated. I thought it was very interesting that you could draw conclusions from empiricism about the truth, the real condition of this coin or the special ability to throw coins of any given person. Subsequently, I started attending statistics lectures and specialising in this field. I did enjoy pure mathematics – but to this day, I am fascinated and motivated by connecting it with issues from the real world. I enjoy formulating complex problems in mathematical terms, to mathematise them.
For me, mathematics in high school
was not that exciting.
It was a bit too mechanical
for my taste.
What exactly does a statistician do?
We attempt to detect patterns and connections in a plethora of data by means of mathematical and statistical methods and to draw conclusions thereof in order to reject hypotheses, validate models or make predictions. The uniqueness of statistics is that it offers the only framework in which uncertainties can be quantified as well. The conclusions we draw are always equipped with probabilities that cause this or that to happen. We will therefore never say that something will occur with absolute certainty. One part of my research is the mathematical development of novel statistical methods. The other part is my collaboration in interdisciplinary projects with interesting partners, mainly from the areas of life science, biology and medicine. They all need statistics in order to solve complex issues and together we try to find ways of where and how statistical modelling can contribute its share.
How do you proceed?
We discuss the issue at hand until we understand it completely. Afterwards, we try to translate the problem into a mathematical form, to formulate a model. Mathematisation always also means simplification. If we succeed to establish a mathematical formulation, half of the intellectual work is accomplished. The essence of mathematics is that it offers a language with which you can communicate clearly and formulate assumptions under which the issue can be solved.
This leads us to your ERC Project «Statistics, Prediction and Causality for Large-Scale Data». What is it about?
Very simply put, it is about developing novel statistical methods with which causal relations and connections within large data sets can be detected that would remain hidden applying already available statistical techniques. This proves very promising, for example in the interdisciplinary collaboration with Ruedi Aebersold in the field of proteomics: it is highly difficult to draw conclusions from data in relation to «real» causality. However, our methodology takes us very close to this ambitious goal. Such novel statistical methods also help to make complex systems more robust so that they function reliably in new or altered environments. Let me illustrate this in terms of the interdisciplinary project PSSS (Personalized Swiss Sepsis Study) in which I am engaged as well. The project is coordinated by the Machine Learning & Computational Biology Lab of ETH Zurich and the University Hospital Basel.
The uniqueness of statistics
is that it offers the only framework
in which uncertainties can be
quantified as well.
Its objective is to detect the risks of sepsis at an early stage individually for every patient in intensive care at hospitals. The system monitors the patient’s condition by continuously measuring approximately 800 characteristics such as pulse, blood pressure, oxygen level etc. In case of a looming sepsis, the system should recognise the danger for any given patient resulting thereof as early as possible. Hereby, we use machine learning techniques with which we basically «train» the algorithms of the system in order to detect as many signs and manifestations of sepsis as possible. The great challenge is that the system shall not only work for a specific group of patients at a specific hospital but generally for preferably all patients everywhere.
And how can statistics help?
Of course, we cannot perform magic, but this is where big data comes into play. There are many published studies and experiences in terms of causes and manifestations of sepsis in intensive care units. The question is: how do I tap into and combine this information load skillfully in order to generalise all these statements? Here, we need novel statistical methods.
How are these novel statistical methods presented once the ERC project is concluded?
They are presented as formulas, as mathematical theories and as a novel methodology. This includes the conditions under which we make a statement and the aspects we (and no other method either) cannot comment. This is one result. The other result will be a software so that these methods and algorithms can be applied. The software will be open source and thus available to everyone.
If we succeed to establish
a mathematical formulation,
half of the intellectual work
is accomplished.
Interdisciplinary collaboration is important to you. Last year, you launched the «ETH Foundations of Data Science» initiative involving the Departments of Mathematics, Computer Science as well as Information Technology and Electrical Engineering Science. What is your objective?
One objective is to expand our research, to coordinate our work even better und to intensify our collaboration. The other objective is certainly to offer a broader perspective to young scientists. We have many highly talented and motivated postdocs and PhD candidates. I find it important that they are offered enough contact persons and an inspiring environment in order to create their own projects. This association generates, I hope, a critical mass of people who exchange and develop, also informally, new excellent ideas.
At present, we are witnessing how big data and artificial intelligence shake the foundations of our economy and society. New ethical questions arise. How do mathematicians deal with these issues?
We as mathematicians are also confronted with the question of what is allowed and what one should do, and things develop at high speed. Physicians, biologists and geneticists had to deal with these questions very early on. Up to now, the basic research scientists have remained rather quiet, which may well be due to their stance of thinking: I only deliver the principles, the engineers are building the systems. However, this no longer suffices. I believe it is important to incorporate these ethical questions into our mathematical culture and into the curriculum of the corresponding training courses at ETH Zurich as well.
How do you see the future of mathematics?
Digitisation has also brought us a mathematisation of many areas concerning life and society. Mathematics today has a much stronger position than it did a couple of decades ago, when it was perceived sometimes as a somewhat aloof «supreme discipline».
Today, we as mathematicians are
also confronted with the ethical question
of what is allowed and what one should do.
Today, the public realises that mathematics can also contribute in solving important societal issues. This is why mathematicians should sum up the courage and say: Because we are trained to communicate in a specific exact way about problems, we should not only calculate and analyse phenomena that have already occurred, but instead «actively» join in the discussion and co-create.
Interview mit Peter Bühlmann (in German)
Peter Bühlmann
Peter Bühlmann studied Mathematics at ETH Zurich and earned his PhD in 1993 in Mathematics with a thesis on Statistics. From 1994 to 1995, he worked as a postdoc and from 1995 to 1997 as a Neyman Assistant Professor at the Department of Statistics at the University of California at Berkeley. He then returned to ETH Zurich where he was an Assistant Professor at the Department of Mathematics from 1997 to 2001 and an Associate Professor for Mathematics from 2001 to 2004. Since October 2004, he has been Professor for Mathematics with a research emphasis on statistics with connections to machine learning, bioinformatics and computational biology.
Horizon 2020 Project
CausalStats: Statistics, Prediction and Causality for Large-Scale Data
- Programme: ERC Advanced Grant
- Duration: 60 months
- Contribution for ETH Zurich: 2’184’375 €