The importance of being earnest about statistics in life sciences
Any researcher active in the area of biology, agronomy, and other life sciences has some knowledge on experimental design and statistical analysis from their days in university and through their experience in their field of work. Statistical software is readily available which also allows them to apply this knowledge. The statistical issues seem surmountable and secondary to other issues they are confronted with. Pondering about statistics or seeking help from statisticians is therefor not often the highest priority.
This is a pity. As a practicing statistician I encountered many situations where research was poorly designed or wrong (or not entirely correct) conclusions made. The proper use of statistics really deserves more attention in life science.
Many statisticians are of that opinion and give courses, provide on-line training material and write blogs on applied statistics. This blog is yet another attempt, but with a specific objective. This blog is not meant as a manual where recipes and formulas can be found. It gives much more attention to the philosophy of statistics rather than to solutions. It will demonstrate that often there are no clear cut solutions. The purpose is to open the eyes of researches for certain issues in design and analysis, and to understand better some underlying principles. This will help them understand why statisticians propose certain solutions, or will help them with the interpretation of the data analysis and with the definition of the next steps. We hope that as a consequence other sources that focus on methods will be used even more efficiently.
The ultimate focus of this text is on good research, with statistics as one of the means to get there. It is not about statistics for the statistics. On the contrary, statistics needs to be demystified. We try to keep the bigger picture in mind, and avoid details, jargon and mathematics. There is a gap between scientists and statisticians, that needs to be closed with the practical situation at hand. We strive for pragmatism in stead of statistical dogmatism, as long as it brings us closer to a better conclusion and the correct decision.
In the end it is all about decision taking. And decision taking necessarily requires a willingness to take a risk. This in its turn requires trust in the data and the analysis based on these data, a better understanding of how large the risk really is. It does not help to explain the technical details or to dwell on mathematical justification to build that trust and understanding. We are convinced it is more useful to help researcher to reason in a statistical way and get them interested in the the real aim of statistics: improving our judgement.
Besides philosophy, novel statistical methodology will be presented. The intention is not to explain the technical details but rather to demonstrate that the well-known techniques have shortcomings, and that (some of) those have been addressed in more modern techniques. The foundations of the statistical methods that are commonly used in experimentation date from the first decades of the 20th century. Many of the techniques currently still dominating routine analysis were developed in that era.
This is surprising because some of these early methods were already criticized by fellow statisticians immediately after they were published and by now many are really outdated. The methods dating from that period had to be computationally very simple. Yet, despite the presence of better methods and powerful personal computers, many of these older methods are still very popular.
If a scientist wrote a paper that took no account of any work done in the last 25 years, it would be surprising if an editor could be found to take it seriously. Yet a paper in which the statistical methods are even more out of date is apparently acceptable.
— John Nelder (1999) —
This conservatism has to do with convenience and seemingly simplicity of these basic methods, with the statistical education that in common graduate science curricula often stops at these basic techniques, with the lack of acces to statisticians, with the availability of popular tabulation software that do a little basic statistics on the side, with journal editors that insist on the use of commonplace methodology applied in the conventional ways before manuscripts gets accepted. The broader availability of more powerful computers starting in the last decades of the 20th century revolutionized applied statistics. Superior techniques (likelihood methods, computer intensive simulations, bayesian statistics via Markov chains,...) that were known but remained only applicable in very simple cases, are nowadays usable for the vast majority of the issues we are confronted with. They allow to get rid of lots of the constraints and assumptions imposed by older methods, treating data more efficiently, allowing more efficient experimental designs. This is very exiting for statisticians and the researchers should benefit from this as well.
The flip side for the researcher, however, is that statistics has moved away from a cookbook style (for this type of experiment you will use this statistical method) to a much more tailor-made approach where for each situation (the objectives of the research, the layout of the experiment, the observed behavior of the data, the prior knowledge about the subject, the analysis...) optimal solutions need to be defined. Designs are even more intrinsically interlinked with the analysis methodology to take advantage of the new methods. It is therefor less and less trivial for researcher that are not familiar with new developments in statistics to make most out their experiments. Likewise, it is more and more needed that the statistician understand the objectives and the subject of the research to advice on experimentation. In other words, we plead for a more tight interaction between researchers and statisticians, what would give statistics its due and will results in better reasearch.
People use statistics as a drunk uses a lamppost — For support rather than illumination.
—Andrew Lang (1844 – 1912)—
Our experience teaches us however that even the simplest of methods are often not well understood. Scientist have a vague idea of what they are about, but forgot about constraints, critical assumptions and applicability of the output. We pointed out that popular techniques are often popular because of their convenience. A notorious example is the use of Null-hypothesis test that come with their P-values. It is convenient because everybody understands P-values (or thinks they understand) and conclusion are unambiguous. Many of the modern methods get away from this crisp black-and-white answers, and this for good reasons. Unfortunately this creates reluctance to leave the comfortable (false feeling of) certainty of the old methods. We plead however, that methods that are less explicit, stimulate much more the thinking about the data, the results and the inference and this to a the great benefit of the research. Statistics is not about proving something. It is about quantifying uncertainty and calculating with uncertainty. Tools have been developed to that end. We have to learn to deal with uncertainty and realize that risk evaluation for making a wrong decision about next steps is an intrinsic part of how you should handle results of an experiment. We hope therefor that the researcher is willing to leave his or her comfort zone, because we know they will get somewhere more rewarding.
The intent of the text is to stimulate you to think about the subjects we present. We will avoid mathematics unless we think it really helps to clarify the subject, but rely much more on common sense, intuition and logic. The latter is also where we need to act most. Sometimes we get fooled by our intuition and hopes. To get the logic straight again by taking some distance and letting the objectivity rule, is far more important than understanding the mathematics. We will use as much as possible examples. Realistic examples if we think that it helps to make the matter very concrete and makes the link to your day-to-day work, or sometimes mock examples if we think you will benefit of getting away from the thinking and solutions common in your own science area. In that way we try to make you understand better certain statistical approaches, appreciate better their advantages and limits and recognize that a less familiar approach may be more appropriate than the old standard solutions.
Everything should be made as simple as possible, but not simpler.
— Albert Einstein (1933) —
We hope that we succeed in this effort by making the text entertaining. This is not a blog that you use as your reference to apply statistics, but rather a text you read out of general interest to understand statistics better. For more details and systematic treatment of the subjects you will need to reach for other sources (and those are plenty).