When not the difference but similarity is what you are after
The difference test starts from a hypothesis that claims the absence of a difference and by rejecting it, you make inferences about the existence of the difference. But what to do if you actually would like to prove the absence of a difference or equivalence? Can we use the same procedures? No, we cannot...
The requirement of equivalence is not that uncommon. For instance a new wheat variety has a superior baking quality, but it is lower yielding than a reference wheat variety. By breeding efforts the yields of the new variety are improved. At a certain point, we compare two varieties in the hope they are equivalent in yield. Inversely, a higher yielding variety should be equivalent to reference varieties when it comes to baking value. A drought resistant variety should outyield other varieties in the less frequent drought struck seasons, but should not be inferior in years with normal rainfall.
It may be not clear why a normal hypothesis cannot do the job here. Easiest to understand is by the absurdity of an example. Say we carry out the yield test of an improved baking quality wheat variety and compare it with a reference wheat variety. We put them in a well designed and well maintained field trial. We perform a hypothesis test and obtain a P-value of 0.03. We have to conclude the two varieties are not equivalent. Our more sloppy fellow researcher tried to do the same, but he used less replicates, lost some plots and some errors slipped into his measurements. Due to the higher variability and fewer replicates, the hypothesis test is less powerful and for the same comparison he obtains a P-value of 0.12. He concludes that the varieties are equivalent and gets promoted by his boss because equivalence is what the management needed to get the variety on the market. It should be obvious that a test that benefits from poor experiments and punishes large and carefully executed experiments cannot be what we need.
It is clear that the usual hypothesis test cannot be reversed. So how do we go about equivalence testing? The important step in this situation is to decide upfront how big a difference needs to be before we cannot any longer consider two treatments as being equal. For instance if the new variety yields 6.23 ton/ha and the old variety 6.25 ton/ha, we can easily forego the 10 kg extra yield per ha, if we get a better quality, even if this 10 kg difference was found to be significant. But how long can we continue with this reasoning? When would we consider the yield not to smaller? This is a question the subject matter specialist can answer based on his knowledge of normal variability of yield. If the specialist judges that differences of 50 kg/ha is acceptable but anything larger needs further development, the statistician can check whether the estimated difference of 20 kg is 'guaranteed' within the band of 50kg set by the specialist. For this he will construct a say 95% confidence interval on this estimate and check whether the boundaries of this interval remains within the limit of 50 kg.