Skip to main content
Home

Multiple Testing Correction

A tool for life science researchers for multiple hypothesis testing correction

What is multipletesting.com is useful for?

Conducting multiple statistical tests increases the likelihood that a significant proportion of associations will be false positives, clouding real discoveries. Several strategies exist to overcome the problem of multiple hypothesis testing. Our multiple testing correction tool provides the five most frequently used adjustment tools to solve the problem of multiple hypothesis testing, including the Bonferroni, the Holm (step-down), the Hochberg (step-up) corrections, and allows to calculate the False Discovery Rate (FDR) and q-values.
Using this multiple testing calculator is straightforward and user-friendly. It has never been easier to adjust p-values! Check out the list of possibilities for multiple hypothesis testing!

Multiple Testing Correction

Multiple Testing

Start


perform multiple hypothesis testing using a list of p values

Publication


read our guide to multiple hypothesis testing

Footer menu

MultipleTesting.com has been utilized among others in studies published in:

Multipletesting.com cited by papers

False Discovery Rate online calculator

The False Discovery Rate (FDR) is a crucial statistical tool that prevents researchers and scientists from mistakenly identifying random noise as meaningful discoveries in their data. It is like a quality control filter that ensures the findings they report are reliable, not false positives. Understanding FDR is essential for anyone conducting data-driven research to maintain the credibility and accuracy of their results.
Example for the False Discovery Rate
Imagine you are searching for treasure buried in a field with a metal detector. Every time your detector beeps, you start digging, hoping to find something valuable. Now, the False Discovery Rate (FDR) is like a way to keep track of how many times your detector beeps when there is actually no treasure. In other words, it helps you determine how many of your "discoveries" are false alarms.
If you want to be careful and not waste too much time digging for nothing, you might set a rule for yourself like, "I'll only dig if my detector beeps at least three times in a row." This rule helps you reduce false alarms because a single random beep is less likely to be a real discovery.
In statistics, the FDR is a similar concept. It helps scientists and researchers avoid claiming too many false discoveries (like finding patterns or effects that are not real) when they analyze data. It is a way to be more cautious and ensure that the "treasures" they find are more likely genuine.

Bonferroni correction online calculator

The Bonferroni Correction is a critical statistical technique employed in scientific research to control the probability of making false discoveries when conducting multiple hypothesis tests. Imagine you are a scientist investigating the effects of 10 different treatments on an outcome. Without the Bonferroni Correction, the risk of incorrectly identifying a significant effect increases with each test. This correction method, akin to adjusting your criteria for success to a much stricter standard, ensures that the overall rate of false discoveries remains low, maintaining the reliability of your research findings.
Scenario: Imagine you are a medical researcher investigating the effects of a new drug on patients' health outcomes. You have collected data on various health parameters such as blood pressure, cholesterol levels, heart rate, etc. You want to test whether the drug significantly impacts each of these parameters.
Why the Bonferroni Correction is Good
Control of False Positives: Without any correction, if you perform each test at a significance level of 0.05 (commonly used in research), you have a 5% chance of making a Type I error (false positive) in each test. If you perform 10 tests, the chance of making at least one false discovery becomes unacceptably high (around 40%). The Bonferroni Correction reduces this risk.
Maintains Statistical Rigor: The correction is a conservative approach. It helps maintain the statistical rigor of your analysis by lowering the significance level for each test. In our example, with 10 tests, the Bonferroni Correction would require you to use a significance level of 0.05/10 = 0.005 for each test. This makes it less likely to declare a result as significant when it is actually due to chance.
Why the Bonferroni Correction Can Be Seen as Bad
Too stringent Criteria: While the Bonferroni Correction guards against false positives, it can be overly conservative. Using a smaller significance level makes it harder to detect actual effects, especially if your sample size is limited. It may lead to a higher chance of false negatives (Type II errors), where you fail to detect a true effect.
Independence Assumption: The Bonferroni Correction assumes that the tests are completely independent of each other. If your tests are somewhat correlated, the correction may be overly harsh, further increasing the risk of Type II errors.
Loss of Power: By reducing the significance level for each test, you reduce the power of your analysis. This means you might need a larger sample size to detect genuine effects.
In summary, the Bonferroni Correction is a valuable tool for controlling the risk of false discoveries when conducting multiple hypothesis tests. It is good because it helps maintain statistical rigor and controls false positives. However, it can be seen as bad because it is conservative, can lead to missed real effects, and assumes test independence. Researchers often weigh these pros and cons and choose the correction method that aligns with the goals and nature of their study.

The Benjamini-Hochberg method online calculator

Scenario: Imagine you are a scientist studying the genetic basis of a disease. You have collected data on the activity of thousands of genes in individuals with and without the disease, and you want to identify which genes are significantly associated with the disease.
How the Benjamini-Hochberg Method Works
Ranking Genes: First, you rank all the genes based on their level of activity and how likely they are to be related to the disease. The genes at the top of the list are your best candidates for being associated with the disease.
Setting a False Discovery Rate (FDR): Now, you decide on a tolerable rate of false discoveries, let's say 5%. This means you're willing to accept that up to 5% of the genes you identify as significant may be false positives.
Testing Genes: Starting from the top of the ranked list, you begin testing each gene's association with the disease. You perform statistical tests to see if the gene's activity is significantly different between the disease group and the healthy group.
Controlling False Discoveries: As you test each gene, you keep a count of how many genes you've identified as significant. If, at any point, you exceed the 5% FDR threshold you set, you stop testing further genes. You've controlled your rate of false discoveries to stay within the predefined limit.
Why the Benjamini-Hochberg Method is Useful
Controlling False Positives: By setting an FDR threshold, you ensure that the genes you identify as significant are less likely to be false positives. This is crucial in genomics and other fields where multiple hypothesis tests are standard.
Efficiency: It allows you to focus your resources (time, money, and lab work) on the most promising genes while keeping the risk of false discoveries in check.
Flexibility: You can adjust the FDR threshold based on the specific needs of your study. For example, you might want to be more or less conservative depending on the consequences of false discoveries.

The Holm-method online calculator

The Holm correction, also known as the Holm-Bonferroni method, is a multiple comparison correction technique used to control the Familywise Error Rate (FWER) in statistical hypothesis testing. It is best suited for situations where you are conducting multiple hypothesis tests and want to control the overall Type I error rate, which is the probability of making one or more false discoveries. Here are circumstances in which it is best to use the Holm correction:
Many Hypothesis Tests: When you are performing a large number of hypothesis tests simultaneously, such as in genomics, where you may be comparing the expression levels of thousands of genes or in clinical trials with multiple treatment groups.
Controlling for Overall Error Rate: If you want to maintain a stringent control over the probability of making any false discoveries among all the tests you're conducting, the Holm correction is appropriate. It's especially useful when you don't want to inflate the overall Type I error rate, as can happen with the simpler Bonferroni correction.
Ordered or Prioritized Hypotheses: The Holm correction works well when you have some reason to believe that your hypotheses or tests can be ordered by importance or priority. It adjusts the p-values in a way that reflects this order.
Flexibility in Adjustments: The Holm correction is more powerful than Bonferroni while being less conservative. It allows for potentially less stringent adjustments for hypotheses that are unlikely to be important, while maintaining strict control for those considered more critical.
Sample Size Variability: In situations where the sample sizes for each hypothesis test vary, the Holm correction can be particularly useful because it takes these variations into account when adjusting the p- values.
However, it's essential to consider that the Holm correction might not be the best choice in all circumstances. If you have a specific FDR (False Discovery Rate) requirement or if your tests are not ordered by priority, other methods like the Benjamini-Hochberg (BH) procedure, which controls FDR, might be more appropriate. Additionally, if you have strong prior knowledge about your hypotheses, you can consider Bayesian methods as an alternative.
In summary, use the Holm correction when you have a large number of hypothesis tests, want to control the overall Type I error rate, and have some sense of prioritization among the tests. It offers a balance between control and power in multiple testing situations.

The q-value online calculator

The q-value is a specific threshold for the False Discovery Rate (FDR). The False Discovery Rate is the proportion of false positives (incorrectly identified discoveries) among all the discoveries or significant findings made when testing multiple hypotheses or conducting numerous statistical tests. The q-value represents the maximum acceptable FDR you will tolerate in your analysis. In other words, if you set a q- value of 0.05 (5%), you are willing to accept that up to 5% of the discoveries you claim as significant might be false positives.
Calculating the q-value: To calculate the q-value for each hypothesis or test, you typically order the p- values (the probability of observing the result under the null hypothesis) from smallest to largest. Then, you compare each p-value to its position in the ordered list and multiply it by the total number of hypotheses (or tests) being considered. This gives you the q-value.
Decision Rule: You compare the calculated q-value for each test to your predetermined threshold (e.g., q < 0.05). If the q-value is below the threshold, you consider the test result statistically significant while controlling the FDR at the specified level.

What input data are needed?

The input data required for multiple testing typically include:
  1. P-values: The primary input for multiple testing is a list of p-values, one for each individual hypothesis test. A p-value represents the probability of obtaining results as extreme as, or more extreme than, the observed results, assuming that the null hypothesis is true. In the context of multiple testing, these p-values are usually obtained from multiple hypothesis tests, such as t-tests, chi-squared tests, ANOVA, Mann-Whitney tests, or other statistical tests.
  2. Number of Hypotheses (or Tests): One needs to know the total number of hypotheses or conducted tests to adjust the p-values correctly. It is crucial as different correction methods have different assumptions about the number of tests.
  3. Alpha (Significance Level): One needs to specify the desired significance level or the acceptable rate of false positives. Typical values are 0.05 (5%) or 0.01 (1%), which can vary depending on the field and the context.
The choice of method depends on the specific research question, the characteristics of the data, and the desired balance between controlling type I errors (false positives) and type II errors (false negatives). After applying a correction method, the adjusted p-values reflect the significance of each test while accounting for the increased risk of false positives. Thus, adjusted p-values determine which results are statistically significant, considering the overall context.

Try our other web tools as well:

Validate predictive biomarkers Validate predictive biomarkers Mutation vs gene expression Mutation vs gene expression Kaplan-Meier plotter Kaplan-Meier plotter Compare normal and tumor Compare normal and tumor
Show — Footer menu Hide — Footer menu
  • Home
  • Contact
  • Log in
  • Copyright: Research Centre for Natural Sciences (2021-2025)