Skip to main content

Introduction to Hypothesis Testing

 Beginner-friendly introduction

Hey there! If you’re diving into the world of statistics, you’ve probably come across the term “hypothesis testing.”

It’s a fundamental concept that’s super useful in various fields, from science to business. But don’t worry if it sounds a bit technical. I’m here to break it down for you in simple, easy-to-understand language. Let’s jump right in!

1. What is Hypothesis Testing?

Hypothesis testing is like a detective game where you start with an assumption (a hypothesis) and then collect evidence (data) to decide whether your assumption is likely to be true.

It’s a way of making decisions or inferences about a population based on a sample of data.

Let’s take an example:
Imagine you’re a quality control manager at a factory that produces light bulbs.

You claim that on an average, the lifespan of a light bulb produced by the factory is 1000 hours.

Hypothesis testing will allow you to test this claim.

You’d collect a sample of light bulbs, measure their lifespans, and use statistical methods to determine if your claim holds up.

But the first thing to do is to establish the null and alternate hypothesis.

2. What is Null Hypothesis and Alternate Hypothesis?

In hypothesis testing, we always start with two hypotheses:

- Null Hypothesis (H₀): This is the hypothesis that there is no effect or no difference. It’s the ‘status quo’ or the assumption we want to test against.

- Alternate Hypothesis (H₁ or Ha): This is the hypothesis that there is an effect or a difference. This is what you are testing or want to prove.

Example: 
Let’s go back to our light bulb example. If you want to test whether the average lifespan of the light bulbs is 1,000 hours:

- Null Hypothesis (H₀): The average lifespan of the light bulbs is 1,000 hours. (H₀: μ = 1000)
- Alternative Hypothesis (H₁): The average lifespan of the light bulbs is not 1,000 hours. (H₁: μ ≠ 1000)

3. Steps to Constructing Null and Alternate Hypotheses

Creating a null and alternative hypothesis involves a few simple steps:

1. Identify the research question or claim: What are you trying to prove or disprove?
2. Set up the null hypothesis (H₀): Assume that there is no effect or difference.
3. Set up the alternative hypothesis (H₁): Propose what you believe to be true.

Let’s understand with examples:

1. Claim: A new drug is effective in reducing blood pressure. 
 — H₀: The new drug has no effect on blood pressure. (H₀: μ = μ₀)
 — H₁: The new drug reduces blood pressure. (H₁: μ < μ₀)

2. Claim: A coin is fair (50% chance of heads or tails). 
 — H₀: The coin is fair. (H₀: p = 0.5)
 — H₁: The coin is not fair. (H₁: p ≠ 0.5)

3. Claim: The average salary of data scientists is greater than $100,000. 
 — H₀: The average salary is $100,000 or less. (H₀: μ ≤ 100,000)
 — H₁: The average salary is more than $100,000. (H₁: μ > 100,000)

One commonality in all these cases is that the hypothesis you want to test is put in the alternative hypothesis, and the ‘status-quo’ as-is scenario in the null hypothesis.

4. What is One-Tail and Two-Tail Test?

One-Tail Test: This is used when the alternative hypothesis specifies a direction of the effect (greater than or less than). 
Example: Testing whether a new medication reduces blood pressure would be a one-tail test (H₁: μ < μ₀).

Two-Tail Test: This is used when the alternative hypothesis does not specify a direction (it could be either greater than or less than). 
Example: Testing whether the average lifespan of a light bulb is different from 1,000 hours would be a two-tail test (H₁: μ ≠ 1000).

5. What is Type I and Type II Error in Hypothesis Testing?

When conducting hypothesis tests, we can make two types of errors:

- Type I Error (α): This occurs when we reject the null hypothesis when it is actually true. It’s like convicting an innocent person. The probability of making a Type I error is called the level of significance (α), commonly set at 0.05.

- Type II Error (β): This occurs when we fail to reject the null hypothesis when it is actually false. It’s like letting a guilty person go free. The probability of making a Type II error is denoted by β.

Example:
If we are testing a new drug, a Type I error would mean saying the drug works when it doesn’t, while a Type II error would mean saying the drug doesn’t work when it actually does.

6. What are the Steps of Hypothesis Testing?

1. State the Hypotheses: Define your null and alternative hypotheses.
2. Choose the Significance Level (α): Common choices are 0.05, 0.01, or 0.10.
3. Select the Appropriate Test: Depending on the data and hypotheses, choose the correct statistical test (e.g., t-test, z-test).
4. Calculate the Test Statistic: Using the sample data, calculate the test statistic.
5. Determine the Critical Value or p-value: Compare the test statistic to the critical value or use the p-value.
6. Make a Decision: If the test statistic exceeds the critical value or the p-value is less than α, reject the null hypothesis.
7. Draw a Conclusion: Based on your decision, conclude whether there is enough evidence to support the alternative hypothesis.

7. What is Tolerance Limit — Level of Significance?

The level of significance (α) is the probability of rejecting the null hypothesis when it is true. It represents the tolerance for making a Type I error. For example, a 5% level of significance (α = 0.05) means there is a 5% risk of concluding that there is an effect when there is none.

8. What is Test Statistic?

A test statistic is a standardized value that is calculated from sample data during a hypothesis test. It is used to determine whether to reject the null hypothesis. The formula depends on the type of test being used (e.g., t-test, z-test).

And there you have it — a simple introduction to hypothesis testing!

Remember, it’s all about starting with a claim, gathering evidence, and making a decision based on data. Keep practicing, and you’ll become a hypothesis testing pro in no time!

If you’re as passionate about AI, ML, DS, Strategy and Business Planning as I am, I invite you to:

Connect with me:

Comments

Popular posts from this blog

How to Create Stunning Data Visualizations in Python: Top 10 Techniques to Learn

  A Visual Analytics Journey In this guide, you’re going to learn some of the coolest and most popular visualization techniques, one plot at a time, using the mpg dataset in Python. Whether you’re interested in visualizing univariate (histograms), bivariate (scatter plot) or multivariate (heatmaps) variables, we’ve got it all covered here in this guide. We’ll start by loading the `mpg` dataset from Seaborn, and before you know it, you’ll be the Picasso of Python plots. So lets get going! Dataset First things first, we need to grab the `mpg` dataset. Think of this dataset as a collection of cool cars from the 1970s and 80s. It’s a nostalgic look at how much fuel (miles per gallon) these cars guzzled. import seaborn as sns import pandas as pd # Load the mpg dataset from seaborn mpg = sns.load_dataset( 'mpg' ) # Display the first few rows to get a feel of the data mpg.head() Output: Boom! We’ve got a dataset full of horsepower, cylinders, and other engine-sort-of-things! L...

10 Projects You Can Discuss in Interviews Even If You Don't Have Work Experience

 If you are an aspiring data scientist, you might wonder what kind of projects you can talk about to stand out. The good news is that you don’t need a formal job history to have meaningful projects to discuss. Building and sharing your own projects can demonstrate your understanding of machine learning, AI, analytics, and data handling. This post lists 10 project ideas that you can create and confidently discuss in interviews. These projects cover a range of skills and tools relevant to data science and generative AI. Each project example includes practical tips on how to approach it and what you can highlight during your interview.                Data visualization dashboard created for a personal analytics project 1. Data Cleaning and Exploration Project Start with a raw dataset from sources like Kaggle or UCI Machine Learning Repository. Focus on cleaning the data by handling missing values, removing duplicates, and correcting errors....

Phases of data science and analytics

Data Science and analytics isn’t a destination — it’s a journey of continuous learning and application. In my experience, this journey can be divided into five distinct phases:                                         5 Phases of Analytics: Image by Author 1. Descriptive Analytics: Focused on understanding what happened in the past. 2. Diagnostic Analytics: Answers the critical question: why did it happen? 3. Predictive Analytics: Often seen as the most glamorous phase, it predicts what will happen next. 4. Prescriptive Analytics: Goes a step further to recommend what should be done based on predictions; or how can you optimize business processes or decisions. 5. Automated Analytics: Finally, the ‘product/software’ development stage of analytics. It automates the process — from descriptive to predictive — making analytics accessible and actionable for business stak...