The populations from which the samples are taken follow a chi-square distribution.

The chi square (χ2) distribution is the best method to test a population variance against a known or assumed value of the population variance. A chi square distribution is a continuous distribution with degrees of freedom. Another best part of chi square distribution is to describe the distribution of a sum of squared random variables. It is also used to test the goodness of fit of a distribution of data, whether data series are independent, and for estimating confidences surrounding variance and standard deviation for a random variable from a normal distribution.

History of Chi Square

Karl Pearson (1857-1936) father of modern statistics (establishing the first statistics department in the world at University College London) came up with the chi squared distribution. Pearson’s work in statistics began with developing mathematical methods for studying the processes of heredity and evolution. Later the chi squared distribution came about as Pearson was attempting to find a measure of the goodness of fit of other distributions to random variables in his heredity and evolutionary modeling.

Chi Square Statistic

Chi square maybe skewed to the right or with a long tail toward the large values of the distribution. The overall shape of the distribution will depend on the number of degrees of freedom in a given problem. The degrees of freedom are 1 less than the sample size.

Chi Square Properties

The mean of the distribution is equal to the number of degrees of freedom: μ=ϑ.
The variance is equal to two times the number of degrees of freedom: σ2 = 2*ϑ.
When the degrees of freedom are greater than or equal to 2, the maximum value for Y occurs when χ2=ϑ-2.
As the degrees of freedom increase, the chi square curve approaches a normal distribution.
As the degrees of freedom increase, the symmetry of the graph also increases.
Finally, It may be skewed to the right, and since the random variable on which it is based is squared, it has no negative values. As the degrees of freedom increases, the probability density function begins to appear symmetrical in shape.

The formula for the probability density function of the chi square distribution is

Where ϑ the shape parameter and Γ is the gamma function.

The formula for the gamma function is

Usually the objective of the six sigma team is to find the level of variation of the output, not just mean of the population. Most importantly, the team would like to know how much variation the production process exhibits about the target to see what adjustments are needed to reach a defect-free process.

A comparison between several sample variances, or a comparison between frequency proportions, the standard test statistic called chi square χ2 test will be used. So, The distribution of the chi square statistic is called the chi square distribution.

Types of Chi square Hypothesis Tests

There are basically two types of Chi squares test,

Chi-square Test of Independence: Determines is there any association between two categorical variables by comparing observed and expected frequencies of test outcomes when there is no defined population variance.
Chi Square Test of Variance: Compare the variances when the variance of the population is known.

Chi-Square test of Independence

Chi square test of independence determines whether there is an association between two categorical variables (like gender, course selection). For example, chi square test of independence examines the association between one category like gender (male and female) and the other category like percentage of absenteeism in a school. Chi square test of independence is a non-parametric test. In other words, the assumption of normality is not required to perform the test.

Chi square test utilizes a contingency table to analyze the data. Each row shows the categories of one variable. Similarly, each column shows the categories of another variable. Each variable must have two or more categories. Each cell reflects the total number of cases for a specific pair of categories.

Assumptions of Chi-square test of independence

Variable must be nominal or categorical
Category of variables are mutually exclusive
The sampling method to be a simple random sampling
The data in the contingency table are frequencies or count

Steps to perform Chi Square test of independence

Step1: Define the null hypothesis and alternative hypothesis

Null hypothesis (H0): There is no association between two categorical variables
Alternative hypothesis (H1): There is a significant association between two categorical variables

Step2: Specify the level of significance

Step 3: Compute χ2 statistic

O is the observed frequency
E is the expected frequency

Expected frequency is calculated for each cell = (frequency of columns*frequency of rows)/ n

Step 4: Calculate the degree of freedom= (number of rows -)*(number of columns -1)= (r-1) * (c-1)

Step5: Find the critical value, based on degrees of freedom

Step 6: Finally, draw the statistical conclusion: If the test statistic value is greater than the critical value, reject the null hypothesis, and hence we can conclude that there is a significant association between two categorical variables.

Chi Square test of independence example

Example: 1000 middle school students are asked which their favorite superhero is: Superman, Ironman or Spiderman. At 95% confidence level would you conclude is there any relationship between gender and superhero characters?

Null hypothesis (H0): There is no association between gender and favourite superhero characters.
Alternative hypothesis (H1): There is a significant association between gender and favourite superhero characters.

Level of significance: α=0.05

First calculate the expected frequency

For the cell (Boys, Superman) = (200*600)/1000 = 120

Similarly, calculate the expected frequency of all cells

Degrees of freedom = (r-1)*(c-1) = (2-1)*(3-1) =2

Chi-square critical value for 2 degrees of freedom =5.991

The test statistic value is greater than the critical value, hence we can reject the null hypothesis

So, we can conclude that there is a significant association between gender and favorite superhero characters.

Download Chi Square Test of Independence Excel Exemplar

Chi Square test – Comparing Variances

The chi square test is best option for two applications.

Case I: comparing variances when the variance of the population known.
Case II: Comparing observed and expected frequencies of test outcomes when there is no defined population variance.

When the population follows a normal distribution, the hypothesis tests for comparing a population variance σx2. The test statistic is given by

Where the number of samples is n and the sample variance is s2. The shape of the χ2 distribution resembles the normal curve but it is not symmetrical, and its shape depends on the degrees of freedom.

Hypothesis testing

A tailed hypothesis is an assumption about a population parameter. The assumption may or may not be true. One-tailed hypothesis is a test of hypothesis where the area of rejection is only in one direction. Whereas two-tailed, it tests against the alternative that the actual variance is greater than or less than the particular value. The selection of one or two-tailed tests depends upon the problem.

Left tail and Right tailed χ2 distribution

The chi square test has the following properties

Evaluates sample variances
Chi square is non-negative.
Chi square is non-symmetric.
The degrees of freedom when working with a single population variance is n-1.
You do not need knowledge of population variation

Left-tailed chi square test example

The average standard deviation of an airline’s passengers waiting time for a single queue is 16 minutes. Accordingly, the population variance is 256 (square of the standard deviation). The average standard deviation of the waiting time for separate queues of the pilot project with 7 passengers is 8 minutes. Thus, the sample variance is 64 (square of the standard deviation). Check whether the wait time reduction with 95% confidence level?

The null hypothesis is H0: σ12 ≥ (16)2

The alternative hypothesis is H1: σ12 < (16)2

Let’s look at the chi square table. Because S is less than σ, this is left tail test, so, df =7-1=6. The critical value for 95% confidence is 1.63

The test statistic (1.5) is less than the critical value (1.63), and it is in the rejection region. Hence the null hypothesis must be rejected. The wait time decreased with the separate line.

Example 1:

The Barnes Company manufactures a DVD player and claims that the mean number of hours of use before repairs is 400, with a standard deviation of 10 hours.

Rigt-tailed chi square test example

Smartwatch manufacturer received customer complaints about the XYZ model, whose battery lasts a shorter time than the previous model. The variance of the battery life of the previous model is 49 hours. 11 watches were tested, and the battery life standard deviation was 9 hours. Assuming that the data are normally distributed, Could the claim about increase in performance of the new model be validated with 5% significance level?

Population standard deviation σ12= 49 hours σ1= 7

Sample standard deviation = 9hours

The null hypothesis is H0: σ12 ≤ (7)2

The alternative hypothesis is H1: σ12 > (7)2

Let’s look at the chi square table. Because S is greater than σ, this is a right tail test, so, df =11-1=10. The critical value for 95% confidence is 18.307.

The test statistic is

Test statics is less than the critical value and it is not in rejection region. Hence we failed to reject the null hypothesis. There is no sufficient evidence to claim the new model battery has better performance.

Two-tailed chi square test example

Company HR believes that the variation in the salaries of new digital technology is not the same as the java technology. From historical data, the standard deviation of salaries of the java is $49K. Salaries of 30 new digital technology employees were collected, and its standard deviation is $70K. Assuming that the data are normally distributed, Could the HR claim be validated with 95% confidence?

Population standard deviation σ1= 49

Sample standard deviation = 70

The null hypothesis is H0: σ12 =(49)2

The alternative hypothesis is H1: σ12 ≠ (49)2

df =30-1=29.

Since s is not equal to σ, it is two tail test. So α/2 =0.05/2 = 0.025

For 29 degrees of freedom left tail (1-α/2 = 1-0.025 = 0.975) is 16.047

And right tail α/2=0.025 is 45.722

Test statics is more than 45.722 and it is in the rejection region. Hence, we can reject the null hypothesis.

Chi Square Sample Size

Chi-Square tests are susceptible to adequate sample size. If the sample size is more, the absolute differences become a smaller and smaller proportion of the expected value. In other words, the strong association may not come up if the sample size is small, and the findings are not significant, even though they are statistically significant.

Where

n is sample size with correction
n’ is sample size without continuity correction
P1 and P2 are proportions in each group
Q1 =1-P1
P̅ = P1+P2/2

Chi Square Videos

https://www.youtube.com/watch?v=53kYOOr5Yhk

Additional Chi Square Examples and Helpful Links:

Chi Square Tables

National Institute of Standards and Technology
Nice Chi Square one-pager from Richland Community College
https://people.richland.edu/james/lecture/m170/tbl-chi.html
https://www.statisticshowto.datasciencecentral.com/tables/chi-squared-table-right-tail/
https://www.socscistatistics.com/tests/chisquare2/default2.aspx

Chi Square Sample Size Calculation

https://stats.stackexchange.com/questions/340291/estimate-sample-size-for-chi-squared-test
http://www.statskingdom.com/sample_size_chi2.html

Other Uses of Chi Square

Chi-square test for goodness of fit: It is a statistical hypothesis test to see how well sample data fit into population characteristics.
Chi Square contingency table.

Six Sigma Black Belt Certification Chi Square Questions:

Question: The time for a fail-safe device to trip is thought to be a discrete uniform distribution from 1 to 5 seconds. To test this hypothesis, 100 tests are conducted with results as shown below.

On the basis of these data, what are the chi square (c2) value and the number of degrees of freedom (df)?

(A) (c2) value = 57.5, degrees of freedom = 4
(B) (c2) value = 57.5, degrees of freedom = 5
(C) (c2) value = 1,150.0, degrees of freedom = 4
(D) (c2) value = 1,150.0, degrees of freedom = 5

Answer:

What follows a chi

Chi-square tests are hypothesis tests with test statistics that follow a chi-square distribution under the null hypothesis.

For which of the following cases is chi

The Chi-Square Test of Independence is commonly used to test the following: Statistical independence or association between two or more categorical variables.

Does chi

It tests whether two populations come from the same distribution by determining whether the two populations have the same proportions as each other. You can consider it simply a different way of thinking about the chi-square test of independence.

What type of data best fits the chi

To apply the goodness of fit test to a data set we need: Data values that are a simple random sample from the full population. Categorical or nominal data. The Chi-square goodness of fit test is not appropriate for continuous data.