The chi square (χ2) distribution is the best method to test a population variance against a known or assumed value of the population variance. A chi square distribution is a continuous distribution with degrees of freedom. Another best part of chi square distribution is to describe the distribution of a sum of squared
random variables. It is also used to test the goodness of fit of a distribution of data, whether data series are independent, and for estimating confidences surrounding variance and standard deviation for a random variable from a normal distribution. Karl Pearson (1857-1936) father of modern statistics (establishing the first statistics department in
the world at University College London) came up with the chi squared distribution. Pearson’s work in statistics began with developing mathematical methods for studying the processes of heredity and evolution. Later the chi squared distribution came about as Pearson was attempting to find a measure of the goodness of fit of other distributions to random variables in his heredity and evolutionary modeling. Chi square maybe skewed to
the right or with a long tail toward the large values of the distribution. The overall shape of the distribution will depend on the number of degrees of freedom in a given problem. The degrees of freedom are 1 less than the sample size. Chi Square Properties
The formula for the probability density function of the chi square distribution is Where ϑ the shape parameter and Γ is the gamma function. The formula for the gamma function is Usually the objective of the six sigma team is to find the level of variation of the output, not just mean of the population. Most importantly, the team would like to know how much variation the production process exhibits about the target to see what adjustments are needed to reach a defect-free process. A comparison between several sample variances, or a comparison between frequency proportions, the standard test statistic called chi square χ2 test will be used. So, The distribution of the chi square statistic is called the chi square distribution. Types of Chi square Hypothesis TestsThere are basically two types of Chi squares test,
Chi-Square test of IndependenceChi square test of independence determines whether there is an association between two categorical variables (like gender, course selection). For example, chi square test of independence examines the association between one category like gender (male and female) and the other category like percentage of absenteeism in a school. Chi square test of independence is a non-parametric test. In other words, the assumption of normality is not required to perform the test. Chi square test utilizes a contingency table to analyze the data. Each row shows the categories of one variable. Similarly, each column shows the categories of another variable. Each variable must have two or more categories. Each cell reflects the total number of cases for a specific pair of categories. Assumptions of Chi-square test of independence
Steps to perform Chi Square test of independenceStep1: Define the null hypothesis and alternative hypothesis
Step2: Specify the level of significance Step 3: Compute χ2 statistic
Expected frequency is calculated for each cell = (frequency of columns*frequency of rows)/ n Step 4: Calculate the degree of freedom= (number of rows -)*(number of columns -1)= (r-1) * (c-1) Step5: Find the critical value, based on degrees of freedom Step 6: Finally, draw the statistical conclusion: If the test statistic value is greater than the critical value, reject the null hypothesis, and hence we can conclude that there is a significant association between two categorical variables. Chi Square test of independence exampleExample: 1000 middle school students are asked which their favorite superhero is: Superman, Ironman or Spiderman. At 95% confidence level would you conclude is there any relationship between gender and superhero characters?
Level of significance: α=0.05 First calculate the expected frequency For the cell (Boys, Superman) = (200*600)/1000 = 120 Similarly, calculate the expected frequency of all cells Degrees of freedom = (r-1)*(c-1) = (2-1)*(3-1) =2 Chi-square critical value for 2 degrees of freedom =5.991 The test statistic value is greater than the critical value, hence we can reject the null hypothesis So, we can conclude that there is a significant association between gender and favorite superhero characters. Download Chi Square Test of Independence Excel Exemplar
Chi Square test – Comparing VariancesThe chi square test is best option for two applications.
When the population follows a normal distribution, the hypothesis tests for comparing a population variance σx2. The test statistic is given by Where the number of samples is n and the sample variance is s2. The shape of the χ2 distribution resembles the normal curve but it is not symmetrical, and its shape depends on the degrees of freedom. Hypothesis testing A tailed hypothesis is an assumption about a population parameter. The assumption may or may not be true. One-tailed hypothesis is a test of hypothesis where the area of rejection is only in one direction. Whereas two-tailed, it tests against the alternative that the actual variance is greater than or less than the particular value. The selection of one or two-tailed tests depends upon the problem. Left tail and Right tailed χ2 distribution The chi square test has the following properties
Left-tailed chi square test exampleThe average standard deviation of an airline’s passengers waiting time for a single queue is 16 minutes. Accordingly, the population variance is 256 (square of the standard deviation). The average standard deviation of the waiting time for separate queues of the pilot project with 7 passengers is 8 minutes. Thus, the sample variance is 64 (square of the standard deviation). Check whether the wait time reduction with 95% confidence level? The null hypothesis is H0: σ12 ≥ (16)2 The alternative hypothesis is H1: σ12 < (16)2 Let’s look at the chi square table. Because S is less than σ, this is left tail test, so, df =7-1=6. The critical value for 95% confidence is 1.63 The test statistic (1.5) is less than the critical value (1.63), and it is in the rejection region. Hence the null hypothesis must be rejected. The wait time decreased with the separate line. Example 1:The Barnes Company manufactures a DVD player and claims that the mean number of hours of use before repairs is 400, with a standard deviation of 10 hours. Rigt-tailed chi square test exampleSmartwatch manufacturer received customer complaints about the XYZ model, whose battery lasts a shorter time than the previous model. The variance of the battery life of the previous model is 49 hours. 11 watches were tested, and the battery life standard deviation was 9 hours. Assuming that the data are normally distributed, Could the claim about increase in performance of the new model be validated with 5% significance level? Population standard deviation σ12= 49 hours σ1= 7 Sample standard deviation = 9hours The null hypothesis is H0: σ12 ≤ (7)2 The alternative hypothesis is H1: σ12 > (7)2 Let’s look at the chi square table. Because S is greater than σ, this is a right tail test, so, df =11-1=10. The critical value for 95% confidence is 18.307. The test statistic is Test statics is less than the critical value and it is not in rejection region. Hence we failed to reject the null hypothesis. There is no sufficient evidence to claim the new model battery has better performance. Two-tailed chi square test exampleCompany HR believes that the variation in the salaries of new digital technology is not the same as the java technology. From historical data, the standard deviation of salaries of the java is $49K. Salaries of 30 new digital technology employees were collected, and its standard deviation is $70K. Assuming that the data are normally distributed, Could the HR claim be validated with 95% confidence? Population standard deviation σ1= 49 Sample standard deviation = 70 The null hypothesis is H0: σ12 =(49)2 The alternative hypothesis is H1: σ12 ≠ (49)2 df =30-1=29. Since s is not equal to σ, it is two tail test. So α/2 =0.05/2 = 0.025 For 29 degrees of freedom left tail (1-α/2 = 1-0.025 = 0.975) is 16.047 And right tail α/2=0.025 is 45.722 Test statics is more than 45.722 and it is in the rejection region. Hence, we can reject the null hypothesis. Chi Square Sample SizeChi-Square tests are susceptible to adequate sample size. If the sample size is more, the absolute differences become a smaller and smaller proportion of the expected value. In other words, the strong association may not come up if the sample size is small, and the findings are not significant, even though they are statistically significant. Where
Chi Square Videos
https://www.youtube.com/watch?v=53kYOOr5Yhk Additional Chi Square Examples and Helpful Links:Chi Square Tables
Chi Square Sample Size Calculation
Other Uses of Chi Square
Six Sigma Black Belt Certification Chi Square Questions:Question: The time for a fail-safe device to trip is thought to be a discrete uniform distribution from 1 to 5 seconds. To test this hypothesis, 100 tests are conducted with results as shown below. On the basis of these data, what are the chi square (c2) value and the number of degrees of freedom (df)? (A) (c2) value = 57.5, degrees of freedom = 4 Answer: What follows a chiChi-square tests are hypothesis tests with test statistics that follow a chi-square distribution under the null hypothesis.
For which of the following cases is chiThe Chi-Square Test of Independence is commonly used to test the following: Statistical independence or association between two or more categorical variables.
Does chiIt tests whether two populations come from the same distribution by determining whether the two populations have the same proportions as each other. You can consider it simply a different way of thinking about the chi-square test of independence.
What type of data best fits the chiTo apply the goodness of fit test to a data set we need: Data values that are a simple random sample from the full population. Categorical or nominal data. The Chi-square goodness of fit test is not appropriate for continuous data.
|