Learning Outcomes
Show Tests of independence involve using a contingency table of observed (data) values. The test statistic for a test of independence is similar to that of a goodness-of-fit test: [latex]\displaystyle{\sum_{(i\cdot{j})}}\frac{{({O}-{E})}^{{2}}}{{E}}[/latex] where:
There are [latex]\displaystyle{i}\cdot{j}[/latex] terms of the form [latex]\frac{{({O}-{E})}^{{2}}}{{E}}[/latex]. A test of independence determines whether two factors are independent or not. Note: The expected value for each cell needs to be at least five in order for you to use this test. ExampleSuppose A = a speeding violation in the last year and B = a cell phone user while driving. If A and B are independent then P(A AND B) = P(A)P(B). A AND B is the event that a driver received a speeding violation last year and also used a cell phone while driving. Suppose, in a study of drivers who received speeding violations in the last year, and who used cell phone while driving, that 755 people were surveyed. Out of the 755, 70 had a speeding violation and 685 did not; 305 used cell phones while driving and 450 did not. Let y = expected number of drivers who used a cell phone while driving and received speeding violations. If A and B are independent, then P(A AND B) = P(A)P(B). By substitution, [latex]\displaystyle\frac{{y}}{{755}}={(\frac{{70}}{{755}})}{(\frac{{305}}{{755}})}[/latex] Solve for About 28 people from the sample are expected to use cell phones while driving and to receive speeding violations. In a test of independence, we state the null and alternative hypotheses in words. Since the contingency table consists of two factors, the null hypothesis states that the factors are independent and the alternative hypothesis states that they are not independent (dependent). If we do a test of independence using the example, then the null hypothesis is: H0: Being a cell phone user while driving and receiving a speeding violation are independent events. If the null hypothesis were true, we would expect about 28 people to use cell phones while driving and to receive a speeding violation. The test of independence is always right-tailed because of the calculation of the test statistic. If the expected and observed values are not close together, then the test statistic is very large and way out in the right tail of the chi-square curve, as it is in a goodness-of-fit. The number of degrees of freedom for the test of independence is: df = (number of columns – 1)(number of rows – 1) The following formula calculates the expected number (E): [latex]\displaystyle{E}=\frac{{{(\text{row total})}{(\text{column total})}}}{\text{total number surveyed}}[/latex] try itA sample of 300 students is taken. Of the students surveyed, 50 were music students, while 250 were not. Ninety-seven were on the honor roll, while 203 were not. If we assume being a music student and being on the honor roll are independent events, what is the expected number of music students who are also on the honor roll? About 16 students are expected to be music students and on the honor roll. ExampleIn a volunteer group, adults 21 and older volunteer from one to nine hours each week to spend time with a disabled senior citizen. The program recruits among community college students, four-year college students, and nonstudents. The table below is a sample of the adult volunteers and the number of hours they volunteer per week. Number of Hours Worked Per Week by Volunteer Type (Observed)The table contains observed (O) values (data).
Is the number of hours4 volunteered independent of the type of volunteer? Solution: The observed table and the question at the end of the problem, “Is the number of hours volunteered independent of the type of volunteer?” tell you this is a test of independence. The two factors are number of hours volunteered and type of volunteer. This test is always right-tailed. H0: The number of hours volunteered is independent of the type of volunteer. Ha: The number of hours volunteered is dependent on the type of volunteer. The expected result are in the table below.
For example, the calculation for the expected frequency for the top left cell is [latex]\displaystyle{E}=\frac{{{(\text{row total})}{(\text{column total})}}}{\text{total number surveyed}}=\frac{{{({255})}{({298})}}}{{839}}={90.57}[/latex] Calculate the test statistic:χ2 = 12.99 (calculator or computer) Distribution for the test: [latex]\displaystyle\chi^{2}_{4}[/latex] Probability statement:p-value=P(χ2 > 12.99) = 0.0113 Compare α and the p-value: Since no α is given, assume α = 0.05. p-value = 0.0113. α > p-value. Make a decision: Since α > p-value, reject H0. This means that the factors are not independent. Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude that the number of hours volunteered and the type of volunteer are dependent on one another. For the example in the table titled “Number of Hours Worked Per Week by Volunteer Type (Expected),” if there had been another type of volunteer, teenagers, what would the degrees of freedom be? Press the
try itThe Bureau of Labor Statistics gathers data about employment in the United States. A sample is taken to calculate the number of U.S. citizens working in one of several industry sectors over time. The table below shows the results:
We want to know if the change in the number of jobs is independent of the change in years. State the null and alternative hypotheses and the degrees of freedom. Press the ExampleDe Anza College is interested in the relationship between anxiety level and the need to succeed in school. A random sample of 400 students took a test that measured anxiety level and need to succeed in school. This table shows the results. De Anza College wants to know if anxiety level and need to succeed in school are independent events.
Solution:
try itRefer back to the information in the Try It about the Bureau of Labor Statistics. How many service providing jobs are there expected to be in 2020? How many nonagriculture wage and salary jobs are there expected to be in 2020? How do you find the degrees of freedom for an independent test?Some calculations of degrees of freedom with multiple number of parameters or relationships use the formula Df = N - P, where P is the number of different parameters or relationships. For example, in a 2-sample t-test, N - 2 is used because there are two parameters to estimate.
When using the independent means tThe mean of this distribution will be zero because, if the null hypothesis is true, the two populations have the same mean. So differences between means would on the average come out to zero.
In which situation would you use a tWe only use the t-test for independent means when we are studying two groups; a different statistic is used when there are more than two groups. In this test, we compare the observed difference between the two sample means (M1 - M2) to the expectation that there is no difference in the population (m1 - m2 = 0).
What is the degree of freedom for 2 independent samples?The two sample t-statistic calculation depends on given degrees of freedom, df = n1 + n2 – 2. If the value of two samples t-test for independent samples exceeds critical T at alpha level, then you can reject null hypothesis that there is no difference between two data sets (H0).
|