In this explainer, we will learn how to take a stratified, or layered, random sample. In general, a data set consists of observations, or measurements, from members of a population, or a sample of the population, in relation to a variable or variables of interest. Our aim in collecting data is to gain information about the population, and we have various statistical methods at our disposal to do this. However, for our results and conclusions to be as accurate and
representative as possible, how we collect the data is itself an important part of statistical methodology. In some instances, it may be possible to collect data on an entire population. For example, if we want to analyze the tennis form of the top 100 tennis players in the world in a particular year, we could actually collect data on all professional matches played by the top 100 players for that year. Suppose, however, that we would like to analyze certain characteristics, such as mass, diameter, and bounce height, of the tennis balls used in professional tournaments in a particular year. It would be neither sensible nor feasible to try and collect data on the whole population of tennis balls used that year. Instead, we might take a sample or samples and collect measurements on the balls in those samples. From the sample data, using statistical methods, we may draw conclusions about the population of tennis balls. When sampling data, our aim is to always try and take a representative sample, that is, a sample that accurately represents or reflects the population from which it is taken. Another term for this is unbiased sample, where no part of a population is over- or underrepresented. There are a number of sampling methods we can use to collect data, one of which is called random sampling. Definition: Random Sampling and Simple Random SamplingA random sample is a subset of elements selected from a population such that each member of the population has some chance of being selected. A simple random sample is one in which each member of the population has an equal chance of selection. Now, it is often the case that a population contains natural, nonoverlapping subdivisions, or strata. In such cases, we might use random sampling to collect data within each stratum and collate the data into a sample representing the whole population. For example, the population of professional tennis players consists of both male and female players. If the proportions of male and female players are not equal, this difference should be reflected within any sample we take. If it is not, and the sample is taken directly from the population as a whole, then the groups, male and female, may not be represented proportionately within the sample. We can remedy this by taking random samples of a proportionate number of male and female players, which we then combine to form the overall sample. This process is called stratified, or layered, random sampling and is defined as follows. Definition: Stratified or Layered Random SamplingStratified or layered random sampling is a sampling method used when a population may be naturally subdivided into distinct, nonoverlapping smaller groups, or strata. Random samples are taken from each individual stratum and combined to form an overall sample. The size of the random sample from each stratum reflects the size of that stratum within the population. Hence, the strata are represented in the final sample in the same proportions as they are within the population. For a population of ๐ elements, and an overall sample size of ๐, we use the following formula to calculate the sample size, ๐ , for a single stratum containing ๐ elements: ๐ =๐๐ร๐. Alternatively, if we know the percentage of the total population, ๐%, that belong to a single stratum, the sample size for that stratum is given by ๐%ร๐. As an example, suppose that 60% of registered professional tennis players are male and 40% are female. If we wish to take a small representative sample of, say, 10 from the population of professional tennis players, our sample should consist of 60%ร10=60100ร10=6,40%ร10=40100ร10=4.maleplayersfemaleplayers If we are instead given that out of a population of 80 professional tennis players, 48 are male and 32 are female, using the formula for sample size for the two strata, we have ๐ =4880ร10=6, ๐ =3280ร10=4. mfmaleplayer sfemaleplayers Letโs look at some examples where we examine our understanding of the definition of stratified random sampling. Example 1: Determining If a Sampling Scenario Is Stratified Random SamplingIn a certain survey about the colleges that some high school students wish to join, a sample of 2โ โ000 students was randomly selected out of a population of 40โ โ000. Is that considered to be stratified sampling? AnswerStratified, or layered, random sampling is used when a population naturally subdivides into groups, or strata. Such a sample reflects the proportions of each stratum within the population. This is achieved by taking random samples from each stratum proportional to the size of the individual stratum within the population as a whole. In this example, the population consists of 40โ โ000 students. We do not have any information on whether or not the population was subdivided into strata, so we must assume that the random sample of 2โ โ000 students was selected directly from the population. Therefore, this is not considered to be stratified sampling. The result in the example above is useful in the context of our next question, where we examine the definition of stratified random sampling. Example 2: Stratified Random SamplingWhich of the following is not true about stratified sampling?
AnswerWe recall that stratified, or layered, random sampling is a sampling method used when a population may be naturally subdivided into distinct, nonoverlapping smaller groups, or strata. Random samples are taken from each individual stratum and combined to form an overall sample. The size of the random sample taken from each stratum reflects the size of that stratum within the population. Letโs now see whether each of the given options fits with this definition.
Hence, we find that only statement C is not true about stratified sampling. In our next example, we calculate the sample size for a stratum within a population. Example 3: Calculating the Sample Size of a Stratum given the Proportion of Sample NeededIn an HR study about the salaries in a certain company with 1โ โ000 employees, the employees were divided into males and females. If the total percentage of females in the company was 60 percent and a sample of 40 people was selected, what was the number of males in the sample? AnswerSince the population, that is, the employees in the company, naturally subdivides into two strata, male and female, we use stratified, or layered, random sampling as the sampling method. This means that the sample reflects the proportions of male and female employees within the company. Since 60 percent of the employees were female, 60 percent of the sample must also have been female. This means that the remainder, that is, 100โ60=40 percent, of the sample must have been male. We are told that the sample consisted of 40 people. Hence, 40 percent of those 40 people must have been male. That is, 40%ร40=40100 ร40=16.peopleinthesampleweremale Example 4: Calculating the Sample Size of a Stratum given the Stratum and Population SizesAdel needs to conduct a study to determine whether the students in his school like playing football. He decides to divide the students into two groups, boys and girls, knowing that the school has a total of 200 students, 80 of whom are girls. If Adel decides that his sample size will be 50, how many girls should he select for the study? AnswerSince the population of students is split into 2 distinct strata, that is, boys and girls, the appropriate sampling method is stratified, or layered, random sampling. A stratified random sample is one that combines a number of separate random samples taken from distinct groups within the population. The sample size for each group reflects the proportion of that group, or stratum, within the population. Applying this to our population of students, 80 out of 200 students are girls. Therefore, the proportion of girls is 80200=0.4, which as a percentage is 0.4ร100%=40%. This means that to reflect the proportions of boys and girls in the population, 40% of Adelโs sample should be girls. Adelโs sample size is 50 students and 40% of 50 is 40100ร50= 20. Hence, Adel should select 20 girls for the study. Note that we could have reached this conclusion in a slightly different way, using a formula for strata sample size. That is, for a population of ๐ elements and an overall sample size of ๐, the sample size, ๐ , for a single stratum containing ๐ elements is ๐ =๐๐ร๐. In our case, ๐=200, ๐=80, and ๐=50 so ๐ =80200ร50=20. In our next example, we apply stratified, or layered, random sampling to a population that has been divided into 3 groups. Example 5: Sample Size of a Stratum given the Sample Size of Other Strata and the Population SizeA scientist decides to conduct a survey on the effects of a certain medicine in a city of 100โ โ000 people. He divides them into three groups based on their region: city center, outer city, and suburbs. There are 10โ โ000 people in the suburbs and 30โ โ000 people in the outer city. If the scientist decides to take a sample of 1โ โ000 people, how many people from the suburbs should be included? AnswerSince the city is divided into three distinct groups, or strata, an appropriate sampling method is stratified, or layered, random sampling. We recall that a stratified random sample is one that combines a number of separate random samples, taken from distinct groups within the population. The size of the sample from each group reflects the proportion of that group, or stratum, within the population. In our case, we know the total population and the number of people in the suburbs and outer city, but not the city center: Popul ation:Suburbs:Outercity:Citycenter:100 0001000030000? Although we do not need to know the number of people in the city center in order to answer the question, we note that there must be 100000โ(10000+30000)=60000 people in the city center. The scientist wishes to take a representative sample of 1โ โ000 people from the population, and we are asked how many of these should be selected from the suburbs. Applying stratified random sampling, the proportion of people from the suburbs in the sample must be the same as the proportion of people from the suburbs in the whole population. There are 10โ โ000 people in the suburbs, and as a proportion of the total population, this is 10000100000=0.1. As a percentage, that is 0.1ร100%=10%. Hence, 10% of the sample should be people from the suburbs. If the sample size is 1โ โ000 people, then 10% of this is 10100ร1000 =100. Therefore, 100 people from the suburbs should be included in the sample. Note that we could have reached this conclusion in a slightly different way, using a formula for strata sample size. That is, for a population of ๐ elements and an overall sample size of ๐, the sample size, ๐ , for a single stratum containing ๐ elements is ๐ =๐๐ร๐. In our case, ๐=10000 0, ๐=10000, and ๐=1000 so ๐ =10000100000ร1000=100. Related to stratified, or layered, random sampling is a method of random sampling used in estimating population sizes known as the captureโrecapture method. Letโs look at an example. Suppose that as part of a large rehousing project, a cat rescue center wishes to estimate the population of stray cats within a particular urban area. On one day, 20 stray cats are captured, tagged, and released. The next day, 12 cats are captured, 4 of which are found to be tagged. As a proportion, 412=13โ0.33; that is, one-third, or approximately 33%, of the cats captured on day 2 had tags. We can assume that the same proportion of cats were tagged from the whole population. Hence, we estimate that 33% of the population comprised 20 cats. If this is one-third of the population, then the total population is three times this. That is, 3ร20=60 cats. Definition: The CaptureโRecapture Method for Estimating Population SizeEquating capture with random selection from a population to estimate the population size, ๐, let ๐ be the number of the population that are initially captured, tagged, and then released. If ๐ is the number of members of the population that are subsequently captured and ๐ is the number of those that are found to be tagged, then the overall population size is given by ๐=๐๐๐. In our stray cats example above, we have ๐=20, ๐=12, and ๐=4. Hence, ๐=๐๐๐=20ร124=60. We can define this method alternatively as follows. Example 6: Using the CaptureโRecapture Method to Estimate the Size of a PopulationIn an HR study about the salaries in a certain company, the employees are divided into males and females. The total percentage of females in the company is 60 percent. A sample of 10 employees is selected from the company. The males in that sample represent 5 percent of the males in the company. What is the total number of employees in that company? AnswerTo begin with, we note that 60 percent of employees in the company are female and that the employees are divided into males and females. This means that 100โ60=40 percent of employees must be male. If we let ๐ be the total number of employees in the company, then the number of male employees is 40 percent of ๐, that is, 40 100ร๐, or 0.4๐ . To find the total number of employees, ๐, we use the captureโrecapture formula. This tells us that the population size ๐=๐๐๐, where ๐ is the number initially captured, tagged, and then released; ๐ is the number subsequently captured; and ๐ is the number of those found to be tagged. In our case, identifying โall male employeesโ as those โcaptured, tagged, and released,โ we have ๐=0.4๐. From the question, we know that our sample size, that is, the number subsequently โcaptured,โ ๐, is equal to 10. Further, the males in this subsequent sample represent 5 percent of the males in the company. This means that ๐=5%ร40%ร๐=0.05ร0.4ร๐=0.02๐. Hence, we have ๐=0.4๐,๐=0.02๐,๐=10. Substituting these values into the captureโrecapture formula for population size then gives us ๐=0.4๐ร100.02๐=200. Hence, the total number of employees in the company is 200. We complete this explainer by summarizing some of the key points. Key Points
What is the name of a sampling method when population is divided into groups then some random sample are selected from each groups?Cluster sampling divides the population into groups or clusters. A number of clusters are selected randomly to represent the total population, and then all units within selected clusters are included in the sample. No units from non-selected clusters are included in the sample.
Which sampling method divides the population into groups?Cluster sampling divides the population into groups, then takes a random sample from each cluster. Both systematic sampling and cluster sampling are forms of random sampling, known as probability sampling, which stands in contrast to non-probability sampling.
What is the name of the sampling method in which the population is divided into subgroups then some subgroups are chosen randomly to make the sample?What is stratified sampling? In stratified sampling, researchers divide subjects into subgroups called strata based on characteristics that they share (e.g., race, gender, educational attainment). Once divided, each subgroup is randomly sampled using another probability sampling method.
What is cluster sampling also known as?Cluster sampling is also known as multi-stage sampling as sample clusters are selected at the first stage and then further elements are sampled from selected clusters.
|