In this explainer, we will learn how to take a stratified, or layered, random sample. In general, a data set consists of observations, or measurements, from members of a population, or a sample of the population, in relation to a variable or variables of interest. Our aim in collecting data is to gain information about the population, and we have various statistical methods at our disposal to do this. However, for our results and conclusions to be as accurate and
representative as possible, how we collect the data is itself an important part of statistical methodology. In some instances, it may be possible to collect data on an entire population. For example, if we want to analyze the tennis form of the top 100 tennis players in the world in a particular year, we could actually collect data on all professional matches played by the top 100 players for that year.
Suppose, however, that we would like to analyze certain characteristics, such as mass, diameter, and bounce height, of the tennis balls used in professional tournaments in a particular year.
It would be neither sensible nor feasible to try and collect data on the whole population of tennis balls used that year. Instead, we might take a sample or samples and collect measurements on the balls in those samples. From the sample data, using statistical methods, we may draw conclusions about the population of tennis balls.
When sampling data, our aim is to always try and take a representative sample, that is, a sample that accurately represents or reflects the population from which it is taken. Another term for this is unbiased sample, where no part of a population is over- or underrepresented.
There are a number of sampling methods we can use to collect data, one of which is called random sampling.
Definition: Random Sampling and Simple Random Sampling
A random sample is a subset of elements selected from a population such that each member of the population has some chance of being selected.
A simple random sample is one in which each member of the population has an equal chance of selection.
Now, it is often the case that a population contains natural, nonoverlapping subdivisions, or strata. In such cases, we might use random sampling to collect data within each stratum and collate the data into a sample representing the whole population.
For example, the population of professional tennis players consists of both male and female players. If the proportions of male and female players are not equal, this difference should be reflected within any sample we take. If it is not, and the sample is taken directly from the population as a whole, then the groups, male and female, may not be represented proportionately within the sample. We can remedy this by taking random samples of a proportionate number of male and female players, which we then combine to form the overall sample.
This process is called stratified, or layered, random sampling and is defined as follows.
Definition: Stratified or Layered Random Sampling
Stratified or layered random sampling is a sampling method used when a population may be naturally subdivided into distinct, nonoverlapping smaller groups, or strata.
Random samples are taken from each individual stratum and combined to form an overall sample. The size of the random sample from each stratum reflects the size of that stratum within the population. Hence, the strata are represented in the final sample in the same proportions as they are within the population.
For a population of π elements, and an overall sample size of π, we use the following formula to calculate the sample size, π , for a single stratum containing π elements: π =ππΓπ.
Alternatively, if we know the percentage of the total population, π%, that belong to a single stratum, the sample size for that stratum is given by π%Γπ.
As an example, suppose that 60% of registered professional tennis players are male and 40% are female. If we wish to take a small representative sample of, say, 10 from the population of professional tennis players, our sample should consist of 60%Γ10=60100Γ10=6,40%Γ10=40100Γ10=4.maleplayersfemaleplayers
If we are instead given that out of a population of 80 professional tennis players, 48 are male and 32 are female, using the formula for sample size for the two strata, we have π =4880Γ10=6, π =3280Γ10=4. mfmaleplayer sfemaleplayers
Letβs look at some examples where we examine our understanding of the definition of stratified random sampling.
Example 1: Determining If a Sampling Scenario Is Stratified Random Sampling
In a certain survey about the colleges that some high school students wish to join, a sample of 2β β000 students was randomly selected out of a population of 40β β000. Is that considered to be stratified sampling?
Answer
Stratified, or layered, random sampling is used when a population naturally subdivides into groups, or strata. Such a sample reflects the proportions of each stratum within the population. This is achieved by taking random samples from each stratum proportional to the size of the individual stratum within the population as a whole.
In this example, the population consists of 40β β000 students. We do not have any information on whether or not the population was subdivided into strata, so we must assume that the random sample of 2β β000 students was selected directly from the population. Therefore, this is not considered to be stratified sampling.
The result in the example above is useful in the context of our next question, where we examine the definition of stratified random sampling.
Example 2: Stratified Random Sampling
Which of the following is not true about stratified sampling?
- Stratified random sampling is also called proportional random sampling.
- Stratified random sampling allows researchers to obtain a sample population that best represents the entire population being studied.
- Stratified sampling is the random selection of data from an entire population.
- Stratified random sampling is a method of sampling that involves the division of a population into smaller subgroups known as strata.
- The stratified random sample is a statistical measurement tool.
Answer
We recall that stratified, or layered, random sampling is a sampling method used when a population may be naturally subdivided into distinct, nonoverlapping smaller groups, or strata.
Random samples are taken from each individual stratum and combined to form an overall sample. The size of the random sample taken from each stratum reflects the size of that stratum within the population.
Letβs now see whether each of the given options fits with this definition.
- Stratified random sampling is also called proportional random sampling. (True or False?)
In stratified random sampling, the population of interest is split into groups or strata. The size of the sample taken from each stratum reflects the proportion of the population represented by that stratum. Therefore, it would not be incorrect to give stratified random sampling an alternate name such as proportional random sampling - Stratified random sampling allows researchers to obtain a sample population that best represents the entire population being studied. (True or False?)
We use stratified random sampling when the population can be split into nonoverlapping groups or strata. The proportions of these groups within the population are calculated, and the same proportions are applied to the random samples taken from each group. This means that the different groups are represented proportionally within the final combined sample. Hence, no group should be either over- or underrepresented, and the sample reflects the proportional makeup of the whole population. Such a sample will best represent the entire population being studied. Hence, statement B is true about stratified random sampling. - Stratified sampling is the random selection of data from an entire population. (True or False?)
By definition, a stratified random sample is one that combines a number of individual samples taken from distinct groups within the population. The size of the sample from each group reflects the proportion of that group, or stratum, within the population. The data is, therefore, not randomly selected from an entire population. Hence, this statement about stratified sampling is false. - Stratified random sampling is a method of sampling that involves the division of a population into
smaller subgroups known as strata. (True or False?)
By definition, stratified random sampling involves the population being divided into smaller subgroups. These smaller groups are known as strata and the size of the sample from each group reflects the size of that group within the population. Hence, this statement is true about stratified sampling. - The stratified random sample is a statistical measurement tool. (True or False?)
A stratified random sample reflects the proportions of the distinct subgroups, or strata, within a population. Measuring the population, and hence, a sample, in this way, we are maintaining the proportions inherent within the population so that statistical results and predictions gained from the sample data reflect the true makeup of the population. By this token, the stratified random sample is a statistical measurement tool. Hence, this statement is true about stratified sampling.
Hence, we find that only statement C is not true about stratified sampling.
In our next example, we calculate the sample size for a stratum within a population.
Example 3: Calculating the Sample Size of a Stratum given the Proportion of Sample Needed
In an HR study about the salaries in a certain company with 1β β000 employees, the employees were divided into males and females. If the total percentage of females in the company was 60 percent and a sample of 40 people was selected, what was the number of males in the sample?
Answer
Since the population, that is, the employees in the company, naturally subdivides into two strata, male and female, we use stratified, or layered, random sampling as the sampling method. This means that the sample reflects the proportions of male and female employees within the company.
Since 60 percent of the employees were female, 60 percent of the sample must also have been female. This means that the remainder, that is, 100β60=40 percent, of the sample must have been male. We are told that the sample consisted of 40 people. Hence, 40 percent of those 40 people must have been male. That is, 40%Γ40=40100 Γ40=16.peopleinthesampleweremale
Example 4: Calculating the Sample Size of a Stratum given the Stratum and Population Sizes
Adel needs to conduct a study to determine whether the students in his school like playing football. He decides to divide the students into two groups, boys and girls, knowing that the school has a total of 200 students, 80 of whom are girls.
If Adel decides that his sample size will be 50, how many girls should he select for the study?
Answer
Since the population of students is split into 2 distinct strata, that is, boys and girls, the appropriate sampling method is stratified, or layered, random sampling.
A stratified random sample is one that combines a number of separate random samples taken from distinct groups within the population. The sample size for each group reflects the proportion of that group, or stratum, within the population.
Applying this to our population of students, 80 out of 200 students are girls. Therefore, the proportion of girls is 80200=0.4, which as a percentage is 0.4Γ100%=40%.
This means that to reflect the proportions of boys and girls in the population, 40% of Adelβs sample should be girls. Adelβs sample size is 50 students and 40% of 50 is 40100Γ50= 20.
Hence, Adel should select 20 girls for the study.
Note that we could have reached this conclusion in a slightly different way, using a formula for strata sample size. That is, for a population of π elements and an overall sample size of π, the sample size, π , for a single stratum containing π elements is π =ππΓπ.
In our case, π=200, π=80, and π=50 so π =80200Γ50=20.
In our next example, we apply stratified, or layered, random sampling to a population that has been divided into 3 groups.
Example 5: Sample Size of a Stratum given the Sample Size of Other Strata and the Population Size
A scientist decides to conduct a survey on the effects of a certain medicine in a city of 100β β000 people. He divides them into three groups based on their region: city center, outer city, and suburbs. There are 10β β000 people in the suburbs and 30β β000 people in the outer city. If the scientist decides to take a sample of 1β β000 people, how many people from the suburbs should be included?
Answer
Since the city is divided into three distinct groups, or strata, an appropriate sampling method is stratified, or layered, random sampling.
We recall that a stratified random sample is one that combines a number of separate random samples, taken from distinct groups within the population. The size of the sample from each group reflects the proportion of that group, or stratum, within the population.
In our case, we know the total population and the number of people in the suburbs and outer city, but not the city center: Popul ation:Suburbs:Outercity:Citycenter:100 0001000030000?
Although we do not need to know the number of people in the city center in order to answer the question, we note that there must be 100000β(10000+30000)=60000 people in the city center.
The scientist wishes to take a representative sample of 1β β000 people from the population, and we are asked how many of these should be selected from the suburbs. Applying stratified random sampling, the proportion of people from the suburbs in the sample must be the same as the proportion of people from the suburbs in the whole population. There are 10β β000 people in the suburbs, and as a proportion of the total population, this is 10000100000=0.1.
As a percentage, that is 0.1Γ100%=10%. Hence, 10% of the sample should be people from the suburbs. If the sample size is 1β β000 people, then 10% of this is 10100Γ1000 =100.
Therefore, 100 people from the suburbs should be included in the sample.
Note that we could have reached this conclusion in a slightly different way, using a formula for strata sample size. That is, for a population of π elements and an overall sample size of π, the sample size, π , for a single stratum containing π elements is π =ππΓπ.
In our case, π=10000 0, π=10000, and π=1000 so π =10000100000Γ1000=100.
Related to stratified, or layered, random sampling is a method of random sampling used in estimating population sizes known as the captureβrecapture method. Letβs look at an example.
Suppose that as part of a large rehousing project, a cat rescue center wishes to estimate the population of stray cats within a particular urban area.
On one day, 20 stray cats are captured, tagged, and released. The next day, 12 cats are captured, 4 of which are found to be tagged. As a proportion, 412=13β0.33; that is, one-third, or approximately 33%, of the cats captured on day 2 had tags.
We can assume that the same proportion of cats were tagged from the whole population. Hence, we estimate that 33% of the population comprised 20 cats. If this is one-third of the population, then the total population is three times this. That is, 3Γ20=60 cats.
Definition: The CaptureβRecapture Method for Estimating Population Size
Equating capture with random selection from a population to estimate the population size, π, let π be the number of the population that are initially captured, tagged, and then released.
If π is the number of members of the population that are subsequently captured and π is the number of those that are found to be tagged, then the overall population size is given by π=πππ.
In our stray cats example above, we have π=20, π=12, and π=4. Hence, π=πππ=20Γ124=60.
We can define this method alternatively as follows.
Example 6: Using the CaptureβRecapture Method to Estimate the Size of a Population
In an HR study about the salaries in a certain company, the employees are divided into males and females. The total percentage of females in the company is 60 percent. A sample of 10 employees is selected from the company. The males in that sample represent 5 percent of the males in the company. What is the total number of employees in that company?
Answer
To begin with, we note that 60 percent of employees in the company are female and that the employees are divided into males and females. This means that 100β60=40 percent of employees must be male. If we let π be the total number of employees in the company, then the number of male employees is 40 percent of π, that is, 40 100Γπ, or 0.4π .
To find the total number of employees, π, we use the captureβrecapture formula. This tells us that the population size π=πππ, where π is the number initially captured, tagged, and then released; π is the number subsequently captured; and π is the number of those found to be tagged.
In our case, identifying βall male employeesβ as those βcaptured, tagged, and released,β we have π=0.4π.
From the question, we know that our sample size, that is, the number subsequently βcaptured,β π, is equal to 10. Further, the males in this subsequent sample represent 5 percent of the males in the company. This means that π=5%Γ40%Γπ=0.05Γ0.4Γπ=0.02π.
Hence, we have π=0.4π,π=0.02π,π=10.
Substituting these values into the captureβrecapture formula for population size then gives us π=0.4πΓ100.02π=200.
Hence, the total number of employees in the company is 200.
We complete this explainer by summarizing some of the key points.
Key Points
- A random sample is a subset of elements selected from a population, such that each member of the population has some chance of selection. A simple random sample is a sample in which each member of the population has an equal chance of selection.
- A stratified or layered random sample is a sampling method used when a population may be subdivided into smaller distinct groups or strata. Random samples are taken from each stratum, the sizes of which are in the same proportion as those of each stratum within the population. These smaller samples are then combined to form a representative sample of the whole population.
- For a population of π elements and an overall sample size of π, we use the following formula to calculate the sample size, π , for a single stratum containing π elements: π =ππΓπ. Alternatively, if we know the percentage of the total population, π%, that belong to a single stratum, the sample size for that stratum is given by π%Γπ.
- The captureβrecapture method is a proportional sampling method used to estimate overall population size, π, such that π=ππ π. Here, π is the number of population members initially captured, tagged, and released; π is the number of population members subsequently captured; and π is the number of those found to have been tagged.