Which sampling method makes use of the sampling interval taken every Kth unit from an ordered population?

Random Sampling

Andrew F. Siegel, Michael R. Wagner, in Practical Business Statistics (Eighth Edition), 2022

A systematic sample is obtained by selecting a single, random starting place in the frame and then taking units separated by a fixed interval. It is easy and convenient to select a sample by taking, say, every fifth unit from the frame, as illustrated in Fig. 8.5.2. It is even possible to introduce some randomness to this sampling method by selecting the starting place at random. But this systematic sampling method has some serious problems because it is impossible to assess its precision. If you wish to select a systematic sample of n from a population of N, your interval between selected items will be N/n.15 If you select the starting place as a random digit between 1 and N/n, the sample average will be a reasonable estimate of the population mean in the sense that it will be unbiased; that is, it will not be regularly too high or too low. This is the good news.

Which sampling method makes use of the sampling interval taken every Kth unit from an ordered population?

Fig. 8.5.2. A systematic sample is made through regular selection from the population. In this case, every fifth population unit is selected, beginning with number 3 of the frame.

The bad news is that you cannot know how good your estimate is. When you ask, “What’s the standard error?” the answer is, “Who knows? The sample is not sufficiently random.” In the words of W. Edwards Deming (who is famous for, among other things, bringing quality to Japanese products):

One method of sampling, used much in previous years, by me as well as by others, was to take a random start and every kth sampling unit thereafter (a patterned or systematic sample) …. As there is no replication, there is no valid way to compute an unbiased estimate of the variance of an estimate made by this procedure …. The replicated method [random sampling] is so simple to apply that there is no point in taking a chance with an estimate that raises questions.16

One way in which systematic sampling can fail is when the list is ordered in an important, meaningful way. In this case, your random start determines how large your estimate will be so that a low starting number, for example, guarantees a low estimate.

A more serious failure of systematic sampling occurs if there is a repetitive pattern in the frame that matches your sampling interval. For example, if every 50th car that is produced gets special care and attention along the assembly line, and if you just happen to select every such 50th car to be in your systematic sample, your results will be completely useless in terms of representing the quality of typical cars.

So the reviews of systematic sampling are mixed. You might feel justified in using a systematic sample if (1) you are reasonably sure that there is no important ordering in the frame, (2) there are no important repetitive patterns in the frame, (3) you do not need to assess the quality of your estimate, and (4) you are sure that nobody will challenge your wisdom in selecting a systematic instead of a random sample.

Because a proper random sample will usually not cost very much more than a systematic sample, you may wonder why systematic samples are still used in some areas of business. So do we.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128200254000087

Systematic Sampling

Raghunath Arnab, in Survey Sampling Theory and Applications, 2017

4.6.1.5 Splitting of a Systematic Sample

Koop (1971) divided the systematic sample into two samples each of size n/2 (assuming integer) and proposed the following variance estimator

(4.6.6)vˆ6=14( y¯A−y¯B)2

where y¯A and y¯B are the sample means based on odd and even labels of the systematic sample s. Koop (1971) derived the expression of bias relative to its variance in terms of the intraclass correlation coefficient.

Now noting y¯s=12(y¯A+y¯B), we find

V(y¯ s)=12(1+ρ0)σ02=(1+ρ0)(1−ρ0)E(vˆ 6)

where σ02=V(y¯A)= V(y¯B) and ρ0 is the correlation coefficient between the subsample means. In case correlation coefficient ρ0 is known from the past survey, an almost unbiased estimator of V (y¯s) is given by

(4.6.7)vˆ7=Vˆ[y¯(s)]=1+ρ01−ρ0vˆ6

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128118481000042

William L. Thompson, ... Charles Gowan, in Monitoring Vertebrate Populations, 1998

2.3.3 SYSTEMATIC SAMPLING WITH A RANDOM START

A sampling procedure designed to simplify the selection process and select plots distributed across the sampling frame is called systematic sampling (Schaeffer et al., 1990). A systematic sampling procedure randomly chooses a plot from the first ks plots and then includes every ksth plot after (Scheaffer et al., 1990). Therefore, systematic sampling as we use it implies a random starting point. Choice of ks depends on desired sample size; if there are 500 sampling units and desired sample size is 20, then ks would be 500/20 = 25.

If ks does not divide evenly into the total number of plots, then biased estimates will result from drawing a 1 -in-ks sample. The reason for this is that not all systematic samples will contain the same number of plots. For instance, suppose we wish to take a 1 -in-4 systematic sample from a sampling frame of 10 plots. There are four possible systematic samples: (1, 5, 9), (2, 6, 10), (3, 7), and (4, 8). Thus, two of the possible samples are based on 3 plots, whereas the other two are based on 2 plots. This results in biased estimates. However, Cochran (1977) suggested that the magnitude of this bias is trivial when sample sizes exceed 50 and is likely insignificant even when sample sizes are small. Therefore, we will ignore this possible source of bias in subsequent discussions. We refer interested readers to Cochran (1977, p. 206) and Levy and Lemeshow (1991, pp. 82–84) for descriptions of two modifications of the systematic sampling method that always yield unbiased estimates of means, totals, and proportions. Further, the Horvitz-Thompson estimator (Horvitz and Thompson, 1952) listed in Appendix C may be used to achieve the same result (Overton and Stehman, 1995).

Systematic sampling does not require a well-defined sampling frame. If distance or dimension data are available for an area of interest, then a 1-in-ks sample may be drawn based on some distance. For instance, a biologist wishes to select five sampling points using a 1-in-10 systematic sampling design along 50 km of a riparian area. She randomly chooses a point within the first 10 km (e.g., 2.5 km) to use as her starting point. She then would have sampling points at 2.5, 12.5, 22.5, 32.5, and 42.5 km.

Although a single systematic sample improves sample coverage over a frame, it is based only on a single random point so its variance estimates must be approximated. Just as a minimum of 2 plots must be randomly selected from each stratum to directly compute variance estimates, so too must at least 2 random points be selected for systematic sampling. The variance approximation for a single systematic sample is based on the variance estimator for a simple random sample and is unbiased only if animals are randomly distributed across a frame. The magnitude of this bias will depend on the underlying distribution of animals.

Drawing a single systematic sample from a frame with either periodically arranged or spatially autocorrelated individuals will likely produce biased estimates of abundance and variance. Applying the Horvitz-Thompson estimator (Horvitz and Thompson, 1952) will yield unbiased abundance estimates regardless of the underlying arrangement of animals (Overton and Stehman, 1995); however, variance estimates still would be underestimated. Spatial autocorrelation can occur due to habitat. That is, two adjacent units occurring in habitat supporting a large number of animals are much more likely to both contain high numbers than two nonadjacent units occurring in different habitats with different animal densities. An example of periodicity could be the pool–riffle–run sequence in rivers. However, because these stream units vary in size, a systematic sample based on distances is unlikely to coincide exactly with only one or mostly one stream habitat type. Periodic variation is unlikely to occur in most situations because of the clumped distributions of many biological populations (Milne, 1959; Krebs, 1989). Further, periodicity will probably not occur at the scale at which most fish and wildlife populations are sampled.

Taking repeated systematic samples will yield unbiased estimates of abundance and variance in any situation. Instead of randomly selecting 1 point or plot within the first ks, we could randomly select more than 1 and use the resulting counts from each to calculate an overall estimate of abundance and variance (Fig. 2.6). Or, multiple starting points could be selected across the sampling frame (Scheaffer et al., 1990). Unfortunately, a repeated systematic sample on average, will yield a less precise estimate than a single one when based on the same number of plots. For instance, a single sample of 10 plots will produce a more precise estimate than two samples of 5 plots each. Larger repeated samples would produce more precise estimates, but also would increase overall sample size. Estimators for repeated systematic sampling are similar to those for simple random sampling except they are based on multiple samples rather than simply multiple observations within a sample; hence, notation must be modified (Scheaffer et al., 1990, pp. 221–222) to

Which sampling method makes use of the sampling interval taken every Kth unit from an ordered population?

Figure 2.6. A repeated systematic sample, each sample containing five plots, from a sampling frame.

(2.7)N¯*=∑j=1HsN¯jns.

(2.8)Nˆ=U×N¯*

and

(2.9)Vˆar(Nˆ)=U(1−uU)∑j= 1ns(N¯j−N¯*) 2ns(ns−1),

where Nj is the arithmetic mean of the jth systematic sample, ns is the number of systematic samples chosen, and N¯* is the overall mean across all systematic samples chosen. All other terms are defined as before.

Selecting a repeated systematic sample begins with setting its sampling interval, ks*, which is obtained by multiplying the number of repeated systematic samples chosen by the sampling interval for a single systematic sample (ks=U÷u),orks*=nsks. Then, ns random starting points are selected from the first ks* plots. Finally, the ks* value is added as a constant to each random starting value until u/ns numbers are picked between each starting point and U. For example, say we wanted ns = 2 systematic samples totaling u = 10 plots from a sampling frame of U = 100 plots. Our sampling interval for a single systematic sample would be ks = 100 ÷ 10 = 10 or 1-in-10; hence, the sampling interval for 2 systematic samples would be ks* = 2 × 10 = 20 or 1-in-20. We then randomly choose two numbers without replacement between 1 and 20, say, 3 and 19. Next, we generate our string of 10 ÷ 2 = 5 plot values for each systematic sample by consecutively adding 20 to each starting point. Thus, our first systematic sample would contain plot numbers 3, 23, 43, 63, and 83, whereas our second would contain 19, 39, 59, 79, and 99. We further illustrate this procedure in Example 2.3, but may require a larger sample size (also see Scheaffer et al., 1990, pp. 222–223). The sample size formula for repeated systematic sampling is in Appendix C.

Even though systematic sampling usually provides better coverage than simple random sampling, we can see from Fig. 2.6 and Example 2.3 that this is not always the case. That is, even a repeated systematic sample missed the majority of the clumped distribution of “animals” in Fig. 2.6 (i.e., a spatially autocorrelated population). A remedy for this is to stratify the frame and then take repeated systematic samples within each strata. This would pretty well ensure good coverage of the frame and avoid problems that occurred in Example 2.3.

Example 2.3. One-Stage Repeated Systematic Sample

Two systematic samples were taken from a sampling frame containing 100 plots (Fig. 2.6). The sampling interval for each. ks, was set at 10 plots to correspond to an overall sample size of 100/10 = 10; therefore the overall sampling interval was k s* = (2)(10) = 20. Two random numbers were chosen between 1 and 20 and used as the starting points for each sample. The first systematic sample had counts of 0, 0, 0, 0, and 1: the second had counts of 1, 0, 1, 0, and 0. Therefore, the mean of the first sample. . was (0 + 0 + 0 + 0 + 1)/5 or 0.2. whereas the mean for the second. N¯1 was (1 – 0 + 1 – 0 + 0)/5 or 0.4. The overall mean across both samples was:

N¯*=0.2+0.42=0.3.

The estimates for abundance and variance were and

Nˆ=(100)(0.3)=30

and

Vˆar(Nˆ)=(100)2(1−10100)(0.2−0.3)2+ (0.4−0.3)22(2−1)=90.0.

Although the above variance estimate is considerably smaller than those in Example 2.2, the abundance estimate is not even close to the true abundance of 100. A better approach would be to stratify the sampling frame and then take a repeated systematic sample within each stratum.

Single systematic samples are commonly used in fish and wildlife studies when a systematic scheme is employed. As stated previously, periodicity in animal distributions is probably not a problem but spatial autocorrelation likely is (e.g., Fig. 2.6). Stratifying a frame by habitat may alleviate this difficulty, but without knowing the exact distribution of animals, stratification alone is probably not the answer. Although a repeated systematic sample within each stratum does not offer the precision of a single systematic sample, it will produce an unbiased estimate of variance; data can be collected from two systematic samples at the same time as long as plot information from each sample is recorded separately, so there should not be a loss in efficiency when conducting a repeated sample. Therefore, we recommend a repeated systematic sample when a systematic design is used.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780126889604500022

Spatial Sampling

Peter A. Rogerson, in Encyclopedia of Social Measurement, 2005

Systematic Spatial Sampling

With a list of N sample elements, a systematic sample of n may be chosen by dividing the list into n equal parts (where for simplicity it is assumed that N/n is an integer). The first observation is taken randomly from among the first N/n on the list; suppose we label this observation x, where 1 ≤ x ≤ N/n. Then the remainder of the sample is chosen by taking as the next observations x + N/n, x + 2 N/n, x + 3 N/n, and so on.

Systematic sampling in such aspatial situations is often done as a matter of convenience. Generalizations of systematic sampling to the spatial case are desirable because they ensure comprehensive coverage of the study area. In addition, the likelihood of collecting redundant information from spatially dependent nearby locations is reduced to a minimum.

One approach to systematic spatial sampling is shown in Fig. 4. A study area is first divided into square cells, and then a point (e.g., point A) is chosen randomly within the first cell. Points are next chosen at the same relative locations in the remaining cells. There are numerous variations of this procedure. For example, when the sampling points are taken to be the center of each cell, the design is known as a centric systematic sample.

Which sampling method makes use of the sampling interval taken every Kth unit from an ordered population?

Figure 4. Systematic spatial sampling.

One potential though uncommon difficulty with systematic spatial sampling is that spatial periodicities may affect the estimate. For example, suppose that the housing prices in an area are a function of location, and in particular are a function of elevation. There are some areas where housing prices are higher at higher elevations (because of the scenic amenities), and others where housing prices are higher at lower elevations (due to accessibility considerations). In either case, if hills and valleys are systematically spaced at roughly equal distances apart, it could be a mistake to take a systematic sample of housing prices because of the potential that the sampled locations would correspond entirely to high (or low) elevation locations. Judicious choice of the sampling interval is therefore called for. The problem may be avoided entirely where this possibility exists by geographic stratification.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0123693985003467

Study Designs

Ronald N. Forthofer, ... Mike Hernandez, in Biostatistics (Second Edition), 2007

6.3.4 Systematic Sampling

Systematic sampling is commonly used as an alternative to SRS because of its simplicity. It selects every kth element after a random start. Its procedural tasks are simple, and the process can easily be checked, whereas it is difficult to verify SRS by examining the results. It is often used in the final stage of multistage sampling when the field worker is instructed to select a predetermined proportion of units from the listing of dwellings in a street block. The systematic sampling procedure assigns each element in a population the same probability of being selected. This assures that the sample mean will be an unbiased estimate of the population mean when the number of elements in the population (N) is equal to k times the number of elements in the sample (n). If N is not exactly nk, then the equal probability is not guaranteed, although this problem can be ignored when N is large. When N is not exactly nk, we can use the circular systematic sampling scheme. In this scheme, the random starting point is selected between 1 and N (any element can be the starting point) and every kth element is selected assuming that the frame is circular (i.e., the end of list is connected to the beginning of the list).

Example 6.9

Suppose that we are taking a 1-in-4 systematic sample from a population of 11 elements: A, B, C, D, E, F, G, H, I, J, and K. Four possible samples can be drawn using the ordinary systematic sampling scheme and 11 possible samples using the circular systematic sampling. The possible samples and their selection probabilities using the ordinary systematic sampling and circular systematic sampling are shown in Table 6.2.

Table 6.2. Possible samples and selection probabilities taking 1-in-4 systematic samples from N = 11, using two different selection schemes.

Ordinary Systematic SamplingCircular Systematic Sampling
SamplesSelection ProbabilitySamplesSelection Probability
1. A E I 3/11 1. A E I 3/11
2. B F J 3/11 2. B F J 3/11
3. C G K 3/11 3. C G K 3/11
4. D H 2/11 4 D H A 3/11
5. E I B 3/11
6. F J C 3/11
7. G K D 3/11
8. H A E 3/11
9. I B F 3/11
10. J C G 3/11
11. K D H 3/11

Ordinary systematic sampling does not guarantee equal probability sampling. For example, here the fourth sample has a different selection probability. Under the circular systematic sampling, each element can be a starting point and equal probability sampling is guaranteed in this scheme.

Systematic sampling is convenient to use, but it can give an unrealistic estimate when the elements in the frame are listed in a cyclical manner with respect to a survey variable and the selection interval coincides with the listing cycle. For example, if one selects every 40th patient coming to a clinic and the average daily patient load is about 40, then the resulting sample would contain only those who came to the clinic at a certain time of the day. Such a sample may not be representative of the clinic patients. Moreover, even when the listing is randomly ordered, unlike SRS, different sets of elements may have unequal inclusion probabilities. For example, the probability of including both the ith and (i + k)th element is 1/k in a systematic sample, whereas the probability of including both the ith and (i + k + 1)th element is zero. This situation complicates the variance calculation.

Another way of viewing systematic sampling is that it is equivalent to selecting one cluster from k systematically formed clusters of n elements each. The sampling variance (between clusters) cannot be estimated from the one cluster selected. Thus, variance estimation from a systematic sample requires special strategies.

A modification to overcome these problems with systematic sampling is the so-called repeated systematic sampling. Instead of taking a systematic sample in one pass through the list, several smaller systematic samples are selected going down the list several times with a new starting point in each pass. This procedure not only guards against possible periodicity in the frame but also allows variance estimation directly from the data. The variance of an estimate from all subsamples can be estimated from the variability of the separate estimates from each subsample.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012369492850011X

Enumeration Methods

William L. Thompson, ... Charles Gowan, in Monitoring Vertebrate Populations, 1998

Line-Intercept Sampling

Becker (1991) presented a method of obtaining unbiased estimates of abundance that was based on probability of transects intercepting animal tracks. The first design applies when an animal's set of tracks can be identified and followed to its beginning and end. The second assumes that the number of animals crossing a transect line can be estimated based on movement data from a random sample of radio-marked animals. We will discuss the first design in more detail.

The probability of the kth track being intercepted depends on the horizontal distance (dk) from one end of the track to the other compared to the width (W) of the x axis of a plot or study area boundary (Fig. 3.11). A repeated systematic random sample, based on a predetermined number of transects, is conducted on each plot or frame to obtain unbiased population estimates. Transects should be of equal length and spaced far enough apart so that tracks will not cross more than one transect.

Which sampling method makes use of the sampling interval taken every Kth unit from an ordered population?

Figure 3.11. A single systematic sample of three transects within a study area. Four sets of tracks are intercepted by transects but horizontal distances of only three are included; not enough of the fourth set of tracks is contained within the boundaries to warrant its inclusion in the sample. Note that the horizontal distance (d2) of the second set of tracks is a sum of two segments because the entire set of tracks is only partially contained within the boundaries.

The probability (pk) that the kth track is included in 1 of the systematic samples of transects is computed using the formula (Becker, 1991, p. 732)

(3.6)pk=dk(W/q)

for tracks with horizontal distances less than or equal to W/q (pk = 1 otherwise), where q is the number of transects surveyed in each systematic sample and each track is assumed to be associated with a single individual. The abundance estimate for each systematic sample is calculated using

(3.7)Nˆj=∑k∈Sj1pk.

where Σk∈Sj means that the quantity is summed over all tracks contained in the jth systematic sample (Sj). The previous equation may be redefined in terms of groups if more than one individual is associated with a given set of tracks. The “1” in the numerator would be replaced with gk, which is the size of the group associated with the kth track. In either case, the total abundance is estimated by (Becker, 1991. p. 732)

(3.8)Nˆ=∑j=1nNˆjns

with a variance estimator of

(3.9)Varˆ(Nˆ)=∑j=1k(Nˆj−Nˆ)2ns(ns−1).

where ns is the number of systematic samples. Use of these formulas is demonstrated in Example 3.4.

Example 3.4 Line-Intercept Sampling of Animal Tracks

Becker (1991) presented an example of line-intercept sampling for tracks applied to a wolverine population in southcentral Alaska. Four systematic samples were conducted on the study area that was 58 km wide (i.e., W = 58 km). Each sample was composed of q = 3 transects surveyed for tracks via airplane. Four sets of wolverine tracks, two of which were made by a group of two animals, were intercepted by transects during the course of the entire survey and yielded the following data (modified from Table 2 in Becker, 1991. p. 734).

Set No.Group sizeHorizontal distance (km) of tracksProbability of interception
1 1 8.75 0.453
2 2 12.25 0.634
3 2 3.50 0.181
4 1 9.75 0.504

The probability that a given set of tracks was intercepted by a systematic sample of transects was computed using Eq. (3.6). For example. pk for the first set of tracks was p1 = (8.75 km)/(58 km/3) = 0.453. The first, second, and fourth set of tracks were encountered in the first systematic sample, the first and second set were intercepted in the second sample, and the third and fourth set were encountered in both the third and fourth systematic samples. We use these data in Eq. (3.7) (modified for groups, i.e., the numerator now reflects group size and may be greater than 1) to obtain abundance estimates for each systematic sample. The estimated abundance for the first systematic sample was

Nˆ1=10.453+20.634+10.504=7.35.

with Nˆ2=5.36, Nˆ3=13.03. and Nˆ4=13.03. Therefore, the overall abundance estimate [Eq. (3.8)] was Nˆ=(7.35+5.36+13.03)/4=9.69 with a variance estimate [Eq. (3.9)] of

Vaˆr(Nˆ)=(7.35−9.69)2+…+(13.03−9.69)24(4 −1)=3.88.

Critical assumptions for this method are that all animals move during the course of the study, all animal tracks of the species of interest are readily recognizable, all animal tracks are continuous, animal movements are independent of the sampling process (i.e., animals do not move in response to the observer so track lengths are fixed), pre- and post-snowstorm tracks can be distinguishable, all animal tracks that intercept sampled transects are observed, the study area is rectangular in shape, and all the transects are oriented perpendicular to a specified reference axis (x axis) (Becker, 1991). Only that part of a set of tracks contained within the boundaries of the plot or study area is included in calculating horizontal distances. Tracks with more than half of their horizontal distance outside of the boundaries are not included in the survey (e.g., the set of tracks at the top and center in Fig. 3.11).

A number of the underlying assumptions of this method preclude it from general application, i.e., its proper use is likely limited to specific situations. For instance, larger animals tend to have larger daily movements and lower densities so surveys on foot are probably limited to studies of smaller animals because of logistic constraints. In addition, ground surveys would be difficult, and possibly dangerous, in areas of rugged terrain (i.e., areas susceptible to avalanches); use of snowmobiles (or other motorized vehicles) to survey lines and intercepted tracks will probably cause a flight response in the target species, which would violate a key assumption of fixed track lengths. Further, aerial surveys are limited to areas of open habitats; obstruction of tracks by overhead vegetation would preclude its use. Even in open habitats, tracks of the target species must be readily discernable from the air from tracks of other resident species. However, despite these and other related difficulties, Becker's method could be useful in situations where appropriate assumptions are satisfied.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780126889604500034

Descriptive statistics

Kandethody M. Ramachandran, Chris P. Tsokos, in Mathematical Statistics with Applications in R (Third Edition), 2021

1.7 Chapter summary

In this chapter, we dealt with some basic aspects of descriptive statistics. First we gave basic definitions of terms such as population and sample. Some sampling techniques were discussed. We learned about some graphical presentations in Section 1.4. In Section 1.5 we dealt with descriptive statistics, in which we learned how to find mean, median, and variance and how to identify outliers. A brief discussion of the technology and statistics was given in Section 1.6. All the examples given in this chapter are for a univariate population, in which each measurement consists of a single value. Many populations are multivariate, where measurements consist of more than one value. For example, we may be interested in finding a relationship between blood sugar level and age, or between body height and weight. These types of problems will be discussed in Chapter 8.

In practice, it is always better to run descriptive statistics as a check on one's data. The graphical and numerical descriptive measures can be used to verify that the measurements are sound and that there are no obvious errors due to collection or coding.

We now list some of the key definitions introduced in this chapter.

Population

Sample

Statistical inference

Quantitative data

Qualitative or categorical data

Cross-sectional data

Time series data

Simple random sample

Systematic sample

Stratified sample

Proportional stratified sampling

Cluster sampling

Multiphase sampling

Relative frequency

Cumulative relative frequency

Bar graph

Pie chart

Histogram

Sample mean

Sample variance

Sample standard deviation

Median

Interquartile range

Mode

Mean

Empirical rule

Box plots

In this chapter, we have also introduced the following important concepts and procedures:

General procedure for data collection

Some advantages of simple random sampling

Steps for selecting a stratified sample

Procedures to construct frequency and relative frequency tables and graphical representations such as stem-and-leaf displays, bar graphs, pie charts, histograms, and box plots

Procedures to calculate measures of central tendency, such as mean and median, as well as measures of dispersion such as the variance and standard deviation for both ungrouped and grouped data

Guidelines for the construction of frequency tables and histograms

Procedures to construct a box plot

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128178157000014

Content Analysis and Television

Erica Scharrer, in Encyclopedia of Social Measurement, 2005

Sampling

Sampling decisions are also not unique to content analysis, although there are particular opportunities and challenges that arise in sampling television content. The objective is to gather a sufficient and appropriate amount and type of television content to test the hypotheses and/or examine the research questions. Therefore, content analysis samples, like other samples in social science research, are ideally fairly large and representative of the general population about which the researcher wishes to infer.

Probability sampling techniques that ensure that each unit in the population has an equal chance of being included in the sample can be utilized. A simple random sample may be generated, for instance, by compiling a list of all programs, episodes, or characters (depending on the unit of analysis) in the population and then using a table of random numbers to determine which will be selected to comprise the sample. Similarly, if such a list can be created, a systematic sample can be derived by choosing every Nth entry in the list (e.g., every fifth, or every tenth, depending on the number of items in the list and the ultimate sample size desired).

However, restrictions concerning time, money, and access result in the fairly frequent use of nonprobability samples in studies of television content. There are two main types. First, convenience samples are selected simply because particular types of content are available to the researcher, rather than being chosen for any theoretical or methodological reason. Examples of convenience samples in television content research include sampling three episodes each of the currently running situation comedies (“sitcoms”) to represent the genre, or relying on syndicated programs from decades past to investigate changes in content over time. Second, purposive samples are drawn by selecting particular content for a conceptually valid reason, but not ensuring that every element of the population had an equal likelihood of selection. A very common example in studies of television content is the use of a composite or a constructed time period to sample. Researchers interested in prime-time programming on the networks, for instance, may videotape ABC, CBS, NBC, Fox, WB, and UPN for one full week from 8 to 11 pm. If they are careful to avoid atypical weeks, such as “sweeps” week (when programming may be unusual because of heightened competition for ratings) or a period with holiday specials rather than general programming, researchers could argue for the representativeness of the week. However, as with a convenience sample, inferences about the relationship between findings in these nonprobability samples and the overall population should be couched in cautious terms.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0123693985005053

Design of Single-Season Occupancy Studies

Darryl I. MacKenzie, ... James E. Hines, in Occupancy Estimation and Modeling (Second Edition), 2018

11.3 Unit Selection

In earlier chapters we briefly mentioned various probabilistic sampling schemes that one might use to select a sample of units from the population or area of interest (e.g., simple random sampling; stratified random sampling; etc.). The importance of having a probabilistic sampling scheme is to be able to generalize the results from the analysis of data collected from the specific study units, to the wider population of interest. If there is no intent to generalize the results beyond the actual study units sampled, a probabilistic sampling scheme is not required (i.e., the study units represent the entire population of interest). However, if no probabilistic sampling scheme is used (i.e., units are selected purely in a haphazard manner or because of convenience), there is no statistical basis for generalizing the results to beyond the specific study units.

Implicitly, the methods detailed in this book assume that the study units represent a simple random sample from the wider population. If a stratified random sample is used, then one could analyze the data for each stratum separately (or within a single analysis with covariates defined to represent each stratum), and then combine the estimates for each stratum using standard results (e.g., Cochran, 1977; Thompson, 2002; Gould et al., 2012). If a systematic sample is used (e.g., by sampling at regularly-spaced cells on a grid) then, as in any systematic sample, it is assumed the occupancy status of each unit is random relative to the sampled cells. Random placement of the initial location, which determines the other sample locations, helps to justify this assumption.

For other sampling schemes there may be additional considerations that may limit the inferences that can be made from the results, or modifications of the analysis methods may be required. Selecting clusters of units is sometimes desirable from a logistical perspective. For example, one can select a vertex of a grid then and survey the four surrounding cells, or place units along a transect. In such cases the surveyed units have not been selected independently of each other, hence the effective sample size may be smaller than the number of units surveyed. If the occurrence of the species is spatially random at the scale of the defined units there may be little penalty, but in the presence of spatial correlation the effective sample size can be drastically reduced. The methods detailed in Section 6.6 to incorporate spatial correlation should be considered to analyze such data, or the multi-scale model (Section 6.3) might also be useful such that unit-level occupancy is made to be conditional upon cluster-level occupancy. However, from a study design perspective the relative benefits of a logistically convenient sampling scheme that may require a more complex analysis with a reduced effective sample size, should be compared to a scenario that is less logistically convenient. It may be possible to achieve a similar level of precision in the estimates by actually going to fewer units that are independently selected, resulting in less overall field work. A simulation study to assess the relative tradeoffs among potential designs is generally recommended.

Where random sampling occurs at multiple levels, e.g., first a random sample of forest stands within a region (primary-level units), and then a random sample of plots from within the selected stands (secondary-level units), with occupancy data being collected at the secondary-level units, the exact method of analysis depends upon the level at which inference is desired. No additional structure to the modeling is required if the results are not generalized beyond the selected primary-level units (e.g., inference is limited to the plots within the selected stands), as this sample represents the entire population of interest. However, if inference is to be generalized beyond the initial sample of primary-level units (e.g., to all stands within the region), then the method of analysis should reflect that random sampling has occurred at multiple levels. We believe the easiest way to accomplish this would be through the inclusion of a cluster random effect term in a Bayesian analysis. The multi-scale occupancy model (Section 6.3) might be also considered to analyze such data to allow for some localized spatial correlation.

If an adaptive sampling scheme (Thompson and Seber, 1996; Thompson, 2002) is used (i.e., neighboring units are included in the sample once the species is detected at a focal unit), then an important consideration is that detection probability not only affects the ability to observe occupancy, but also influences which set of units is eventually included in the sample. If the target species is present and detected at a unit, then the neighboring units are subsequently surveyed, however if the species is present but undetected then the neighboring units are not surveyed. The methods in this book would need to be extended to account for the probability of a unit being included in the sample, which is a function of detection probability (Thompson and Seber, 1996; Thompson, 2002). Using a model that accounts for spatial correlation may also be required with such a sampling scheme (Pacifici et al., 2016).

While there are some probabilistic sampling schemes that are not directly compatible with the methods discussed in this book at present, we do not necessarily discourage the use of such schemes (although note that ‘convenience sampling’ is not a probabilistic sampling scheme). We are confident that in the future, techniques will be developed to estimate and model occupancy data collected using these schemes. In addition, it is frequently possible to obtain model-based occupancy estimates for small spatial units using the methods described in this book and then use these estimates in replication-based estimation approaches (e.g., Skalski, 1994). Although this two-step approach may not be optimal, it provides a reasonable approach until more inclusive modeling is completed. However, we urge people to think very carefully about why they wish to use a more complicated scheme if similar results could be obtained using a much simpler design.

A final comment with respect to selecting units is that generally we advise against only selecting units based upon knowledge of their likely occupancy state (e.g., units that were known to be occupied by the species in the recent past, or based upon casual observations), when the population of interest consists of units about which such knowledge is and is not available. Unless this group of units actually represents the population of interest, estimates of occupancy for the entire population may be biased. For example, suppose occupancy is to be estimated within a stream system for a particular salamander species. Within this system, there are locations where the salamanders have been reported as present, based on sightings in the past by members of the public and local herpetologists. If these locations were selected as study units, then the estimated level of occupancy is likely to be higher than for a random sample of study units from throughout the stream system. Alternatively, if interest lies both in locations where the salamanders have been reported and in all other places within the stream system, then these regions could be treated as separate strata within the population (the stream system), and a random sample of units selected from each stratum. We return to this point in Chapter 12 where we consider design issues for multiple-season occupancy studies, but note here that preferential selection of units that are occupied could lead to apparent trends in occupancy even for a population that is currently stable. However, one situation where this type of design may be appropriate, is when the fraction of units that are still occupied now is of direct interest (e.g., the persistence probability for the species over the intervening time period). Now the population of interest consists only of units that were known to have been occupied during the past (e.g., based on museum records, field notes of naturalists and explorers, etc.) and ‘occupancy’ may be interpreted as a measure of persistence accordingly (e.g., Karanth et al., 2010). Asking questions about whether current ‘occupancy’ (of previously occupied units) is lower in human-developed units than less developed units (for example) may be worthwhile study objectives.

The key consideration is that all units within the population of interest must have a non-zero probability of being selected for surveying. Units that have no chance of being surveyed (e.g., because they are considered too difficult to access) are outside the scope of inference for the study. Extending the results to such units has no statistical support and is an act of faith rather than science. It is important to keep in mind that any deviation from a statistical ‘ideal’ in the name of convenience or expediency will have consequences on the type of analysis required (i.e., additional complexity) or the quality of the inferences that can be made (i.e., limitation of scope, additional caveats, or untestable assumptions).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124071971000156

What is the sampling method where every Kth unit is selected from a population?

A method of sampling from a list of the population so that the sample is made up of every kth member on the list, after randomly selecting a starting point from 1 to k.

What does KTH mean in sampling?

In a sample taken from a population, the kth order statistic is the kth smallest element in the sample. We describe the distribution of the kth order statistic when a sample of size n is randomly drawn from the population {1, 2, …, N} (without replacement).

What type of random sampling technique in which every kth element of the population is selected until the desired number of elements in the sample is obtained?

Cluster sampling involves the selection of every kth element in the population until the desired number of elements in the sample is obtained.

Which of the following is a sampling method in which every element of the population?

Probability Sampling This Sampling technique uses randomization to make sure that every element of the population gets an equal chance to be part of the selected sample. It's alternatively known as random sampling. Simple Random Sampling: Every element has an equal chance of getting selected to be the part sample.