17 กันยายน 2547 13:13 น.
หมากเขียบ
RESEARCH FORUM--The Research Sample, Part I: Sampling
Thomas R. Lunsford, MSE, CO
Brenda Rae Lunsford, MS, MAPT
ABSTRACT
The cost of studying an entire population to answer a specific question is usually prohibitive in terms of time, money and resources. Therefore, a subset of subjects representative of a given population must be selected; this is called sampling. The concepts involved in selecting subjects to represent the larger population are presented. Sampling errors and associated determining factors are reviewed.
Definitions of the research populations, including target and accessible groups, are given. The inclusion and exclusion criteria required to refine the accessible population to a researchable subgroup are explained, and an example is provided. The two types of sampling methods, probability and nonprobability, are defined and presented with their respective types. Probability sampling includes simple random sampling, systematic sampling, stratified sampling, cluster sampling and disproportional sampling. Nonprobability sampling includes convenience sampling, consecutive sampling, judgmental sampling, quota sampling and snowball sampling.
The goals and concepts related to recruitment are reviewed with application to survey and experimental research. Three steps are suggested for obtaining an appropriate research sample: (1) clearly define the target population, (2) define the accessible population, and (3) define the steps and effort that will be employed to recruit subjects for study.
Introduction
The first two questions most researchers ask once a research project has been defined are, How many subjects will I need to complete my study, and how will I select them? This article, Part I, will attempt to address the issues related to selecting subjects for a research project. Part II, which will be published in the Fall JPO, will present in detail the factors relevant to determining sample size.
In clinical research it would be ideal to include the entire population when conducting a study; this enables a generalization to be made about the results to the population as a whole. In some cases this has been possible, such as when the 1976 Philadelphia Legionnaires disease epidemic was studied. However, in most cases, the population in question is too large or too spread out over time and distance to allow for measuring or evaluating each member of the population.
Researchers have developed a number of techniques where only a small portion of the total population is sampled, and attempts to generalize the results and conclusions for the entire population are made. There are some distinct advantages and disadvantages in using samples. Advantages include that sampling involves a smaller number of subjects and is more time efficient, less costly and potentially more accurate (since it is more feasible to maintain control over a smaller number of subjects). Disadvantages include potential bias in the selection of subjects, which may lead to error in interpretation of results and decrease in ability to generalize the results beyond the subjects actually studied (1-3).
Cox and West describe a population as a well-defined group of people or objects that share common characteristics (1). All immigrants from Germany or all patients with left hemiplegia are examples of well-defined groups with common characteristics. A population in a research study is a group about which some information is sought. Most researchers cannot include all members of the population in their studies and must resort to limiting the number of subjects to only a sample from the population.
A sample is a small subset of the population that has been chosen to be studied (1,2). The sample should represent the population and have sufficient size so a given innovative orthosis or prosthesis can be subjected to a fair statistical analysis. Unfortunately, all samples deviate from the true nature of the overall population by a certain amount due to chance variations in drawing the samples few cases from the populations many possible members. This is called sampling error and is distinguished from non-chance variations due to determining factors. Determining factors include items such as biased sampling procedures, effects of independent variables, research conditions and other causal agents or circumstances (2).
One of the most famous cases of biased sampling was the 1936 Literary Digest poll before the U.S. presidential election of 1936 (2,3). Two million ballots were mailed out, received back and tabulated; the results confidently predicted the easy election of Landon (57 percent) over Roosevelt. Unfortunately, the names on the mailing list were taken from telephone directories and lists of automobile owners. At that time, only people of certain wealth had telephones and/or drove cars, and there was a strong correlation between those with wealth and a preference for Landon. The larger mass of people without cars or telephones voted for Roosevelt, giving him the largest margin of victory in history at that time. This large error in prediction is a prime example of the consequences that biased sampling can produce.
Many clinical studies do not achieve their intended purposes because the researcher is unable to enroll enough subjects. Therefore, at some point in planning a study consideration should be given to sample size. While the number of subjects studied is important, even more important in a study is that the subjects accurately represent the larger population. In contrast to the previous example where more than two million ballots gave a biased and erroneous result, polls taken by Gallup and Harris in 1968, in which only 2,000 voters were sampled, predicted a victory by Richard Nixon of 41 and 43 percent, respectively. Nixon won by 42.9 percent (2).
Sampling Concepts
Sampling
target population
external validity
Since it will not be practical to recruit every human with spasticity for this study, it is necessary to define an accessible population. The accessible population is a subset of the target population that reflects specific characteristics with respect to age, gender, diagnosis, etc., and who are accessible for study (4).
Therefore, in the AFO footplate example, it is critical to define or characterize the target population before a sample of subjects can be defined. For example, will all patients with spasticity be included? Is the question to be limited to children with diplegia, secondary to cerebral palsy, adults post CVA or adults and children post brain injury? This narrowing and refining of the research question is useful since it more clearly defines the target as well as accessible populations and has direct impact on the external validity of the inferences to be drawn at the conclusion of the study.
Once the specific clinical and demographic characteristics of the accessible population are defined, it is important to consider the geographic and time constraints with which both the researcher and potential subjects will have to contend.
Will the study intervention require more than one visit to the clinic, laboratory or office? If so, how far can subjects be expected to travel, and what means of transport are required to get them there? If the research is to be conducted in a large metropolitan area with good public transportation, repeated trips and distance may not be a problem. Transportation logistics can be an insurmountable problem if not planned for and resolved in designing the research plan.
In the case of the AFO footplate study, one constraint that might be placed on the accessible subjects is that they live within 30 minutes travel by car and are able to commit to four visits within a one-month period of time. This leads to the next major consideration in the sampling process, defining inclusion and exclusion criteria of the accessible population.
Inclusion Criteria
In the above example of specific AFOs for patients with spasticity, it is important to consider the research question and include factors that will enable a homogeneous selection of subjects, e.g., age, gender, diagnosis, degree of spasticity, muscle groups affected, etc. It may be determined that the specific variables under study (footplate contours and spasticity) are more likely to show an effect in the growing child than in the adult. Therefore, one inclusion criterion may be a specific population of children whose ages encompass the growing years.
Since patterns of tone are different depending on diagnosis, it may be desirable to specify cerebral palsy and not include other diagnoses. Also, since loads and deforming forces on feet are different if the subject has bilateral versus unilateral involvement, it may be desirable to include only those subjects with hemiplegia. Therefore, the inclusion criteria for this study may be children between the ages of 2 and 15 (growth with walking years) diagnosed as cerebral palsy with spastic hemiplegia. Also, subjects who live within a specific distance, have convenient and affordable transportation and are able to commit to a specific number of visits may be included. A final inclusion criterion may be parental consent and support.
Exclusion Criteria
Exclusion criteria are applied to subjects who generally meet the inclusion criteria but must be excluded because they cannot complete the study or possess unique characteristics that may confound the results. For example, it may be necessary to exclude subjects with spastic hemiplegia secondary to cerebral palsy who were premature at birth or who have additional medical problems that may affect their outcomes. A child with epilepsy may be taking medication that can also affect his/her muscle tone, which could confound study results. If the ability to walk is an important dependent variable of the study, then subjects who do not walk or who have been walking for less than one year may be excluded. Subjects who may have unreliable sources of transportation or noncompliant parents also may need to be excluded.
An important ethical consideration is the willingness of the subject to participate. In the instance of a study of spastic children, parental permission must be obtained or the subject must be excluded. Also, withholding one treatment to evaluate another may pose a difficult ethical consideration. The exclusion criteria, considering all of the above, may result in sampling guidelines that exclude children who are less than 2 or more than 15 years of age, are not on medication that affects muscle tone, do not walk or have walked less than one year, have inadequate transportation and/or whose parents will agree to participate (see Table 1 ).
Sampling Methods
The process of defining a representative subpopulation to study is called sampling. There are two main categories of sampling, probability and nonprobability.
Probability Sampling
The first potential problem in any system of selection is bias. Bias can occur easily as previously described in the Roosevelt and Landon election of 1936, and it also may be related to researcher preference. Patient volunteers can introduce bias since they tend to be healthier and produce results different from subjects chosen randomly. To avoid selection bias it is important to guarantee that each of the candidates for inclusion in the study has an equal opportunity for selection, That guarantee requires subjects to be selected at random, or that randomization be employed. Randomization is important for two reasons: First, it provides a sample that is not biased, and second, it meets the requirements for statistical validity (2). Several methods exist that can be used to randomly select subjects.
Simple random sampling can be accomplished using an array of random numbers (1) (see Table 2 ). In this table the numbers are grouped into series of five digits. This grouping method is for ease of presentation only; the same numbers could be grouped in twos or threes. For grouping by threes, the first column would contain 104,803,757,042, etc. How the random numbers are organized depends on the size of the population to be studied. Once the random numbers are organized into columns and rows, the researcher must decide where to start in the table and in what direction to proceed.
Suppose it has been decided that there are 900 patients (i.e., the accessible population) with spasticity from which to draw a sample for the AFO footplate study. From this accessible population it is desired to randomly select 90 subjects for the study. The first step is to assign a number from 1 to 900 to each member of the accessible population. Next. the starting number in the table must be selected. (An easy way to do this is to close your eyes and place the point of a pencil on the table.)
Assume that the number selected is 88974 (column 3, row 3), and it has been decided to use the last three digits to determine which subject is selected first. In this case, the last three digits are 974. However, the accessible population numbers range from 1 to 900, and 974 cannot be used. Arbitrarily proceeding downward, the next random number is 48237; therefore, patient 237 will become the first subject selected for the study. The next subjects would be numbers 306, 301,802, 308, etc., until the entire sample of 90 subjects is selected. Obviously, a larger random number table would be required to select 90 subjects.
It is also possible to use all digits in the random number table. Beginning again with the five digit number 88974 and progressing downward, the first subjects selected would be 889, 744, 823, 725, 306, 012, 802, etc., until again all 90 subjects are selected for the sample.
Systematic random sampling is a method frequently chosen for its simplicity because it is a periodic process (1,2,46). This method could be carried out by selecting the first subject randomly as described above and then selecting every second or third subject who comes to the office and meets the inclusion/exclusion criteria. This method, however, is problematic in that other staff who know of the method can manipulate patient appointments to assure inclusion. There is no advantage to this method over simple random sampling (5).
Stratified sampling is a method by which subjects are grouped according to strata such as age, gender or diagnosis (left hemi vs. right hemi), etc. (1,2,4-6). Using this method, subgroups of interest can be defined and equal numbers of subjects sampled for each group. For example, if there was interest in the functional outcomes for use of a certain type of AFO footplate in patients post brain injury, then it would be useful to define age as a subgrouping since age often relates to the functional challenge imposed on various orthoses. For example, a young child may engage in a lot of crawling, jumping, running, etc., when wearing an AFO whereas a senior citizen is more likely to walk cautiously. This permits comparison of the subgroups, such as children (5 to 12), teens (13 to 19), adults (20 to 55) and seniors (56 and up). Using this method, subjects would be recruited randomly for each subgroup, and, although each subgroup would have a different age range, the general inclusion/exclusion criteria would apply to each of the subgroups.
Cluster sampling is a method used to enable random sampling to occur while limiting the time and costs that would otherwise be required to sample from either a very large population or one that is geographically diverse (1,2,4,5). An example of how this might be used is as follows.
To obtain as many subjects as possible and to eliminate any potential bias inherent in selecting subjects from one specific clinic, the researcher may wish to select subjects from all of the hospitals and outpatient clinics within a given area. However, this would be too costly and time-consuming. Therefore, use of the cluster approach is appropriate. Using this method, a one- or two-level randomization process is used. First, each hospital and outpatient clinic that meets the inclusion criteria is identified. Second, one of the selection methods described above is used to randomly select a portion of those facilities. All of the available subjects from the randomly selected facilities could be included, or subjects from each of the randomly selected facilities could themselves be randomly selected. The important element in this process is that each of the facilities and each of the subjects have an equal opportunity to be chosen, with no researcher or facility bias.
Disproportional sampling is a method that facilitates the difficulty encountered with stratified samples of unequal size (2). Suppose, for example, it is desired to conduct a survey of the members of the American Academy of Orthotists and Prosthetists. Also suppose an educational grant has been secured that will support study of only 200 members (subjects) and that the available population in the Academy is 2,000 individuals, in the available population of 2,000 there are 1,700 males and 300 females. Since the 200 subjects needed for the study comprise only 10 percent of the available population, then how many of each gender are required? Simple proportioning suggests that 17/20 (85 percent) of the 200 be males and 3/20 (15 percent) be females. This would result in approximately 170 males and 30 females. The small number of females probably would not provide adequate representation for drawing conclusions about the entire membership.
One way of dealing with this situation is to use a simple random sample and leave the proportional representation to chance; however, unless the sample is unusually large, the differential effect of gender will probably not be controlled (6). A disproportional sampling design will permit random selection of Academy members of adequate size from each category. For example, 100 males and 100 females may be selected. This sample of 200 cannot be considered random because each female has a much greater chance (higher probability) of being chosen.
This approach creates an adequate sample size, but it presents problems for data analysis because the characteristics of one group (in this case, the females) will be overrepresented in the sample. Fortunately, this effect can be controlled by weighting the data so the males receive a proportionally larger mathematical representation in the analysis of scores than the females.
Calculating proportional weights involves determining the probability that any one male or female Academician will be selected. Selecting 100 male Academicians involves a probability of 100 out of 1,700, or 1 of 17 (1/17). The probability of any one female Academician being selected is 100 out of 300, or 1 of 3 (1/3). Therefore, each female has a probability of selection more than five times that of any male.
Next, the assigned weights are determined by taking the inverse of these probabilities. The weight for male scores is 17/1 17, and that for females is 3/1 = 3. This means that when the data are analyzed, each males score will be multiplied by 17, and each females score will be multiplied by 3. In any mathematical manipulation of the data, the total of the males scores would be larger than the total of the females scores. Therefore, the proportion of each group is differentiated in the total data set.
Because all Academy members in a group will have the same weight, the average scores for that group will not be affected; however, the relative contribution of these scores to overall data interpretation will be controlled.
Nonprobability Sampling
In the real world of clinical research true random sampling is very difficult to achieve. Time, cost and ethical considerations often prohibit researchers from making the necessary arrangements and securing the necessary clearances, for example, to obtain subjects from other facilities or professional practices to test a hypothesis. Therefore, it is often necessary to use other sampling techniques. These techniques produce nonprobability samples in that the sampling technique is not random (2,5).
With nonprobability sampling it is unlikely that the population selected will have the correct proportions because all members of the population do not have an equal chance of being selected. Therefore, it may not be assumed that the sample fully represents the target, and any statement generalizing the results beyond the actual sample tested must be stated with qualification.
Because the validity of statistical testing methods is based on random selection of subjects, it is important when using nonprobability sampling that random techniques be employed to the maximum extent possible. Five nonprobability sampling techniques have evolved: convenience sampling, consecutive sampling, judgmental sampling, quota sampling and snowball sampling.
Convenience sampling is probably the most commonly used technique in clinical research today (1,2,4,5). With convenience sampling, subjects are selected because of their convenient accessibility to the researcher. These subjects are chosen simply because they are the easiest to obtain for the study. This technique is easy, fast and usually the least expensive and troublesome. The famous sample description of 10 healthy young men is assuredly either 10 male medical, prosthetic/orthotic or therapist students who have volunteered to be subjects for a study. The criticism of this technique is that bias is introduced into the sample. Volunteers always are suspect because they tend to be the healthiest, strongest, fastest, most skilled, etc. (7).They often volunteer because they like to show off or are competitive in nature and like to be tested. Volunteers may not be representative of the larger overall population.
Another common example of a convenience sample occurs when subjects are selected from the clinic, facility or educational institution at which the researcher is employed. Bias is likely to be introduced using this sampling technique because of the methods, styles and preferences of treatment employed at a given facility.
Consecutive sampling is a strict version of convenience sampling where every available subject is selected, i.e., the complete accessible population is studied. This is the best choice of the nonprobability sampling techniques since by studying everybody available, a good representation of the overall population is possible in a reasonable period of time (5).
Even though consecutive sampling does not allow randomization of the original subject pool to be studied, every effort should be made to randomize at all other levels. For example, assume it is desirable to test two different prosthetic feet. Once the study pool of subjects is defined, the assignment of prosthetic feet to subjects should be random. If all subjects will be tested with each of the feet, the order of testing should be randomized to remove as much bias as possible in the testing procedures.
If every subject is tested wearing foot A first and foot B second, foot B may prove to be the best foot, due to the learning effect. The learning effect gives an advantage to the subsequent items (prosthetic feet in this case) tested because the subjects become more familiar with the procedures and protocol and develop experimental skill. If foot B were truly superior and the testing was not random, its beneficial effect would be vulnerable to challenge because of the learning effect. The subjects become comfortable with the testing procedure with foot A and simply perform better the second time around with foot B. The results and generalizability would be flawed.
Judgmental sampling, also called purposive sampling, is another form of convenience sampling where subjects are handpicked from the accessible population (2). This technique leaves much to be desired because of its inherent bias. Subjects usually are selected using judgmental sampling because the researcher believes that certain subjects are likely to benefit or be more compliant. For example, in the study of prosthetic feet athletic subject amputees might be selected for the more athletic foot because they are more likely than a sedentary or geriatric patient to benefit from that foot.
Quota sampling is a nonprobability technique used to ensure equal representation of subjects in each layer of a stratified sample grouping (2). For example, in the study of the orthotic impact on spasticity using different footplate contour designs, assume there are four different designs, and it is desired that randomization be applied as to which subject gets which footplate to test.
Using Table 3 , one method would be to assign subjects consecutively to footplate designs I to IV for the first four subjects (Round 1). The next round would assign subject 1 to footplate IV, subject 2 to footplate I, subject 3 to footplate II and subject 4 to footplate III, etc. In this manner there are equal numbers of subjects for each insert tested, and bias is managed as long as the subjects are assigned consecutively with no manipulation by anyone familiar or involved with the study. This allows control over the distribution of subjects across test situations and provides some protection from bias even though the original set of subjects was not randomly selected.
Snowball sampling is a technique used to identify potential subjects when appropriate candidates for study are hard to locate (2). For example, if locating an adequate number of amputees becomes difficult, an amputee belonging to a local support group could be recruited to assist in locating subjects willing to participate in a study. In other words, it is possible to have assistance from patients to help identify people with similar disabilities or conditions to assist in identifying subjects for study. This process is known as snowballing or chain referral (2).
Recruitment
Once the decision to use a certain sampling approach has been made, subjects must be recruited. The goal of recruitment is to obtain a sample large enough to enable valid statistical analysis and allow subjects to be selected in such a manner as to avoid bias (4). Errors or problems in either of these areas can be prevented with a research design that employs controls and a carefully planned sampling technique.
The chosen method of recruitment usually is based on the type of study; for example, survey data collected via questionnaire may be obtained by a direct person-to-person interview, telephone or mailed form. Experimental research, such as for the AFO footplate study, requires that subjects be able to commit time and transportation to come to the study site and repeat this effort more than once.
There often is an inverse relationship between the ease of recruitment effort and the success in obtaining data. In survey research, for example, direct personal effort in recruitment often is not employed; the recruitment method frequently is comprised of obtaining a mailing list and submitting questionnaires to the accessible population via the mail. A frequent drawback in this type of recruitment effort is a very low response rate of 50 to 60 percent (7). Another disadvantage is that the researcher loses all control over the actual data gathering. If the low return is anticipated and an adequate number of questionnaires is sent, then problems caused by inadequate data may be avoided; however, this does not help the loss of control.
Alternately, subjects are more difficult to recruit when more effort on their part is requested. For example, when multiple visits are required, such as in the AFO footplate example, recruiting subjects is a bigger chore for the researcher since subjects are asked to travel to the test site and do so on more than one occasion. However, because the researcher applies test conditions directly to the subject, not only is it probable that all necessary data will be obtained, but control over the experiment and the data acquisition is maintained.
Once the accessible population has been defined, every effort should be made to obtain subjects in the manner planned. If a systematic random sampling method has been chosen and a large proportion of the accessible subjects refuses to participate, then a bias error is introduced into the study. In the case of subject refusal, bias is introduced since the reason for their refusal is often universal. For example, several subjects may refuse because the study seems physically too difficult; when this occurs, the researcher is left with only subjects who do not think the effort requested of them is too difficult. This implies that the remaining subjects may be more fit or healthy than those who refused. This is a threat to external validity and affects the researchers ability to generalize the results to the original target population (3).
Recruitment techniques may include personal contact, follow-up phone calls, incentives (such as paying subjects for their time or parking), etc. Some researchers even make home visits to potential subjects to explain the research and its importance; others mail advertising brochures to make participating seem exciting and important.
Language also may present a potential difficulty with recruitment. Therefore, a brochure in the appropriate foreign language or a staff or volunteer who can translate or interpret the foreign language may be required. Subjects may be recruited from the facility in which the researcher works or is familiar, or special efforts may have to be made to contact other similar facilities to engage their permission to approach their patients.
Sometimes advance work can be done to assist the recruitment process once the study is ready to begin. Community groups, such as local churches, YMCA, youth organizations, patient support groups and local business groups such as Kiwanis and Elks, may be contacted for support in identifying potential subjects. Depending on the community impact, these groups may even invite a researcher to address their membership to explain the importance of their project to gain acceptance and willingness to participate.
Summary
The goals of sampling are to decrease time and money costs, to increase the amount of data and detail that can be obtained, and to increase accuracy of data collection by preventing errors.
To accomplish these goals it is necessary to follow these steps:
Clearly define the target population to which the results will be generalized. For example, the AFO footplate study could be targeted to children in the growing years with flexible deformities or to adults with fixed deformities. Very specific inclusion criteria that outline the desired demographic and clinical characteristics of the desired target population are necessary.
An accessible population representative of the target must be defined by additional inclusion criteria with specific characteristics regarding the geographic, social and time frames required for this subpopulation. For example, having transportation available, being English-speaking and not being Christian Scientists could be inclusion criteria. Also, exclusion criteria are developed in this step to avoid any ethical problems and eliminate characteristics that may invalidate the results. For example, if an ethical problem may arise in denying treatment to one of the groups, an exclusion criterion might include excluding anyone already on a treatment protocol for the clinical problem under study.
The sampling process must be defined well ahead of subject selection whether it be a random (probability) or nonrandom (nonprobability) approach, and the researchers must adhere to a specific technique for recruitment appropriate for that approach. The recruitment effort must be vigorous enough to assure a large enough sample to enable statistical validity and must minimize probability of error and bias of selection.
THOMAS R. LUNSFORD, MSE, CO, is director of the orthotic department at the Institute for Rehabilitation and Research in Houston, Texas, and assistant professor of physical medicine and rehabilitation at Bay/or College of Medicine in Houston.
BRENDA RAE LUNSFORD, MS, MAPT is visiting assistant professor at Texas Women Universitys School of Physical Therapy in Houston.