A population refers to the entire group of individuals, animals, or items about which we want to draw conclusions. It includes every possible observation along the lines of a particular criterion which makes it comprehensive and often very large. For example, if we wanted to study the average height of all adults in a country, the population would consist of every adult in that country.Collecting data from an entire population would be time consuming, logistically challenging, and expensive to implement.
A sample is a subset of the population, selected to represent the population for statistical analysis. A sample is used to make inferences about the population. For instance, instead of measuring the height of every adult in the country, we might measure the height of a few thousand adults chosen randomly from different regions. The sampling methods employed must ensure that the sample collected is representative of the population. You can only draw reliable conclusions about the entire population when the sample is representative.
The following video demonstrates a few examples of how we can draw inferences from representative samples drawn from a population:
Nonrandom sampling is a method where individuals or units are selected for a sample based on specific criteria or convenience rather than randomly. This type of sampling does not give every member of the population an equal chance of being selected. As such, this approach to sampling can potentially introduce bias into the results. Common nonrandom sampling techniques include convenience sampling, where subjects are chosen because they are easily accessible; judgmental sampling, where subjects are selected based on the researcher’s criteria; and quota sampling, where subjects are chosen to meet specific characteristics. While nonrandom sampling can be quicker and less expensive, it may not accurately represent the population, limiting the generalizability of the findings.
Random sampling is a technique used in statistics to select a subset of individuals or observations from a larger population in such a way that each member of the population has an equal chance of being chosen. This method ensures that the sample is representative of the population, minimizing bias, and allows for accurate inferences about the population as a whole. By using random sampling, researchers can generalize findings from the sample to the broader population with greater confidence. The randomness in the selection process helps avoid systematic errors and ensures that the sample reflects the diversity and characteristics of the entire population.
Sampling error refers to the difference between a sample statistic and the corresponding population parameter, caused by the fact that the sample is only a subset of the population. This error decreases as the sample size increases, since larger samples (N ≥ 30) tend to more accurately reflect the population. Normality refers to the assumption that the data follows a normal distribution, which is important for hypothesis testing. The normal distribution assumes that a sample distribution can be represented by a bell curve and that this distribution will be representative of the population. With that mind, we can use the assumption of normality to determine probability values for any data point in the sample by converting any raw score to a z-value. In the following video, you will learn about the normal distribution and how z-scores are calculated and used.
When the population standard deviation is unknown and the sample size is small (N < 30), the Student’s t-distribution is used instead of the normal distribution to estimate population parameters, accounting for increased uncertainty due to smaller sample sizes. Instead of calculating a z-score, a t-score is calculated which takes into account the standard error of the mean (SEM). A small SEM indicates that the sample mean is likely to be closer to the population mean. A large SEM suggests more variability in the sample means, implying that the sample mean may be further from the population mean. The t-score value is calculated from a sample mean and sample standard deviation. It is used to assess the significance for a sample mean relative to a hypothesized population mean.
t = (sample mean - hypothesized population mean)/(sample standard deviation/√N)
In hypothesis testing, t-scores are often used instead of z-scores when the population standard deviation is unknown. In these cases, the sample standard deviation is used as an estimate. However, sample standard deviations can vary more than population standard deviations, especially with small sample sizes. The t-distribution is designed to account for the additional uncertainty introduced by using the sample standard deviation as an estimate. It has heavier tails than the standard normal distribution, which reflects the increased variability associated with estimating the population standard deviation. As such, t-scores are generally more robust than z-scores, especially with smaller sample sizes. The following video covers the basic concepts of hypothesis testing and demonstrates the usage of t-scores in the process.
© Unity Environmental University 2025. “America’s Environmental University.™”