![]() ![]() This method is often used when the different strata in the population have different rates of a certain outcome, and the researcher wants to be able to compare the rates between strata. ![]() Disproportionate Stratified Random Samplingĭisproportionate stratified random sampling is a method of selecting a sample from a population where the population is divided into strata, and each stratum is sampled at a rate proportional to its size. Within each stratum, individuals are then chosen at random to be included in the sample. The number of people in each stratum is then proportional to the size of the overall population. With proportionate stratified random sampling, the population is first divided into strata, or groups, based on some shared characteristic. This type of sampling is often used in situations where the population is spread out over a large geographic area or when the members of the population are not easy to identify. Proportionate Stratified Random Sampling is a method of sampling that is used when researchers want to study a population that is not easily accessible. Disproportionate Stratified Random Sampling.Proportionate Stratified Random Sampling.There are two types of Stratified Random Sampling: Stratified random sampling can also be used to oversample or undersample specific subgroups within the population Types of Stratified Random Sampling This would ensure that all four grades were represented in the sample. We could stratify the population by grade level (9th, 10th, 11th, 12th) and then draw a random sample from each stratum. This type of sampling is used when it is important to ensure that each stratum in the population is represented in the sample.įor example, suppose we want to study the reading habits of high school students. If we go ahead and train our model on the sample data which has the wrong proportions it is likely that the model will be over-fitted to the training data and it is also likely that when we run the model against real-world or testing data that is in the right proportions it will underperform.Stratified random sampling is a type of probability sampling in which the population is first divided into strata and then a random sample is drawn from each stratum. In machine learning algorithms this can cause problems down the line. If our sample data has 70% male undergraduates it will not represent the population. Female graduate students = 15% of the population.Male graduate students = 20% of the population.Female undergraduates = 20% of the population.Male undergraduates = 45% of the population.The example in this article shows a combination of two factors as follows. To make matters more complex, it might be that there are multiple feature columns involved. This will involve resampling the sample data so that the proportions match the population (see for more information). If we can establish that the sample data should better reflect the population then we can “stratify” the data. Perhaps the marketing team accidentally hit more males with their marketing campaign causing an imbalance. One possibility is that the data collection method might have been flawed. There could be many explanations for our 60% male sample data. In the real world the UK general population is closer to 49.4% male and 50.6% female (source: ) and certainly not 60% / 40%. For example, lets assume that the data science team were given survey data and we noticed that the survey respondents were 60% male and 40% female. Sometimes the sample data that data scientists are given does not fit what we know about the wider population data. Photo by Charles Deluvio on Unsplash Introduction ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |