less than 1 minute read


There is a term “Sampling” in statistics. In this page, will see the concept of sampling and how it is adjusted to machine learning.


Sampling means selection of observations to acquire some knowledge of a statistical population. (Wikipedia)

In statistics, when complete enumeration for population is impossible or requires quiet a few resources, than select subset of population to have a knowledge of a population. So it should representative the statistical population well.

Kinds of sampling

  • Ramdom sampling : Literally select samples from population randomly.
  • Ramdom sampling with replacement : It can include same sample for several times
  • Ramdom sampling without replacement : After selection, do not put the sample into the population again. So no chance to select same sample
  • Downsampling : Select less from bigger data set to reduce high-frequency items.
  • Upsampling : Select more from less data set For example, assume there are 6 data as {0,0,0,0,1,1}. When downsampling, takes 2 0s and 2 1s. When upsampling, takes 4 0s and 4 1s.