HSN Data Sampling

16 October 2018 - 15:59

The sampling of the HSN is based on tables with the official number of births for each year and each municipality. As a rule the whole period under investigation (1812-1922) is stratified into cohorts of ten years (except the first cohort, which contains the eleven-year period 1812-1822). 

These periods correspond to the administrative order of the certificates itself. Second, the sample is stratified according to the type of residence. In the case of Noord-Holland e.g. this means that a division was made between Amsterdam, the other important cities Haarlem, Alkmaar, Hoorn and Zaandam and the rest of the province (countryside).

The number of births for each municipality is based on two sources. For the period 1811 to 1850 the numbers of births were provided by the so-called Hofstee database, which can be found at the Netherlands Interdisciplinary Demographic Institute (NIDI). For the period 1851 to 1922 data were used from the Historical Ecological Database (HED, Department of Social Geography, University of Amsterdam). However, for several reasons this sample design does not fit exactly with the number of certificates. In the city of Utrecht for example, more children were born and given a certificate of birth than were officially counted as newborn children in that city. Among the reasons for this was the fact that people from the countryside only came to city hospitals for delivery and then went back to their villages or simply disappeared.

Because of the disagreement between the number of births and the number of certificates the official number of births was systematically enlarged by ten percent to give every certificate of birth an equal chance to get sampled.

The basic sample of the HSN originally consisted of a half percent of the birth certificates. This led to rather unequal numbers per cohort which survived to the age of twenty. Confronted with this result of the pilot project of Utrecht, it was decided to shift to a more sophisticated design and to differentiate the sample ratios according to the period as follows: 1812-1872 0.75% and 1873-1922 0.5%. In total about 85,500 births are sampled.

IISH Research | Structured Data | HSN