top of page

Data

The datasets produced for the  project are available for free download from GESIS Data Archive.

The download link includes the following datasets and relevant materials:

  • Matrices of the number of collected dates per decade by country (.zip)

  • Matrices of Jenson-Shannon divergence measure by countries (computed based on years) (.zip)

  • Matrices of z-scores (by decade) with temporal focal points of all countries (by region) (.zip)

Evaluation:

  • Final error rates for each decade and language edition ( .csv)

 

Code:

  • Extraction of 4-digit numbers from the main text of multilingual Wikipedia articles (Python)

Please cite both the paper and the dataset when using these data.

 

 

 

Data collection. We show parts of the article on Portuguese history and one of its outlinks, as they appear in Slovenian Wikipedia in 2016. We collect all 4-digit numbers from the main text of the article and all its outlinks, and analyse the resulting distribution (bottom part of the figure).

We study articles related to the history of all UN member states and compare them in 30 language editions. For each of the 193 countries we locate an article in the English edition of Wikipedia, titled 'History of X', where X is the country name. Using Wikipedia's inter-language links, we retrieve other language versions of the article from sister editions.

 

We limit the analysis to 30 largest Wikipedia editions (more than 125,000 articles in July 2016 ), providing these languages are native to Europe. By applying this setup we avoid issues connected with extraction and alignment of dates from the languages using different calendars and alphabet systems.

Retrieving dates
Data and materials availability
bottom of page