top of page

Historical landscapes:

A millennium of human history through the eyes of Wikipedia enthusiasts: An approach to quantifying historiography

Results

For each of the countries we locate an article in the English edition of Wikipedia, titled 'History of X', where X is the country name. Using Wikipedia's inter-language links, we retrieve other language versions of the article from sister editions, article texts, and the texts of all Wikipedia articles to which these pages link. Our unit of analysis is the mentions of year numbers in the article text. We parse all texts to extract all 4-digit numbers in the range between 1000 and 2016 (we refer to these as dates). Overall, we have retrieved approximately 17M dates from 773,121 articles in 30 language editions of Wikipedia. We discover:

  • RQ1: strong recency bias across countries and entire language editions - most retrieved dates belong to the recent decades, while those before 1500 are very sparse.

  • RQ2: evidence of Eurocentric bias - a multitude of focal points distributed through entire timelines of European countries, while we see much fewer highlights in pre-Columbian Americas and Oceania.

  • RQ3: high inter-lingual consensus across the examined editions in describing histories of individual countries.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Writing about history -- historiography -- is important in all social groups. Establishing some consensus on relevant historical periods and events provides a feeling of roots, and is at the core of building identities -- for individuals, groups, or nations. Each description inherently presents a unique viewpoint on past events, and it might be partial and disputable. Quantifying such differences is a challenging task.

Fortunately, Wikipedia's open and digital nature allows for thorough quantitative analysis of historical narratives, even across a large number of languages -- something which is not a typical case for other historiographical sources, such as printed encyclopedias or history textbooks.

 

Research Questions

We look into descriptions of national histories of 193 UN member states in 30 large Wikipedia language editions,

compute their timelines over the last 1,000 years, and detect the periods of highest importance for each of the countries across language editions. In particular, we ask:

RQ1: What are the most documented periods of history of the last 1,000 years in Wikipedia?

RQ2: What are the temporal focal points in descriptions of national histories in Wikipedia?

RQ3: Are country timelines consistent across language editions?

Results in detail

Conclusion

The observed `peaks' and `lows' of interest to certain time periods, as well as cross-lingual differences in national timelines, might have different explanations. If these dissimilarities are intentional, they might be a reflection of cultural differences, and in this case, our results could be interesting for historians and culture scholars who might wish to explore the topic in greater detail and with other methods.

If these differences are accidental or could be reduced to `missing data', our findings could be actionable for the Wikimedia community and enthusiastic editors who wish to improve the quality of the articles in various language editions.


In any case, our results show that Wikipedia's historical reference articles are not free from gaps and biases. We hope that History teachers and students, as well as lay readers who use Wikipedia to enrich their knowledge about world history, would benefit from this awareness.

Full paper for download  [.pdf]   [data]

Paper in a nutshell (slides)   [.pdf]

 

How to cite:

@InProceedings{Samoilenko2017History,
    author = {Samoilenko, Anna and Lemmerich, Florian and Weller, Katrin and Zens, Maria and Strohmaier, Markus},
    title = {Analysing Timelines of National Histories across Wikipedia Editions: A Comparative Computational Approach},
    booktitle = {Proceedings of the Eleventh International AAAI Conference on Web an Social Media (ICWSM 2017)},
    pages = {210-219},

    year = {2017},

    eventdate = {15-18 May},

    location = {Montreal, Canada}
}

 

bottom of page