A data project with Yannick Pengl (ETH Zurich) and Nils-Christian Bormann (University of Essex). All data and the LEDA R-package can be found on the LEDA Github page.
abstract: Social scientists increasingly combine multiple datasets to study ethnicity in Africa. We facilitate these efforts by systematically linking over 8’100 ethnic categories from eleven databases including surveys, geographic data, and expert-coded lists. Exploiting the linguistic tree from the Ethnologue database, we propose a systematic solution to the grouping problem of ethnicity. Novel empirical results on trust in African heads of states highlight the importance of explicitly considering sample inclusion criteria and different ways of linking ethnic categories from multiple datasets. An R-package allows researchers to link ethnic groups from any database with explicit rules and to easily add their own data on ethnic groups.
In total, we link ethnic lists drawn from eleven data sources:
- Afrobarometer Surveys
- All Minorities at Risk (AMAR)
- Census data from IPUMS
- Ethnic Power Relations Dataset
- Ethnologue languages
- Political Relevant Ethnic Groups from Posner (2004)
- Ethnic groups in Francois, Trebbi & Rainer (2015)
- Ethnic groups from Fearon (2003)
- GREG Data (based on the Russian Atlas Miradova)
- Demographic and Health Surveys
- Murdock Atlas
- Spatially Interpolated Data on Ethnicity (SIDE)