Downloads

Due to licensing restrictions, we cannot provide the raw data. However, we provide estimates obtained using the methods that we developed, and made available with scientific publications for reproducibility purposes. The latest release below includes country-level and subnational datasets for both Scopus and OpenAlex.

Current Release: 2026 V2

Description Area Database File Name
Yearly measures by gender and field Country OpenAlex openalex_scholarlymigration_country_gender_and_field.parquet
Yearly bilateral flows by gender and field Country OpenAlex openalex_scholarlymigration_country_flows_gender_and_field.parquet
Yearly measures by gender and field Subnational OpenAlex openalex_scholarlymigration_subnational_gender_and_field.parquet
Yearly bilateral flows by gender and field Subnational OpenAlex openalex_scholarlymigration_subnational_flows_gender_and_field.parquet
Yearly measures by gender and field Country Scopus scopus_scholarlymigration_country_gender_and_field.parquet
Yearly bilateral flows by gender and field Country Scopus scopus_scholarlymigration_country_flows_gender_and_field.parquet
Yearly measures by gender and field Subnational Scopus scopus_scholarlymigration_subnational_gender_and_field.parquet
Yearly bilateral flows by gender and field Subnational Scopus scopus_scholarlymigration_subnational_flows_gender_and_field.parquet

The data can also be found in the Zenodo repository.

If you download or use this data, please subscribe to our newsletter to receive updates on the project:


Old release: 2024 V1

Description Database Research article File Name
Yearly measures of the number of scholars, emigrations, immigrations, net migration rates and other variables, per country Scopus scopus_2024_V1_scholarlymigration_country_enriched.csv
Yearly bilateral flows of scholars between countries Scopus scopus_2024_V1_scholarlymigration_countryflows_enriched.csv

Working Paper

More information about how the data is produced and processed can be found in our new working paper

Any questions & comments

or problems concerning the access to the data, please write to us.


How is the data produced?

Input data

Currently, our main data source is the Scopus bibliometric database because of its high quality in author name disambiguation. It covers the metadata and abstracts from over 50 million articles from more than 9,000 publishers and over 17 million author profiles.

Via the Max Planck Digital Library, we use the infrastructure of the German Competence Centre for Bibliometrics to generate and download 240 million authorship records from the data. One authorship record is the unique combination of author, publication and affiliation addres.

Screenshot of Paper-headlines

Data processing

We filter out unreliable entries of the Scopus database (please read the Methods and Documentation Working Paper for more information). In the next step, we group the data by year and author:

year author affiliation country
2008 Jane Doe DEU
2008 Jane Doe DEU
2008 Jane Doe FRA
2012 Jane Doe USA

If there is more than one affiliation country in one year, we take the most frequent one. If there is a year without any affiliation country, we fill the time up to two years before a publication with the country in the next available year:

year author inferred residence country
2006 Jane Doe DEU
2007 Jane Doe DEU
2008 Jane Doe DEU
2010 Jane Doe USA
2011 Jane Doe USA
2012 Jane Doe USA

Aggregated by country and year, these are the populations of researchers.

Migration events

If the country of residence changes, this will create a "migration event". In our example we register one migration event. The outmigration country is Germany, the inmigration country is the USA. The year of the migration is the first year with a new residence country: 2010.

year author outmigration country inmigration country
2010 Jane Doe DEU USA

The migration numbers for each country are obtained by aggregating all migration events by country and year. The migration rates are calculated by dividing the migration numbers by the country's population of researchers.


Output Data

Area measures

The data is aggregated by area, gender, field of science and year.

The area can be a country or a subnational administrative unit (e.g. states in the USA). The field can be one of these three categories: `Medical and Health Sciences` (abbreviated as "MhS"), `Humanities' and `Social Sciences' (abbreviated as "HumSS"), `Natural Sciences', `Engineering and Technology', and `Agricultural Sciences' (abbreviated as "AgEnNat") and the all encompassing category "All fields" (abbreviated as "all").

The gender is inferred from the full name of the authors using the Python package Nameprediction and is categorized in three categories: Female ("f"), male ("m"), unknown ("u") and all ("all").

year area population of researchers inmigration total outmigration total netmigration outmigration rate inmigration rate netmigration rate
2016 DEU 131310 4499 4523 -24 0.034 0.034 -0.0002
2016 USA 759857 15296 14250 1046 0.020 0.019 -0.0014


Bilateral flows

The data is aggregated by inmigration area, outmigration area and year. It shows the flows of scholars between all areas with at least one migration event.

year area_from area_to number of migrations
2010 DEU USA 2393