Open Datasets

This page contains information about open datasets provided by the Dimensions team in the BigQuery environment.

These dataset are provided free of charge and allow data analysts working with research data, scientometricians and in general data consumers to carry out advanced investigations by taking advantage of the speed and flexibility of the BigQuery platform.

As with other BigQuery datasets, end users can take advantage of a free tier of 1 TB per month.

ORCID dataset

Release date: Feb 2025
Update frequency: yearly
Documentation: ORCID official schema
License: CC0

This dataset is a SQL representation of the ORCID public data file.

ORCID, which stands for Open Researcher and Contributor ID, is a free, unique, persistent identifier (PID) for individuals to use as they engage in research, scholarship, and innovation activities.

An ORCID record may contain information about a researcher’s work, affiliations, funding, peer review, and more. Items on ORCID records can be broken down into assertions that connect the ORCID iD-holder with an activity or affiliation. These assertions can be added to an ORCID record by the researcher who owns the record, or by systems the researcher has granted permission to do so.

For more information, see also the ORCID FAQs.

COVID19 dataset

Release date: Jan 2021
Update frequency: daily
License: CC-BY-NC

The COVID-19 dataset dataset contains all published articles and preprints, grants, clinical trials, and research datasets from Dimensions.ai that are related to COVID-19.

The dataset can be used as a sandbox environment for the larger dimensions.ai dataset, as well as for Covid-19 research.

The dataset is growing every day. You can see the latest data in our interactive Covid-19 dashboard, which we built using this dataset and Google’s free Data Studio tool.