Release Notes and Deprecations

Versions History

Release — 2023/12

  1. New field: datasets.categories.rcdc_v2023 (structure).
    Description: Versioned category Research, Condition, and Disease Categorization (model version 2023).
  2. Removed field: datasets.categories.rcdc_v2020.

Release — 2023/11

  1. New field: patents.application_reference_id (string).
    Description: The identifier of the referenced patent application.
  2. New field: patents.document_category (string).
    Description: Present for US only. Categories are derived from kind codes and include: Defensive, Design, Plant, Reissue, Statutory, Utility.
  3. New field: patents.inventors (array).
    Description: List of details of the people who invented the patent.
  4. New field: patents.categories.rcdc_v2023 (structure).
    Description: Versioned category Research, Condition, and Disease Categorization (model version 2023).
  5. Modification: auxiliary structure Assignee extended with new fields.
    Description: Original and current assignee structures extended to include: city, country_code, state and street (where available).
  6. Removed field: patents.categories.rcdc_v2020.
  7. New field: publications.pubmed.keywords (array).
    Description: Keywords for the PubMed publication, if present.
  8. New field: publications.source (structure).
    Description: The source title this publication belongs to, this may be a journal, a book series or a preprint platform.

Release — 2023/10

  1. New field: publications.categories.rcdc_v2023 (structure).
    Description: Versioned category Research, Condition, and Disease Categorization (model version 2023).
  2. New field: publications.categories.uoa_v2023 (structure).
    Description: Versioned category Units of Assessment (model version 2023).
  3. New field: publications.pubmed (structure).
    Description: PubMed related details of a publication (MeSH terms, publication types etc.).
    Note: this new structure includes mesh and publication_types.
  4. New field: publications.pubmed.publication_types (array).
    Description: PubMed publication types, if known.
  5. New field: publications.pubmed.mesh.ui (array).
    Description: The MeSH unique identifiers, if known, including unique identifiers of subheadings.
  6. Deprecated field: publications.mesh_terms (please use publications.pubmed.mesh.terms).
  7. Deprecated field: publications.mesh_headings (please use publications.pubmed.mesh.headings).
  8. Removed field: publications.categories.rcdc_v2020.
  9. Removed field: publications.categories.for_v2020.
  10. Removed field: publications.document_type.versions.
  11. Process change: the field publications.journal is now populated with details from the associated source title regardless of the publication’s output type.

Release — 2023/09

  1. Removed field: reports.categories.rcdc_v2020.
  2. New field: reports.categories.rcdc_v2023 (structure).
    Description: Versioned category Research, Condition, and Disease Categorization (model version 2023).
  3. New field: reports.concepts (array).
    Description: Concepts describing the main topics of the report (note: automatically derived from the text using machine learning).

Release — 2023/08

  1. New field: policy_documents.categories.sdg_v2021 (structure).
    Description: Versioned category SDG - Sustainable Development Goals (model version 2021).

Release — 2023/07

  1. New field: grants.activity_code (string).
    Description: The NIH activity code for the grant, as provided by the source.
  2. New field: policy_documents.categories.rcdc_v2023 (structure).
    Description: Versioned category Research, Condition, and Disease Categorization (model version 2023).
  3. Removed field: policy_documents.categories.rcdc_v2020.
  4. Deprecated field: clinical_trials.category_uoa.
  5. Deprecated field: clinical_trials.categories.uoa_v1.
  6. Deprecated field: datasets.category_uoa.
  7. Deprecated field: datasets.categories.uoa_v1.
  8. Deprecated field: patents.category_uoa.
  9. Deprecated field: patents.categories.uoa_v1.
  10. Deprecated field: policy_documents.category_uoa.
  11. Deprecated field: policy_documents.categories.uoa_v1.

Release — 2023/06

  1. New field: patents.figures_amount (integer).
    Description: The number of figures provided with a patent.
  2. New field: clinical_trials.primary_completion_date (string).
    Description: Primary completion date provided by trial, year-month or just year.
  3. New field: clinical_trials.primary_completion_year (integer).
    Description: Primary completion year provided by trial.
  4. New field: clinical_trials.categories.rcdc_v2023 (structure).
    Description: Versioned category Research, Condition, and Disease Categorization (model version 2023).
  5. Deprecated field: clinical_trials.categories.rcdc_v2020.

Release — 2023/05

  1. New field: patents.claims_amount (integer).
    Description: The number of claims associated with a patent.

Release — 2023/04

  1. New field: grants.project_numbers (array).
    Description: Grant identifiers, as provided by the source (e.g., funder, aggregator) the grant was derived from.
  2. Deprecated field: grants.grant_number with replacement field grants.project_numbers.
  3. Removed field: grants.categories.rcdc_v2020.

Release — 2023/03

  1. New source: org_groups
    Description: The organisation groups data source is a curated set of research and funder groups (such as NSF - National Science Foundation, US Federal Funders).

Release — 2023/01

  1. New field: clinical_trials.secondary_ids (array).
    Description: Identifiers, other than parent registry identification number, assigned to the study such as grant, contract numbers or an identification number assigned by another registry.
  2. New field: clinical_trials.organisation_details.type (string).
    Description: The type of organisation associated with the clinical trial (sponsor, collaborator, facility or affiliated).
  3. Modified field: patents.orange_book (array).
    Note: The structure of this field has changed to better reflect the layout of the source data.

Release — 2022/10

  1. New field: datasets.categories.sdg_v2021 (structure).
    Description: Versioned category SDG - Sustainable Development Goals (model version 2021).
  2. New field: datasets.concepts (array).
    Description: Concepts describing the main topics of a data set (note: automatically derived from the dataset text using machine learning).
  3. New field: grants.categories.sdg_v2021 (structure).
    Description: Versioned category SDG - Sustainable Development Goals (model version 2021).
  4. New field: patents.concepts (array).
    Description: Concepts describing the main topics of a patent (note: automatically derived from the patent text using machine learning).
  5. New field: patents.orange_book (structure).
    Description: Data added from the FDA Orange Book Data Files.
  6. New field: publications.categories.sdg_v2021 (structure).
    Description: Versioned category SDG - Sustainable Development Goals (model version 2021).
  7. New field: reports.categories.sdg_v2021 (structure).
    Description: Versioned category SDG - Sustainable Development Goals (model version 2021).
  8. New field: *.categories.for_2020_v2022 (structure).
    Description: Versioned category ANZSRC Field of Research 2020 (model version 2022).

Note

The versioned category fields – categories.bra_v1, categories.for_v1, categories.for_v2020, categories.hrcs_hc_v1, categories.hrcs_rac_v1, categories.icrp_cso_v1, categories.icrp_ct_v1 and categories.sdg_v1 have been deprecated in this release for all primary data source types – clinical trials, datasets, grants, patents, policy documents, publications and reports . The replacement versioned categories are categories.bra_v2020, categories.for_2020_v2022, categories.hrcs_hc_v2020, categories.hrcs_rac_v2020, categories.icrp_cso_v2020, categories.icrp_ct_v2020 and categories.sdg_v2021. Please see the documentation for the individual source type for details. These fields still remain in the data schema, but it we recommend changing any existing usage of these fields to instead use the provided replacement field.

Usage of the top-level category fields such as category_for or category_uoa is recommended unless a specific versioned category model is required.

Release — 2022/08

  1. New field: clinical_trials.study_type (string).
    Description: The type of study associated to the clinical trial e.g. ‘Interventional’, ‘Observational’, etc.
  2. New field: clinical_trials.study_arms (array).
    Description: The groups or subgroups of participants that receives specific a set of interventions/treatments, or no intervention, according to the trial’s protocol.
  3. New field: clinical_trials.study_designs (array).
    Description: Details regarding the design of a clinical trial.
  4. New field: clinical_trials.study_outcome_measures (array).
    Description: List of measures of outcome associated to the clinical trial.
  5. New field: clinical_trials.study_participants (integer).
    Description: The actual or estimated target number of participants.
  6. New field: clinical_trials.study_eligibility_criteria (string).
    Description: Criteria for participation in the trial.
  7. New field: clinical_trials.study_minimum_age (string).
    Description: Minimum age eligible for the study.
  8. New field: clinical_trials.study_maximum_age (string).
    Description: Maximum age eligible for the study.
  9. New field: publications.copyright_statement (string).
    Description: The copyright status as found in the source document.
  10. New field: publications.document_type (structure).
    Description: The automated document type classification.
  11. New field: publications.funding_section (structure).
    Description: The funding section text as found in the source document.
  12. New field: publications.repository_dois (array).
    Description: Identified DOIs based on DOI prefixes from Figshare and DataCite.
  13. New field: *.categories.bra_v2020 (structure).
    Description: Versioned category BRA (model version 2020).
  14. New field: *.categories.for_v2020 (structure).
    Description: Versioned category ANZSRC Field of Research 2008 (model version 2020).
  15. New field: *.categories.hrcs_hc_v2020 (structure).
    Description: Versioned category Health Research Classification System (HRCS) – Health Categories (model version 2020).
  16. New field: *.categories.hrcs_rac_v2020 (structure).
    Description: Versioned category Health Research Classification System (HRCS) – Research Activity Codes (model version 2020).
  17. New field: *.categories.icrp_cso_v2020 (structure).
    Description: Versioned category International Cancer Research Partnership – Common Scientific Outline (CSO) (model version 2020).
  18. New field: *.categories.icrp_ct_v2020 (structure).
    Description: Versioned category International Cancer Research Partnership – Cancer Types (CT) (model version 2020).
  19. New field: *.categories.rcdc_v2020 (structure).
    Description: Versioned category Research, Condition, and Disease Categorization (RCDC) System (model version 2020).

Note

The v2020 categories detailed above (items 13 through 19) are for all primary data source types – clinical trials, datasets, grants, patents, policy documents, publications and reports. Top level categories such as category_for, will contain the current model version matching the version used within the DSL API and Dimensions web application.

Release — 2022/03

  1. New source: source_titles
    Description: The Source Titles table is a curated database of publications ‘containers’, for example journals, preprint servers, book series and others.
  2. New field: publications.source_id (string).
    Description: The source title for the publication.
  3. New field: publications.authors.orcid (array).
    Description: ORCiD identifiers identified for the author.
  4. New field: grants.funding_cny (float).
    Description: Funding amount awarded in CNY.

Release — 2021/07

  1. New source: policy_documents
    Description: The policy document data includes sources that are designed to change or otherwise influence guidelines, policy or practice. Tracked policy sources and document types range from government guidelines, reports or white papers; independent policy institute publications; advisory committees on specific topics; research institutes; and international development organisations.
  2. New field: clinical_trials.concepts (array).
    Description: Concepts describing the main topics of a clinical trial (note: automatically derived from the publication text using machine learning).
  3. New field: clinical_trials.date_inserted (timestamp).
    Description: Date when the record was inserted into Dimensions.
  4. New field: clinical_trials.interventions (array).
    Description: Information about the clinical trial’s interventions according to the research plan or protocol created by the investigators.
  5. New field: clinical_trials.mesh_terms (array).
    Description: Medical Subject Heading terms as used in PubMed.
  6. New field grants.concepts (array).
    Description: Concepts describing the main topics of a grant (note: automatically derived from the grant text using machine learning).
  7. New field: grants.funder_org_acronyms (array).
    Description: List of acronyms for the funding organisation.
  8. New field grants.language (string).
    Description: Grant original language, as ISO 639-1 language codes.
  9. New field grants.original_abstract (structure).
    Description: Abstract of the grant in its original language.
  10. New field: grants.original_title (structure).
    Description: Title of the grant in its original language.

Release — 2021/06

  1. New field publications.arxiv_id (string).
    Description: The publications arXiv identifier (e.g. ‘arXiv:2102.02477’).
  2. New field publications.parent_id (string).
    Description: For supported publication type, the Dimensions publication identifier of a record considered to be the parent item of this publication (ie. parent book of a book chapter).
  3. New field: datasets.license.name (string).
    Description: The dataset licence name, e.g. ‘CC BY 4.0’.
  4. New field: datasets.license.url (string).
    Description: The dataset licence URL, e.g. ‘https://creativecommons.org/licenses/by/4.0/’.

Release — 2021/05

  1. New field: grants.funder_org_cities (array).
    Description: List of cities linked to the organisation funding the grant, expressed as GeoNames codes.
  2. New field: grants.funder_org_countries (array).
    Description: List of countries linked to the organisation funding the grant, expressed as GeoNames codes.
  3. New field: grants.funder_org_state_codes (array).
    Description: List of state codes linked to the organisation funding the grant, expressed as GeoName codes (ISO‌-3166-2).
  4. New field: grants.research_org_cities (array).
    Description: List of cities of the research organisations receiving the grant, expressed as GeoNames identifiers
  5. New field: grants.research_org_countries (array).
    Description: List of countries of the research organisations receiving the grant, expressed as GeoName codes.
  6. New field: grants.research_org_state_codes (array).
    Description: List of state codes of the organisations receiving the grant, expressed as GeoName codes (ISO‌-3166-2).
  7. New field: datasets.repository_name (string).
    Description: The name of the repository of the dataset.

Release — 2021/04

  1. New source: reports
    Description: A technical report is a document that describes the process, progress, or results of technical or scientific research or the state of a technical or scientific research problem. It might also include recommendations and conclusions of the research.
  2. New field: patents.assignee_countries (array).
    Description: List of countries of the assignees of the patent, expressed as GeoNames codes (note: this is the combination of the current_assignee_countries and original_assignee_countries fields).
  3. New field: patents.current_assignee_countries (array).
    Description: List of countries of the assignees who currently own the rights of the patent, expressed as GeoNames codes (note: this value is extracted independently from GRID).
  4. New field: patents.original_assignee_countries (array).
    Description: List of countries of the assignees who have owned the rights of the patent, expressed as GeoNames codes (note: this value is extracted independently from GRID).
  5. New field: patents.current_assignee.country.
    Description: Country of the assignee (note: this value can be extracted independently from GRID).
  6. New field: patents.original_assignee.country.
    Description: Country of the assignee (note: this value can be extracted independently from GRID).

Release — 2021/03

  1. New field: publications.open_access_categories_v2.
    Description: Implementing recategorization of Open Access across Dimensions services. This is to align more closely with Unpaywall. Details here.
    Note: the old field publications.open_access_categories is being deprecated and should no longer be used.
    Details of deprecated publication fields can be found here.
  2. Deprecated field: publications.open_access_categories
  3. New field publications.subtitles (array).
    Description: Publication subtitle, when available.
  4. New field publications.date_online
    Description: The publication online date.
  5. New field publications.date_print
    Description: The publication print date.
  6. New field publications.date_normal (date)
    Description: The publication date for the version of record of a document, normalised for partial dates which do not include a month/day value (ie. ‘2020’ converts to ‘2020-01-01’).
  7. Feature: Documentation – schema descriptions can now be found within the BigQuery Web UI (see publications as an example), as well as in our documentation.

Release — 2020/09

  1. Initial release

Deprecations History

Release — 2023/04

  1. Deprecated field: grants.grant_number with replacement field grants.project_numbers.

Release — 2022/11

  1. Deprecated field: policy_documents.category_sdg with no replacement.
  2. Deprecated field: policy_documents.categories.sdg_v1 with no replacement.

Release — 2022/10

  1. Deprecated field: *.categories.bra_v1 with replacement field *.categories.bra_v2020.
  2. Deprecated field: *.categories.for_v1 with replacement field *.categories.for_2020_v2022.
  3. Deprecated field: *.categories.for_v2020 with replacement field *.categories.for_2020_v2022.
  4. Deprecated field: *.categories.hrcs_hc_v1 with replacement field *.categories.hrcs_hc_v2020.
  5. Deprecated field: *.categories.hrcs_rac_v1 with replacement field *.categories.hrcs_rac_v2020.
  6. Deprecated field: *.categories.icrp_cso_v1 with replacement field *.categories.icrp_cso_v2020.
  7. Deprecated field: *.categories.icrp_ct_v1 with replacement field *.categories.icrp_ct_v2020.
  8. Deprecated field: *.categories.sdg_v1 with replacement field *.categories.sdg_v2021.
  9. Deprecated field: patents.category_sdg with no replacement.
  10. Deprecated field: patents.categories.sdg_v1 with no replacement.

Release — 2021/03

  1. Deprecated field: publications.open_access_categories with replacement of field publications.open_access_categories_v2.