Data Sources

This section provides information about each source currently supported by the DSL (version 2.10.0) in its default configuration, listing each field available for that source with details about the field value type, including whether it may be used as a facet and whether it is a multi-value field. The page also lists the fields included in each fieldset available for a given source, and each aggregation indicator available for each source.

There are eight primary content types that can be queried via the API:

Two additional sources consist of curated lists of disambiguated researchers and organizations. These objects are referenced in all of the content types above.

The figure below provides an overview of the main document sources and their respective cross links.

cross source links - document sources

The figure below provides an overview of the main relatioships between document sources and researchers / organizations.

cross source links - documents and researchers-organizations

Finally, a number of auxiliary entities exists, which cannot be queried directly but are available as structured fields or facets attached to the primary sources. The next

Auxiliary Entities

Auxiliary entities are subsidiary objects that cannot be queried directly, but are available as structured data attached to primary sources via fields or facets.

States

States Literal Fields

Field

Literal Type

Multi?

Filter? Entity Filter?

Description

name

string

GeoNames state name (ISO‌-3166-2).

id

string

GeoNames state code (ISO‌-3166-2). E.g., ‘US.CA’ for geonames:5332921 .

States Fieldsets

Fieldset

Fields

basics

id name

Repositories

Repositories Literal Fields

Field

Literal Type

Multi?

Filter? Entity Filter?

Description

name

string

The name of the repository of the dataset.

id

string

Dimensions ID of the repository.

Repositories Fieldsets

Fieldset

Fields

basics

id name

Open Access

Open Access Literal Fields

Field

Literal Type

Multi?

Filter? Entity Filter?

Description

name

string

Name of the open access category. E.g., ‘Closed’ or ‘Pure Gold’.

id

string

Dimensions ID of the open access category. E.g., one of ‘closed’, ‘oa_all’, ‘gold_bronze’, ‘gold_pure’, ‘green_sub’, ‘gold_hybrid’, ‘green_pub’, ‘green_acc’. (see also the Publications field open_access).

description

string

Description of the open access category.

Open Access Fieldsets

Fieldset

Fields

basics

description id name

Journals

Journals Literal Fields

Field

Literal Type

Multi?

Filter? Entity Filter?

Description

title

string

Title of a journal publication. E.g. ‘Nature’ or ‘The Lancet’

id

string

Dimensions ID of a journal. E.g., jour.1016355 or jour.1077219 .

Journals Fieldsets

Fieldset

Fields

basics

id title

Countries

Countries Literal Fields

Field

Literal Type

Multi?

Filter? Entity Filter?

Description

name

string

GeoNames country name.

id

string

GeoNames country code (eg ‘US’ for geonames:6252001 )

Countries Fieldsets

Fieldset

Fields

basics

id name

Cities

Cities Literal Fields

Field

Literal Type

Multi?

Filter? Entity Filter?

Description

name

string

GeoNames city name.

id

string

GeoNames city ID (eg ‘5391811’ for geonames:5391811 )

Cities Fieldsets

Fieldset

Fields

basics

id name

Categories

Categories Literal Fields

Field

Literal Type

Multi?

Filter? Entity Filter?

Description

name

string

Name of the category. Note: this may include an identifier from the original source. E.g., ‘2.1 Biological and endogenous factors’ (HRCS_RAC codes) or ‘1103 Clinical Sciences’ (FOR codes).

id

string

Dimensions ID of the category.

Categories Fieldsets

Fieldset

Fields

basics

id name

Literal Field Types

This section provides information about the most basic field types currently supported by the DSL (version 2.10.0).

Field type

Description

date

String that represents a precise date. Format is "YYYY-MM-DD", if MM or DD are omitted, they are considered to be 01 automatically.

float

Floating decimal point number

integer

Whole number

string

Short piece of text a few characters/words long, e.g. journal issue

json

Nested unstructured data

Metadata API

The DSL exposes programmatically metadata, such as supported sources and entities, along with their fields, facets, fieldsets, metrics and search fields.

This is available via the describe command. This can be used in the following ways:

  • describe - returns list of all sources and entities

  • describe version - returns information about the DSL version and release

  • describe source <source_name> - returns information about fields, fieldsets, metrics and search_fields of a source

  • describe entity <entity_name> - returns information about fields and fieldsets of an entity

  • describe schema - returns information about all sources and all entities at once

  • describe service <service_name> - returns information about a service. Supported services: grid

For example, the following query will retrieve metadata about all the fields relevant to the publications source.

describe source publications

The metadata is returned as JSON.