Data Source Tables

Within the BigQuery datasets offered from the Dimensions BigQuery project, each table holds a specific entity type. These data sources are independent research outputs that directly map to the data types available from Dimensions (such as publications or datasets). These tables closely mirror the data available from the Dimensions web application and from the Dimensions DSL API.

Each table can hold further references to entities contained within other tables (such as publications to grants) or even references within the same table (i.e. citations/references for publications).

The primary data source tables available (depending on your subscription type) are detailed below:

Update frequency

The frequency of data source updates are detailed below:

Source

Update Frequency

Publications

Daily incremental updates. New base set releases 2-4 time per year.

Grants

Monthly, full base set releases.

Clinical Trials

Daily incremental updates. New base set releases 2-4 time per year.

Datasets

Daily incremental updates. New base set releases 1-2 time per year.

Patents

Weekly incremental updates. New base set releases 1-2 time per year.

Reports

Daily incremental updates.

Researchers

Daily incremental updates.

Organizations (GRID)

As necessary with releases of new versions of GRID.

Table partitions

The data source tables include details about which fields are used for partionining and clustering the Dimensions dataset. This information is useful to create more efficient queries.

With large sets of data in BigQuery tables it is possible to increase the efficiency and reduce the cost of queries by partitioning the data. For a table it is possible to nominate a specific field on which data is grouped and partitioned on, and after which the the cost of queries, if filtered on this partitioning field, can be drastically reduced if the query utilises the partitioning field in filtering statements.

Taking the publications data source table as an example, we partition on the year field. When performing a query, if we apply a filter to reduce the year of publication down to within the range 2010-2015 the cost of the query would reduce by approximately 70%. This is because BigQuery will only need to access the data present within the partitions 2010, 2011, 2012, 2013, 2014 and 2015 as opposed to accessing and reading the data over the entire table (ie. all year ranges).

The documentation we provide for each data source lists all of the fields which are available and the type of data contained within the field. On this same listing we mark the field which is the primary partitioning key. We also indicate which fields which are used for clustering.

Google provides an excellent introduction to partitioned tables here: Introduction to partitioned tables.

See also

If you are using Google BI Engine, see also the FAQ Partitioning and Google BI Engine.