Frequently Asked Questions

The contents of this section are organized into three sections: API access and support, queries and errors questions, data model questions. Please contact us if you cannot find an answer to your problem.

1. API Access & Support

What is my `<your-url-dimensions.ai>`?

Unless explicitly specified, please use https://app.dimensions.ai.

Can I access the DSL API using my ReadCube account?

Unfortunately no, this is currently not possible.

How can I get support?

Please send an email to supportapi@dimensions.ai. Please provide as much information about your problem as possible, mainly:

What are you trying to do and what problem are you facing.
What is the exact error message, full body and headers.
Describe how can we reproduce the problem.
Any additional details, such as possible network proxies, firewalls, any additional details that might be relevant for reproducing the issue.

Where can I find out more examples?

You can explore real-world applications of the API at the Dimensions API Lab , an open-source repository of Jupyter notebooks demonstrating how to carry out common scholarly analytics tasks e.g. building a citation network, doing a journal competitors analysis, tracking researchers’ affiliations over time etc..

2. Queries and Errors

Which are the API error codes ?

HTTP Status Code	Explanation
400	Semantic/Query Error: the DSL query is not valid.
401	Authentication Failure, Token Expiration: the token has expired or is invalid.
500	Evaluation/Data/Timeout Error: the query could not be evaluated.

How often do you change the API?

Changes are introduced with each new minor release, roughly every two weeks or so.

We strive to make the API always backward-compatible by never introducing changes to the query syntax or the data model that will alter significantly their behaviour. E.g. in the case of data fields, when improved / newer versions of a data point introduce breaking changes (e.g. a single-value field becomes a multi-value one), the old fields are deprecated (never deleted) and a brand new field gets introduced instead.

When a new major release happens, all deprecated fields get removed from the newest versions and maintaned only in the legacy version.

What is the proper `Content-Type` and character encoding information?

The DSL does not expect any specific content type or encoding information. UTF-8 and JSON is always assumed, with the exception of /dsl.json, where the requests are DSL queries, not JSON input. Similarly, the DSL always returns valid JSON objects, encoded in UTF-8.

Are there any guidelines regarding query length/complexity?

To achieve optimal query performance, we recommend to stick to the following general principles:

Number of items used in in filter clause: up to 400, example: search publications where id in [...]
Number of boolean filter conditions: up to 100, example: search publications where field1 = value and field2 = value
Number of boolean full text clauses: up to 100, example: search publications in authors for "\"Alan Turing\" OR \"Stephen Hawking\""

Why do I get the error “Query is too long or complex” ?

This error usually happens in these circumstances:

The query contains too many filters, such as too many and/or conditions. This could also occur when using in filter with too many values.
The query returns too much data. This happens often because one of the records in your batch contains more data than usual (e.g. a publications with hundreds of authors/affiliations). Try reducing the number of records returned via the limit operator, or identify the records(s) causing the error so to deal with them separately.
The query includes an entity filter (e.g. researchers.name) that returns too many results. In this case the user should try to make entity filters more specific. For instance, instead of filtering publications by researcher’s first/last name, it is better to either use authors search (ie search publications in authors for "<first name> <last name>"), or run a query to researchers source first (ie search researchers for "\"<first name> <last name>\"") to obtain the researcher ID and then filter on publication using this ID instead.

Why am I getting a lot of warnings about deprecated fields?

If you are using a deprecated field, you should consider updating your code as soon as possible.

A special case is if your queries make use of fieldset like all or extras. These fieldsets return all fields, also the deprecated ones, so to avoid breaking legacy code and be backward-compatible; hence the warnings. In general when writing code used in integrations or long-standing extraction scripts it is best to return specific fields rather that a predefined set. This has also the advantage of making queries faster by avoiding the extraction of unnecessary data.

Why am I getting the “Something went wrong” bubble after running a query in the user interface?

This error is usually raised when the query is either too long (ie. has too many or clauses in the filter) or it returns too much data (e.g. when using the all fieldset). Try to rewrite the query using different filters, for example using in operator. Also, try returning less records.

Why am I getting HTTP 500 Internal Server Error when using API access?

This error is usually raised when the query is either too long (ie. has too many or clauses in the filter) or it returns too much data (e.g. when using the all fieldset). Try to rewrite the query using different filters, for example using in operator. Also, try returning less records.

Getting a warning saying ‘Using more than one UNNEST may lead to long response times and timeouts.’

This warning is added to results of queries which contain multiple unnest expressions in the specification of returned fields, ie. return publications[unnest(research_orgs) + unnest(researchers)]. These types of queries, depending on the heaviness of the underliyng data might lead to excessive result sets and even result in query timeouts. It is recommended to use a smaller page size, by using limit to a lower limit when this is necessary.

Why do boolean queries behave strangely in some cases?

The Dimensions API is powered by a Lucene-based search engine and as such inherits certain behaviours from it. As a consequence, the way we interpret queries containing boolean operators is not guaranteed to follow familiar rules of precedence. For best results, use brackets to specify order of precedence and include brackets around every NOT phrase. e.g. lions AND tigers OR NOT bears would conventionally be parsed as (lions AND tigers) OR (NOT bears) but must include the brackets to guarantee that it will be parsed in this way by Dimensions. For a deep dive into how boolean operators are parsed in Lucene-based search engines, we recommend this article: Always Trouble with Boolean Queries.

3. Data Model

What is the difference between old-style FOR codes and the new ones?

As of version 1.18, we implemented a new methodology to assign Fields of Research (FOR) categories to publication documents. The new methodology is improving both the precision and recall of the FOR categorization. The API now provides both the 4-digit and the respective 2-digit codes via the single field category_for.

Do all author/investigator affiliations include GRID and GeoNames identifiers?

No. Identifiers are automatically extracted from person affiliations text, so they can be missing in some cases. Also, these identifiers are the result of separate data mining processes, so some affiliations may include a GeoNames country or city code, but not a GRID ID.

What’s the difference between authors and researchers?

Researchers are the subset of (publications) authors that have been successfully disambiguated. See also the searching for researchers / authors section for more details on this topic.

4. Logging and data retention

What data is captured about my API usage?

We capture details about who is using the API and what queries they send. We keep specific user details and queries for 3 months, for example, to help with system maintenance and customer support. We keep less specific data for a longer time. Access to this information is carefully controlled.

How long is my email address associated with my API usage?

We keep user email addresses and all corresponding queries for 3 months, for example, to help with user support and system maintenance. We keep less specific information about API requests (e.g. what features and data sets were used in each request but not the full text of the query) for 2 years. In this data set we can see what organization a user is from, but data specific to individuals has been pseudonymised or removed. This data helps us understand longer term seasonal patterns in usage so that we can maintain and improve the API. We also keep full user email addresses and simple usage information, such as counts of requests received by each user (i.e. not what the queries were) for 2 years. Customer organizations often want this type of usage data.

What is the oldest usage data you retain?

All data is deleted when it is 2 years old, by an automated process occurring weekly.