A Tour of the DSL

The Dimensions Search Language (DSL) allows you to write concise expressions, called queries, to search and retrieve information from the Dimensions dataset. This page provides an interactive walkthrough of DSL query basics, to get you familiar with the language.

Todo

In this tutorial, suggested interactive exercises to build up your DSL knowledge will appear in boxes like this. This is the first one!

A simple query

Let’s start with a simple query example. One of the most basic and commonly-used types of questions we could ask about Dimensions data is something like, “Which are the most recent scholarly works related to a specific research topic?”.

We can answer such a question with a query like this, which asks: “Which are the most recent scientific publications related to malaria?”

search publications for "malaria" return publications

Todo

Click the “Run query” button above to view the results retrieved by this query in a new window.

Whitespace is ignored in DSL queries, so feel free to use line breaks and indentation as you wish. The following query is exactly the same as the query above:

search publications
  for "malaria"
return publications

The anatomy of a query

Each DSL query is composed of two main parts:

  • A search phrase, e.g. search publications for "malaria", and

  • At least one return phrase, e.g. return publications

The first part of the query, the search phrase, specifies the set of documents that we want to know something about. In the example search publications for "malaria", we are saying “I want to know something about publications related to malaria”.

search part can optionally specify whether to search using full text search or search only in title and abstract. This is accomplished by specifier in [full_data | title_abstract_only] that follows the source to search in, such as for example search grants in title_abstract_only.

The second part of the query, the return phrase, specifies what we want to know about the documents specified in the search phrase. In the example return publications, we are saying “I want to see basic information about the publications matched by the search phrase”.

A DSL query is not complete without both a search phrase and at least one return phrase; we must tell the DSL both which data we are interested in and what we want to see from that data. Trying to run a query that has only one phrase or the other will result in an error, as you can see for yourself by trying to run these incomplete queries:

search publications for "malaria"
return publications

In the following sections we’ll explore what we can do with each of these two phrases.

search for source documents

As we’ve just seen, the search phrase specifies a set of documents that will provide the data for the results we’ll ask for in the return phrase.

As all Dimensions data must be taken from one of the supported sources, this phrase always starts with the word search followed by the name of the desired source, e.g. search publications. This is the only part of the search phrase that is not optional; the following is a valid (though perhaps not very useful) query where the search phrase specifies the set of all publications:

search publications return publications

Choose a document source

In the previous queries we’ve searched for publications, which is an example of a Dimensions data source, i.e. a type of scientific document/work that may be of interest. Others include grants, patents, clinical_trials, researchers and policy_documents.

We can query another source by replacing publications with the name of another source, like grants. For example:

search grants for "malaria" return grants

Todo

Try changing the query above to search for another type of work, e.g. patents. (Hint: make sure you update both the search and return phrases; what happens when you only change one?)

For a list of all currently supported sources, see the Data Model page.

Search for research topics

As we’ve seen in the preceding examples, we’re often interested in scientific work related to a particular topic, e.g. malaria. In the for phrase, we can specify such a topic as one or more search term that we want the retrieved documents to match. We can specify a multi-word phrase as well, for example:

search grants for "attention deficit disorder" return grants

Todo

Try modifiying the query above to search for different terms/subjects. Take a look at the titles of the returned documents to see how they may be related to the subject(s) you searched for.

How does this type of search work behind the scenes? First, the complete text of all documents is analyzed, and each document is assigned a score that represents how well it matches the given terms(s). Documents are then ranked by their score, with the best-matching documents appearing first in the results.

Restrict the search with filters

We can restrict the set of documents matched by the search phrase by specifying certain filters that must apply to the retrieved documents, in other words certain properties these documents must have. We specify these as filter expressions in the where phrase, as follows:

search grants where start_year>=2010 and funders.acronym="DFG" for "attention deficit disorder" return grants

The query above only matches documents that were published in the year 2010 or later and where the supporting funder had the acronym “DFG”.

Todo

Try to modify the query above to match documents published only in 2010 (not in later years), then to match documents published in 2010 or earlier. (You can probably guess what operators you’d need to use instead of >=, but consult the list of filter operators if you get stuck.)

Filters in the where phrase may also be used to retrieve one or more specific documents using an identifier such as a publication DOI, like so:

search publications
where doi in ["10.1186/s12888-017-1463-3", "10.1186/s40479-017-0057-5", "10.1186/s12888-017-1222-5"]
return publications

The fields, entities, etc. that may appear in filter expressions depend on the source being searched; in the queries above, we see the year field as well as the funder entity’s acronym field being used to search grants, and the doi field being used to search publications.

If an unsupported field name is used, the DSL will report an error message to that effect, and provide a list of the supported fields to help you find the correct field name. For example, if you want to find grants that started in the year 2010 and create a filter for this using a field called startyear, this will result in an error, because the field name is not quite correct. Try it for yourself - this query will raise an error:

search grants where start_year=2010 return grants

Todo

Fix the query above to use the correct field name (hint: it’s in the error message).

Consult the documentation for the where phrase (including the types of expressions/operators that may be used) to find out more about the types of filters and operators you can use, and the lists of supported fields for each supported source to see which field and entity names can be used in filter expressions.

Todo

Play with the queries above to use other filter expression(s) that narrow the set of documents in different ways.

Note

As you can see in the various queries above, the where phrase may be used in combination with a for phrase, or by itself. If both phrases are used, they can appear in either order - it does not matter which comes first. Behind the scenes, first the where filters will be applied to narrow down the set of documents, and then this restricted set of documents will be scored & ranked against the for search terms as described above.

return information about documents

As we said earlier, the search phrase allows us to specify which documents we’re interested in, but it doesn’t allow us to specify exactly what we want to know about those documents; for that we use the return phrase(s).

A return phrase comes after the search phrase, and always starts with the word return followed by a specification of the result we wish to see. In the queries we’ve seen so far, this has been only the most basic type of result: the source documents themselves, i.e. return publications or return grants.

However, the return phrase gives us the flexibility to be much more specific about what we wish to see; in the following sections we’ll explore some of the possibilities.

Return source documents

If we are interested in knowing something about the source documents themselves, we can ask the DSL to show us only certain information (metadata) about each document by indicating the specific field(s) we wish to see, like so:

search publications for "malaria" return publications [doi + title + year]

As in the search phrase, the field(s) that may appear in such a return phrase depend on the source being searched/returned.

Todo

Modify the query above to return different metadata by adding/removing/changing fields in the return phrase. (Consult the list of publications fields for ideas.)

As a shorthand for commonly requested data, the DSL also supports a few fieldsets for each source, so that the name of a fieldset can be used instead of a long list of fields. For example, the extras fieldset includes additional fields we may not have seen before:

search publications for "malaria" return publications [extras]

Compare the results of the query above to those of the query below, which uses the smaller basics fieldset:

search publications for "malaria" return publications[basics]

The basics fieldset is the default used when no fields or fieldsets are specified in the results phrase, such that return publications returns the exact same results as return publications[basics]. Try deleting :basics from the query above to see for yourself.

For a complete list of supported fields and fieldsets for each source, see the supported data page of the documentation.