A Tour of the DSL¶
The Dimensions Search Language (DSL) allows you to write concise expressions, called queries, to search and retrieve information from the Dimensions dataset. This page provides an interactive walkthrough of DSL query basics, to get you familiar with the language.
Todo
In this tutorial, suggested interactive exercises to build up your DSL knowledge will appear in boxes like this. This is the first one!
A simple query¶
Let’s start with a simple query example. One of the most basic and commonly-used types of questions we could ask about Dimensions data is something like, “Which are the most recent scholarly works related to a specific research topic?”.
We can answer such a question with a query like this, which asks: “Which are the most recent scientific publications related to malaria?”
search publications for "malaria" return publications
Todo
Click the “Run query” button above to view the results retrieved by this query in a new window.
Whitespace is ignored in DSL queries, so feel free to use line breaks and indentation as you wish. The following query is exactly the same as the query above:
search publications
for "malaria"
return publications
The anatomy of a query¶
Each DSL query is composed of two main parts:
A
search
phrase, e.g.search publications for "malaria"
, andAt least one
return
phrase, e.g.return publications
The first part of the query, the search
phrase, specifies the set of documents that we want to know something about.
In the example search publications for "malaria"
, we are saying “I want to know something about publications related to malaria”.
search
part can optionally specify whether to search using full text search or search only in title and abstract.
This is accomplished by specifier in [full_data | title_abstract_only]
that follows the source to search in, such as for example search grants in title_abstract_only
.
The second part of the query, the return
phrase, specifies what we want to know about the documents specified in the search
phrase. In the example return publications
, we are saying “I want to see basic information about the publications matched by the search
phrase”.
A DSL query is not complete without both a search
phrase and at least one return
phrase; we must tell the DSL both which data we are interested in and what we want to see from that data. Trying to run a query that has only one phrase or the other will result in an error, as you can see for yourself by trying to run these incomplete queries:
search publications for "malaria"
return publications
In the following sections we’ll explore what we can do with each of these two phrases.
search
for source documents¶
As we’ve just seen, the search
phrase specifies a set of documents that will provide the data for the results we’ll ask for in the return
phrase.
As all Dimensions data must be taken from one of the supported sources, this phrase always starts with the word search
followed by the name of the desired source, e.g. search publications
. This is the only part of the search
phrase that is not optional; the following is a valid (though perhaps not very useful) query where the search
phrase specifies the set of all publications:
search publications return publications
Choose a document source¶
In the previous queries we’ve searched for publications
, which is an example of a Dimensions data source, i.e. a type of scientific document/work that may be of interest. Others include grants
, patents
, clinical_trials
, researchers
and policy_documents
.
We can query another source by replacing publications
with the name of another source, like grants
. For example:
search grants for "malaria" return grants
Todo
Try changing the query above to search for another type of work, e.g. patents
. (Hint: make sure you update both the search
and return
phrases; what happens when you only change one?)
For a list of all currently supported sources, see the Data Model page.
Search for research topics¶
As we’ve seen in the preceding examples, we’re often interested in scientific work related to a particular topic, e.g. malaria. In the for
phrase, we can specify such a topic as one or more search term that we want the retrieved documents to match. We can specify a multi-word phrase as well, for example:
search grants for "attention deficit disorder" return grants
Todo
Try modifiying the query above to search for different terms/subjects. Take a look at the titles of the returned documents to see how they may be related to the subject(s) you searched for.
How does this type of search work behind the scenes? First, the complete text of all documents is analyzed, and each document is assigned a score that represents how well it matches the given terms(s). Documents are then ranked by their score, with the best-matching documents appearing first in the results.
Restrict the search with filters¶
We can restrict the set of documents matched by the search
phrase by specifying certain filters that must apply to the retrieved documents, in other words certain properties these documents must have. We specify these as filter expressions in the where
phrase, as follows:
search grants where start_year>=2010 and funders.acronym="DFG" for "attention deficit disorder" return grants
The query above only matches documents that were published in the year 2010 or later and where the supporting funder had the acronym “DFG”.
Todo
Try to modify the query above to match documents published only in 2010 (not in later years), then to match documents published in 2010 or earlier. (You can probably guess what operators you’d need to use instead of >=
, but consult the list of filter operators if you get stuck.)
Filters in the where
phrase may also be used to retrieve one or more specific documents using an identifier such as a publication DOI, like so:
search publications
where doi in ["10.1186/s12888-017-1463-3", "10.1186/s40479-017-0057-5", "10.1186/s12888-017-1222-5"]
return publications
The fields, entities, etc. that may appear in filter expressions depend on the source being searched; in the queries above, we see the year
field as well as the funder
entity’s acronym
field being used to search grants
, and the doi
field being used to search publications
.
If an unsupported field name is used, the DSL will report an error message to that effect, and provide a list of the supported fields to help you find the correct field name. For example, if you want to find grants that started in the year 2010 and create a filter for this using a field called startyear
, this will result in an error, because the field name is not quite correct. Try it for yourself - this query will raise an error:
search grants where start_year=2010 return grants
Todo
Fix the query above to use the correct field name (hint: it’s in the error message).
Consult the documentation for the where phrase (including the types of expressions/operators that may be used) to find out more about the types of filters and operators you can use, and the lists of supported fields for each supported source to see which field and entity names can be used in filter expressions.
Todo
Play with the queries above to use other filter expression(s) that narrow the set of documents in different ways.
Note
As you can see in the various queries above, the where
phrase may be used in combination with a for
phrase, or by itself. If both phrases are used, they can appear in either order - it does not matter which comes first. Behind the scenes, first the where
filters will be applied to narrow down the set of documents, and then this restricted set of documents will be scored & ranked against the for
search terms as described above.
return
information about documents¶
As we said earlier, the search
phrase allows us to specify which documents we’re interested in, but it doesn’t allow us to specify exactly what we want to know about those documents; for that we use the return
phrase(s).
A return phrase comes after the search
phrase, and always starts with the word return
followed by a specification of the result we wish to see. In the queries we’ve seen so far, this has been only the most basic type of result: the source documents themselves, i.e. return publications
or return grants
.
However, the return
phrase gives us the flexibility to be much more specific about what we wish to see; in the following sections we’ll explore some of the possibilities.
Return source documents¶
If we are interested in knowing something about the source documents themselves, we can ask the DSL to show us only certain information (metadata) about each document by indicating the specific field(s) we wish to see, like so:
search publications for "malaria" return publications [doi + title + year]
As in the search
phrase, the field(s) that may appear in such a return
phrase depend on the source being searched/returned.
Todo
Modify the query above to return different metadata by adding/removing/changing fields in the return
phrase. (Consult the list of publications fields for ideas.)
As a shorthand for commonly requested data, the DSL also supports a few fieldsets for each source, so that the name of a fieldset can be used instead of a long list of fields. For example, the extras
fieldset includes additional fields we may not have seen before:
search publications for "malaria" return publications [extras]
Compare the results of the query above to those of the query below, which uses the smaller basics
fieldset:
search publications for "malaria" return publications[basics]
The basics
fieldset is the default used when no fields or fieldsets are specified in the results phrase, such that return publications
returns the exact same results as return publications[basics]
. Try deleting :basics
from the query above to see for yourself.
For a complete list of supported fields and fieldsets for each source, see the supported data page of the documentation.