Accessing Dimensions on GBQ using PowerBI¶
This guide will walk you through the steps necessary to access Dimensions BigQuery data using Microsoft PowerBI.
Please note that the guide and screenshots below have been taken using PowerBI Desktop; if you are using other versions of the software, there might be some differences due to the way the user interface looks, however the fundamental steps will be the same. This guide assumes the user is already proficient with PowerBI and SQL based data analysis, this is not intended to be a general PowerBI guide, rather general notes for creating PowerBI data sources utilising Google BigQuery.
There are two different approaches that can be utilised to access Google BigQuery data from within PowerBI. Each approach has pros and cons and it is dependent on the type of analysis being performing as to which is best suited.
One important note, the native PowerBI Google BigQuery data source provider and the ODBC data provider both do not deal with BigQuery’s nested and repeated data schemas. These structures, also known as STRUCTS and ARRAYS in standard SQL are not represented natively in PowerBI so are displayed as JSON encoded results. It is recommended to instead write custom SQL queries that flatten the GBQ nested data schema and structures into a single/flat SQL result and utilise this as the data source within PowerBI.
Both of the options below require that the Google Account you are utilising has access to a GCP project with BigQuery admin/user roles (used to initiate and charge query jobs).
At minimum, a GCP project is required in order to be utilised as a “Billing Project”. The billing project is one which charges for queries and other BigQuery operations are charged to. It is possible however to utilise the BigQuery Sandbox, providing free queries of up to 1TB per month without requiring a billing account to be configured.
For more information regarding GCP projects and billing project configuration, details can be found on the Google Cloud Platform Setup page.
1. Built-in GBQ provider¶
The built in Google BigQuery data provider within PowerBI has a limitation that only projects that the users credentials has project related roles associated are listed by the data source provider. This is problematic when it comes accessing shared datasets, this is the case with datasets like our public COVID-19 dataset and the overall Dimensions dataset. In order to overcome this limitation, it is necessary to create views within projects that are owned by the users credentials. SQL views are essentially wrappers, that create a virtual table that contains the results of a SQL query and presents that result as a table. Details regarding Google BigQuery views can be found here.
The view requires a GCP project and an owning dataset. The Google Cloud Platform Setup page provides useful details regarding configuring the prerequisite GCP project.
The first step is to create a new view that encapsulates a SQL statement that extracts data from a dataset/table of your choice. It is recommended that the SQL statement flattens the data schema, un-nesting arrays and structures into a flat schema instead. The name of the view is fully specified, in that it includes the name of the view, in addition to the project and dataset name.
1 2 3 4 5 6 7 8 9 10 11
CREATE OR REPLACE VIEW `local-shoreline-282601.powerbi_views.covid_pubs` SELECT p.id, p.title.preferred as title, date, IF(BYTE_LENGTH(date) > 4, CAST(SPLIT(date, "-")[SAFE_OFFSET(1)] as INT64), null) as month, journal.id as journal_id, CONCAT(ct.code, " ", ct.name) as category, altmetrics.score as attention_score FROM `covid-19-dimensions-ai.data.publications` p, UNNEST(category_for.second_level.full) ct
The newly created view should now appear within the Google BigQuery console explorer. Within PowerBI it is now possible to create a Google BigQuery data source after following the required steps to provide authentication details.
After configuring the Google BigQuery source it should now be possible to select the created view as the table to import into PowerBI.
2. Simba ODBC Drivers¶
Google make freely available standard ODBC drivers for BigQuery that provides applications (such as PowerBI) access to Google BigQuery data resources using the common ODBC framework.
Install the ODBC drivers¶
ODBC drivers need to be installed first before being able to use Google BigQuery from an application that provides ODBC as a data source. The drivers and the installation instructions can be found here. For Windows 10 with PowerBI desktop you will want to install the Windows 64-bit drivers.
Configure the ODBC driver¶
With the installation of the ODBC driver complete, it is now necessary to connect a ODBC data source definition.
From the Windows menu, open the application “ODBC Data Sources (64-bit)”.
Create a ODBC data source¶
Create a new data source for Google BigQuery, selecting “Simba ODBC drivers for Google BigQuery”. It is possible to define a ODBC data source that is available to only the currently signed-in user on the computer you are using, or define a ODBC data source that is available and configured for all users on the same desktop machine. Enter a name and description for the newly created data source.
Change the OAuth mechanism to “User Authentication” and then click “Sign In…” and follow the required steps within the browser window that opens. At the end of this process, you will be presented with a Google OAuth confirmation code. Copy this code back to the ODBC data source configuration window, into the field “Confirmation Code”. The next step is to select/enter a “Catalog (Project)”. This is the GDP project that is used for the billing of queries (as discussed at the beginning of this guide).
It is possible to define an additional project that access can be accessed from the listing of available projects/datasets that wouldn’t otherwise be visible. Any project/dataset that is available to the credentials configured in the above section can be appended onto the list of accessible data sources. Click “Advanced Options” and fill in the “Additional Projects” section to include the name of the project that contains datasets you wish to access (such as “dimensions-ai”).
Use ODBC within PowerBI¶
From within PowerBI, create a new data source and select the type of data source as “ODBC”.
From the ODBC options, select the data source name (DSN) which is the same as the one defined in the previous section.
Due to Google BigQuery supporting nested and repeated data sources, it is heavily recommended that a custom SQL query is used to extract the data needed for analysis. In the above screenshot, the “SQL Statement” field has been populated with a query that UNNESTs the ANZSRC FOR 4-digit classifications codes and creates a flat set of results.