How to Query from Google BigQuery and Azure SQL Database at the same time in Python

Google BigQuery and Azure Cloud are both powerful platforms to store data. Google BigQuery can process a couple TB of data within a couple minutes and you pay when you query, store and process. The detailed pricing is here.Azure SQL data base provides fast and convienient data for the first 32 GB/month at ~$5/month. The detailed pricing is here.Normally business intelligence analysts, database managers or data scientists access the two platforms from the two consoles separately. This leads to that data are scattered and have to be read into a central programming IDE such as RStudio and PyCharm. This post will introduce some quick and dirty way to query data from both Azure database and Google Bigquery in Python using packages that are developed for this purpose.

Connect to Azure Database using Pyodbc package

To query data from Azure SQL database, we will use the package 'pyodbc' to connect to the Azure database via the necessary database engine credential.Below I’m taking an example of querying some weather data from an established azure database. The main function we will use is pyodbc.connect()

import os
import pyodbc
import pandas as pd

server = 'server name'
database = 'database name'
username = 'your user name'
password = 'your password'
driver= '{ODBC Driver 13 for SQL Server}'

cnxn = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password)

After connecting to the database, if we want to get temperature and precipitation data for the time frame 2017-08-01 to 2018-07-31, we then use the below query and use the function from pandas package pd.read_sql() to convert the data into pandas readable data frame.

sql1='''
    SELECT
[StationID] AS station_id,
[ObservationDate] AS date,
TEMPERATURE_Average AS temp_avg,
TEMPERATURE_Maximum AS temp_max,
TEMPERATURE_Minimum AS temp_min,
[PRECIPITATION_Minimum] AS precip_min,
[PRECIPITATION_Maximum] AS precip_max,
[PRECIPITATION_Average] AS precip_avg,
[TOTAL_DAILY_WATER_EQUIVALENT] AS tot_water
FROM [Aggregated].[StationData_Daily]
WHERE [ObservationDate]>='2017-08-01' AND [ObservationDate]<'2018-07-31'
AND [StationID] IN (45956,31921,62138)
    '''
temp_precip1=pd.read_sql(sql1,cnxn)
temp_precip1.head(10)

Pulling data from google bigquery

Once you have some temperature and precip data from Azure database, you would want to connect to the Google BigQuery console to download some data from there. In order to do that, we need to install the Python package called 'pandas-gbq' and 'google.cloud'. Additionally, you would need your project credential information from Google BigQuery(BQ) since everything on BQ is project based and you will also need a private_key which you could only obtain if you have admin access. The private key is a json file which you would need to save under the same directory when connecting to BQ in Python. Please see below code.


from google.cloud import bigquery
import pandas_gbq

sql="""
select *
FROM `yourproject.yourtable`
"""
# make sure the json file is saved under the same directory of this script
df=pandas_gbq.read_gbq(query=sql,project_id='xxx',private_key='xxx.json',dialect='standard')
df.head(10)

Connecting to BQ is even easier as it has combined the pandas data frame converting together with the reading bigquery by using a function called pandas_gbq.read_gbq.

Once you have both of the data, you can merge them together and do further analysis on it. There you have it. It is not as difficult as you think. Therefore, there’s no need to put the data from two platforms together and enjoy the benefits of the two platforms at the same time.