Google Analytics has proven to be one of the most influential tools ever created for marketing analysis. Google is pretty unrelenting in their pursuit of innovation for Google Analytics and that innovation shows in the number of other tools they’ve built for analysts. From Google Sheets to BigQuery to Google Data Studio, the complementary tools built are a great aid for dealing with the dearth of data that can be mined from Google Analytics. One of the little known yet game-breaking tools available for use with Google Analytics data is the Google Analytics BigQuery Export.

This tool, which is only available for users of Google Analytics premium product, is in essence a raw data export of a website’s Google Analytics data. This unlocks any analyst with a decent knowledge of SQL from the shackles of Google canned reports. This also allows an analyst to create much more robust logic for creating reports. For instance, if an analyst wanted to create a report for all users that viewed a particular page during their session and returned to the site within 6 days, they would only be limited by only their knowledge SQL and their ability to fork out the $150K Google charges for their premium product!!!
Google Analytics does not provide data at the user level out of the box, however, with the aid of a process outlined in Simo Ahava’s tremendously useful blog, you can use Google Tag Manager to pull Google’s user and session IDs out of the cookie (also known as Client ID) and feed them back to the interface in custom dimension or event. This gives an analyst the ability to report on user activity at the user ID level.
Remember: passing personally identifying information to Google Analytics is a violation of the terms of service, so don’t pass any personal identifying information to Google if you might have it, like email addresses.
Below are the steps I use for passing pageview data along with user and session data to the GTM data layer for logging in Google Analytics, however, you could technically use a slightly different process to pass ecommerce, event, goal, custom metric/dimension data as well. I’ll cover that in a later post. I’ll assume that the reader has already tagged all of their pages with a Google Tag Manager container, but if not, start by reading this post and make sure to tag your pages.
Steps:
- Create a Custom Dimension by going to the admin page in your Google Analytics view:Under “Custom Definintions” select “Custom Dimensions”, create a dimension and call it “Client ID” or whatever name you prefer. This dimension will have a scope of “Session”. Make note of the dimension index (you’ll need to enter that later).
- Create a Custom JavaScript Variable in Google Tag Manager and give it a title such as {{Set Client ID in Dimension 1}}.
Here is the code:function() { // Modify customDimensionIndex to match the index number you want to send the data to var customDimensionIndex = 1; return function(model) { model.set('dimension' + customDimensionIndex, model.get('clientId')); } }
Make sure to include the correct index to the customDimensionIndex variable. If you’ve completed this step correctly, you will be able to see the ClientId being passed under whichever custom dimension you have set it up for in the Google Analytics Debugger tool.
Client ID being passed into dimension 1 - If everything shows up, move back to Google Tag Manager and edit the pageview Tag for your site. Under “Fields to Set”, type “customTask” and under “Value” use the dropdown to select the variable we created in step B, {{Set Client ID in Dimension 1}}.
Now that concludes the first part of the process. Once you’ve reached this step, you could technically start playing with the user and session Client ID dimension in Google Analytics’ custom reports.Pull Client ID Custom Dimension DataSo we’ve tagged our site to send user and session data to Google Analytics and have dealt with sampling, now for the fun part. This string pulls page URLs, user and session IDs by date based on the dimensions detailed above. Where pro
ga_data <- get_ga(<replace with your view id>, start.date = "2017-01-01", end.date = "2017-01-02", dimensions = "ga:date, ga:pagePath, ga:dimension1", metrics = "ga:pageviews", fetch.by = “day”)
Run some other scripts
Using some other scripts, an analyst can answer a number of other questions, like how long does it take a new user to become a repeat user. These scripts rely heavily on the data.table syntax instead using base R. Please take a look at my prior post on using data.table to learn why I do so.
ga_data <- get_ga(<replace with your view id>, start.date = "2017-06-18", end.date = "2017-06-24", dimensions = "ga:date, ga:hour, ga:minute, ga:medium, ga:dimension1", metrics = "ga:sessions", fetch.by = "day")
Data Returned Then run the rest:
#split client ID into user and session IDs library(data.table) a <- do.call(rbind, sapply(ga_data$dimension1, function(x) strsplit(as.character(x),"\\."))) ga_data <- cbind(ga_data, a) #count sessions for each user in sequence ga_data$userseq <- sequence(rle(as.character(ga_data$user_id))$lengths) setDT(ga_data) #since the dimension is a "hit" based dimension, data will be returned for all hits. This script limits data to "session" data only and sorts by user, session and date. ga_data <- ga_data[sessions > 0][order(user_id,session_id,date,hour,minute)] #grab the date for the first session for each user ga_data[,mindate := min(date),by = "user_id"] #calculate the number of days since the first user session for each subsequent session ga_data[, datediff := as.numeric(difftime(date,mindate,units = "days"))] #calculate and return the average ga_data[,mean(datediff)]
This will return the number of days on average it takes a new user to become a repeat user.
There are a number of other uses for the client ID data in GA. For instance, a marketer might want to do some attribution modelling or a content manager would want to know if viewing an article in one session might effect subsequent sessions.
One consideration around doing this type of analysis is scale. Most smaller websites won’t pose an issue, but some larger sites (like the one I currently work on) will. Pulling an individual non-aggregated row for every session, page or event can yield some extremely large datasets. In this case, it would make sense to send the data to a cloud storage data warehouse such as BigQuery. Want to learn more on using R to solve for this? Stay tuned…