I started my voyage into learning R by taking Datacamp’s online courses. After finishing courses on data manipulation in both base R and dplyr, I stumbled upon a course on using the data.table library. I was taken back a bit after learning that data.table using a different syntax than base R. This was unnerving as I didn’t know what I would gain from learning data manipulation in yet another syntax. My skepticism, however, changed to optimism once I began working on a rather large dataset a few weeks later. This dataset (a 43M row set of email opens and clickthroughs), took something like 30-40 minutes to read into R using the base read.csv function. Instead, I tried using the fread function in data.table. Low and behold, what took 30-40 minutes using base R took about 5 minutes using fread. Here’s timing data for a 3M row text file:
The speed improvements are not just limited to reading data. Manipulations are also faster using data.table. Here are 2 functions written to group and count rows using the world cities population dataset.
Lastly, as you can see in the functions written above, the data.table function (dt_func) is less verbose than the dplyr function (df_func). One of the reasons for this is that dplyr is meant to be easily expressed from one programmer to another, however, some programmers will not need to share their code from one user to another. Nevertheless, once I learned the data.table syntax, I preferred using it over the dplyr syntax. This seems to be the case for many programmers with a previous foundation in SQL.
While learning syntax can be a tough task, I have to admit that the extra work of learning data.table syntax is worth it.
If you haven’t had a chance to read my last post on using R with Google Analytics data, please take a look. Also, if you have any comments, questions or if simply want to call me crazy, drop a comment below.
In my opinion, Google Analytics is the single most influential development in marketing analytics ever. Quantcast estimates that 70% of its top 10,000 website have GA installed. Google has shown a relentless drive to improve the product over the years and it’s free price tag insures access to most anyone that runs a website. With that said, Google Analytics is a service and no service (great or lacking) is without flaws. One of the hidden advantages GA possesses is a robust API and this advantage allows users to build some of the features that are missing from the standard interface. I wanted to cover some of the ways a user could use R to deal with some of the features not available in GA.
In order to use any of these techniques, you will have to install R as well as the rga package and dplyr package which available on CRAN. Other packages used include ggplot2 for visualization, scales, lubridate and zoo. Use the script below to install.
Event Conversion Rate Script
#Authorization for RGA
##Enter your view ID here
#Enter you start and end date here
#RGA script pulls event parameters, and event page
This gives all event parameters (Category, Action and Label) as well as the page URL and content group 1, allowing the user to easily aggregate pages if they are passing content groups. I strongly encourage using content groupings.
Analyze Acquisition Mediums with ggplot2
Google Analytics has some good embedded graphs for analyzing traffic mediums and the advent of Google Data Studio gives users even more flexibility, however, sites with high numbers of marketing mediums (10+) will pose issues for these tools. Using ggplot2 in R allows a user to create what analysts call “small multiples” or a series of similar graphs or charts using the same scale and axes, allowing them to be easily compared. Below is a script that returns small multiples for a year over year comparison of marketing mediums.