Today is a very auspicious day.
In less than 24 hours, Google Analytics will stop processing data from tens of millions of websites. Sites that have been using Google Analytics for a very long time.
Getting data out of Google Analytics is hard. It’s tenuous. It’s frustrating. And there are a number of services out there that can help you.
Analytics Canvas has a Universal Analytics Backup Service at a really amazing $99 starting price point.
Piped Out also has a great product Universal Analytics migration service at a similar £150 starting point.
At Analytics Ninja, we took a bit of a different approach for our clients. I spoke to Dominic at Piped Out and shared with him what I’m about to share with you. His comment was “how did you manage to butcher the API to get what you got?” I smiled. The answer is proprietary, though I know a handful of super smart people out there who can figure it out. It seems impossible, but it’s not impossible. We did it.
So yes, Analytics Ninja has a Universal Analytics Backup / Archival service that we offer too. This is something that we built for our clients, and have decided to make it known to the public because I think that there are larger non-GA360 using companies out there that that would benefit. I haven’t finalized pricing that I will offer this for, but it will probably be between 50 to 100 times that of the two companies above. Because we aren’t competing in the same space. We have a different level of offering.
The thing about the Reporting API is that it limited to a certain amount of dimensions and metrics per query. And it’s grumpy. And you can’t get raw data out of it.
So my friends in the space built services that basically pull reports.
Our service takes the GA Reporting API, pulls out all of the data, and stitches it together in BigQuery to make a proper database of Universal Analytics Data.
Including with a Session ID.
[UPDATE]
A friend recently pointed out to me that SuperMetrics has a “Data Warehouse” product for Universal Analytics. Apparently they charge quite a lot for it, but it also does not have a Session ID. Maybe I should brand our solution as the Analytics Ninja Super Data Warehouse Plus Plus Advanced Google Universal Analytics Warehouse for Google BigQuery.
#meh. I’m not great at branding….
In any case, I really love the schema we built. We based it generally off of the format of the GA360 BigQuery export, making a sessions table that is comprised of one row per session. Then within each session, we’re able to use nested and repeated columns to capture the Pages, Events, and Products interacted with by the user during their sessions.
I’m feeling AWE-STRUCT, for all those BigQuery nerds who appreciate that major dad joke.
The session_id that we use for Universal Analytics is the clientId + date + sessioncount. This is super important because the sessioncount actually RESETS after a 6 month period. So if you have any cookies that were relatively long lived, you’ll end up having overlaps if you use just a clientid+sessioncount combination. Something that we learned the hard way.
Please note, that this schema does NOT assume that the clientId was stored as a custom dimension. IF you have a clientId and timestamp as a custom dimension, we’ll build a proper EVENTS table for you, and model that into a sessions table. (Then a Customer Journey table and MultiTouch Attribution tables, but that’s for a different blog post which I’ll never find the time to write).
Because we’re using repeated columns, we can store all of the Pages, Events, and Product information in a very COST EFFECTIVE structure, since BigQuery on-demand querying charges by data scanned per column.
So, if you want to build standard Session Level reporting, you can do so with very inexpensive aggregations.
If you want to rebuild ecommerce reporting, all of the ecommerce data is available in a single column.
For those of you who are interested in Storage Costs, it’s about 2mb per session (based upon some averages we see with our clients).
Active logical storage in $0.02 per GiB per month. So that’s approximately two dollars ($2) per 100,000 100,000,000 sessions of storage (per month).
If you’re interested in saving your data for reporting, I’d recommend you turn to my friends / colleagues at PipedOut or Analytics Canvas. Their services are inexpensive and solve an important problem, namely being able to store reports.
As I noted, our solution is something altogether different. It’s a proper database. It’s capable of segmentation.
It’s fully able to be integrated in a true seamless fashion with GA4 data. The Seamless GA3<>GA4 Sessions integration is on our Phase II road map.
If you’re interested, please contact me [email protected].
Lastly, props to my brother-in-arms and dear friend Timur. You’re an amazing engineer. Thank you for building this together. It’s awesome.
Leave a Reply