• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Analytics Ninja

Analytics Ninja

Silently Surpass the Competition

  • Our Team
  • Services
    • Google Analytics Audit
    • Google Analytics Implementation
    • Google Tag Manager Implementation
    • Dashboarding and Reporting
    • Data Analysis
    • Amplitude Implementation Services
    • Amplitude Reporting Services
    • Tracking & Analytics for Shopify Stores
    • Shopify Data Reporting Services
  • Our Clients
    • Client Portfolio
    • Testimonials
  • Blog
  • Contact Info
  • Hire Us

The Importance of Clean and Meaningful Google Analytics data

February 10, 2013 by Analytics Ninja 3 Comments

Ever since returning from Superweek in beautiful Galyateto, Hungary, I’ve been thinking a lot about data and the utility of Google Analytics as a tool.  Yes, I know, I spend a lot of time thinking about those things, but the conference was particularly inspiring in those regards.  Google Analytics is not different than any other digital analytics tool insomuch as it is critical to understand what the values that get reported actually mean and how they get there in the first place.  But that’s not enough.  When we analyze data, we need it to be presented in a meaningful way.  Data visualization is tremendously important in this regards, and I believe that one of the reasons why Google Analytics has such great adoption and market penetration (besides the enticing $0.00 entry price point) is because the UI is crisp, FAST, and easy to use.

One catalyst for this post is a response to this post entitled “Are You Being Misled by Google Analytics?”  While I am about to critique the post, I do want to point out that one of the ideas that Tien Nguyen has (who Chris mentions in his article as the source of this idea)  is indeed insightful.  Namely, that without configuration Google Analytics may not provide as much visibility into traffic sources that one needs.   While I urge you to take a look at the article, I’ll briefly summarize the main idea here.

Currently, when traffic is not tagged with campaign tracking parameters Google Analytics by default sets its campaign cookie according to the document referrer.  In other words, GA looks at the source of traffic (which website the user was on before they clicked to your site) and then uses a set of rules to determine how to classify the traffic.  If the source of traffic is one of GA’s predefined search engines, then the traffic will be listed as Organic.  If the traffic is not from a predefined search engine, it will be listed as a referral.  When a user arrives on your site via a URL with campaign tracking parameters appended, the source of traffic will be reported in accordance to the tracking parameters.

I will start with the case from one of my own clients, and then move on to Chris’s post. One of my clients is a retailer who advertised heavily on Google, but decided that they should also try advertising on Bing Ads. As most people know, Microsoft and Yahoo created what they call the “Search Alliance” through which Microsoft adCenter (now Bing Ads) powers the ad displays on Microsoft, Yahoo, and all of their partner websites.

Regular campaign tagging

 

My client was interested in knowing which traffic was coming from Yahoo and which traffic was coming from Bing. It was easy enough to get this data, all we needed to do was create a filter:

referral from yahoo

 

The impact of making this change gives us a Traffic Sources report that looks like this:

partner networks

 

The thing is, who cares?  Seriously, I did my very best to explain to my client that this simply doesn’t matter. I was told that the “type of person” who searches on Yahoo may be very different than the type of searches on Bing.  Ok…  Let’s accept that assumption for now.   So why does it not matter? Well, there is no way to change ad targeting in Bing Ads based upon ad network. In other words there is no way to optimize the traffic that one is getting based upon this information. As a result, instead of providing insightful data we are creating fragmented data.

The blog post that I referenced above goes to length to explain what a CSE Co-brand is and why there is an “issue” with normal link tracking vs. their solution for “advanced” link tracking. Their suggestion for understanding the true source of Pricegrabber traffic:

cobrand-analysis

 

AHHHH!!!  So many rows.

Reporting [utm_source], [real referrer] in GA.  Geeky, yes. Cool, yes (especially for us geeks).

But WHY? Optimization as a relates to Pricegrabber or any other comparison shopping engine is vis-à-vis, for the most part, product suppression. That means to say, if certain products are getting lots of clicks but they are not converting then lower their bids or remove them completely from your feed. Knowing the true referral source for this Pricegrabber traffic is not actionable (an important word that us web analysts like to throw around a lot).

Much more useful advice would have been to suggest concatenating the product category to utm_campaign and the product SKU to utm_content.  Usable data… novel…

 

Using New Profiles

The real kicker for me, however, was that the author left out a critical piece of advice. Namely, that anybody who is interested in using this “clever” filter should ONLY do so in a new profile. Profiles cannot be cleaned up retroactively. Once poor data is in there, is there to stay. Since many readers out there may not know about the best practice of using test profiles, unfiltered profiles, new profiles for these sorts of changes, etc., I feel that is borderline negligent to leave out that critical advice.

 

Where are my ecommerce sales?

ecommerce data

 

Another important “detail” that is worth mentioning is that the application of the filter like this will yield some very “wonky” data. In the image above we see that traffic specifically from Bing and Yahoo isn’t registering any sales. Why would that be?  Let’s start by applying a Visits with Transactions advanced segment.

visits with transactions

 

 

Where did these transactions go???  Ah, here they are, under the adcenter tagged traffic.

 

hidden transactions

 

But wait another darn minute!!  Something doesn’t look right here.

I’ll give you a moment to figure it out…

 

Answer:  Filters function on the “hit” level.

Filters allow for the processing of data that is pumped into Google Analytics before it goes into profiles. Data gets processed in Google Analytics on the “hit” level.  A “hit” is any time that there is a request to the __utm.gif file.  The parameters that are appended to the file location contain the data that GA uses to build all reports.

In the example below, I did a search for [analytics ninja] in Google. As you can see, I was signed in which is why the keyword reported is (not provided).

 

landing page

When I visit another page on the same site, you will notice that the UTMR parameter gets set as zero.

 

2nd page viewed

 

The very first hit of the session determines much of the “visit information” about the session.  This is why there are visits with all of the different PriceGrabber information neatly shoved in there in the traffic sources reports.  The Co-Brand is visible to the “visit” because that data from utmr existed during the first hit of the session. However, on all subsequent hits the utmr parameter did not contain the information will as stipulated in the filter.  That “Co-Brand” data (i.e. document.referrer) is not longer available to GA to process.  Any of the e-commerce hits are perforce not tracked back to the way the visit is being displayed in Google Analytics and therefore are coming up as ghost sessions.

 

Bottom line: Please be very careful folks with all of the snazzy Google Analytics advice out there.  You might wind up with a bunch of fragmented data that is totally wonky, not actionable, and probably putting a permanent stain on your GA data since you didn’t create a new profile.

Reader Interactions

Comments

  1. Tien V Nguyen says

    February 11, 2013 at 3:23 am

    Love this detailed post! I replied on our blog to the points you made, I’ll post them here for your readers too:
    —
    Here are a few points I’d like to address:

    -” Knowing the true referral source for this Pricegrabber traffic is not actionable ”

    In some cases this may be true, but there are instances where knowing the “true” source can be extremely important. For instance if the traffic we’re getting is from say an international domain, e.g. South America, England, India, etc.. then we know that that traffic is completely useless to the merchant since they don’t ship there.

    Or we’ll observe that 40% of traffic is from a source that doesn’t convert, or is from a “questionable” site, the CSE that sends that traffic can shut off those sources. So we’ll tell the CSE, “we noticed that we’re getting traffic from a cobrand based in South Africa, can you shut it off” and no longer will the merchant be paying for clicks that are completely useless to them.

    You’re 100% right that product suppression (as well as brand or category suppression) is a major factor in optimization, but the actual source, if it’s of very low quality can be just as important.

    “Namely, that anybody who is interested in using this “clever” filter should ONLY do so in a new profile.”

    Another good point. I think the screenshots taken were a bit dated and we had one minor update that we do in-house, and that’s instead of using the “output to -> constructor” set to Campaign Source, we’ll use “User Defined” so that it doesn’t interfere with any data moving forward.

    You’re right in that if we did set it to campaign source, that’ll really screw up with how the old/new data work with each other, but if the output goes to a field in analytics that is not being populated by any data, using the current profile shouldn’t mess anything up.

    Reply
    • Yehoshua Coren says

      February 11, 2013 at 9:19 am

      @facebook-2500302:disqus

      Thank you very much for taking the time to comment on this post. There are a number of important things that you bring up, and I’ll go ahead and make an update to my post noting them as well.

      Re: Being able to tell CSEs to turn off certain co-brands.

      This indeed is actionable! I was unaware that CSEs honored such requests (it certainly isn’t in their user interfaces). The point you raise changes everything. Now we can certainly make decisions based upon properly configured GA data.

      Re: User Defined Value

      This is key. I chose not to mention this in my post, because I was unaware that CSEs will manually turn off co-brands upon request. So pushing data to user defined didn’t make much of a difference. BUT, now that knowing the true traffic source can impact our profitability, using user defined is very important.

      As you can see from my blog post, the referral source is filtered on the hit level; the first hit is the time when utmr sends the value we need. As such, sending referral values to user defined for the first hit will indeed get matched to the visit, and we won’t get those ghost sessions that I mentioned in my post. In other words, it won’t “interfere” with data as you mentioned. (Please feel free to confirm this based upon your own data).

      You referred to passing values to user defined as a “minor update.” I respectfully disagree with you and would like to suggest that this is a “major update.” 🙂 Without doing this, we wouldn’t have the conversion data necessary to tell the CSEs to turn off one of the co-brands.

      Reply
  2. Rob Kingston says

    February 18, 2013 at 1:04 pm

    Great post, Yehoshua (I only just managed time to sit down and read it all).

    I find most of the filter hacks people share can be teased out with a handful of broad filters and then analysed down the track using advanced segments or the API to drill into the data. The only filters I regularly use are:

    – Full referrer filter
    – Internal address filter
    – Include domain name filter (or exclude development environment filter(s))
    – Excluding hits on a cross domain proxy. e.g. http://www.citricle.com/blog/how-to-integrate-netsuite-with-google-analytics/

    Sure there’s some funky stuff you can do with filters, but for the most part, they don’t offer a lot of value.

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

What we Offer

  • Google Analytics Audit
  • Google Analytics Implementation
  • GTM Implementation
  • Dashboarding and Reporting
  • Data Analysis

Find us here

  • Facebook
  • LinkedIn
  • Phone
  • Twitter

Who we Are

  • About Us
  • Contact Us
  • Privacy and Cookies

Blog Topics

  • Advanced Segmentation (4)
  • Bounce Rate (1)
  • Conversion Attribution (3)
  • Conversion Tracking (1)
  • Google Analytics Cookies (3)
  • Google Cloud Platform (1)
    • BigQuery (1)
  • Google Product Search (1)
  • Google Tag Manager (1)
  • HubSpot (1)
  • Key Performance Indicators (2)
  • Miscellaneous (3)
    • Shopify (1)
  • Tagging (2)
  • Troubleshooting (2)
  • Uncategorized (23)
  • Universal Analytics (1)

Footer

Like us, Follow us

  • Facebook
  • LinkedIn
  • Phone
  • Twitter

Services

  • Google Analytics Audit
  • Google Analytics Implementation
  • GTM Implementation
  • Dashboarding and Reporting
  • Data Analysis

About

  • About Us
  • Contact Us
  • Privacy and Cookies

Copyright (c) 2010 - 2025 Analytics Ninja LLC