Why you can’t trust your website analytics: Part One

Written by Edward Kay on November 29, 2018

How much do you trust your website stats, e.g. Google Analytics?

The answer should be ‘not that much’.

Data from three separate UK membership organizations reveals a steady decline in the proportion of visitors who are captured by Google Analytics.

Initial data discrepancies

Year Out Group were concerned their website visitor numbers in Google Analytics have been dropping year-on-year since 2013. This is despite strong search engine rankings and a new website launched in March 2018.

The new Year Out Group website is hosted with WP Engine. WP Engine provide detailed analytics on the site traffic based on their log files. These log file stats are processed to filter out non-human requests (e.g. search engine bots) from real visitors – just like Google Analytics tries to do.

I decided to compare the traffic reported by WP Engine’s log files with that in Google Analytics.

Web traffic analysis is complex. Each available data set will include some form of filtering or processing. I fully expected some variation between the two data sets in absolute terms, but thought the general trends would be broadly consistent.

I was wrong.

And this is where it gets interesting.

The WP Engine logs show a strong trend of visit numbers increasing, while Google analytics shows a continued gradual decline:

group-stats

Year Out Group stats. Google Analytics shows a decline while WP Engine logs show a growing audience.

Some variation in the actual numbers is one thing.

But when the two sets of data are reporting fundamentally different trends, there are huge implications.

Further proof Google Analytics is inaccurate

The next step was to see if other clients with the same WP Engine logs available showed similar characteristics.

And they did.

I ran the same analysis for Scottish Association of Landlords and Professional Speaking Association. These two sites had the added advantage of being able to go back a little further with the hosting logs to provide a larger data set.

All three datasets show large differences between visitor numbers. Google Analytics is only capturing 20-80% of the visit numbers from the log analyses.

More crucially, the proportion of log file visits reported in Google Analytics is decreasing for all three sites. The sites with the highest percentage of log file visits in Google Analytics show the sharpest declines:

file-visits

Proportion of log file visits recorded as Google Analytics sessions.

Note the downwards trend across three separate websites.

(The data points with ratios over 100% are due to Cloudflare caching. See notes for details).

Over the same time period, all three sites show strong growth in visits from the log file data while their corresponding Google Analytics data report visits as either static or in decline.

Possible causes

Incorrect data

We have to trust that data available from Google and WP Engine are accurate. There will always be an element of filtering and processing of these data.

I am working on the assumption that all such processing is applied consistently.

Ad blockers, privacy settings and network filters

Google Analytics relies on the user’s browser to send data to it:

how-third-party-tools-work


How Google Analytics works under normal circumstances.

Clearly if this information is not sent, their usage will not be tracked in Google Analytics.

Many ad blockers – whose purpose is to hide adverts on the websites you visit – also block tracking and analytics services, including Google Analytics.

Firefox even includes a tracking protection option without the need for any extensions. When enabled, this explicitly stops data being sent a huge list of services, including Google Analytics.

Network administrators can also filter out traffic. Net neutrality laws (in the UK and EU at least) prevent telecoms companies from blocking adverts. These laws thwarted the attempts by mobile operator Three to block adverts at the network level. But there is nothing to stop corporate network managers filtering out such traffic before it reaches the public internet.

Visit data in server log files are not affected by the use of an ad blocker:

how-adblockers-work

How ad blockers stop Google Analytics collecting data. Note how the log file data are not effected.

(Fun fact: the URL of this image initially included google-analytics, but was then blocked by my ad blocker!)

The use of ad blocking technology is difficult to quantify. Various reports from 2017 (the latest available) suggest ad blockers are used by between 11% and 58% of users.

What does this mean for associations?

Don’t blindly assume that just because some numbers are on a fancy dashboard they represent the truth.

We need to understand how the data is collected and any factors that may affect its accuracy.

Only then can we determine the appropriate level of trust to place in these data and use them effectively in our decision making.

Watch-out for part two of this report when we’ll dive right into this.