Google Analytics backup – using Urchin
Keeping a local copy of your Google Analytics data can be very useful for your organisation. For example, for third party data audits, reprocessing data, troubleshooting purposes, and for viewing data longer than 25 months (Google’s current data retention commitment). Having a local copy of the collected data allows you achieve all these and protects you from the accidental or malicious deletion of your GA account and profiles.
If you are looking for an outsourced backup service for Google Analytics, contact GA-Experts.com, who can do this all for you…
[This post updated: 04-Aug-2009]
Benefits of keeping a local copy of Google Analytics visitor data
What you can do with your local copy of your data:
- Greater control over your data e.g. for third-party audits
- Troubleshoot Google Analytics implementation issues
- Process historical data as far back as you wish – using Urchin
- Re-process data when you wish – using Urchin
Use the following 2-step guide to keep a local copy; 1) modify your Google Analytics Tracking Code – the GATC; 2) add a small transparent gif file to your web root.
1. Modifying your GATC
If you are using urchin.js in your Google Analytics tracking, then you should upgrade! However, if that is not possible just yet, add the following line immediately before
urchinTracker(); as follows:
For urchin.js users:
For ga.js users, add the following line immediately before
pageTracker._trackPageview(); as follows:
For ga.js async users, add the following line immediately before
_gaq.push(['_trackPageview']); as follows:
2. Add _utm.gif to your web server root
The consequence of step 1, is the request for a file named
__utm.gif from your web server as your page is loaded. This is simply a 1×1 pixel transparent image that Urchin uses for processing. In fact, it is the only line in your logfile that Urchin uses (apart from error pages and robot activity. You can create the file for yourself or use this one (use right click, save file as) __utm.gif file. Upload this in your document root i.e. where your home page resides.
That’s it! With these two simple steps, Google Analytics visitor data is simultaneously streamed to your web server logfiles in addition to being sent to Google Analytics for processing. This is simple to achieve as all web servers log their activity by default, usually in plain text format.
Once implemented, open your logfiles to verify the presence of additional
__utm.gif entries that correspond to the visit data as ‘seen’ by Google Analytics.
A typical Apache logfile line entry (line wrapped here) looks like this:
188.8.131.52 www.mysite.com - [01/Oct/2007:03:34:02 +0100] "GET /__utm.gif?utmwv=1&utmt=var&utmn= 2108116629 HTTP/1.1" 200 35 "http://www.mysite.com/pageX.htm" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" "__utma=1.117971038.1175394730.1175394730.1175394730.1; __utmb=1; __utmc=1; __utmz=1.1175394730.1.1.utmcid=23|utmgclid=CP-Bssq-oIsCFQMrlAodeUThgA| utmccn=(not+set)|utmcmd=(not+set)|utmctr=looking+for+site; __utmv=1.Section One"
[GA-Experts.co.uk has a best practice guide on configuring an Apache logfile format ]
For Microsoft IIS, the format (line wrapped) can be as follows:
2007-10-01 01:56:56 184.108.40.206 - - GET /__utm.gif utmn=1395285084&utmsr=1280x1024&utmsa=1280x960 &utmsc=32-bit&utmbs=1280x809&utmul=en-us&utmje=1&utmce=1&utmtz=-0500&utmjv=1.3&utmcn=1&utmr =http://www.yoursite.com/s/s.dll?spage=search%2Fresultshome1.htm&startdate=01%2F01%2F2010& man=1&num=10&SearchType=web&string=looking+for+mysite.com&imageField.x=12&imageField.y=6&utmp =/ 200 878 853 93 - - Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.0.3705; +Media+Center+PC+3.1;+.NET+CLR+1.1.4322) - http://www.yoursite.com/
In both examples, the augmented information applied by the GATC is the addition of utmX name value pairs. This is known as HYBRID data collection – the benefits of which are discussed in the post Software v Page Tags v HYBRIDS .
The benefits explained in detail
1. Greater control over your data
Some organisations simply feel more comfortable having their data sitting physically within their premises and are prepared to invest in the IT resources to do so. Of course you can not simply run this data through an alternative web analytics vendor, as the GATC page tag information will be meaningless to anyone else. However, it does give you the option of passing your data to a 3rd party auditing service such as ABCE . Audit companies are used to verify web site visitor numbers – useful for content publishing sites that sell advertising and therefore wish to validate their rate cards.