Google Analytics backup – using Urchin

Keeping a local copy of your Google Analytics data can be very useful for your organisation. For example, for third party data audits, reprocessing data, troubleshooting purposes, and for viewing data longer than 25 months (Google’s current data retention commitment). Having a local copy of the collected data allows you achieve all these and protects you from the accidental or malicious deletion of your GA account and profiles.

[This post updated: 04-Aug-2009]

Benefits of keeping a local copy of Google Analytics visitor data

What you can do with your local copy of your data:

Use the following 2-step guide to keep a local copy; 1) modify your Google Analytics Tracking Code – the GATC; 2) add a small transparent gif file to your web root.

1. Modifying your GATC

If you are using urchin.js in your Google Analytics tracking, then you should upgrade! However, if that is not possible just yet, add the following line immediately before urchinTracker(); as follows:

For urchin.js users:

var _userv=2;

For ga.js users, add the following line immediately before pageTracker._trackPageview(); as follows:


For ga.js async users, add the following line immediately before _gaq.push(['_trackPageview']); as follows:


2. Add _utm.gif to your web server root

The consequence of step 1, is the request for a file named __utm.gif from your web server as your page is loaded. This is simply a 1×1 pixel transparent image that Urchin uses for processing. In fact, it is the only line in your logfile that Urchin uses (apart from error pages and robot activity. You can create the file for yourself or use this one (use right click, save file as) __utm.gif file. Upload this in your document root i.e. where your home page resides.

That’s it! With these two simple steps, Google Analytics visitor data is simultaneously streamed to your web server logfiles in addition to being sent to Google Analytics for processing. This is simple to achieve as all web servers log their activity by default, usually in plain text format.

Once implemented, open your logfiles to verify the presence of additional __utm.gif entries that correspond to the visit data as ‘seen’ by Google Analytics.

A typical Apache logfile line entry (line wrapped here) looks like this: - [01/Oct/2007:03:34:02 +0100] "GET /__utm.gif?utmwv=1&utmt=var&utmn=
2108116629 HTTP/1.1" 200 35 "" "Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; .NET CLR 1.1.4322)" "__utma=1.117971038.1175394730.1175394730.1175394730.1;
__utmb=1; __utmc=1; __utmz=1.1175394730.1.1.utmcid=23|utmgclid=CP-Bssq-oIsCFQMrlAodeUThgA|
utmccn=(not+set)|utmcmd=(not+set)|utmctr=looking+for+site; __utmv=1.Section One"

[ has a best practice guide on configuring an Apache logfile format ]

For Microsoft IIS, the format (line wrapped) can be as follows:

2007-10-01 01:56:56 - - GET /__utm.gif utmn=1395285084&utmsr=1280x1024&utmsa=1280x960
=/ 200 878 853 93 - - Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.0.3705;
+Media+Center+PC+3.1;+.NET+CLR+1.1.4322) -

In both examples, the augmented information applied by the GATC is the addition of utmX name value pairs. This is known as HYBRID data collection – the benefits of which are discussed in the post Software v Page Tags v HYBRIDS .

The benefits explained in detail

1. Greater control over your data
Some organisations simply feel more comfortable having their data sitting physically within their premises and are prepared to invest in the IT resources to do so. Of course you can not simply run this data through an alternative web analytics vendor, as the GATC page tag information will be meaningless to anyone else. However, it does give you the option of passing your data to a 3rd party auditing service such as ABCE . Audit companies are used to verify web site visitor numbers – useful for content publishing sites that sell advertising and therefore wish to validate their rate cards.

NOTE : Be aware that when doing this, protecting end-user privacy (your visitors) is your responsibility and you should be transparent about this in your privacy policy.

Comments (most recent first)

  1. BTW, take a look at AngelFish as a possible replacement to Urchin:

  2. Niamh says:

    Hi Brian

    I am interested in keeping a local copy of GA data for a start up I am working with. Will this work if I just have GA or do I need to have purchased the Urchin product also?

    Essentially I want pageviews stamped with visitor ID, visit ID and timestamp. (I can get through API using custom variables but raw files are preferred for a number of reasons). If the method above doesnt work with standalone GA, I guess that we could write some code to read the GA cookie and create these log files. That might be preferred because as far as I know this is not a widely used function and maybe there is a risk it would be deprecated?

    Many thanks for your insight. Love to meet in person if you find yourself in Ireland!

    Best wishes

    • @Niamh – this method is not applicable any more as Urchin sales have been discontinued. Essentially saving a local copy of the data is a bit useless if there is no tool to analyse it.

      BTW, be careful with any kind of visitor ID labelling using GA. It is against the Terms of Service to collect such data – even if anonymous. That is, all GA reporting is at the aggregate level.

  3. Thinkerati says:

    With this, does the data get replicated on the usual GA server AND your local server, or is it only recorded on your local server but not on GA server?

  4. Claudia says:

    *-* This is really too complicated for me, but I’ll try to follow this blog…analytics are simple the most important tool for promoting my business.

  5. Heena says:

    Hi Brian, i m using GA data for generating web traffic reports but its talking too long to get data from GA. so i want it to store it locally in my database. is there any way that i can store it like this? and i can run cron to fetch data overnight?

  6. Kiran BM says:

    Cool! This helps. Thanks Brian

  7. Kiran: In Urchin, setting the method of log file processing to UTM means that it will only process these lines for pageview data. Errors and file downloads will also be reported on.

    Event Tracking (not publically released) is a Google Analytics only feature at present.

  8. Kiran BM says:

    This is really cool!

    I have a question. May sound a bit stupid.
    How do I re-create the stats using the “utm.gif”ed log file and Urchin? As the log file will have extra lines (or calls) to utm.gif in addition to usual weblog lines, how do I tell Urchin to use utm lines alone? Also will I be able to backup the event-tracking related stuff using this approach?

  9. Sergio: You can’t backup data what you were not streaming to your logfiles at the time. However if you are logging your web server activity you could process it through Urchin:

    This would only be achieved by identifying visits by their ip address+user-agent identifier, which is notoriously inaccurate, but it could at least give you an idea. Moving forward you should use the HYBRID method for greater accuracy.

    Nathanial: I am in the process of updating my previous posts for the new ga.js implementation. The equivalent ga.js variable setting is:


    The migration guide can help you with this:

  10. Hi Brian,

    Interesting idea. I notice that this is uses the old version of the GA code. Has this been tested with the latest version?


  11. Hi Brian, excellent article!
    Do you know any possibility to backup your GA previous acquired data?

  12. Emma Kane says:

    Now this is interesting – I didn’t know you could do this. Off now to do this and see how it works in practice – and If I can usefully analyse it using something other than urchin 🙂

