URGENT!! Sum of newVisits returned is different to manually adding up the new visits value per day??

255 views
Skip to first unread message

Nate Norrish

unread,
Sep 1, 2011, 2:45:29 PM9/1/11
to google-analytics...@googlegroups.com
Using the data feed query explorer, if I setup my query with just metrics as "ga:newVisits", it returns a total "newVisits" for 1 month of 9916.

If I add a "ga:date" to the dimensions field, it returns a list of all newVisits per day of the selected month. Manually adding up all these values returns a total of 9931.

Any ideas why this is happening? Not sure if I'm missing a point here, but it seems that they should both be returning the same total of new visits.

Some screen shots representing the problem are here:

Without ga:date dimension - shows the sum of newVisits for the selected date range, which apparently is 1187
http://oecms.com/shared/nate/rkvh7i.png

With ga:date dimension - add each value up and it returns 1188
http://oecms.com/shared/nate/8ednym.png 



Thanks 

Christian G. Warden

unread,
Sep 1, 2011, 2:55:55 PM9/1/11
to google-analytics...@googlegroups.com
If a visit spans multiple days, this would make sense. For example,
if the visit by a new visitor started at 11:45pm on 8/31 and ended at
12:05am on 9/1, there would be a new visit on both days.

Christian

Nate Norrish

unread,
Sep 1, 2011, 3:12:28 PM9/1/11
to google-analytics...@googlegroups.com
Interesting point. But the visit took place on 8/31 not 9/1, and a new visit is defined as someone who has never been to the site before, therefore GA should know that they were there previously on the 8/31 and the 9/1 would not be a new visit, it would just be a visit right? 


Nate Norrish

unread,
Sep 1, 2011, 3:12:58 PM9/1/11
to google-analytics...@googlegroups.com
btw, thanks for the prompt reply :)

Brandt Dainow

unread,
Sep 2, 2011, 12:00:04 PM9/2/11
to google-analytics-api - GA Data Export API
I've tested on a few sites and I get the same behavior. It is may be
related to end of day. Google Analytics closes all visits at the end
of each day and starts new ones. If someone gets to your site at
11:55PM and stays till 12:15AM, it will be counted as 2 visits - 1
each day. However, since it determines new or repeat visitor via the
cookie when you first arrive, I have no idea whether it is treating
the new day as a repeat or carrying over the new visit status.

The real lesson is that Google Analytics is not 100% accurate. No web
analytics system is, and you can get different numbers for the same
thing from different reports in most packages, but Google Analytics
system is less accurate than any other.

Nate Norrish

unread,
Sep 2, 2011, 5:06:42 PM9/2/11
to google-analytics...@googlegroups.com
Brandt,

I've also noticed that the New Visits percentage is different on the Overview page of Google Analytics in comparison to the New Visits percentage on every other page. A little strange, seems like a bug to me!

A new visit shouldn't be counted on 2 days regardless of the end of day scenario, a new visit should be recorded immediately as the person makes a connection for the first time and doesn't have a cookie. If a new visit is being recorded in the above stated possible reason, this to me is not logical.

If the cookies were cleared in the browser, this could be a possible reason? Or perhaps the new visit was recorded in the GA database, but the connection was lost some how to the website, so the client didn't receive the cookie. 

Surely though, if there are these extra new visits for whatever reason, GA should be outputting the same values regardless of bug or not. Also, if GA has some way to determine duplicate new visits (ie: to have a different total new visits value), the duplicate should be removed from their database. 

This just doesn't make any sense! 


Thanks anyway for the reply!!

Joris W

unread,
Sep 5, 2011, 8:24:41 AM9/5/11
to google-analytics...@googlegroups.com
I just posted a post to this group that seems related:

I get significantly differing numbers for the same dimension/metric between the API and the Analytics website.

If I retrieve the amount of Visits for a single day for a single custom variable, the difference is 700 on 3000 between API and Web.

Nick

unread,
Sep 5, 2011, 5:34:55 PM9/5/11
to google-analytics...@googlegroups.com
Hi Thanks for bringing this up!

It's a bit complicated, so let me try to explain what's going on.

First off. The data being accessed by both the Web UI and API is the same and the calculations of each dimension and metric is the same. The main issue people have is that the queries they are issuing between both interfaces is different, and why the numbers appear to be different.

In your query, you're comparing results for only ga:newVisits vs. ga:date and ga:newVisits.

This is easy to do in the API, but more difficult to represent in the UI. In the UI:
- to get new visits, you must look at the overview report and multiply visits * percent new visits.
- to get date and new visits, you can create a custom report, export to csv and sum the results.

If you do this, you'll notice the exact same problem. Both sums are off. So technically our API is working exactly like the UI, and this 'feature' has been around for a while.

So what's going on?

It has to do with sessions which span day boundaries. GA stores data at a daily level, where the end of the day is midnight in the timezone of the configured profile. If a session spans this boundary, GA will split the session into 2 sessions where all the values of the session are exactly the same. This causes the session to effectively be duplicated.

To determine new visits, the tracking code has a visit counter which is incremented for each new session. So if you visit a site on your first session, the visit count == 1. Then on your next session, visit count == 2, etc...

ga:newVisits is the number of occurrences of visit count == 1, for the first hit of a session, within the date range.

ga:date & ga:newVisits is the number of occurrences of visit count == 1 within the date range, for the first hit of a session, where the session is not a duplicated session which occurred from the day boundary splitting. So this value is de-duped.

Thats why you see a difference. And if you subtract the two sums, you can actually get a sense of the number of times this duplication is happening.

When I get a chance, I'll update the ga:newVisit docs.

Hope this helps,
-Nick



Nate Norrish

unread,
Sep 5, 2011, 11:52:48 PM9/5/11
to google-analytics...@googlegroups.com
Nick,

Thanks for the information. I still don't quite understand, although I think I understand what you're saying. 

Let's just say for example:

I select a date from June 1 2011 to June 30 2011. On each of these days I received 1 new visit except on the 30th of June at 11:59pm the visit passed over to 00:03 July 1st.

newVisits is defined as "The number of new visits by people who had never been to the site before."

So for each day there is a newVisit value of 1. July 1st can be counted as a newVisit as well, as I guess the cookie will have expired ? or is the cookie set to expire after 30 days of creation? Regardless of July 1st being a newVisit, I have selected the range June 1st to June 30th, excluding July. So the total newVisits should be 30 for a sum of all newVisits of the date range, as well as adding up manually each total.

If a newVisit starts on June 5th at 11:59pm and the visit continues to June 6th 00:03, the newVisit would have already been recorded in the database on June 5th and logged in the cookie for that day. Therefore, any new requests to the website should not be recorded as a new visit, as the cookie says they have already previously visited the site (as stated in the definition). It should infact be logged as a regular visit, as it isn't new.

I really don't understand why each newVisit of a date range would not add up to the sum of new visits when no dimension is specified.

How is the data stored? Is there a delay between the user browsing a page and storing that information in the database, like is it stored in some kind of memory buffer and gradually added to a database? Or each piece of data is instantly stored; however, the data is only processed and accessible via GA at the end of each day?

What information is stored in the cookie, an ID that refers to the logged data, so GA can pull it from the database, like a PHP session? or does it state the time the users first accessed the site? 



Nick

unread,
Sep 6, 2011, 1:49:59 AM9/6/11
to google-analytics...@googlegroups.com
Hi,

Yes it's a bit confusing since we're thinking about this differently.

The tracking code maintains a count of the visit number for a visitor. This count starts at 1 and each new session the count is incremented.

ga:newVisits is defined as the number of times that count equals one on the first hit of a session.

Lets say a visitor sends GA the following sequence of hits in a session:

date=June1 time=11:56 visit_count=1 page=foo
date=June1 time=11:59 visit_count=1 page=bar
date=June2 time=12:01 visit_count=1 page=cat
date=June2 time=12:02 visit_count=1 page=bat

see how the dates and times pass midnight between the 2nd and 3rd hits?

GA will split this up into 2 sessions, where:
> date=June1 time=11:56 visit_count=1 page=foo

is still the first hit of the original session, 

but:
> date=June2 time=12:01 visit_count=1 page=cat

becomes the new first hit of the new session because of crossing the date-time boundary

notice how visit_count equals 1 in the first hit of both sessions.

querying a date range of June1 and June 2 for ga:newVisits equals 2 since there are 2 sessions and the first hit of both sessions have a visit count of 1.

but when you query with ga:date, the second session's hit is not used in the calculation so you get:

ga:date  ga:newVisits
June1    1
June2    0

-Nick
Reply all
Reply to author
Forward
0 new messages