Increases in Sampling in API?

49 views
Skip to first unread message

Aaron Toledo

unread,
Apr 4, 2013, 12:49:57 AM4/4/13
to google-analytics...@googlegroups.com
Has anyone noticed a higher occurrence of sampled data in the API in the last 4-8 weeks? I store a log of all my queries and I noticed that if I re-run some of them that were originally run back in Jan 2013, they now return a TRUE value for the sampling flag. However, when I originally ran them, they returned FALSE. I stored the old data so I know it wasn't sampled originally.

Aaron Toledo

unread,
Apr 4, 2013, 12:57:40 AM4/4/13
to google-analytics...@googlegroups.com
Also, here's an example of what I passed into the query that originally did not sample:

'ids' : 'ga:XXXXX',
'dimensions' : 'ga:date,ga:medium,ga:source',
'metrics' : 'ga:visits,ga:pageviews,ga:bounces,ga:timeonsite,ga:goal4completions,ga:goal6completions,ga:goal7completions,ga:goal9completions,ga:goal10completions',
'start-date' : '2010-08-01',
'end-date' : '2011-12-05',
'start-index' : '1',
'max-results' : '5'

I noticed that if I removed medium and source, it stops sampling. Those fields are common requests and they usually don't sample immediately until recently. Here's what removed the sampling (but I do need those additional fields for my analysis).

'ids' : 'ga:XXXXX',
'dimensions' : 'ga:date',
'metrics' : 'ga:visits,ga:pageviews,ga:bounces,ga:timeonsite,ga:goal4completions,ga:goal6completions,ga:goal7completions,ga:goal9completions,ga:goal10completions',
'start-date' : '2010-08-01',
'end-date' : '2011-12-05',
'start-index' : '1',
'max-results' : '5'

Aaron Toledo

unread,
Apr 4, 2013, 1:29:54 AM4/4/13
to google-analytics...@googlegroups.com
I also just found this again (https://developers.google.com/analytics/resources/concepts/gaConceptsSampling)


500,000 maximum sessions for special queries where the data is not already stored.
In many of the reports, the list dimension is fixed, so Analytics can store this data. This enables Analytics to deliver timely reporting information for large data sets. However, if you request an ad hoc set of dimensions, that information is not stored and Analytics will need to perform the calculation at the time of the request. In this case, only 500,000 sessions will be processed in order to improve the response time. Your report query might easily exceed 500,000 sessions if you request an adhoc dimension over an expanded date range. To get a sense of how many sessions might appear in your request, you can use the visits metric over the date range you intend to query. This maximum of 500,000 session applies per web property.

It looks like Medium and Source might no longer be one of the stored data sets in Analytics and they now fall into the "ad hoc set of dimensions". I used to be able to pull Medium and Source where sessions were greater than >500,000 in the query...but maybe that's no longer the case. 

Does anyone know what the dimensions/metrics are that "ad hoc" and those that aren't?
Reply all
Reply to author
Forward
0 new messages