regular expression and other filter to get page view

1,676 views
Skip to first unread message

Marketing SA

unread,
Jul 7, 2011, 5:45:50 AM7/7/11
to google-analytics...@googlegroups.com
I have been puzzled by following query in Data Feed Query Explorer

Query Set A
dimensions= ga:date,ga:hostname
metrics= ga:pageviews
filters= pagePath=~^/cat[0-9].+$;ga:hostname==OUR HOST NAME

this will return somehow about 3 times more numbers from following query
Query Set B
dimensions= ga:date
metrics= ga:pageviews
filters= pagePath=~^/cat[0-9].+$

I understand returned value in A is smaller than B...  Also if the filter combination is
filters= pagePath==index.do;ga:hostname==OUR HOST NAME
it returned what is seems to be accurate value...

I am running filters in wrong way???




Nick

unread,
Jul 7, 2011, 6:50:20 PM7/7/11
to google-analytics...@googlegroups.com
Your filter looks syntactically correct. Maybe you are collecting data from different hostnames and query A filters those out?

What if you just queries for ga:hostname,ga:pageviews for the same date range. How many results do you get?

-Nick

Marketing SA

unread,
Jul 7, 2011, 8:35:57 PM7/7/11
to google-analytics-api - GA Data Export API
Thank you for the quick replay.

Well, I am just setting the filter ga:hostname== , to make sure that I
can remove access to the site through IP address and so on.

When I just query for a day for
dimensions= ga:date, ga:hostname
matrics= ga:pageviews

then I got 452678 which is pretty close to 458996 for whole access ...
dimensions= ga:date
matrics= ga:pageviews

Somehow this "increase" of ga:pageviews happens when

I use combination of regular expression and exact match for ga:hosname
in filter
or
I use regular expression in filter and set ga:hostname in dimension.

Because the number is so far apart (36122 w/o ga:hostname in filter vs
99363 w/ ga:hostname in filter, in a case when I try to get pageviews
for one particular kind of page)

I don't even know which number to believe...

Kaz Koda

Marketing SA

unread,
Jul 13, 2011, 8:40:08 PM7/13/11
to google-analytics...@googlegroups.com
Like I have mentioned that I wanted to filter out hostname, just to be more precise...  but any idea why this has happen?  

Is this a bug of a sort?  

For my purpose I do not have to use this filter (I can live with that level of precision) , and I think I should believe the number without the filter and the one with regular expression filter is wrong.  I just wanted to make sure this is really so, then I can go on to gather the stats....


Marketing SA

unread,
Jul 14, 2011, 5:58:43 AM7/14/11
to google-analytics...@googlegroups.com
I am try digging this, and I am getting more puzzled,

dimensions=ga:date,ga:pagePath,ga:hostname
*metrics=ga:pageviews
filters=ga:pagePath=~^/sw_;ga:hostname=<our www domain>
sort=ga:pagePath
*start-date=2011-07-01
*end-date=2011-07-01
max-results=3000

will result in 2109 page views and 609 different pagePath.

but 
dimensions=ga:date,ga:pagePath
*metrics=ga:pageviews
filters=ga:pagePath=~^/sw_
sort=ga:pagePath
*start-date=2011-07-01
*end-date=2011-07-01
max-results=3000

will result in 1717 and 300 different pagePath
( I have to take ga:hostname not only from filter but also from dimensions)

Thing is, with hostname filtered, it does gives back all the what it looks like correct path & hostname.  Also the query without filter with ga:hostname seems to contain the result without the filter (in other word, latter group is subset of query with ga:hostname filtered...)  Any idea?  Could this be a bug?

Marketing SA

unread,
Jul 14, 2011, 11:36:40 PM7/14/11
to google-analytics...@googlegroups.com
Nick, while I was waiting to see if you can check the problem I did try spit out difference of date by having limited date range and get result with and without ga:hostname in dimension.  While checking the result I found even more wired result from Data Feed Query Explorer.

Query 1:
dimensions=ga:date,ga:hostname
metrics=ga:pageviews
filters=ga:pagePath==/sw_&|20082
start-date=2011-07-01
end-date=2011-07-01
this will result in :
ga:date ga:hostname ga:pageviews
07-01-2011 www.sekaimon.com 1

However ,
Query 2 (without ga:hostname in dimensions)
dimensions=ga:date
metrics=ga:pageviews
filters=ga:pagePath==/sw_&|20082
start-date=2011-07-01
end-date=2011-07-01
this will result in : no reuslt
Now I am really puzzled.  I am not even sure which number to believe in... Can you at least show me which query is more accurate?
Reply all
Reply to author
Forward
0 new messages