Problem with COUNT DISTINCT

92 views
Skip to first unread message

Chiefumpire

unread,
Mar 7, 2011, 7:58:11 AM3/7/11
to bigquery...@googlegroups.com
Hi all,

seems I posted to the old google group....

I encounter a error when I issue a query like this:
bq query "select user,count(distinct tag) from [chiefumpire_bucket/
tables/del_tags] group by user"

The following is the Error:
Traceback (most recent call last):
 File "/usr/bin/bq", line 9, in <module>
   load_entry_point('bigquery-shell==1.1.0', 'console_scripts', 'bq')
()
 File "/usr/lib/python2.6/site-packages/bigquery_shell-1.1.0-
py2.6.egg/bq.py", line 296, in run_main
   appcommands.Run()
 File "/usr/lib/python2.6/site-packages/google_apputils-0.2-py2.6.egg/
google/apputils/appcommands.py", line 665, in Run
   return app.run()
 File "/usr/lib/python2.6/site-packages/google_apputils-0.2-py2.6.egg/
google/apputils/app.py", line 202, in run
   return _actual_start()
 File "/usr/lib/python2.6/site-packages/google_apputils-0.2-py2.6.egg/
google/apputils/app.py", line 229, in _actual_start
   really_start()
 File "/usr/lib/python2.6/site-packages/google_apputils-0.2-py2.6.egg/
google/apputils/appcommands.py", line 651, in _CommandsStart
   sys.exit(command.CommandRun(GetCommandArgv()))
 File "/usr/lib/python2.6/site-packages/google_apputils-0.2-py2.6.egg/
google/apputils/appcommands.py", line 258, in CommandRun
   ret = self.Run(argv)
 File "/usr/lib/python2.6/site-packages/google_apputils-0.2-py2.6.egg/
google/apputils/appcommands.py", line 340, in Run
   return self._cmd_func(argv)
 File "/usr/lib/python2.6/site-packages/bigquery_shell-1.1.0-
py2.6.egg/bq.py", line 195, in _Query
   (meta, query_result) = _BigQueryClient().Query(query)
 File "/usr/lib/python2.6/site-packages/bigquery_shell-1.1.0-
py2.6.egg/bq_client.py", line 256, in Query
   cursor.execute(query)
 File "/usr/lib/python2.6/site-packages/bigquery_shell-1.1.0-
py2.6.egg/bigquery_db_api.py", line 344, in execute
   params=dict(q=statement)))
 File "/usr/lib/python2.6/site-packages/bigquery_shell-1.1.0-
py2.6.egg/bigquery_db_api.py", line 479, in Call
   raise DatabaseError('Database error: %s' % message)
bigquery_db_api.DatabaseError: Database error: None

There is no error if I move the field "user" like:
bq query "select count(distinct tag) from [chiefumpire_bucket/tables/
del_tags]"

The schema:
[
 { "id": "user", "type": "string", "mode": "REQUIRED" },
 { "id": "url", "type": "string", "mode": "REQUIRED" },
 { "id": "tag", "type": "string", "mode": "REQUIRED" }
]

I want to find the average number of unique tags per user.
Could anyone give me a suggestion?

Craig Citro

unread,
Mar 8, 2011, 5:20:46 AM3/8/11
to bigquery...@googlegroups.com
Hi,

You're going past the max result size we return in a query. It's documented here:


Can you describe what you want to do with the result? Was that the query you wanted, or did you want to sort the results a little further?

-cc

Chiefumpire

unread,
Mar 8, 2011, 5:51:55 AM3/8/11
to bigquery...@googlegroups.com
Hi CC,

Thanks for your reply.
The query is not completed. I want to know the average number of unique tags per user. The following is the original query:

SELECT SUM(tag_cnt) / COUNT(user) 
FROM
(
SELECT user,COUNT(DISTINCT tag) tag_cnt
FROM [chiefumpire_bucket/tables/del_tags]
GROUP BY user
)

which return the same error. I expect the result will be only one row, but it returned an error "bigquery_db_api.DatabaseError: Database error: None". I have tried another query which return an error like "result size too large" (I can't remember the query). So I though the result size limit is not exceed. Could it be my query is incorrect or COUNT DISTINCT with fields is not supported?

Message has been deleted

Michael Sheldon

unread,
Mar 9, 2011, 5:21:34 PM3/9/11
to bigquery...@googlegroups.com, Chiefumpire
Hello Chiefumpire,

Your query looks well formed to me.

Can you capture the rpc log and send it? There may be some useful details there that could help us figure out what is going on. Add "--rpclog=path/to/file" to the command line you launch bqshell with to capture the request/response logs. (Or if you use bq.py, --bigquery_db_api_rpclog.)

--Michael Sheldon

Jaka Jancar

unread,
Mar 9, 2011, 7:59:49 PM3/9/11
to BigQuery discuss
Are you sure the table contains data?

Anthony

unread,
Mar 10, 2011, 4:15:14 AM3/10/11
to bigquery...@googlegroups.com, Chiefumpire
Hi Michael,

Attached is the rpc log. The error code is 500.

Hi Jaka,

The table contains data.
bq query "select count(*) from  [chiefumpire_bucket/tables/del_tags]"
---------
COUNT(*)
---------
140126024
---------

bq_rpc.txt

Anthony

unread,
Mar 27, 2011, 5:17:57 AM3/27/11
to bigquery...@googlegroups.com
Hi all,

Is there any update on the problem?

Thanks.
Reply all
Reply to author
Forward
0 new messages