Google BigQuery Updates: AVG, VARIANCE and STDDEV Functions, Browser Tool Improvements, job.kind in Jobs List method (March 1st, 2012)

910 views
Skip to first unread message

Michael Manoochehri

unread,
Mar 1, 2012, 8:39:47 PM3/1/12
to bigquery...@googlegroups.com
Hello Google BigQuery Developers!

The Google BigQuery engineering team has been hard at work pushing new improvements, thanks in no small part to feedback from our Limited Preview partners! Here's a summary of this week's BigQuery updates:


1. New Aggregate Functions: Average, Variance, and Standard Deviation
BigQuery now supports the AVG, VARIANCE, and STDDEV functions to make your statistical analysis queries more straightforward. For example, here's a query that will return the maximum, minimum, average, variance, and standard deviation of the number of characters used in each record of our public Wikipedia revision history dataset:

SELECT MAX(num_characters) max_chars, MIN(num_characters) as min_chars, ROUND(AVG(num_characters)) as avg_chars, VARIANCE(num_characters) as variance_chars, ROUND(STDDEV(num_characters)) as std_dev_chars FROM publicdata:samples.wikipedia;

You can also use these aggregate functions with other query functions that return a numerical expression. For example, here's a query that will report aggregate information about the length of article titles featured in our Wikipedia revision dataset:

SELECT MAX(LENGTH(title)) as title_length_max, MIN(LENGTH(title)) as title_length_min, VARIANCE(LENGTH(title)) as title_length_variance, STDDEV(LENGTH(title)) as title_length_stddev FROM publicdata:samples.wikipedia;

2. More improvements to the BigQuery Browser Tool
The BigQuery Browser Tool now loads much faster than before, and we've recently added support for one of our most commonly requested API features: the ability to skip a user-defined number of invalid records during data ingestion. This feature is very useful for ingestion of large CSV input files that may contain a few invalid characters, unexpected newlines or other bad formatting.



For more information on using the BigQuery Service via our Web Browser tool, see our Browser Tool Getting Started Guide and our Browser Tool Tips and Tricks article.

3. The Jobs.list method of the BigQuery API now returns the Job type (via the "kind" property) for each entry
The BigQuery API supports a variety of asynchronous operations through the Jobs methods: load, query, extract, and table copy. We've recently added the job[].kind property to the Jobs.list method, meaning that a user can retrieve information about BigQuery job types without having to call the Jobs.get method separately for each job. This update is one of many useful features for building BigQuery admin tools that keep track of the amount and status of a user's Jobs. Read more about the BigQuery API Jobs methods.

4. Getting started with BigQuery and the Google API Client for Java
A common request from our Limited Preview partners has been to provide more documentation for BigQuery development with Java, so we've added a new Codelab demonstrating how to use the BigQuery API with the Google Java API Client library.

Please share your questions, comments, or feedback by posting to the bigquery-discuss list. We appreciate it!

Thanks,
The Google BigQuery Team
Reply all
Reply to author
Forward
0 new messages