Google BigQuery Updates: AVG, VARIANCE and STDDEV Functions, Browser Tool Improvements, job.kind in Jobs List method (March 1st, 2012)
2,653 views
Skip to first unread message
Michael Manoochehri
unread,
Mar 1, 2012, 8:40:56 PM3/1/12
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to bigquery...@googlegroups.com
Hello Google BigQuery Developers!
The Google BigQuery engineering team has been hard at work pushing new improvements, thanks in no small part to feedback from our Limited Preview partners! Here's a summary of this week's BigQuery updates:
1. New Aggregate Functions: Average, Variance, and Standard Deviation BigQuery now supports the AVG, VARIANCE, and STDDEV functions to make your statistical analysis queries more straightforward. For example, here's a query that will return the maximum, minimum, average, variance, and standard deviation of the number of characters used in each record of our public Wikipedia revision history dataset:
SELECT MAX(num_characters) max_chars, MIN(num_characters) as min_chars, ROUND(AVG(num_characters)) as avg_chars, VARIANCE(num_characters) as variance_chars, ROUND(STDDEV(num_characters)) as std_dev_chars FROM publicdata:samples.wikipedia;
You can also use these aggregate functions with other query functions that return a numerical expression. For example, here's a query that will report aggregate information about the length of article titles featured in our Wikipedia revision dataset:
SELECT MAX(LENGTH(title)) as title_length_max, MIN(LENGTH(title)) as title_length_min, VARIANCE(LENGTH(title)) as title_length_variance, STDDEV(LENGTH(title)) as title_length_stddev FROM publicdata:samples.wikipedia;
2. More improvements to the BigQuery Browser Tool The BigQuery Browser Tool now loads much faster than before, and we've recently added support for one of our most commonly requested API features: the ability to skip a user-defined number of invalid records during data ingestion. This feature is very useful for ingestion of large CSV input files that may contain a few invalid characters, unexpected newlines or other bad formatting.
3. The Jobs.list method of the BigQuery API now returns the Job type (via the "kind" property) for each entry The BigQuery API supports a variety of asynchronous operations through the Jobs methods: load, query, extract, and table copy. We've recently added the job[].kind property to the Jobs.list method, meaning that a user can retrieve information about BigQuery job types without having to call the Jobs.get method separately for each job. This update is one of many useful features for building BigQuery admin tools that keep track of the amount and status of a user's Jobs. Read more about the BigQuery API Jobs methods.
4. Getting started with BigQuery and the Google API Client for Java A common request from our Limited Preview partners has been to provide more documentation for BigQuery development with Java, so we've added a new Codelab demonstrating how to use the BigQuery API with the Google Java API Client library.
Please share your questions, comments, or feedback by posting to the bigquery-discuss list. We appreciate it!