Available now: training video, faster import. Coming up: CSV import, cleaner clustering

40 views
Skip to first unread message

Jonathan Stray

unread,
Nov 2, 2012, 3:55:42 PM11/2/12
to overvie...@googlegroups.com
Hello all. Thought I'd talk about some of the work we've just finished, and some of the changes coming in the next few weeks.

Training video
We've created an 8 minute tutorial video which demonstrates the process of document set exploration. It's on Vimeo, and will shortly also be available on the Overview help page.

Faster import
We continue to work on speeding Overview up and supporting larger document sets. The current maximum is 10,000 documents. We've also just implemented changes that speed up the clustering portion of the import -- the last 10% on the progress bar -- by up to ten times. This should shave several minutes off your import time for large document sets.

Coming up: CSV import
First, and this is big, we're working on the ability to import CSV files, in the same format that the prototype uses. Basically you put the text of each document in a CSV, one row per document, with an additional optional column that specifies the URL to go to to view that document. If you do this you will no longer need to upload to DocumentCloud.

I know some of you are saying, "so what?" or "wouldn't getting the text into a CSV be harder?" But many of you already have your documents in tabular form. CSV import makes it easy to load web site scrapes, tweets, or database dumps into Overview.

We expect this feature to go live the week of Nov 12.

Coming up: cleaner clustering
We're also working on the clustering algorithm to make it a little easier to interpret the tree that Overview produces. There will still likely be many single documents -- what we call "dust" -- because real document sets contain lots of outliers, documents that are just different from all the others. But hopefully the big groups should be clearer.

As always, don't hesitate to get in touch with feedback!

  - Jonathan 
Reply all
Reply to author
Forward
0 new messages