will refine be transitioned to the google cloud or will it just stay as a local service using our computers resources before hitting google. ive experienced issues loading extremely large datasets. hopefully you guys will integrate refine with google storage sooner than later. it would help a lot.
As someone that uses Refine for medical data with protected health
information if it was *only* offered as a hosted service it wouldn't
be usable to me anymore. Please keep uses like this in mind also. As
you say in your intro, it could have repercussions for journalists etc
also.
Thanks.
--
Judson Dunn
http://sleepyhead.org
agreed. the point im trying to make is that, if a user doesnt have the cpu resources to handle such big datasets, googles cloud could be used to do such transformations. in the future, im sure google will handle sensitive data as they cater to enterprise users
randall
On Nov 14, 2010 3:47 PM, "Judson Dunn" <cohe...@sleepyhead.org> wrote:
On Sun, Nov 14, 2010 at 2:12 PM, Randall Amiel <randy1...@gmail.com> wrote:
> will refine be tra...
David,Corpus linguists -- who collaboratelty or privately harvests mono- or multi-lingual data from the web, clean them, and do all sorts of text manipulation and analysis on the text data.
We're definitely seeing several audiences:1. Individual people who have small to medium-sized, public or at least not sensitive data sets -- Google-hosted service would be best.2. Individual people who have small to medium-sized, private/sensitive data sets -- desktop application (the current form) would be best.3. Communities who want to collaborate on processing some public data sets -- Google-hosted service would be best.a. Freebase sub-communities who want to load some data sets into Freebase.b. Citizen reporters who want to collaborate on sifting through some government public data sets.4. Communities who want to collaborate on processing some private data sets -- self-hosted service would be best.a. News agencies working through data for news stories.b. Government agencies like data.uk.gov cleaning up data before publishing it.c. Crisis response teams handling private data, and who can't rely on connectivity to the cloud.The two different hosted options will require significantly different technology stacks.Am I missing any other audience?David
agreed. the point im trying to make is that, if a user doesnt have the cpu resources to handle such big datasets, googles cloud could be used to do such transformations. in the future, im sure google will handle sensitive data as they cater to enterprise users
On Sun, Nov 14, 2010 at 2:12 PM, Randall Amiel <randy1...@gmail.com> wrote:> will refine be tra...As someone that uses Refine for medical data with protected health
information if it was *only* offered as a hosted service it wouldn't
be usable to me anymore. Please keep uses like this in mind also. As
you say in your intro, it could have repercussions for journalists etc
also.
Thanks.
--
Judson Dunn
http://sleepyhead.org
I dont mean to start such a debate, but, what if u want to start connecting datasets without using freebase. I mean freebase isnt a centralized location to link all datasets $yet$. what about facebook graph?
On Nov 14, 2010 9:19 PM, "Resty Cena" <rest...@gmail.com> wrote:
#3 is collaboration, #4 is private. Multi-lingual corpus sets. Individual language sets could be private or collaboration, but multilingual corpus sets must be collaborative.
On Sun, Nov 14, 2010 at 5:42 PM, Stefano Mazzocchi <stef...@google.com> wrote:
>
> On Sun, Nov 14...
stefano:
I guess thats what I was getting to: the costly operations. not all operations can be put into 1 db. we must support operations over any dataset that exposes a standardized dataset (rdf etc...) and maybe a uri to another. costly refining must take place in the cloud especially refining a join between facebook graph and freebase ;)
On Nov 14, 2010 8:00 PM, "Stefano Mazzocchi" <stef...@google.com> wrote:On Sun, Nov 14, 2010 at 3:09 PM, Randall Amiel <randy1...@gmail.com> wrote:
>
> agreed. the point im trying to make is that, if a user doesnt have the cpu resources to handle s...
There is one important thing that needs to be understood: the types of operations supported by Google Refine are very hard to scale. You can slide and dice thru hundreds of thousands of rows in a few seconds, but even if we were able to map-reduce the hell out of this (which is not a given, btw!) and slice and dice thru hundreds of millions of rows in a few minutes (assuming one could keep the cost per row linear), the overall UI experience would be so poor you wouldn't be able to stand it.Moreover, Refine was designed from the start to be a locally hosted web service which means that it heavily depends on ajax latencies being tiny, local bandwidth being very high and I/O concurrency to be small to none.These are all issues that will need to be addressed and while some of them just require engineering resources to be executed, others require hard-core research in distributed computing and that will take time and has a high risk of failure.Plus, we're a very small team (at least so far) so set your expectations accordingly.
>>
>> On Nov 14, 2010 3:47 PM, "Judson Dunn" <cohe...@sleepyhead.org> wrote:
>>
>> On Sun, Nov 14,...
I dont mean to start such a debate, but, what if u want to start connecting datasets without using freebase. I mean freebase isnt a centralized location to link all datasets $yet$.
what about facebook graph?
On Nov 14, 2010 9:19 PM, "Resty Cena" <rest...@gmail.com> wrote:
#3 is collaboration, #4 is private. Multi-lingual corpus sets. Individual language sets could be private or collaboration, but multilingual corpus sets must be collaborative.> On Sun, Nov 14...
This is a good division of audiences!Another dimension to keep in mind is presence or absence of DeveloperChops and DeveloperPower. Some things need to be available exclusively through point -n- click to reach their broader audience; other times it's OK to assume someone can do a little coding or ask someone else to develop something.Refine currently seems to straddle this a bit - you need some expression-writing chops to really use its power, although no actual development code is required.
I think the realtime colloboration was based off openfire, and then ported to google wave. Docs n gmail prob use a varation of this. Furthermore, I dont think you would need realtime colloboration in refine, unless youre refining realtime military data or some type of realtime stream flows.
On Nov 17, 2010 12:19 PM, "David Huynh" <dfh...@gmail.com> wrote:On Tue, Nov 16, 2010 at 4:32 PM, Rebecca Shapley <rsha...@google.com> wrote:
>
> This is a good division of audiences!
>
> Another dimension to keep in mind is presence or abse...