Problems

12 views
Skip to first unread message

carolinebeavon

unread,
Jul 4, 2011, 3:36:13 PM7/4/11
to google...@googlegroups.com
Hey guys

Thank you to everyone who gave me advice last time I posted here
I hinted then that I was having some gitchy problems but persisted. 
Here's the background of my dataset  / project again - (apologies for repetition.) 

I am working with a database of 2 text columns and 85,000+ rows
They are ALBUM TITLE and ARTIST - I wanted to add a third column of GENRE

My process:

1, reconciling the ARTIST column I made us of a combination of using freebase/DBPedia and the various "things", eg Musical Artist, Band etc. 
2. Add "musical genre" column from Freebase based on Artist

However, at varying stages (eg 95% through reconciliation of Artist column, half way through adding genre and when I also tried to reconcile Album) Refine freezes. 

I leave it for a while (knowing it's a large dataset) but nothing.  

Knowing I'd had problems in the past I had another go today, and was exporting at varying stages, but after it crashed again, these project files also "freeze" on the "starting up" page when I try to open them. 

Also an Excel table I exported has saved with no data. 

I'm working with such a lot of data and I lose about several hours work every time it crashes, as I have to do a lot of cleaning up and manual adding after reconciliation. 

Not sure if this is of help to help solve the problem!




 
 

 

David Huynh

unread,
Jul 4, 2011, 4:59:47 PM7/4/11
to google...@googlegroups.com
Hi Caroline,

Thanks for continuing to try Refine! Looking at the screenshot, I can't tell exactly what happened. But I know there was one bug in version 2.0 that could corrupt a project. It might be something quite unique in your data, in conjunction with reconciliation with DBPedia, that might trigger it. It's just a hunch.

Are you using the RDF extension? Which version of Refine are you using? If you haven't tried version 2.1-RC1, would you mind trying it?

Also, I would not recommend reconciling 85,000 rows all at once. I'd split up into several smaller jobs. For example, you can create a custom text facet using this expression
    row.index / 5000
and then select one choice at a time in that facet, which selects 5000 rows, and do reconciliation on that batch. If anything goes wrong with reconciliation, then you would only lose at most 4999 results.

And if your data isn't too sensitive, you could zip up the project directory and send it to me privately so I can try some surgery on it. On the home screen of Refine, you'd see a link called "Browse workspace directory". That should open up Windows Explorer at the workspace directory, under which you'd find a subdirectory whose name contains the project ID 2213877593623. Just zip that up.

Thanks,

David

carolinebeavon

unread,
Jul 5, 2011, 4:52:09 AM7/5/11
to google...@googlegroups.com
Hey David

Thank you for your message ... I wasn't using the version of Refine that you suggested - I'll give that a go!

Yes, I am using the RDF extension

I had been reluctant to break the data up, as there are a lot of identical cells which need to stay there, so I was hoping to process them all in one go, but I do think breaking them up might be the answer, you're right. 

My data isn't sensitive and I'd be happy to send it over to you, but not to share the link online right now (I'll drop you an email to the address on your website if that is ok?)

Thank you!

Caroline

carolinebeavon

unread,
Jul 7, 2011, 6:49:24 AM7/7/11
to google...@googlegroups.com
Just a quick update, that version of Refine, plus the use of 5000 facets, worked perfectly!

Thank you for your assistance!

C

David Huynh

unread,
Jul 7, 2011, 1:26:56 PM7/7/11
to google...@googlegroups.com
BTW, just for clarification, this batching trick is only necessary for features involving calling out to some external services, like reconciliation. For other features that deal with the data locally, there's no need to batch, even for 100,000 rows or more.

David
Reply all
Reply to author
Forward
0 new messages