Openrefine is hard on RAM/memory?

150 views
Skip to first unread message

Andrea Zanni

unread,
Aug 8, 2016, 10:33:19 AM8/8/16
to openr...@googlegroups.com
Hello everyone, 
sorry for the dumb question:
I have to buy a new laptop and one of the factor I need considering is how powerful should it be for working with openrefine with large projects.
I ususally work with projects with tens of thousands of rows, but I'd love to have a machine capable to scale up smoothly of an order of magnitude (of rows).

I was thinking about i5 CPU + 8 gb RAM, but I don't know if I need also a SDD instead of a normal HDD.

Thanks!

Andrea



Joe Wicentowski

unread,
Aug 8, 2016, 9:25:40 PM8/8/16
to openr...@googlegroups.com
Not a dumb question at all! I think OR benefits from both lots of RAM and fast disk access (SSD). I'd suggest maxing out the RAM, going for 16 GB if available, but definitely no less than 8. OR files aren't huge, so I don't think storage size is a.major consideration, but disk access speed is much better with SSD than HD, so I'd go for the largest SSD within your budget.

Sent from my iPhone




--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrea Zanni

unread,
Aug 9, 2016, 3:45:57 AM8/9/16
to openr...@googlegroups.com
Thanks!
I must say I was not sold on SSD, but everyone kept me saying that I very good so I changed my mind. 
Happy that the OR community thinks the same: OR is definitely the most resource-expensive software I use.

Andrea

To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+unsubscribe@googlegroups.com.

Joe Wicentowski

unread,
Aug 12, 2016, 9:13:34 AM8/12/16
to OpenRefine
Andrea,

I take it from the lack of response from others here that my advice is not controversial, but I hope you'll be pleased with your purchase.  OR aside, SSDs are an incredible boost to performance, and plenty of RAM gives your work plenty of overhead.  See this article on configuring OR for large datasets: https://github.com/OpenRefine/OpenRefine/wiki/FAQ:-Allocate-More-Memory.  Note the suggestion that you use a 64-bit edition of Java.  This also means having a 64-bit operating system; most are these days, but it's worth checking anyway.

Joe

Owen Stephens

unread,
Aug 12, 2016, 9:47:55 AM8/12/16
to OpenRefine
Nothing controversial, but I'd just flag a couple of things around increased memory:

1) You need to make sure OR is able to make use of the memory available - see https://github.com/OpenRefine/OpenRefine/wiki/FAQ:-Allocate-More-Memory
2) I wouldn't be confident that increasing memory allocation will enable you "to scale up smoothly of an order of magnitude (of rows).". While increasing memory definitely helps, but OR just isn't designed for dealing with millions of rows and may struggle with projects with hundreds of thousands of rows - note that it isn't just about the number of rows, but also columns (try importing a project with many blank columns and you'll see it causes problems). Also note this thread https://groups.google.com/forum/#!searchin/openrefine/memory%7Csort:relevance/openrefine/FEn0Wm5rBnk/4e-z6QCOyp4J where a user reports that despite additional memory allocation OR didn't seem to use the memory allocated to it (unfortunately this thread didn't resolve the question of what this happened)

So - overall I'd go along with Joe's recommendations but be aware you'll still hit limits on the amount of data OR can happily handle, and you may not get the order of magnitude scaling you expect from doubling or even tripling the memory allocation.

Owen


On Friday, August 12, 2016 at 2:13:34 PM UTC+1, Joe Wicentowski wrote:
Andrea,

I take it from the lack of response from others here that my advice is not controversial, but I hope you'll be pleased with your purchase.  OR aside, SSDs are an incredible boost to performance, and plenty of RAM gives your work plenty of overhead.  See this article on configuring OR for large datasets: https://github.com/OpenRefine/OpenRefine/wiki/FAQ:-Allocate-More-Memory.  Note the suggestion that you use a 64-bit edition of Java.  This also means having a 64-bit operating system; most are these days, but it's worth checking anyway.

Joe
On Tue, Aug 9, 2016 at 2:45 AM, Andrea Zanni <zanni.a...@gmail.com> wrote:
Thanks!
I must say I was not sold on SSD, but everyone kept me saying that I very good so I changed my mind. 
Happy that the OR community thinks the same: OR is definitely the most resource-expensive software I use.

Andrea
On Tue, Aug 9, 2016 at 3:25 AM, Joe Wicentowski <joe...@gmail.com> wrote:
Not a dumb question at all! I think OR benefits from both lots of RAM and fast disk access (SSD). I'd suggest maxing out the RAM, going for 16 GB if available, but definitely no less than 8. OR files aren't huge, so I don't think storage size is a.major consideration, but disk access speed is much better with SSD than HD, so I'd go for the largest SSD within your budget.

Sent from my iPhone




On Mon, Aug 8, 2016 at 10:33 AM -0400, "Andrea Zanni" <zanni.a...@gmail.com> wrote:

Hello everyone, 
sorry for the dumb question:
I have to buy a new laptop and one of the factor I need considering is how powerful should it be for working with openrefine with large projects.
I ususally work with projects with tens of thousands of rows, but I'd love to have a machine capable to scale up smoothly of an order of magnitude (of rows).

I was thinking about i5 CPU + 8 gb RAM, but I don't know if I need also a SDD instead of a normal HDD.

Thanks!

Andrea



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Thad Guidry

unread,
Aug 12, 2016, 10:02:44 AM8/12/16
to openrefine
If your working with incredibly large data or think you might be....and you need the fastest storage available...currently that is either:

1. Lots of RAM and setup a RAM drive (with an OS that supports your expected max RAM size)
2. An NVMe, also known as U.2 storage block (NVMe capable drive(s), connectors, with motherboard or PCIe support for NVMe) https://en.wikipedia.org/wiki/NVM_Express
3. Another option is looking into other technologies that do scale a bit better then our OpenRefine, such as:

Reply all
Reply to author
Forward
0 new messages