Large dataset advice

13 views
Skip to first unread message

Lalitha Rajagopalan

unread,
Apr 12, 2021, 3:11:42 PM4/12/21
to OpenRefine
Hi,

Would it be possible to use OpenRefine to reconcile large datasets for entity data (see lists below). We are trying to see if we can use AWS S3 to store the data and use Dynamo DB/ ElasticSearch with OpenRefine , and possibly Kibana.

LEI Entity Information - 1.85 Million records
SAM Entity Data - 7 Million records
Open Ownership - 20 million BODS statements, in a 10GB JSONLines file

Fran Parras

unread,
Apr 13, 2021, 5:27:31 AM4/13/21
to openr...@googlegroups.com
Should be possible, you need to think how do you want to work, but if you can create the recipe with a decent sample of data and operationalise it should be perfect, but think in that approach and also create the services to recon if they are not part of the open source services.

Also I would recommend to move from json to flat file to reduce the data ingress and egress for the last dataset.

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/739b0a7d-283f-4743-b241-8b91590b5d9fn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages