What is suggested way for creating datasets from csv file and partial synchronization?

25 views
Skip to first unread message

Ladislav Nesnera

unread,
Jul 31, 2017, 9:42:22 PM7/31/17
to CKAN Global User Group (Non-technical questions)
We have two use cases:
  • dozens of dataset decribed in spreadsheet (xls, csv). Creting them individually by hand is litle bit irritating
  • one CKAN instance contains all our datasets, second one only subset of open data (in primary catalog tagged as "open data")
What is suggested way for creating datasets from csv file and publishing open data into the second server?

Thank you in advance for your recommendation.   ;?

Florian May

unread,
Jul 31, 2017, 10:38:26 PM7/31/17
to ckan-global...@googlegroups.com
Hi Ladislav,

My gut feeling is that the most flexible yet maintainable way for you might be a script in your favourite language rather than using / customising existing harvesting infrastructure (like ckanext-harvest) to do the same.
I found my sweet spot with ckanapi and iPython notebooks - see what you can re-use: https://github.com/datawagovau/harvesters 
Alternatively, R, ckanr and RMarkdown workbooks are able to do the same, but Python's list comprehensions are just a joy to work with.


ckanext-harvest is a very elegant way to automate / schedule very simple jobs (for which harvest has a configuration), but customising ckanext-harvest your specific use case (with proper handling of all edge cases) will require the above work plus modifying the actual extension (adding your own custom harvester). My use case was too one-off (only ran a few times in test/UAT, then in prod) to warrant that kind of sophistication.

Hope that helps!
Florian



--
You received this message because you are subscribed to the Google Groups "CKAN Global User Group (Non-technical questions)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ckan-global-user-group+unsub...@googlegroups.com.
To post to this group, send email to ckan-global-user-group@googlegroups.com.
Visit this group at https://groups.google.com/group/ckan-global-user-group.
To view this discussion on the web, visit https://groups.google.com/d/msgid/ckan-global-user-group/2b45fe69-8831-40d0-9072-d0ccdb01c448%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Steven De Costa

unread,
Aug 1, 2017, 12:10:02 AM8/1/17
to ckan-global...@googlegroups.com

We built that to help on a use case where custodians wanted to upload and import a dataset with its resources without having to enter all the metadata via the UI. It doesn't handle bulk creation of datasets from a single excel file but take a look and see if it might help in your case :)

Cheers,
Steven

Steven De Costa

unread,
Aug 1, 2017, 12:15:12 AM8/1/17
to ckan-global...@googlegroups.com
Sorry, on the second use case we'd use syndicate and push from the first CKAN into the second :)

You'll find the forked repo we use here: https://github.com/DataShades/ckanext-syndicate

Cheers,
Steven

STEVEN DE COSTA | EXECUTIVE DIRECTOR
www.linkdigital.com.au

   

To unsubscribe from this group and stop receiving emails from it, send an email to ckan-global-user-group+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CKAN Global User Group (Non-technical questions)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ckan-global-user-group+unsubscri...@googlegroups.com.

Ladislav Nesnera

unread,
Aug 1, 2017, 6:53:34 PM8/1/17
to CKAN Global User Group (Non-technical questions)
Guys, thanks a lot for very inspirational posts
Reply all
Reply to author
Forward
0 new messages