Hi Philip,
that's quite interesting, however there are lots of details in the
project which is unclear to me (maybe it is due to the terminology, I
am familiar with Dataverse and Spark, but not with OpenShift and
Kubernetes).
I am working on a (meta)data quality measuring framework, partly based
on Spark. I can imagine that this Boston University project could be
used as a connector between Dataverse and Spark. Would it be possible
to use it without OpenShift, and with all the languages Spark support?
In the
https://github.com/dataverse-broker/sample-dataverse-app/blob/master/spark_wordcount.py
file (which is the Spark wordcount implementation), the details of the
connection is hidden behind the Spark API:
rdd = sc.textFile(filename)
If I am not mistaken, the "filename" parameter is actually an URL of
the Dataverse API to retrieve a file.
Do you know how much part of the Dataverse API is implemented? Can one
access metadata as well?
BTW: is there anybody else who works on data quality in Dataverse context?
Best,
Péter
> --
> You received this message because you are subscribed to the Google Groups
> "Dataverse Users Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
dataverse-commu...@googlegroups.com.
> To post to this group, send email to
dataverse...@googlegroups.com.
> To view this discussion on the web visit
>
https://groups.google.com/d/msgid/dataverse-community/CABbxx8FJZQVe3au-i0zsUBwWCkO3nENPZAmgxCibQ6JNw%3DML%2Bg%40mail.gmail.com.
> For more options, visit
https://groups.google.com/d/optout.
--
Péter Király
software developer
GWDG, Göttingen - Europeana - eXtensible Catalog - The Code4Lib Journal
http://linkedin.com/in/peterkiraly