Supporting scientific workflows

3 views
Skip to first unread message

Thad Guidry

unread,
May 28, 2021, 3:43:01 PM5/28/21
to openref...@googlegroups.com
I was thinking the other day[1] about how we might continue to make OpenRefine part of a workflow step in larger scientific workflows.  I've read tons of books on workflow management, and used many workflow management tools, but wanted to see how scientific workflows themselves often look like in real practice and what their particular needs often are.

A few folks mentioned Steep by Michel Krämer
Steep additionally provides some nice outlines and research work surrounding scientific workflows.
There's a lot here that we could borrow from like workflow state, variables, arguments, policies, etc.
One of the more interesting ideas that I thought of was actually thinking about how Abstract Wikipedia is approaching things with an Orchestrator and Evaluator.
Here we could imagine an Orchestrator as a service that orchestrates a Process Chain in Steep.
For OpenRefine, we might think of our History of Changes as being a Process Chain itself in Steep. "A process chain is a se­quen­tial list of in­struc­tions that will be sent to Steep’s re­mote agents to ex­e­cute pro­cess­ing ser­vices in a dis­trib­uted en­vi­ron­ment"

In OpenRefine, currently we have only 1 processing service, our OpenRefine concept of an Operation, essentially.
It might make sense to eventually allow many different kinds of processing services. Which I think was something that Antonin hinted at as a plausibility for next steps with our grandiose vision of new architecture, but not yet landed in master or fully baked or thought through yet.

Anyways, I don't want to dive to deeply into scientific workflows or FAIR practices, but just generally acknowledge that the Workflow ideas that Antonin suggested in his paper make sense against the idea of a Process Chain and it's Executable (an OpenRefine Operation essentially) and where some of our Operations are in fact Process Chains with Service metadata themselves (we call them Long Running Operations) such as Reconciling, Fetch URLs, etc. those things that require, a retry policy, some runtime, some runtime arguments, some parameters, etc.

[1] around grant RFP's among other things, since money is always a good motivator for development recruiting :-)

Thanks for your attention,

LAN LI

unread,
May 28, 2021, 4:49:40 PM5/28/21
to OpenRefine Development
Thanks for sharing and sounds great! 

Could you please point the paper you mentioned " just generally acknowledge that the Workflow ideas that Antonin suggested in his paper make sense "  to me? Thanks a lot!

Cheers,
Lan

Thad Guidry

unread,
May 28, 2021, 5:38:35 PM5/28/21
to openref...@googlegroups.com
Hi Lan!

Antonin's ORCID profile is here: https://orcid.org/0000-0002-8612-8827
which lists his papers.  Where the one I was referring to can be viewed here: https://arxiv.org/abs/1906.05937v2
--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine-dev/c1764f18-75f6-4ca6-bb81-bcd173896273n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages