Validator updates and a developer meeting (Thursday 19th, 8AM WDT/11AM EDT/ 5PM CET)

13 views
Skip to first unread message

Melissa Kline

unread,
Sep 16, 2019, 12:11:34 PM9/16/19
to Psych-Data-Standards
Dear all - I hope your semesters/quarters (as relevant) are starting off well! Our big hurdle for Psych-DS right now is the validator - assuring we have working code, aligning it with final decisions on the specification, and then testing the heck out of it!

Felix Henninger and Alexander Hart have been hard at work on a validator and have gotten quite a ways - it's time to regroup with anyone who is interested in being involved in this process to see what remains to be done, what edge cases need testing, and how this code should integrate with/be used by the dataset creators and other coding projects that have been going on. 

(Felix/Alexander, will you the update you sent me & links to this thread before the meeting so people can take a look at your work?)

We'll be having an initial call to take stock this Thursday, September 19th at 8AM WDT/11AM EDT/ 5PM CET. Since this is short notice, we will also plan on having more regular developer meetings going forward, but I hope this can start to get a larger group on the same page!

If you cannot make it to the meeting but want to be involved in coding or testing for the validator, please shoot me a note so I make sure you have access to the necessary repositories! 

Here's the zoom link we'll use for the meeting: https://zoom.us/j/7464261529


Hope to talk to many of you on Thursday!

- Melissa

 

Felix Henninger

unread,
Sep 16, 2019, 6:30:17 PM9/16/19
to psych-data...@googlegroups.com, Alexander Hart

Hi Melissa, hej everyone,

thanks a lot for the shout-out! We fondly remember our discussions back at SIPS, where a few of us got together and debated how to approach building a Psych-DS validator, and even started building a prototype. Fast-forward a long trip home and some rainy afternoons, and we (Alexander and Felix) found ourselves with something we think might be workable, and we’d like to present it to you all, with the very warm invitation to take a look, critique our prototype, and of course join forces!


Status quo

Right now, we have a fairly solid validator that works in the browser as well as in R. We’ve also begun work on a command-line client (works in principle, needs better UI), and Alexander has started building a downloadable stand-alone application based on electron.

All of these work using the same core code and support the following checks:

  • Alphanumeric file names
  • UTF-8 file encoding
  • Presence and validity of JSON metadata files
  • Adherence of metadata files to the schema
  • TSV file formatting
  • Correspondence between metadata and TSV file content (matching column names)

Do check it out! We have a series of broken test cases for you to try, and we’ve updated the example datasets so that they work with the validator.

For the impatient among you, here’s what it looks like:



Design goals

At SIPS, we had a lively discussion about how to realize the validator, and how to trade off different goals (this is from our memory, please feel free to correct us):

  • On one hand, we all saw that a no-installation, in-browser validator would be the most widely usable and user-friendly option. At the same time, given that many of us (ourselves included) are R users, an R integration/plugin would be nice, and Ian Hussey has already demonstrated how awesome a (local) Shiny UI for validation would be. However, we also agreed that it would not be ideal to build and support multiple validators with possibly divergent features.
  • Following the example of BIDS, whose validator is largely based on a machine-readable set of rules for file paths, we agreed that the validation logic should be re-usable to the greatest possible extent. However, we quickly determined that the Psych-DS validation logic would need to go far beyond file paths, so it was unclear whether we would be able to use the same design.

At the end of the hackathon, we hadn’t resolved these conflicting requirements, and while a few of us had started an in-browser prototype, there wasn’t a broad consensus on how to go forward.

Over the following weeks, however, we discovered that these goals weren’t quite as incompatible as we had originally thought. Specifically, we found that we could reuse the JavaScript logic from a browser-based validator in R thanks to the awesome V8 package, which embeds Chrome’s JS engine in R, and this is what we’ve done. We also found that we could express the JSON-LD metadata format using the JSON schema standard, for which validators exist for many different programming languages (not for R though, at least not natively).
So right now, both the online validator and the R package share the same validation code, and provide an independent interface on top of it. In the long run, adding a GUI-based electron app as a further interface would allow for easy deployment outside of the R and also remove the need for an internet connection.


Next steps

We’re new to your project, and so the first thing we’d both want to continue is a clear community consensus that this is a useful approach in all of your eyes – we wouldn’t want to push things further without making sure we’re aligned with the project as a whole. Though we would love to continue development with the community, we both can’t afford to continue to build the project single-handedly ad infinitum, so we will also depend on all of your support.

Thankfully, we think there’s lots of ways in which we all can join together. Here are some:

  • The validator is still a bit rough around the edges, and would benefit massively from feedback and testing. If you have a Psych-DS compliant dataset or are building one, please give the validator a spin!
  • Translating the specification into checks, and discussing whether our checks correspond to the spec: We haven’t always found this easy, for example, do filename and encoding checks apply only to data and JSON files, or everything in the project folder? Should we ignore CSV files in the raw_data directory, or should we complain about them? Quite probably you have answers to all of these questions, but this is something we’d be thrilled to discuss, and maybe there’s also room for improving the spec at the same time.
  • Implementing new checks, for example for validating data file content (column data types), and fixing/adapting anything we find through testing and discussions.
  • UI improvements: There’s a lot to do here, and especially the R package output isn’t yet as exhaustive as that of the web-based UI. It’s definitely worth exploring whether we can use Ian’s prototype to create an even better R integration (maybe an RStudio plugin to validate the current project?), and as mentioned above, a stand-alone desktop app is also a possibility. There's also a lot of room for exploration there: Which functionality would you like to see in a desktop/cli based validator? For example, we could include an offline version of the spec documentation , or update/fix files based on the validation results.
  • Technical improvements: The validators aren’t yet as fast as they could be, and especially the file encoding detection is fairly resource-intensive. This is a symptom of the current bare-bones implementation, and something we know how to improve. We’ve also discussed several internal designs for the validator internals that we’d like to implement long-term. Finally, we’d like to add automated testing to make the continued development easier.

For those of you who are interested, but uncertain about programming JS, we would be glad to show you around the code, and would be thrilled to give an introduction to the programming language (we hope you’ll find that it isn’t that different from R). There's also a lot to do in pure R, so for the purists among you, please don't be deterred :-) .


Ok, that’s it for now! Sorry for the wall of text. We would love to hear your thoughts, and hope to chat with many of you soon. Best,


Alexander & Felix

--
You received this message because you are subscribed to the Google Groups "Psych-Data-Standards" group.
To unsubscribe from this group and stop receiving emails from it, send an email to psych-data-stand...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/psych-data-standards/CAF%3DPoJO07yfRUgSfhtBYkqT7cJRpTd1qwRQyNgCkR04S25AyPw%40mail.gmail.com.
signature.asc

Felix Henninger

unread,
Sep 17, 2019, 3:38:32 PM9/17/19
to Felix Schönbrodt, psych-data...@googlegroups.com, Alexander Hart

Hej Felix, hi everyone,

thanks a lot for your kind message, it was fun working on this, and I'm looking forward to having you on board!

Is the issue tracker at https://github.com/psych-ds/validator-prototype/issues your preferred place for feedback?
I can't think of a better one for now (Melissa, can you?), so as far as I'm concerned, please do leave us some issues!

Kind regards, and talk to y'all soon,


-Felix

On 17/09/2019 14:58, Felix Schönbrodt wrote:

Hi guys,

you did an impressive job - thanks a lot for that!
It's very helpful that you found a way to implement a common code base for the validator, and I like the solution already a lot.

I am happy to contribute, both in R, and with limited knowledge, also in JS.

Playing around a bit with the validator, I also agree with you that this is a good opportunity for a refinement of both the validator and the spec; I started to leave some comments in the spec Google doc.

Is the issue tracker at https://github.com/psych-ds/validator-prototype/issues your preferred place for feedback?

Best,
Felix

signature.asc

Melissa Kline

unread,
Sep 17, 2019, 3:41:04 PM9/17/19
to Felix Henninger, Felix Schönbrodt, Psych-Data-Standards, Alexander Hart
That makes sense to me, though if the questions are to do with e.g. the tech spec not making clear what the validator *should* do, we need to fix up both the doc and the code. I'll make sure we save some time on Thursday to talk over how we want to stay organized/let new folks where to leave issues!

-m

Melissa Kline

unread,
Sep 19, 2019, 9:34:48 AM9/19/19
to Psych-Data-Standards
A reminder that the developer meeting is today (in about 90 minutes).  I get access to my meeting room right on the hour, so I'm budging the start time to (your local hour) + 10 minutes. Here are the call details again:
And here is a link to the meeting minutes - all attendees should plan on accessing this document during the call, and we'll also use it as a way to track decisions/let other folks know what we get up to!

- Melissa

Melissa Kline

unread,
Sep 19, 2019, 11:48:53 AM9/19/19
to Psych-Data-Standards
And here is the new link!


Reply all
Reply to author
Forward
0 new messages