--
You received this message because you are subscribed to the Google Groups "Psych-Data-Standards" group.
To unsubscribe from this group and stop receiving emails from it, send an email to psych-data-stand...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/psych-data-standards/f89e4396-a6de-4b01-b62b-2d5d8c437551%40googlegroups.com.
Hej everyone,
thanks for your awesome thoughts! Here's some more less
thought-out ones :-)
[Melissa: ...] there was a healthy contingent (including I think Tal Yarkoni?) who argued for the removal of *spec controlled* folder naming for anything other than raw/source/unmodified/unformatted data, on the argument that the tools and validator only need to know about this folder (i.e. whatever primary/formatted/unmodified/first in the reproducible analytic pipeline chain is called, the validator will verify its contents).
For what it's worth, I've been thinking along similar lines, in
that my opinion is that the metadata standard shouldn't be too
strongly tied to the folder naming, and be able to (potentially)
exist separately. I don't think that means giving up on the folder
structure, however (see below).
More specifically, I think it would be useful to make explicit in
the metadata to which files the information applies, rather than
leaving that implicit (e.g. by default, the top-level dataset_description.json
would include an entry like "applies_to":
"./raw_data/**/*_data.tsv", signalling that the metadata is
valid to all files in raw_data ending in _data.tsv,
subfolders included)
I would imagine that this would make it much easier for tools that
just understand JSON-LD to deal with the data (surely there's a
standard JSON-LD key for this already that I just haven't found
yet?).
The counterargument to eliminating 'unmodified-formatted/primary-structured' IMO would be the role of Psych-DS in encouraging good practice and comparability across datasets.
I agree, and I don't think that's incompatible with the above: I personally wouldn't give up the project structure as part of the standard: In my view it would also be great to (independently) enforce a folder structure, and only give the Psych-DS stamp to datasets that also meet this part of the standard too.
Terminology is a *major* challenge here, and one of the main things that user-testers at SIPS struggled to understand (what goes in 'source'? what goes in 'raw'?). Even 'unmodified-formatted' (which I otherwise like), I suspect may seem contradictory to users encountering the folder in the wild.
Yeah, as a non-native speaker, the raw/source distinction isn't intuitive to me, but I don't have better ideas (I like Ian's suggestion of unmodified; unprocessed, maybe, might be another alternative?). I've added some comments to the doc to this end. I'm also with Felix S. in that I like the hierarchical structure with a top-level data directory.
Ok, so much for my unprocessed raw thoughts,
straight from the source if you will 🙊 Kind regards, and have a
great weekend y'all!
-Felix
To view this discussion on the web visit https://groups.google.com/d/msgid/psych-data-standards/CAF%3DPoJMQkZyfKhQQhYVC%2BLnaVpDYPDwF53uDpW48pwUX%3Dy6V_w%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/psych-data-standards/bb047d38-6e1c-5379-55f6-87f83013d218%40web.de.
To view this discussion on the web visit https://groups.google.com/d/msgid/psych-data-standards/CAE9Ate3ir6vWhhkmsUc9Em0dX4Gj2C56GKN5FC2tyqXcNPq2YA%40mail.gmail.com.
Hi all,
Apologies, I won’t make the dev call today but will try to join in future.
From a cursory reading of this thread, it seems to be a question of where psych-ds sits in terms of “convention over configuration” (https://en.wikipedia.org/wiki/Convention_over_configuration). It would be nice to agree on a set of standard folder names but may be a barrier to entry for some adopters. I wonder if the spec could: a) clearly define the discrete types of data (i.e. source vs raw etc.) and state which ones are relevant for validation purposes; b) suggest a ‘reasonable convention’ of ‘sensible defaults’ for people who don’t have a strong preference (most new comers may just follow this convention); c) have a mechanism in the metadata file to ‘configure’ non standard folder names, i.e. someone could designate a folder called “source” as relating to the Psych-ds notion of raw-formatted. A validator could look first for the configuration, and failing that check for the convention? Felix, I think this is similar to what you are suggesting with some sort of "applies_to" property below?
Apologies, though, I haven’t really digested the issue to be sure this is really makes sense, so feel free to disregard!
Best,
Eoghan
To view this discussion on the web visit https://groups.google.com/d/msgid/psych-data-standards/CAF%3DPoJPuRyRDPSciBKRhiO6LS%2B65jRsmmYU1t11uPxMCx8_qNw%40mail.gmail.com.
I guess I am on board with a mix of ideas … I like the idea of raw and source being in different folders, but I was 100% on board with everything Felix S. said – having numbers or alphabetical order is very appealing. I might agree with this because it mostly matches what I do.
If I think about an end user who is trying this for the first time, having a set of common rules would be very helpful. Oh, you want me to name it “source/swiss_cheese”? Great, I’ll do that. From my experience, people like black and white instructions when they are a novice. See you guys in a few.
erin
To view this discussion on the web visit
https://groups.google.com/d/msgid/psych-data-standards/DB7PR02MB4986499897F2F30DAB4D11AB849E0%40DB7PR02MB4986.eurprd02.prod.outlook.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/psych-data-standards/BN6PR10MB14099214FA93DC84E4913D8FF39E0%40BN6PR10MB1409.namprd10.prod.outlook.com.