Designing JSON Response for BIDS Data Sharing Between Systems

26 views
Skip to first unread message

Daniel Duc Trinh

unread,
Mar 24, 2025, 10:50:18 AMMar 24
to bids-discussion

Hello,

I am currently trying to determine the appropriate response format for this case.

Let's say I have System A, which contains a large amount of BIDS data and actively collects data from patients, converting it into BIDS format. System A serves as the primary storage for BIDS data. All data is stored in S3 and is exposed to other systems via an API.

Now, let's say I have System B, which needs to download data from System A (using the API response and S3 storage from System A). System B will then insert this data into its database and display it on the frontend.

I am designing the JSON response from System A to ensure System B can efficiently utilize and download the data. Here is my initial idea:

{
  "dataset": "Name of dataset",
  "subject": "Subject ID",
  "session": "Session ID",
  ...
  "other_entities": "Additional relevant fields"
}

Would this structure make it easier for System B to process and download the data? Any suggestions or improvements are welcome!
Thank you so much

Eric Earl

unread,
Mar 24, 2025, 11:13:17 AMMar 24
to bids-discussion
Hi Daniel Trinh,

This sounds to me like a job for PyBIDS or bids2table maybe. Take a look at the PyBIDS tutorial here to see what fields are available to query. Alternatively, you could use bids2table found here.

You also said (paraphrasing) "System B needs to download data, insert this data into its database, and display it on the frontend to process the data." I'm assuming you're designing some application for visualizing and interacting with BIDS data and metadata. Generally, I would try to keep the UI less busy and only show or ask for essential things to the user unless your users ask for more.

Eric Earl
Scientist, NIMH Data Science & Sharing Team

yarikoptic

unread,
Mar 24, 2025, 11:57:29 AMMar 24
to bids-discussion
Interesting question/use-case!
- unclear if you just want to download meta-data or the entire data/diff
- with my datalad hat on, I would have just established a datalad version of the dataset (e.g. as import from S3), like all those https://registry.datalad.org/overview/?query=bids&sort=update-desc and used git or datalad CLI to fetch changes (with or without data), and then reload/extract anything is needed
  - well -- an instance of similar registry could be established to re-extract any metadata desired upon changes
https://git-annex.branchable.com/assistant/ could have also be used to keep various systems in sync. We started to use it for https://github.com/ReproNim/reprostim/ project to sync all videos of stimuli we grab at the scanner -- works lovely

Daniel Duc Trinh

unread,
Mar 25, 2025, 9:57:20 AMMar 25
to bids-discussion

Thank you for your response.

I’m familiar with PyBIDS and currently use it for System B to digest data and inserted to System B's database, but I hadn’t heard of BIDS2Table before—thanks for the suggestion!

System B also needs to download data from System A’s S3 storage, which I consider the original BIDS dataset ( Which only allow to be downloaded )

My main concern is how System B can detect and update changes when data in System A is modified. For example, suppose System A (via API and S3) initially contains 125 subjects, and System B successfully imports and loads them into its database. Now, if System A later corrects errors for two subjects or if System A deleted 2 subjects how would System B know to update or remove them?  Additionally, what happens if a user has already run a pipeline on the incorrect data before the update?

Dont hesitate to let me know if you have any question i am trying my best to explain cause i feel like this is really interesting use case.

Thanks again

Eric Earl

unread,
Mar 25, 2025, 11:11:42 AMMar 25
to bids-discussion
Have you used DataLad before? Like Yarik suggested, it's got a very stable and powerful set of git annex tools to index and detect changes to data. Using DataLad you get a "version-controlled" record of any and all data changes to System A's data and interact with that log to make changes downstream on System B. If a user has already run a pipeline on incorrect data, then the next logical steps are to:

1. Notify the user their pipeline was run on incorrect data
2. Inform the user to re-download and run their pipeline on the corrected data
3. Suggest to the user their downstream analyses be updated with the corrected processed data

~Eric Earl

yarikoptic

unread,
Mar 25, 2025, 11:20:50 AMMar 25
to bids-discussion
if not DataLad -- then you could find/design system to identify changes on S3 from prior state based on ETags. If you are in control of multipart uploads to S3 you could even make those ETags be used as a checksum for multipart uploads. We do that in DANDI (ref: https://github.com/dandi/dandi-cli/blob/386e50db53a8fc0236b666927402ec1bee3e7759/dandi/files/bases.py#L434).  But again -- that would be for you to code, and that is where resorting to git-annex with or without DataLad could be of assistance, since then it would be just analysis of `git diff`.

If size on S3 grows, you might want to employ S3 Inventory which would prepare you a listing of keys you have on S3.  You might also to enable versioning for S3 bucket so you could gain access to prior versions of the files.  Then for you it could be a 'diff' on TSV files.
FWIW -- for the purposes of backup of DANDI bucket (has billions of keys) we are developing https://github.com/dandi/s3invsync - to be able to backup versioned S3 bucket locally while retaining copies of prior versions of the keys as well, and while operating on inventory dumps instead of direct listing of S3 bucket due to its size.

Ali Khan

unread,
Mar 25, 2025, 11:38:19 AMMar 25
to bids-discussion
Also wanted to point out that PyBIDS can now (since version 0.18.0) query directly from S3, or any filesystem supported by universal_pathlib (e.g. google cloud storage etc..).

Best,
Ali

From: bids-di...@googlegroups.com <bids-di...@googlegroups.com> on behalf of Eric Earl <eric.e...@gmail.com>
Sent: March 24, 2025 11:13 AM
To: bids-discussion <bids-di...@googlegroups.com>
Subject: [bids-discussion] Re: Designing JSON Response for BIDS Data Sharing Between Systems
 
You don't often get email from eric.e...@gmail.com. Learn why this is important
--
We are all colleagues working together to shape brain imaging for tomorrow, please be respectful, gracious, and patient with your fellow group members.
---
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bids-discussion/8eece96e-fd2a-48b9-bdf6-ada31ef5eb68n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages