Hi Romain,
I assume by the raw data you mean the pixel data? Typically, everything else is very small compared to pixel data. If you do not need the pixel data at all, then you can use the stop_before_pixels=True argument to read_file(). That avoids loading the pixel data into memory, and also does not even seek past the pixel data in the file, but instead closes the file immediately. It is the fastest read you can do with pydicom, and the parsing into dicom structures adds very little time compared with just reading the same bytes with a python read() statement. I'm not sure if that would be compatible with dcmstack unless it specifically checks for that case.
If you will also need to the pixel data (to convert to another file, for example), then I recommend you try using the defer_size argument to read_file. When then pixel data is needed it will then be read from disk transparently by pydicom, so that should work with any code. But it avoids the memory use for as long as possible, so you could read a large number of files, filter them by some of the dicom information, and only write out the ones you need, for example.
You might also want to look at the time_test.py script in the test/performance subdirectory. You could edit that to run some timing tests with a subset of your files, to assess what works best.
Regards,
Darcy