The best choice depends on lots of factors, so it's hard to give absolute advice.
Some of the factors include: your I/O usage patterns, whether you are doing local caching, how big that cache is, upper filesystem type, network bandwidth, how you have s3backer caching configured, etc.
To take a specific example, if you have a big, local cache that's configured for delayed write-back, then it should absorb a lot of the "noise" created by the smaller upper filesystem as it reads and writes tiny 4k blocks. When it finally does come time to write back the data, it does so more efficiently using the larger chunks.
But I'm certainly not an expert... and the true answer can only be found by doing real measurements. Maybe setting up such a performance profiling analysis would be a good project for an undergraduate CS student :)
-Archie