Thanks Brian for the guidance.
1) We are leveraging Thanos querier for consolidating the metrics across the shards
2) We are using Thanos store for long term storage which kind of serving our needs
The only concern I feel here is - with shards we always bound to considerate on "Single point of failure" and how do we technically address it
Likewise any shard going down we would get to drop "x" mins of metrics and due to which we can't achieve the fault tolerance of so called 4 9's (99.99% of availability)
Any thoughts on this lines on how do we get more resilient