Hi all,
For the parallel read_sql implementation, Modin uses LIMIT and OFFSET for pagination of the data and unions the partial results. Consider a partitioned table in the database, how is the correctness of the result ensured here? If there is no ORDER BY clause specified, the LIMIT-OFFSET combination might result in different result orders (and therefore different result sets) when running the same query multiple times. (Even if an ORDER BY is specified, the order of tuples sharing the same sort keys is not ensured. This might be a problem if the sort key is not unique.)
As a consequence, I get a different result when comparing the parallel read_sql and the ordinary pandas read_sql.
Best regards,
Steffen