I've recently started using kite to write/read Parquet datasets. I've noticed the following thing, which seems like a bug to me:
I created a Parquet dataset with the following partition strategy:
val ps: PartitionStrategy = new PartitionStrategy.Builder().
year("day", "year").
month("day", "month").
build()
("day" is a field containing a timestamp)
I noticed that whenever a record contains a "day" value corresponding to the first day of a month,
this record gets serialized under the partition corresponding to the preceding month
E.g: if "day" = 631148400000 [i.e. 1990-01-01]
the record gets serialized under ".../year=1989/month=12" instead of ".../year=1990/month=01"
Is this a known thing?
Cheers,
GG