On 03/11/2015 01:50 PM, Micah Whitacre wrote:
> If I have a partition strategy based on time field in my data, is there
> a way to delete all partitions older than a given time?
>
> As an example I have a PartitionStrategy that looks like the following:
>
> PartitionStrategy.Builder builder = new PartitionStrategy.Builder();
> builder.dateFormat("extractionTime",
> "extractionTime_partition","yyyyMMddHH");
>
> So I will be writing data into essentially hourly partitions. Obviously
> hourly partitions might lose their value over time so I'll want to roll
> those partitions up into daily/monthly by copying the data into a
> different dataset with a different partitioning strategy. However I
> would want to delete all the hourly partitions older than 2 weeks. I
> thought I should be able to do something like this:
>
> dataset.toBefore("extractionTime", <millis for 2 weeks ago>).deleteAll();
>
> I however am not able do this because of the following exception:
>
> java.lang.UnsupportedOperationException: Cannot cleanly delete view:
> FileSystemView...
> at
> org.kitesdk.data.spi.filesystem.FileSystemView.deleteAll(FileSystemView.java:108)
>
> In this case the underlying issue is the DateFormatPartitioner's
> projectStrict[1] doesn't support Range predicates like I am using so the
> constraints never align with the boundaries.
You're right about the cause here. The date format partitioner doesn't
implement the partition logic that is handled for the other time-based
partitions yet. It wouldn't be too difficult to do.
We originally didn't include it because we would have to detect the
partitioning order (e.g., year before month) in the format. But, we
currently assume that the partition order is correct for the separate
time-based partitioners, so it wouldn't be too bad to make the same
assumption.
> I tried switching around the PartitionStrategy to be something like this:
>
> builder.year("extractionTime", "extractionTimeYear_partition");
> builder.day("extractionTime", "extractionTimeDay_partition");
> builder.hour("extractionTime", "extractionHourDay_partition");
>
> Which should be equivalent functionally. I still can't delete data
> because of the same exception. In this case it seems that when it is
> trying to evaluate strict.equals(permissive)[2] the TimePredicateImpl[3]
> is failing equivalency because despite both strict/permissive having the
> same upper[] = {2015} because they don't differ they are not equivalent.
> I'm still trying to wrap my head around that logic a bit but seems odd
> at first glance.
This is something we should fix immediately. Could you open a bug report
for it? I doubt it it a really difficult fix.
Thanks for letting us know about these! Opening issues for both of these
would be great if you have the time. Thanks!
rb
--
Ryan Blue
Software Engineer
Cloudera, Inc.