안녕하세요,
이제 막 tajo를 사용해보려고하는 user입니다.
test query로 tpch query들을 사용하다보니,
date와 interval의 조합이 timestamp로 표현되서 compare operator들이 제대로 동작하지 않는 문제가 발생합니다.
예를들면 아래와 같습니다. (tpch Q1)
---------------------------------------
table name: default.lineitem
table path: hdfs://localhost/lineitem.tbl
store type: CSV
number of rows: unknown
volume: 7.7 GB
Options:
'csvfile.delimiter'='|'
schema:
l_orderkey INT8
l_partkey INT8
l_suppkey INT8
l_linenumber INT8
l_quantity FLOAT4
l_extendedprice FLOAT4
l_discount FLOAT4
l_tax FLOAT4
l_returnflag TEXT
l_linestatus TEXT
l_shipdate DATE
l_commitdate DATE
l_receiptdate DATE
l_shipinstruct TEXT
l_shipmode TEXT
l_comment TEXT
-----------------------------------------
select
> l_returnflag,
> l_linestatus,
> sum(l_quantity) as sum_qty,
> sum(l_extendedprice) as sum_base_price,
> sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
> sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
> avg(l_quantity) as avg_qty,
> avg(l_extendedprice) as avg_price,
> avg(l_discount) as avg_disc,
> count(*) as count_order
> from
> lineitem
> where
> l_shipdate <= date'1998-12-01' - interval '97 days'
> group by
> l_returnflag,
> l_linestatus
> order by
> l_returnflag,
> l_linestatus;
에러를 따라가다보니 현재 compareto data type이 date인데,
timestamp와의 비교에 type이 다르다보니 실패를 하는 것이네요.
ERROR: org.apache.tajo.exception.InvalidOperationException: Cannot compare to TIMESTAMP type datum
java.io.IOException: org.apache.tajo.exception.InvalidOperationException: Cannot compare to TIMESTAMP type datum
at org.apache.tajo.engine.planner.physical.HashShuffleFileWriteExec.next(HashShuffleFileWriteExec.java:152)
at org.apache.tajo.worker.Task.run(Task.java:446)
at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:276)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tajo.exception.InvalidOperationException: Cannot compare to TIMESTAMP type datum
at org.apache.tajo.datum.DateDatum.compareTo(DateDatum.java:257)
at org.apache.tajo.datum.Datum.lessThanEqual(Datum.java:184)
at org.apache.tajo.engine.eval.BinaryEval.eval(BinaryEval.java:136)
at org.apache.tajo.engine.planner.physical.SeqScanExec.next(SeqScanExec.java:283)
at org.apache.tajo.engine.planner.physical.HashAggregateExec.compute(HashAggregateExec.java:51)
at org.apache.tajo.engine.planner.physical.HashAggregateExec.next(HashAggregateExec.java:87)
at org.apache.tajo.engine.planner.physical.HashShuffleFileWriteExec.next(HashShuffleFileWriteExec.java:107)
... 3 more
그래서 몇가지 더 확인을 해보니
date + interval 조합의 결과가 timestamp인것 같네요.
add_months(), add_days()에서도 동일하게 return이 timestamp라 비교에 실패하게 됩니다.
이 문제를 해결하고자 아래와 같이 timestamp를 다시 date type으로 변경하니 에러없이 동작합니다.
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= to_date(to_char(date'1998-12-01' - interval '97 days', 'YYYY-MM-DD'), 'YYYY-MM-DD')
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
-----------------
개인적인 생각으로는 date.compareTo(timestamp)를 구현하거나,
내부적으로 to_date(timestamp)로 변환해서 비교하는건 어떨까 싶습니다.
아니면 to_date(timestamp, format)를 제공하는것도 방법이 될 수 있을 것 같구요.