parquet-generator example

314 views
Skip to first unread message

김소현

unread,
Apr 17, 2020, 5:33:09 AM4/17/20
to zrlio-users
Hi, 
I'll run zrlio/sql-benchmarks. 

So, I have to generate an input file using parquet-generator. 

Could you provide EquiJoin, Pagerank? 

like "How to generate TPC-DS dataset"  in github.com/zrlio/parquet-generator

Thanks,
sohyun.

sohyu...@sk.com

unread,
Apr 17, 2020, 5:38:21 AM4/17/20
to d...@crail.apache.org, zrlio...@googlegroups.com

Hi,

I’ll run zrlio/sql-benchmarks.

 

So, I have to generate an input file using parquet-generator.

 

Could you provide EquiJoin, Pagerank?

Like “How to generate TPC-DS dataset” in github.com/zrlio/parquet-generator

 

Thanks,

sohyun

SK hynix

김 소 현 ( Sohyun Kim )

Memory System R&D Platform Software

office : +82-31-8093-4188

Phone : +82-10-5214-6284

이 메일은 비밀 정보를 포함하고 있을 수 있습니다. 만약 메일을 잘못 수신하였거나 발신인이 의도하지 않은 메일을 수신한  경우에는 메일 발송자에게 즉시 메일이 잘못 발송되었음을 알리고 본 메일 및 원본과 복사본은 모두 삭제해주시기 바랍니다. 본 메일의 어떠한 허가 받지 않은 열람, 사용, 폭로, 배포, 복제나 첨부 파일의 복사 등의 행위는 엄격히 금지되어 있습니다

 

Animesh Trivedi

unread,
Apr 17, 2020, 5:51:43 AM4/17/20
to sohyu...@sk.com, d...@crail.apache.org, zrlio...@googlegroups.com
Join and PageRank are computations on top of parquet data sets. They are not part of the parquet generator code. 

I recommend using one of the distributed data processing framework like Spark, to read the generated parquet dataset in, and then perform joins on them. See here for example: https://stackoverflow.com/questions/43495883/how-to-join-two-parquet-datasets

--
Animesh

--
You received this message because you are subscribed to the Google Groups "zrlio-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zrlio-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/zrlio-users/68d6615cda99469aba7076c009bcc090%40nmail06.hynixad.com.

sohyu...@sk.com

unread,
Apr 17, 2020, 11:48:24 PM4/17/20
to Animesh Trivedi, d...@crail.apache.org, zrlio...@googlegroups.com

Thanks for replying me J

 

I have one more question.

 

When I run hibenchs kmeans, pagerank on crail-spark-io, I get a serializer issue.

https://github.com/zrlio/crail-spark-io/issues/3 is the same issue..

 

According to this link, it appears to be an issue being solved.

Have you solved this issue?

 

Ps. I run on kryo serializer, the result is same.
I set..

======================================================

Spark.serializer    org.apache.spark.serializer.KryoSerializer   in Spark-default.conf

======================================================

 

Thanks,

Sohyun.

sohyu...@sk.com

unread,
Apr 18, 2020, 12:06:00 AM4/18/20
to Animesh Trivedi, d...@crail.apache.org, zrlio...@googlegroups.com

Oh, I solved this issue.

 

As I add serializer path into Hibench spark.conf, It run perfectly.

 

Thanks J

Sohyun.

From: 김소현(KIM SOHYUN) Platform Software
Sent: Saturday, April 18, 2020 12:48 PM
To: 'Animesh Trivedi' <animesh...@gmail.com>
Cc: d...@crail.apache.org; zrlio...@googlegroups.com
Subject: RE: [zrlio-users] parquet-generator example

 

Thanks for replying me J

 

I have one more question.

 

When I run hibenchs kmeans, pagerank on crail-spark-io, I get a serializer issue.

https://github.com/zrlio/crail-spark-io/issues/3 is the same issue..

 

According to this link, it appears to be an issue being solved.

Have you solved this issue?

 

Ps. I run on kryo serializer, the result is same.
I set..

======================================================

Spark.serializer    org.apache.spark.serializer.KryoSerializer   in Spark-default.conf

======================================================

 

Thanks,

Sohyun.

From: Animesh Trivedi <animesh...@gmail.com>
Sent: Friday, April 17, 2020 6:52 PM
To:
김소현(KIM SOHYUN) Platform Software <sohyu...@sk.com>
Cc: d...@crail.apache.org; zrlio...@googlegroups.com
Subject: Re: [zrlio-users] parquet-generator example

 

Join and PageRank are computations on top of parquet data sets. They are not part of the parquet generator code. 

Reply all
Reply to author
Forward
0 new messages