Druid CSV custom separator

509 views
Skip to first unread message

Alexandru Voinescu

unread,
Nov 19, 2013, 9:28:44 AM11/19/13
to druid-de...@googlegroups.com
Hello,
We are trying to use Druid with out custom csv files. We use | (pipe) as a separator in the csv files. Does Druid support custom separators?

Thank you,
Alex.

Fangjin Yang

unread,
Nov 21, 2013, 12:03:19 AM11/21/13
to druid-de...@googlegroups.com
Hi Alexandru,

Druid does support a delimited data spec where you can pass your own delimiter. 


You can pass this data spec as part of the json blob for Druid ingestion endpoints.

Taeyang Oh

unread,
Dec 11, 2013, 12:44:49 AM12/11/13
to druid-de...@googlegroups.com
Dear Fangjin, 

I am implementing a solution based on druid and in love with it so far!!!   Thank you for your great work and sharing it. 


Hey, I have a question for you about csv delimiter, since I'm stuck with custom csv delimiter in realtime spec file. 

I was using 0.6.30 version, wanted to use custom delimiter(like "|") and found csv delimiter(DelimitedDataSpecis not implemented in source code.    

I could find it under "test" folder in source code but could not under "main" folder. 


Could you confirm that for me?    


Thank you so much.  

With all good wishes & love, 
TY. 

2013년 11월 21일 목요일 오후 2시 3분 19초 UTC+9, Fangjin Yang 님의 말:

Fangjin Yang

unread,
Dec 11, 2013, 1:47:43 PM12/11/13
to druid-de...@googlegroups.com
Hi Taeyang,

Awesome to hear you are trying out Druid. DelimitedDataSpec is part of the druid-api repository, which Druid has a dependency on. If you are not able to download sources in your IDE, you can look at the source code here: https://github.com/druid-io/druid-api/blob/master/src/main/java/io/druid/data/input/impl/DelimitedDataSpec.java?source=cc

The default delimiter in the delimited data spec is for TSV but it should be able to take any custom delimiter. Things *should* just work if you use it with our custom delimiter.

Let me know if that helps.

Thanks,
FJ

Reza

unread,
Mar 28, 2014, 11:19:22 PM3/28/14
to druid-de...@googlegroups.com
Looking at :

https://github.com/druid-io/druid-api/blob/6047855dd98ccfe14d6c8387714bb5d7c926cdeb/src/main/java/io/druid/data/input/impl/DataSpec.java

@JsonSubTypes
(value = {
    @JsonSubTypes.Type(name = "json", value = JSONDataSpec.class),
    @JsonSubTypes.Type(name = "csv", value = CSVDataSpec.class),
    @JsonSubTypes.Type(name = "tsv", value = DelimitedDataSpec.class)
})

only tsv can get the custom delimiter, which is a bit confusing (I would've expected both or none of csv and tsv to work with a custom delimiter). Maybe it's better to have a explicit format="custom" that uses DelimitedDataSpec and leave the "tsv" to use DelimitedDataSpec too for backward compatibility.

I couldn't find an explicit reference to this delimiter support difference between "csv" and "tsv"

Xavier Léauté

unread,
Mar 28, 2014, 11:40:26 PM3/28/14
to druid-de...@googlegroups.com
CSV supports quoted fields and uses a dedicated csv parser to handle the most common formats out there, since there is no proper CSV standard to speak of. 

TSV was originally just that, tab-separated, but extended to support other delimiters.
However, it does not support quoting fields or anything, and just splits on the delimiter.

I agree it's a little confusing, and maybe we should use "delimited" as the format name and leave "tsv" as an alias for backwards compatibility. 
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/f6d27e74-52ea-4fd5-9cd8-18f76baf086c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages