import csv file in elasticsearch

660 views
Skip to first unread message

prateek gera

unread,
Apr 30, 2016, 8:35:45 AM4/30/16
to Fluentd Google Group
Hi All,
  I want to import data from csv file in elasticsearch using fluentd.Can you please help me in this ?




Regards
Prateek Gera

prateek gera

unread,
May 2, 2016, 3:57:23 AM5/2/16
to Fluentd Google Group
I am using this configuration http://docs.fluentd.org/articles/recipe-csv-to-elasticsearch but no success.

Mr. Fiber

unread,
May 2, 2016, 4:27:00 AM5/2/16
to Fluentd Google Group
no success.

Please write detailed information. We have no shared information...
If you want to use fluentd for batch-like approach, try read_from_head.


--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

prateek gera

unread,
May 2, 2016, 5:19:21 AM5/2/16
to Fluentd Google Group
Below is my configuration:
<source>
  type tail
  path /var/log/example.csv
  tag example.log
  format csv
  keys key1,key2,key3,key4,key5,key6
  time_key key6
  pos_file /tmp/fluentd--1462169027.pos
</source>
    but I am not getting any error in td-agent log.

Mr. Fiber

unread,
May 2, 2016, 5:22:04 AM5/2/16
to Fluentd Google Group
So you want to import logs by batch-way, not streaming?

prateek gera

unread,
May 2, 2016, 5:25:52 AM5/2/16
to Fluentd Google Group
Currently I am testing to parse this data of CSV into elasticsearch but later on it will be updated on every x minute.

Mr. Fiber

unread,
May 2, 2016, 5:43:59 AM5/2/16
to Fluentd Google Group
Could you try read_from_head I mentioned before?

prateek gera

unread,
May 2, 2016, 5:49:29 AM5/2/16
to Fluentd Google Group
I have set this now
<source>
  type tail
  path /var/log/export.csv
  tag example.log
  format csv
  keys key1,key2,key3,key4,key5,key6
  time_key key6
  pos_file /tmp/fluentd--1462169027.pos
  read_from_head true
</source> 
         but still same.

Mr. Fiber

unread,
May 2, 2016, 5:56:52 AM5/2/16
to Fluentd Google Group
Did you remove pos_file before?

prateek gera

unread,
May 2, 2016, 7:38:22 AM5/2/16
to Fluentd Google Group
I removed now pos_file now its showing timestamp error in td-agent log:
2016-05-02 16:37:47 +0530 [warn]: "ID,NAME,CARDNO,DOORID,INOUTSTATE,ACTUALTIME" error="invalid time format: value = ACTUALTIME, error_class = ArgumentError, error = no time information in \"ACTUALTIME\"" .
  For your information content of my csv as following:
ID  NAME CARDNO    DOORID INOUTSTATE                                       ACTUALTIME
19592           ABC 3111914228   1                             In 30/01/16 13:52

Mr. Fiber

unread,
May 2, 2016, 8:01:09 AM5/2/16
to Fluentd Google Group
Please paste your complete configuration here.
Previous pasted configuration is different from your csv, e.g. keys.

Mr. Fiber

unread,
May 2, 2016, 8:03:51 AM5/2/16
to Fluentd Google Group
error="invalid time format: value = ACTUALTIME, error_class = ArgumentError, error = no time information in \"ACTUALTIME\"" .

irb(main):002:0> Time.parse('30/01/16 13:52')
=> 2030-01-16 13:52:00 +0900

Time.parse can parse your datetime format. Hmm...

Mr. Fiber

unread,
May 2, 2016, 8:04:58 AM5/2/16
to Fluentd Google Group
Ah, I understand the problem.
Fluentd's CSV parser doesn't skip header so you should not include header in your CSV.

prateek gera

unread,
May 2, 2016, 8:11:56 AM5/2/16
to Fluentd Google Group
Below is my configuration:
<source>
  type tail
  path /var/log/export.csv
  tag Door-Controller.log
  format csv
  keys key1,key2,key3,key4,key5,key6
  time_key "key6"
  #time_format %d/%m/%y %H:%M
 # pos_file /tmp/fluentd--1462169027.pos
  read_from_head true
</source>

prateek gera

unread,
May 2, 2016, 8:28:34 AM5/2/16
to Fluentd Google Group
I removed header from csv and set keys in td-agent.conf error gone from log but its still not parsing to fluentd.

Mr. Fiber

unread,
May 2, 2016, 8:34:48 AM5/2/16
to Fluentd Google Group
its still not parsing to fluentd.

What does this sentenc mean?

prateek gera

unread,
May 2, 2016, 8:37:23 AM5/2/16
to flu...@googlegroups.com
Data of CSV not being sent to elasticseacrh .

--
You received this message because you are subscribed to a topic in the Google Groups "Fluentd Google Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fluentd/cXcWQJ4xX6A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fluentd+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Thanks & Regards
Prateek Gera

Mr. Fiber

unread,
May 2, 2016, 8:38:54 AM5/2/16
to Fluentd Google Group
I tried your conf and it worked.
So elasticsearch has a problem or you didn't wait flush interval.

% fluentd -c prateek_case.conf
2016-05-02 21:37:11 +0900 [info]: reading config file path="prateek_case.conf"
2016-05-02 21:37:11 +0900 [info]: starting fluentd-0.12.22
2016-05-02 21:37:11 +0900 [info]: gem 'fluent-plugin-flowcounter-simple' version '0.0.4'
2016-05-02 21:37:11 +0900 [info]: gem 'fluentd' version '0.12.22'
2016-05-02 21:37:11 +0900 [info]: adding match pattern="Door-Controller.log" type="stdout"
2016-05-02 21:37:11 +0900 [info]: adding source type="tail"
2016-05-02 21:37:11 +0900 [warn]: 'pos_file PATH' parameter is not set to a 'tail' source.
2016-05-02 21:37:11 +0900 [warn]: this parameter is highly recommended to save the position to resume tailing.
2016-05-02 21:37:11 +0900 [info]: using configuration file: <ROOT>
  <source>
    @type tail
    path /Users/repeatedly/tmp/fluentd/export.csv

    tag Door-Controller.log
    format csv
    keys key1,key2,key3,key4,key5,key6
    time_key key6
    time_format %d/%m/%y %H:%M
    read_from_head true
  </source>
  <match Door-Controller.log>
    @type stdout
  </match>
</ROOT>
2016-05-02 21:37:11 +0900 [info]: following tail of /Users/repeatedly/tmp/fluentd/export.csv
2016-01-30 13:52:00 +0900 Door-Controller.log: {"key1":"19592","key2":"ABC","key3":"3111914228","key4":"1","key5":"In"}

prateek gera

unread,
May 3, 2016, 1:54:50 AM5/3/16
to Fluentd Google Group
I am not getting data in elasticserach. I have set match pattern in td.agent.conf at elasticsearch/fluentd server:
<match Door-Controller.log>
  type elasticsearch
  logstash_format true
  flush_interval 5s
  host localhost #(optional; default="localhost")
  port 9200 #(optional; default=9200)
  index_name fluentd #(optional; default=fluentd)
  type_name Door-Controller.log #(optional; default=fluentd)
</match>

<match elasticsearch>
 type copy
 <store>
   type stdout
 </store>
 <store>
 type elasticsearch
 logstash_format true
 flush_interval 5s #debug
 </store>
</match>

Also there are some errors in my elasticsearch:
[2016-05-03 11:20:07,016][DEBUG][action.fieldstats        ] [Cottonmouth] [logstash-2016.01.18][4], node[TL-aJ0LVRG-Cxpn0cZBQXg], [P], v[12], s[STARTED], a[id=7fX2HJWVQei0sOenSt5qSw]: failed to execute [org.elasticsearch.action.fieldstats.FieldStatsRequest@420952e0]
RemoteTransportException[[Cottonmouth][127.0.0.1:9300][indices:data/read/field_stats[s]]]; nested: IllegalArgumentException[field [@timestamp] doesn't exist];
Caused by: java.lang.IllegalArgumentException: field [@timestamp] doesn't exist
at org.elasticsearch.action.fieldstats.TransportFieldStatsTransportAction.shardOperation(TransportFieldStatsTransportAction.java:166)
at org.elasticsearch.action.fieldstats.TransportFieldStatsTransportAction.shardOperation(TransportFieldStatsTransportAction.java:54)
at org.elasticsearch.action.support.broadcast.TransportBroadcastAction$ShardTransportHandler.messageReceived(TransportBroadcastAction.java:282)
at org.elasticsearch.action.support.broadcast.TransportBroadcastAction$ShardTransportHandler.messageReceived(TransportBroadcastAction.java:278)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  it is saying @timestamp field not exist but in my csv there is field which contains timestamp.

prateek gera

unread,
May 3, 2016, 2:50:18 AM5/3/16
to Fluentd Google Group
When I am querying to elastic search via curl its showing data:
{"took":350,"timed_out":false,"_shards":{"total":1415,"successful":1415,"failed":0},"hits":{"total":581580,"max_score":1.0,"hits":[{"_index":"logstash-2001.12.15","_type":"Door-Controller.log","_id":"AVRxffvFES2CCSisa3_T","_score":1.0,"_source":{"key1":"8148","key2":"Isha Kukreja","key3":"2428599652","key4":"1","key5":"In","@timestamp":"2001-12-15T17:24:00+05:30"}},{"_index":"logstash-2001.12.15","_type":"Door-Controller.log","_id":"AVRxffvNES2CCSisa5zD","_score":1.0,"_source":{"key1":"17895","key2":"Pallavi Samyal","key3":"2519429815","key4":"1","key5":"Out","@timestamp":"2001-12-15T18:04:00+05:30"}},{"_index":"logstash-2001.12.15","_type":"Door-Controller.log","_id":"AVRxffvNES2CCSisa5zE","_score":1.0,"_source":{"key1":"17894","key2":"Pallavi Samyal","key3":"2519429815","key4":"1","key5":"In","@timestamp":"2001-12-15T10:01:00+05:30"}},{"_index":"logstash-2001.12.15","_type":"Door-Controller.log","_id":"AVRxffvOES2CCSisa53z","_score":1.0,"_source":{"key1":"1248","key2":"Prashant Patni","key3":"2380273796","key4":"1","key5":"Out","@timestamp":"2001-12-15T19:18:00+05:30"}},{"_index":"logstash-2001.12.15","_type":"Door-Controller.log","_id":"AVRxffvRES2CCSisa6jA","_score":1.0,"_source":{"key1":"17388","key2":"Shivalika Agarwal","key3":"2519303527","key4":"1","key5":"Out","@timestamp":"2001-12-15T18:10:00+05:30"}},{"_index":"logstash-2001.12.15","_type":"Door-Controller.log","_id":"AVRxffvRES2CCSisa6jB","_score":1.0,"_source":{"key1":"17387","key2":"Shivalika Agarwal","key3":"2519303527","key4":"1","key5":"In","@timestamp":"2001-12-15T09:57:00+05:30"}},{"_index":"logstash-2002.01.16","_type":"Door-Controller.log","_id":"AVRxIeRMES2CCSisYk24","_score":1.0,"_source":{"key1":"19466","key2":"Abhishek Arya","key3":"3111914228","key4":"1","key5":"Out","@timestamp":"2002-01-16T13:08:00+05:30"}},{"_index":"logstash-2002.01.16","_type":"Door-Controller.log","_id":"AVRxIeRQES2CCSisYlbn","_score":1.0,"_source":{"key1":"9807","key2":"ajeet kumar","key3":"2518714135","key4":"1","key5":"Out","@timestamp":"2002-01-16T19:56:00+05:30"}},{"_index":"logstash-2002.01.16","_type":"Door-Controller.log","_id":"AVRxIeRQES2CCSisYlbo","_score":1.0,"_source":{"key1":"9806","key2":"ajeet kumar","key3":"2518714135","key4":"1","key5":"In","@timestamp":"2002-01-16T18:15:00+05:30"}},{"_index":"logstash-2002.01.16","_type":"Door-Controller.log","_id":"AVRxIeRSES2CCSisYll-","_score":1.0,"_source":{"key1":"8679","key2":"Akshat Goel","key3":"2517902359","key4":"1","key5":"In","@timestamp":"2002-01-16T17:06:00+05:30"}}]}}
     but not able to see this is kibana using _type:"Door-Controller.log" search pattern.

prateek gera

unread,
May 3, 2016, 3:31:14 AM5/3/16
to Fluentd Google Group
I am able to get data in kibana by creating new index pattern logstash* non timebased but I want data with timestamp in my csv 6th field containing timestamp but its showing only 5 keys key1,key2,key3,key4,key5 .Kindly help me in this to get data with timestamp.

prateek gera

unread,
May 5, 2016, 5:21:52 AM5/5/16
to Fluentd Google Group
I got it working using logstash format true.

Mr. Fiber

unread,
May 5, 2016, 8:51:08 PM5/5/16
to Fluentd Google Group
good :)

prateek gera

unread,
May 11, 2016, 1:02:17 AM5/11/16
to Fluentd Google Group
Hi Mr Fiber,
   I seen data of csv is coming into elasticsearch using fluentd but not completely I am using below configuration :
<source>
  type tail
  path /var/log/abc.csv
  tag Door-Access.log
  format csv
  keys ID,NAME,CARDNO,DOORID,INOUTSTATE,ACTUALTIME
  time_key ACTUALTIME
  time_format %d/%b/%Y:%H:%M:%S
  read_from_head true
  pos_file /tmp/fluentd--1462883938.pos
</source>
     How can I import data completely ?


Prateek Gera

Mr. Fiber

unread,
May 11, 2016, 11:19:08 AM5/11/16
to Fluentd Google Group
How can I import data completely ?

What does 'completely' mean?

prateek gera

unread,
May 11, 2016, 11:28:27 AM5/11/16
to flu...@googlegroups.com

I mean complete csv.

prateek gera

unread,
May 11, 2016, 11:28:34 AM5/11/16
to flu...@googlegroups.com

Currently csv is not being updated on daily basis

Mr. Fiber

unread,
May 11, 2016, 11:31:58 AM5/11/16
to Fluentd Google Group
I mean complete csv.

I can't understand your 'complete'.

prateek gera

unread,
May 11, 2016, 11:36:11 AM5/11/16
to flu...@googlegroups.com

Whole data of csv currently there is not complete data in ES. Even i have set read from head true.

Mr. Fiber

unread,
May 11, 2016, 11:37:22 AM5/11/16
to Fluentd Google Group
lack of data or lack of column?

prateek gera

unread,
May 11, 2016, 11:40:00 AM5/11/16
to flu...@googlegroups.com

Lack of data.

Mr. Fiber

unread,
May 11, 2016, 11:44:13 AM5/11/16
to Fluentd Google Group
No error?

prateek gera

unread,
May 11, 2016, 11:45:20 AM5/11/16
to flu...@googlegroups.com

No error but its not there data from head of csv few lines missing.

Mr. Fiber

unread,
May 11, 2016, 11:58:14 AM5/11/16
to Fluentd Google Group
This case is popular and almost users don't have a problem so
I assume your ES setting or data itself has a cause.
Please write more detailed information.

prateek gera

unread,
May 13, 2016, 1:35:32 AM5/13/16
to Fluentd Google Group
Hi ,
 I am getting below error in td-agent log:
[warn]: "" error="undefined method `map' for nil:NilClass"




Regards
Prateek Gera

Mr. Fiber

unread,
May 13, 2016, 2:36:23 AM5/13/16
to Fluentd Google Group
If this error happens in in_tail parser,
Your file has broken lines as a CSV.

prateek gera

unread,
May 13, 2016, 2:38:38 AM5/13/16
to flu...@googlegroups.com
Thanks I got it resolved.One more question is it necessary to have time_key field in csv parser.I want to import csv that does not having timestamp key.

Mr. Fiber

unread,
May 13, 2016, 2:44:18 AM5/13/16
to Fluentd Google Group
"estimate_current_event true" without time_key parameter may work.

prateek gera

unread,
May 13, 2016, 2:49:38 AM5/13/16
to flu...@googlegroups.com
worked like charm :)

prateek gera

unread,
May 13, 2016, 6:49:34 AM5/13/16
to flu...@googlegroups.com
My csv containg single value but in elasticsearch its showing twice/thrice how can I avoid duplicate records in elasticsearch via fluentd
Reply all
Reply to author
Forward
0 new messages