Hello,
I am trying to implement a syslog data collection to Kafka using the Kafka connect with the syslog connector from:
https://github.com/rmoff/kafka-connect-syslog
After lots and lots of attempts, I came to successfully produce and consume the syslog data, BUT the issue I see is that in the end the message always a given level of modifications compared to an origin syslog event.
Example of a raw event sent from a rsyslog client:
<86>Aug 19 21:51:34 ip-10-0-0-xxx sshd[31126]: pam_unix(sshd:session): session closed for user ubuntu
With the string converter, using the following properties:
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
Example of events produced to the kafka topic:
Struct{date=Sun Aug 19 21:23:33 UTC 2018,facility=3,host=ip-10-0-0-6,level=6,message=ip-10-0-0-6 systemd[1]: Started Session 897 of user ubuntu.,charset=UTF-8,remote_address=ip-172-18-0-1.eu-west-2.compute.internal/172.18.0.1:35656,hostname=ip-172-18-0-1.eu-west-2.compute.internal}
With the JSON converter, using the following properties:
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
Example:
{"date":1534714091000,"facility":10,"host":"ip-10-0-0-6","level":6,"message":"ip-10-0-0-6 sshd[30008]: pam_unix(sshd:session): session closed for user ubuntu","charset":"UTF-8","remote_address":"ip-172-18-0-1.eu-west-2.compute.internal/172.18.0.1:47500","hostname":"ip-172-18-0-1.eu-west-2.compute.internal"}
With Avro converter. using the following properties:
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
Example:
� ip-10-0-0-6 sudo: ubuntu : TTY=pts/2 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/vi /usr/share/kafka-connect-syslog/config/syslog.properties
UTF-8 rip-xxxxxxx.eu-west-2.compute.internal/172.18.0.1:52186 Pip-xxxxxxx.eu-west-2.compute.internal
So far the best result would be with the JSON converter, but that is still not the original data as streamed by Syslog.
Is it possible to preserve and produce the origin syslog even without modifying its structure ?
Thank you for your help
Guilhem