GeoIP config for nginx-ingress - no data in Elasticsearch

1,455 views
Skip to first unread message

m.przy...@biotcloud.com

unread,
Jul 17, 2018, 4:51:27 PM7/17/18
to Fluentd Google Group
Hello everyone, that's my first post on this group. Let's say i'm begginer in using Fluentd and here's what i have, and what i would like to have ;) The main goal is to have nginx-ingress logs published into Elasticsearch with geoip location so i can visualize metrics on dashboard with map based on IP's. Everything is running on Kubernetes cluster v1.10, most of the things installed by helm charts. I've extened this docker image gcr.io/google-containers/fluentd-elasticsearch by installing fluent-plugin-geoip-1.2.0. I'm already publishing nginx-ingress logs to Elasticsearch in json format and output looks like below:

{"proxy_protocol_addr": "89.72.XXX.XXX","remote_addr": "89.72.XXX.XXX", "proxy_add_x_forwarded_for": "89.72.110.107", "request_id": "8a49870b2cee49911b0793ec97226036","remote_user": "", "time_local": "17/Jul/2018:20:42:15 +0000", "request" : "GET /app/kibana HTTP/1.1", "status": "200", "vhost": "kibana.staging.domain.com","body_bytes_sent": "14225", "http_referer": "http://kibana.staging.domain.com/", "http_user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36", "request_length" : "480", "request_time" : "0.246", "proxy_upstream_name": "monitoring-kibana-80", "upstream_addr": "100.100.72.251:5601", "upstream_response_length": "14196", "upstream_response_time": "0.244", "upstream_status": "200"}

Config for fluentd for GeoIP looks like this:

geoip-filter.conf: |-
<filter kubernetes.**>
@type geoip
# Specify one or more geoip lookup field which has ip address (default: host)
# in the case of accessing nested value, delimit keys by dot like 'host.ip'.
geoip_lookup_keys remote_addr
# Specify optional geoip database (using bundled GeoLiteCity databse by default)
geoip_database "/var/lib/gems/2.3.0/gems/fluent-plugin-geoip-1.2.0/data/GeoLiteCity.dat"
# Set adding field with placeholder (more than one settings are required.)
<record>
city ${city["remote_addr"]}
lat ${latitude["remote_addr"]}
lon ${longitude["remote_addr"]}
country_code3 ${country_code3["remote_addr"]}
country ${country_code["remote_addr"]}
country_name ${country_name["remote_addr"]}
dma ${dma_code["remote_addr"]}
area ${area_code["remote_addr"]}
region ${region["remote_addr"]}
geoip '{"location":[${longitude["remote_addr"]},${latitude["remote_addr"]}]}'
</record>
# To avoid get stacktrace error with `[null, null]` array for elasticsearch.
skip_adding_null_record true
# Set log_level for fluentd-v0.10.43 or earlier (default: warn)
@log_level info
# Set buffering time (default: 0s)
# flush_interval 1s
</filter>

At the end I don't see any data in Elastisearch related to geoip. Can anyone take a look and advice me what i'm missing with configuration ? Components are in following version:
Elasticsearch v6.3.1
Fluentd 1.2.2
Message has been deleted

Mr. Fiber

unread,
Jul 20, 2018, 2:17:01 AM7/20/18
to Fluentd Google Group
At the end I don't see any data in Elastisearch related to geoip. Can anyone take a look and advice me what i'm missing with configuration

What does this mean?
Data is not stored in ES or Kibana doesn't show stored data?


Masahiro

On Wed, Jul 18, 2018 at 10:42 PM, <m.przy...@biotcloud.com> wrote:


W dniu wtorek, 17 lipca 2018 22:51:27 UTC+2 użytkownik m.przy...@biotcloud.com napisał:
Hello everyone, that's my first post on this group. Let's say i'm begginer in using Fluentd and here's what i have, and what i would like to have ;) The main goal is to have nginx-ingress logs published into Elasticsearch with geoip location so i can visualize metrics on dashboard with map based on IP's. Everything is running on Kubernetes cluster v1.10, most of the things installed by helm charts. I've extened this docker image gcr.io/google-containers/fluentd-elasticsearch by installing fluent-plugin-geoip-1.2.0. I'm already publishing nginx-ingress logs to Elasticsearch in json format and output looks like below:

{"proxy_protocol_addr": "89.72.XXX.XXX","remote_addr": "89.72.XXX.XXX", "proxy_add_x_forwarded_for": "89.72.110.107", "request_id": "8a49870b2cee49911b0793ec97226036","remote_user": "", "time_local": "17/Jul/2018:20:42:15 +0000", "request" : "GET /app/kibana HTTP/1.1", "status": "200", "vhost": "kibana.staging.domain.com","body_bytes_sent": "14225", "http_referer": "http://kibana.staging.domain.com/", "http_user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36", "request_length" : "480", "request_time" : "0.246", "proxy_upstream_name": "monitoring-kibana-80", "upstream_addr": "100.100.72.251:5601", "upstream_response_length": "14196", "upstream_response_time": "0.244", "upstream_status": "200"}

Config for fluentd for GeoIP looks like this:

nginxingress.conf: |-
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
format json
read_from_head true
keep_time_key true
</source>

<filter kubernetes.ingress-nginx-ingress-controller.nginx-ingress.>
@type geoip
# Specify one or more geoip lookup field which has ip address (default: host)
# in the case of accessing nested value, delimit keys by dot like 'host.ip'.
geoip_lookup_keys remote_addr
# Specify optional geoip database (using bundled GeoLiteCity databse by default)
geoip_database "/var/lib/gems/2.3.0/gems/fluent-plugin-geoip-1.2.0/data/GeoLiteCity.dat"
# Set adding field with placeholder (more than one settings are required.)
backend_library geoip
<record>
city ${city["remote_addr"]}
lat ${latitude["remote_addr"]}
lon ${longitude["remote_addr"]}
country_code3 ${country_code3["remote_addr"]}
country ${country_code["remote_addr"]}
country_name ${country_name["remote_addr"]}
dma ${dma_code["remote_addr"]}
area ${area_code["remote_addr"]}
region ${region["remote_addr"]}
geoip '{"location":[${longitude["remote_addr"]},${latitude["remote_addr"]}]}'
</record>
# To avoid get stacktrace error with `[null, null]` array for elasticsearch.
skip_adding_null_record false
# Set log_level for fluentd-v0.10.43 or earlier (default: warn)
@log_level info
# Set buffering time (default: 0s)
# flush_interval 1s
</filter>

At the end I don't see any data in Elastisearch related to geoip. Can anyone take a look and advice me what i'm missing with configuration ? Components are in following version:
Elasticsearch v6.3.1
Fluentd 1.2.2

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

m.przy...@biotcloud.com

unread,
Jul 20, 2018, 3:39:50 AM7/20/18
to Fluentd Google Group
Hi, there's no geoip.location field available in Kibana. I believe problem exists on correct filtering, i'm attaching my whole config for fluentd which is created from one yaml as separated conf. There's section geoip2-filter.conf - but as far as i know it's only for inputs. Should I also add some values for outputs like match?? Does the values in <filter "somevalues"> at the begging of the output matters somehow like below ? Running basic tests from https://github.com/y-ken/fluent-plugin-geoip works as expected.

    <filter kubernetes.**>  <---DOES_IT_MATTER ?


      @type geoip


      # Specify one or more geoip lookup field which has ip address (default: remote_addr)


      # in the case of accessing nested value, delimit keys by dot like 'host.ip'.


      geoip_lookup_keys remote_addr



Here's my template mappings:

    "mappings": {
     
"fluentd": {
       
"dynamic_templates": [
         
{
           
"message_field": {
             
"path_match": "message",
             
"match_mapping_type": "string",
             
"mapping": {
               
"type": "text",
               
"norms": false
             
}
           
}
         
},
         
{
           
"string_fields": {
             
"match": "*",
             
"match_mapping_type": "string",
             
"mapping": {
               
"type": "text",
               
"norms": false,
               
"fields": {
                 
"keyword": {
                   
"type": "keyword",
                   
"ignore_above": 256
                 
}
               
}
             
}
           
}
         
}
       
],
       
"properties": {
         
"@timestamp": {
           
"type": "date"
         
},
         
"@version": {
           
"type": "keyword"
         
},
         
"geoip": {
           
"dynamic": true,
           
"properties": {
             
"ip": {
               
"type": "ip"
             
},
             
"location": {
               
"type": "geo_point"
             
},
             
"latitude": {
               
"type": "half_float"
             
},
             
"longitude": {
               
"type": "half_float"
             
}
           
}
         
}
       
}
     
}
   
},


config.yaml

Mr. Fiber

unread,
Jul 23, 2018, 2:24:17 AM7/23/18
to Fluentd Google Group
From your config.yaml, no "kubernetes." prefix tag events from input plugins.
If you use <filter kubernetes.**>, you need to emit events with "kubernetes.app" like tag.

m.przy...@biotcloud.com

unread,
Jul 23, 2018, 2:37:51 AM7/23/18
to Fluentd Google Group
@repeatedly so as i understood you, i should change in the output setting to:

  output.conf: |
   
# Enriches records with Kubernetes metadata
   
<filter kubernetes.**> <-- here ?
     
@type kubernetes_metadata
   
</filter>

Is this correct ?

Mr. Fiber

unread,
Jul 23, 2018, 2:59:36 AM7/23/18
to Fluentd Google Group
Yes or change input plugin's tag parameter.


--

m.przy...@biotcloud.com

unread,
Jul 23, 2018, 4:43:44 AM7/23/18
to Fluentd Google Group
@repeatedly Looking forward, as still no success at all. I've attached how does log from Nginx looks like in Kibana. I can see now city/country/longitude/latitude fields, but it's still not working. In pattern index fields are visible (see attachment) . I'm totally blind what i'm doing wrong.

Here's current output configuration:
output.conf: |
   
# Enriches records with Kubernetes metadata
   
<filter kubernetes.**>

     
@type kubernetes_metadata
   
</filter>
 
    <match **>
      @id elasticsearch
      @type elasticsearch
      @log_level info
      include_tag_key true
      host "#{ENV['OUTPUT_HOST']}"
      port "#{ENV['OUTPUT_PORT']}"
      logstash_format true
      <buffer>
        @type file
        path /
var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count
2
        flush_interval
5s
        retry_forever
        retry_max_interval
30
        chunk_limit_size
"#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
        queue_limit_length
"#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
        overflow_action block
     
</buffer>
    </
match>  

and geoip config:
  geoip2-filter.conf: |-
   
<filter kubernetes.**>

     
@type geoip
     
# Specify one or more geoip lookup field which has ip address (default: remote_addr)
     
# in the case of accessing nested value, delimit keys by dot like 'host.ip'.

      geoip_lookup_keys remote_addr


     
# Specify optional geoip database (using bundled GeoLiteCity databse by default)
     
# geoip_database    "/path/to/your/GeoIPCity.dat"
     
# Specify optional geoip2 database
     
# geoip2_database   "/path/to/your/GeoLite2-City.mmdb" (using bundled GeoLite2-City.mmdb by default)
     
# Specify backend library (geoip2_c, geoip, geoip2_compat)
      backend_library geoip2_c


     
# Set adding field with placeholder (more than one settings are required.)
     
<record>
        city            $
{city.names.en["remote_addr"]}
        latitude        $
{location.latitude["remote_addr"]}
        longitude       $
{location.longitude["remote_addr"]}
        country         $
{country.iso_code["remote_addr"]}
        country_name    $
{country.names.en["remote_addr"]}
        postal_code     $
{postal.code["remote_addr"]}
     
</record>



      # To avoid get stacktrace error with `[null, null]` array for elasticsearch.
      skip_adding_null_record  false


      # Set @log_level (default: warn)
      @log_level         info
    </
filter>

screenshot.png
index_patterns.png

Mr. Fiber

unread,
Jul 25, 2018, 1:07:41 AM7/25/18
to Fluentd Google Group
Maybe, the problem is GeoIP2 database returns different format for IP addresses.
You can test it with following code.

require 'geoip2'
geoip = GeoIP2::Database.new('/path/to/GeoLite2-City.mmdb')
geoip.lookup('89.72.110.107').to_h  # contains city field
geoip.lookup('8.8.8.8').to_h              # contains continent but don't contain city

I'm not sure this happens with legacy geoip database but this looks the cause.


Mr. Fiber

unread,
Jul 25, 2018, 2:32:05 AM7/25/18
to Fluentd Google Group
This seems GeoIP's specification.
And currently, geoip plugins skip adding fields with "skip_adding_null_record true"
when first placeholder value is missing.
So changing "city" order to bottom avoids this problem.

```
<record>

   latitude        ${location.latitude["remote_addr"]}
   longitude       ${location.longitude["remote_addr"]}
   country         ${country.iso_code["remote_addr"]}
   country_name    ${country.names.en["remote_addr"]}
   city            ${city.names.en["remote_addr"]}      # "city" field is missing when geoip database doesn't have information 
   postal_code     ${postal.code["remote_addr"]}  # "postal" field too
</record>
```

m.przy...@biotcloud.com

unread,
Jul 26, 2018, 11:43:26 AM7/26/18
to Fluentd Google Group
@repeatedly - changing order didn't helpd.It's really hard to understand me what's wrong at all :/. I'm more then sure something bad is on the config side. I've removed unused and not needed things from the config, so it's a little bit shorter right now, maybe you can tak a look in free while and see something bad :

INPUT:

general.conf: |-
  # Prevent fluentd from handling records containing its own logs. Otherwise
  # it can lead to an infinite loop, when error in sending one message generates
  # another message which also fails to be sent and so on.
  <match fluentd.**>
    @type null
  </match>
  # Used for health checking
  <source>
    @type http
    port 9880
    bind 0.0.0.0
  </source>
system.conf: |-
  <system>
    root_dir /tmp/fluentd-buffers/
  </system>
containers.input.conf: |-
  <source>
    @id fluentd-containers.log
    @type tail
    path /var/log/containers/*.log
    pos_file /var/log/fluentd-containers.log.pos
    time_format %Y-%m-%dT%H:%M:%S.%NZ
    tag raw.kubernetes.*
    format json
    read_from_head true
  </source>
  # Detect exceptions in the log output and forward them as one log entry.
  <match raw.kubernetes.**>
    @id raw.kubernetes
    @type detect_exceptions
    remove_tag_prefix raw
    message log
    stream stream
    multiline_flush_interval 5
    max_bytes 500000
    max_lines 1000
  </match>
system.input.conf: |-
  # Example:
  # 2015-12-21 23:17:22,066 [salt.state       ][INFO    ] Completed state [net.ipv4.ip_forward] at time 23:17:22.066081
  <source>
    @id minion
    @type tail
    format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/
    time_format %Y-%m-%d %H:%M:%S
    path /var/log/salt/minion
    pos_file /var/log/salt.pos
    tag salt
  </source>
  <source>
    @id docker.log
    @type tail
    format /^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/
    path /var/log/docker.log
    pos_file /var/log/docker.log.pos
    tag docker
  </source>
geoip-filter.conf: |-
  <filter **>
    @type geoip
    # Specify one or more geoip lookup field which has ip address (default: remote_addr)
    # in the case of accessing nested value, delimit keys by dot like 'host.ip'.
    geoip_lookup_keys remote_addr
    # Specify optional geoip database (using bundled GeoLiteCity databse by default)
    # geoip_database    "/path/to/your/GeoIPCity.dat"
    # Specify optional geoip2 database
    # geoip2_database   "/path/to/your/GeoLite2-City.mmdb" (using bundled GeoLite2-City.mmdb by default)
    # Specify backend library (geoip2_c, geoip, geoip2_compat)
    #backend_library geoip2_c

    # Set adding field with placeholder (more than one settings are required.)
    <record>
      latitude        ${location.latitude["remote_addr"]}
      longitude       ${location.longitude["remote_addr"]}
      country         ${country.iso_code["remote_addr"]}
      country_name    ${country.names.en["remote_addr"]}
      city            ${city.names.en["remote_addr"]}        
    </record>
    # To avoid get stacktrace error with `[null, null]` array for elasticsearch.
    skip_adding_null_record  false
    # Set @log_level (default: warn)
    @log_level         info
  </filter>     
forward.input.conf: |-
  # Takes the messages sent over TCP
  <source>
    @type forward
  </source>

OUTPUT:
output.conf: |

m.przy...@biotcloud.com

unread,
Jul 26, 2018, 1:01:40 PM7/26/18
to Fluentd Google Group
And here's the mapping from index (as it's some kind of staging there's no problem with deleting and recreating indices)

{
  "mapping": {
    "fluentd": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "@version": {
          "type": "keyword"
        },
        "docker": {
          "properties": {
            "container_id": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        "geoip": {
          "dynamic": "true",
          "properties": {
            "ip": {
              "type": "ip"
            },
            "latitude": {
              "type": "half_float"
            },
            "location": {
              "type": "geo_point"
            },
            "longitude": {
              "type": "half_float"
            }
          }
        },
        "kubernetes": {
          "properties": {
            "container_image": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "container_image_id": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "container_name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "host": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "labels": {
              "properties": {
                "app": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "component": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "controller-revision-hash": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "k8s-app": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "module": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "operator": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "pod-template-hash": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "prometheus": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "release": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "role_kubernetes_io/networking": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "stage": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "statefulset_kubernetes_io/pod-name": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "submodule": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "version": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                }
              }
            },
            "master_url": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "namespace_id": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "namespace_name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "pod_id": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "pod_name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        "kubernetes_namespace_container_name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "log": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "stream": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "tag": {
          "type": "text",

Mr. Fiber

unread,
Jul 26, 2018, 2:51:41 PM7/26/18
to Fluentd Google Group
changing order didn't helpd

You can use stdout output instead of elasticsearch output for checking geoip plugin works or not.
stdout shows incoming events in fluentd log.
If there are no geoip related fields, geoip filter config are wrong or geoip database doesn't have information for your ip address.
If there are geoip related fields in the record, your Elasticsearch has wrong setting.


Masahiro

m.przy...@biotcloud.com

unread,
Jul 27, 2018, 5:06:46 AM7/27/18
to Fluentd Google Group
So one more thing which is on my mind is that while using config shown above fields specified in geoip filter like city/country/latituded etc are applied to all logs in ES. I didn't find a working way to apply this only to Nginx logs. Fields are shown in Kibana only when filter is being set to <filter kubernetes.**>. Changing output to stdout causes no logs are being parsed into ES.

karthi...@accionlabs.com

unread,
Oct 17, 2018, 10:09:28 AM10/17/18
to Fluentd Google Group
were you able to fix this? am also facing same issue now
Reply all
Reply to author
Forward
0 new messages