Kong new install failing with "Error during migration 2016-02-25-160900_remove_null_consumer_id"

1,841 views
Skip to first unread message

brad...@gmail.com

unread,
Feb 20, 2017, 7:10:06 PM2/20/17
to Kong

New Install of Kong onto RHEL 6.8 using kong-0.9.8.el6.noarch.rpm.

 

Kong data store is a new Cassandra 2.2.8 DB with no Kong keyspace initially.   Cassandra cluster is 2 nodes -- 1 node in each of 2 data centers.

 

Getting error when we first issue kong start

 

Error:

/usr/local/share/lua/5.1/kong/cmd/start.lua:37: /usr/local/share/lua/5.1/kong/cmd/start.lua:18: Error during migration 2016-02-25-160900_remove_null_consumer_id: [cassandra error] [Invalid] Predicates on non-primary-key columns (consumer_id) are not yet supported for non secondary index queries

 

We have implemented a work-around, which I will describe below, but I would appreciate help with understanding the root cause and a better solution.

 

Error logs:

 

2017/02/20 10:19:29 [verbose] prefix in use: /usr/local/kong

2017/02/20 10:19:29 [verbose] preparing nginx prefix directory at /usr/local/kong

2017/02/20 10:19:29 [verbose] saving serf identifier to /usr/local/kong/serf/serf.id

2017/02/20 10:19:29 [debug] searching for OpenResty 'resty' executable

2017/02/20 10:19:29 [debug] /usr/local/openresty/bin/resty -V: 'nginx version: openresty/1.11.2.1'

2017/02/20 10:19:29 [debug] found OpenResty 'resty' executable at /usr/local/openresty/bin/resty

2017/02/20 10:19:29 [verbose] saving serf shell script handler to /usr/local/kong/serf/serf_event.sh

2017/02/20 10:19:29 [verbose] SSL enabled, no custom certificate set: using default certificate

2017/02/20 10:19:29 [verbose] default SSL certificate found at /usr/local/kong/ssl/kong-default.crt

2017/02/20 10:19:29 [warn] ulimit is currently set to "1024". For better performance set it to at least "4096" using "ulimit -n"

2017/02/20 10:19:29 [verbose] running datastore migrations

2017/02/20 10:19:29 [warn] 21319#0: *2 [lua] log.lua:22: warn(): No cluster infos in shared dict cassandra, context: ngx.timer

2017/02/20 10:19:29 [info] migrating core for keyspace kong

2017/02/20 10:19:30 [info] core migrated up to: 2015-01-12-175310_skeleton

2017/02/20 10:19:39 [info] core migrated up to: 2015-01-12-175310_init_schema

2017/02/20 10:19:41 [info] core migrated up to: 2015-11-23-817313_nodes

2017/02/20 10:19:41 [verbose] could not start Kong, stopping services

2017/02/20 10:19:41 [verbose] leaving serf cluster

2017/02/20 10:19:41 [verbose] left serf cluster

2017/02/20 10:19:41 [verbose] stopping serf agent at /usr/local/kong/pids/serf.pid

2017/02/20 10:19:41 [debug] sending signal to pid at: /usr/local/kong/pids/serf.pid

2017/02/20 10:19:41 [debug] no pid file at: /usr/local/kong/pids/serf.pid

2017/02/20 10:19:41 [verbose] serf agent stopped

2017/02/20 10:19:41 [verbose] stopping dnsmasq at /usr/local/kong/pids/dnsmasq.pid

2017/02/20 10:19:41 [debug] sending signal to pid at: /usr/local/kong/pids/dnsmasq.pid

2017/02/20 10:19:41 [debug] no pid file at: /usr/local/kong/pids/dnsmasq.pid

2017/02/20 10:19:41 [verbose] stopped services

Error:

/usr/local/share/lua/5.1/kong/cmd/start.lua:37: /usr/local/share/lua/5.1/kong/cmd/start.lua:18: Error during migration 2016-02-25-160900_remove_null_consumer_id: [cassandra error] [Invalid] Predicates on non-primary-key columns (consumer_id) are not yet supported for non secondary index queries

stack traceback:

        [C]: in function 'error'

        /usr/local/share/lua/5.1/kong/cmd/start.lua:37: in function 'cmd_exec'

        /usr/local/share/lua/5.1/kong/cmd/init.lua:89: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:89>

        [C]: in function 'xpcall'

        /usr/local/share/lua/5.1/kong/cmd/init.lua:89: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:45>

        /usr/local/bin/kong:13: in function 'file_gen'

        init_worker_by_lua:38: in function <init_worker_by_lua:36>

        [C]: in function 'pcall'

        init_worker_by_lua:45: in function <init_worker_by_lua:43>

 

That migration ("2016-02-25-160900_remove_null_consumer_id") appears in /usr/local/share/lua/5.1/kong/dao/migrations/cassandra.lua as follows:

 

  {

    name = "2016-02-25-160900_remove_null_consumer_id",

    up = function(_, _, dao)

      local rows, err = dao.plugins:find_all {consumer_id = "00000000-0000-0000-0000-000000000000"}

      if err then

        return err

      end

 

      for _, row in ipairs(rows) do

        row.consumer_id = nil

        local _, err = dao.plugins:update(row, row, {full = true})

        if err then

          return err

        end

      end

    end

  }

 

I am not familiar enough with kong or lua to figure out exactly which table it was trying to update.  

 

WORK-AROUND:

 

  1. Run the migrations manually.

    kong migrations reset --config /etc/kong/kong.conf
    kong migrations up --config  /etc/kong/kong.conf

  2. Prevent Kong from trying to rerun the migrations every time it starts.  

    I am assuming that the logic should detect that the migrations are not needed but that isn't working.

In the below 2 files, we commented out the "assert(dao:run_migrations())" line.
 

In /usr/local/share/lua/5.1/kong/cmd/start.lua

  local dao = DAOFactory(conf)

  local err

  xpcall(function()

    assert(prefix_handler.prepare_prefix(conf, args.nginx_conf))

    --assert(dao:run_migrations())

 

In  /usr/local/share/lua/5.1/kong.lua

  local events = Events() -- retrieve node plugins

  local dao = DAOFactory(config, events) -- instanciate long-lived DAO

  --assert(dao:run_migrations()) -- migrating in case embedded in custom nginx

 

 

Since we are new to this whole stack (Kong, Cassandra, and LUA) we don't know what side-effects that this work-around will have aside from needing to manually run the migrations if/when we upgrade Kong.   But my fear is that after we have started using Kong we won't be able to simply "reset" the schema without losing the metadata we have entered up to that point, and unless the migration code can detect what has already been implemented then we will wind up with errors and duplicates in the schema_migrations table.   This is mentioned in https://github.com/Mashape/kong/issues/1118

 

This could either be a bug in Kong or a bug in Cassandra, or even a bug in the Cassandra driver for lua.  However, I am past my limits on any of them.


I should note that we did not hit this problem in our first environment.  The main difference I can think of is that the data store in the working environment is a 3-node Cassandra cluster in a single data center.  The environment that isn't working is a multi-datacenter Cassandra cluster.

 

brad...@gmail.com

unread,
Feb 23, 2017, 11:06:49 AM2/23/17
to Kong, brad...@gmail.com
Update:  Applied Cassandra patch 2.2.9 and retried with same result.    Not happy with the work-around so hoping for some help.  I think this should be easy to replicate.

brad...@gmail.com

unread,
Feb 23, 2017, 12:09:02 PM2/23/17
to Kong, brad...@gmail.com
Update #2:  The issue seems to be related to my multi-datacenter Cassandra configuration.   The error is happening with either a 2 node configuration (1 node per DC) or a 6 node configuration (3 nodes per DC).    In the two node cluster I decommissioned one of the nodes and changed kong.conf to only use the remaining node and Kong started fine with no error.
 
For the two node cluster with the error, my config is below with the node names and IP addresses masked a bit:

$CASSANDRA_HOME/conf/cassandra.yaml

 

cassandra.yaml on node-1

 

cluster_name: 'Cassandra PAT'

data_file_directories:

    - /data/cassandra/data

commitlog_directory: /data/cassandra/commitlog

saved_caches_directory: /data/cassandra/saved_caches

seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:

         - seeds: "10.900.12.10"

listen_address: 10.900.12.10

rpc_address: 0.0.0.0

broadcast_rpc_address: 10.900.12.10

endpoint_snitch: GossipingPropertyFileSnitch

 

cassandra.yaml on node-2

 

cluster_name: 'Cassandra PAT'

data_file_directories:

    - /data/cassandra/data

commitlog_directory: /data/cassandra/commitlog

saved_caches_directory: /data/cassandra/saved_caches

seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
         - seeds: "10.900.12.10"

listen_address: 10.800.17.30

rpc_address: 0.0.0.0

broadcast_rpc_address: 10.800.17.30

endpoint_snitch: GossipingPropertyFileSnitch

 

 

$CASSANDRA_HOME/conf/cassandra-rackdc.properties

 

Because we are using the GossipingPropertyFileSnitch, need to also update/confirm cassandra-rackdc.properties file:

 

cassandra-rackdc.properties file on node 1 

 

dc=dc1

rack=rack1

 

cassandra-rackdc.properties file on node 2 

 

dc=dc2

rack=rack1


--------------------------------

The kong.conf is:

database = cassandra            

 

cassandra_contact_points = 10.900.12.10,10.800.17.30

                                

cassandra_port = 9042          

 

cassandra_keyspace = kong      

 

cassandra_consistency = LOCAL_QUORUM     

 

cassandra_timeout = 5000        

 

cassandra_ssl = off             

 

cassandra_ssl_verify = off      

 

cassandra_username = kong      

 

cassandra_password = kong       

 

cassandra_repl_strategy = NetworkTopologyStrategy

 

cassandra_data_centers = ldc:1,egi:1



Any insight into the root cause or another troubleshooting idea would be much appreciated.


On Monday, February 20, 2017 at 6:10:06 PM UTC-6, brad...@gmail.com wrote:

Thibault Charbonnier

unread,
Feb 23, 2017, 3:23:04 PM2/23/17
to kong...@googlegroups.com
Hi,

Thanks for the details. One question: if you change your
cassandra_consistency to something else than LOCAL_QUORUM (maybe, QUORUM
or ALL) for the migrations step only (kong migrations up -c kong.conf),
would those setups still raise this error?

Thank you,
t.

brad...@gmail.com

unread,
Feb 23, 2017, 10:39:23 PM2/23/17
to Kong
I get the error with cassandra_consistency = ALL or ONE or LOCAL_QUORUM unless there is only one datacenter in the cluster, which completely baffles me.  I don't know if I tried QUORUM but I will if you think it may help.

I do have some additional info that I hope will be useful, and thanks for responding.    

As I mentioned, we get the error when we run kong start connected to a multi-DC Cassandra cluster with no kong keyspace created.   If I run the migrations manually (reset followed by up) I don't get the error.   I said earlier that I didn't know which table was being updated when the error is triggered, but I have determined that it is the plugins table and in fact it does not have an index on the consumer_id column even though the CREATE INDEX statement is in the migration DDL right after creation of the plugins table.

Am I correct in assuming that the order of the list of core migrations should match the order in which they were executed?

The log shows that the schema migrations are happening in the right order, but the order doesn't match the list of migrations in the schema_migrations table for the core ID.   Hypothetically, if the migration that updates consumer_id to nil ran before or at the same time as 2015-01-12-175310_init_schema then it might try the update before the plugins(consumer_id) index was created.  Or else the create index might have failed for some reason (more likely).    That index was not the only one missing.   Of the 6 indexes in 2015-01-12-175310_init_schema only 2 existed when I checked the tables after the error.  The middle index for each table is there but the other two are not.

I can decommission one of the two Cassandra nodes and remove the reference to the second DC from the keyspace replication and it works just fine.   Or, as I mentioned, I can run the migrations manually and it also works fine, but I then have to stop the migrations from running automatically on kong start or else I get duplicate migration issues. 

If I can't get past this, what would I need to do in order to try the Kong version that works with Cassandra 3.x.   I think I read where some of the code that is related to migrations has been re-worked in that version. 

brad...@gmail.com

unread,
Feb 24, 2017, 2:45:25 PM2/24/17
to Kong, brad...@gmail.com
Update:  We found that port 7000 was not open bi-directionally between the Cassandra node in DC1 and the node in DC2.   Hoping this will be the issue, but will have to wait for the firewall rule change to go through.

In our Prod cluster port 7000 is open bi-directionally between the Cassandra nodes but port 9042 is not open between the Kong app servers and the Cassandra nodes in DC2.   

Will re-try after all these FW issues are addressed.    Please stand by :) 


On Monday, February 20, 2017 at 6:10:06 PM UTC-6, brad...@gmail.com wrote:

Thibault Charbonnier

unread,
Feb 24, 2017, 7:41:02 PM2/24/17
to kong...@googlegroups.com
Would love to be updated as soon as you find out more about this.

I just tried 'kong start' on 6 nodes cluster (2 DCs) with the following
config without any issue:

database = cassandra
cassandra_contact_points = xxx,xxx
cassandra_keyspace = kong
cassandra_username = ...
cassandra_password = ...
cassandra_ssl = off
cassandra_repl_strategy = NetworkTopologyStrategy
cassandra_data_centers = DC1:2,DC2:3
cassandra_consistency = LOCAL_QUORUM

t.

brad...@gmail.com

unread,
Feb 27, 2017, 10:51:45 AM2/27/17
to Kong

 

Still getting the error.   Right now I don't know of any FW rule issues, but maybe there are additional ports to consider.

 

Below is the complete output from the failing start command.

 

[root@sc1ucbtyapi01 ~]# kong start --vv

2017/02/27 09:21:58 [verbose] Kong: 0.9.8

2017/02/27 09:21:58 [debug] ngx_lua: 10006

2017/02/27 09:21:58 [debug] nginx: 1011002

2017/02/27 09:21:58 [debug] Lua: LuaJIT 2.1.0-beta2

2017/02/27 09:21:58 [debug] PRNG seed: 111655171229

2017/02/27 09:21:58 [verbose] no config file found at /etc/kong.conf

2017/02/27 09:21:58 [verbose] reading config file at /etc/kong/kong.conf

2017/02/27 09:21:58 [debug] admin_listen = "0.0.0.0:8001"

2017/02/27 09:21:58 [debug] anonymous_reports = true

2017/02/27 09:21:58 [debug] cassandra_consistency = "ALL"

2017/02/27 09:21:58 [debug] cassandra_contact_points = {"10.211.12.10","10.218.17.30"}

2017/02/27 09:21:58 [debug] cassandra_data_centers = {"ldc:1","egi:1"}

2017/02/27 09:21:58 [debug] cassandra_keyspace = "kong"

2017/02/27 09:21:58 [debug] cassandra_password = "******"

2017/02/27 09:21:58 [debug] cassandra_port = 9042

2017/02/27 09:21:58 [debug] cassandra_repl_factor = 1

2017/02/27 09:21:58 [debug] cassandra_repl_strategy = "NetworkTopologyStrategy"

2017/02/27 09:21:58 [debug] cassandra_ssl = false

2017/02/27 09:21:58 [debug] cassandra_ssl_verify = false

2017/02/27 09:21:58 [debug] cassandra_timeout = 5000

2017/02/27 09:21:58 [debug] cassandra_username = "kong"

2017/02/27 09:21:58 [debug] cluster_listen = "0.0.0.0:7946"

2017/02/27 09:21:58 [debug] cluster_listen_rpc = "127.0.0.1:7373"

2017/02/27 09:21:58 [debug] cluster_profile = "wan"

2017/02/27 09:21:58 [debug] cluster_ttl_on_failure = 3600

2017/02/27 09:21:58 [debug] custom_plugins = {}

2017/02/27 09:21:58 [debug] database = "cassandra"

2017/02/27 09:21:58 [debug] dnsmasq = true

2017/02/27 09:21:58 [debug] dnsmasq_port = 8053

2017/02/27 09:21:58 [debug] log_level = "notice"

2017/02/27 09:21:58 [debug] lua_code_cache = "on"

2017/02/27 09:21:58 [debug] lua_package_cpath = ""

2017/02/27 09:21:58 [debug] lua_package_path = "?/init.lua;./kong/?.lua"

2017/02/27 09:21:58 [debug] lua_ssl_verify_depth = 1

2017/02/27 09:21:58 [debug] mem_cache_size = "128m"

2017/02/27 09:21:58 [debug] nginx_daemon = "on"

2017/02/27 09:21:58 [debug] nginx_optimizations = true

2017/02/27 09:21:58 [debug] nginx_worker_processes = "auto"

2017/02/27 09:21:58 [debug] pg_database = "kong"

2017/02/27 09:21:58 [debug] pg_host = "127.0.0.1"

2017/02/27 09:21:58 [debug] pg_port = 5432

2017/02/27 09:21:58 [debug] pg_ssl = false

2017/02/27 09:21:58 [debug] pg_ssl_verify = false

2017/02/27 09:21:58 [debug] pg_user = "kong"

2017/02/27 09:21:58 [debug] prefix = "/usr/local/kong/"

2017/02/27 09:21:58 [debug] proxy_listen = "0.0.0.0:8000"

2017/02/27 09:21:58 [debug] proxy_listen_ssl = "0.0.0.0:8443"

2017/02/27 09:21:58 [debug] serf_path = "serf"

2017/02/27 09:21:58 [debug] ssl = true

2017/02/27 09:21:58 [verbose] prefix in use: /usr/local/kong

2017/02/27 09:21:58 [verbose] preparing nginx prefix directory at /usr/local/kong

2017/02/27 09:21:58 [verbose] saving serf identifier to /usr/local/kong/serf/serf.id

2017/02/27 09:21:58 [debug] searching for OpenResty 'resty' executable

2017/02/27 09:21:58 [debug] /usr/local/openresty/bin/resty -V: 'nginx version: openresty/1.11.2.1'

2017/02/27 09:21:58 [debug] found OpenResty 'resty' executable at /usr/local/openresty/bin/resty

2017/02/27 09:21:58 [verbose] saving serf shell script handler to /usr/local/kong/serf/serf_event.sh

2017/02/27 09:21:58 [verbose] SSL enabled, no custom certificate set: using default certificate

2017/02/27 09:21:58 [verbose] default SSL certificate found at /usr/local/kong/ssl/kong-default.crt

2017/02/27 09:21:58 [warn] ulimit is currently set to "1024". For better performance set it to at least "4096" using "ulimit -n"

2017/02/27 09:21:58 [verbose] running datastore migrations

2017/02/27 09:21:58 [warn] 13324#0: *2 [lua] log.lua:22: warn(): No cluster infos in shared dict cassandra, context: ngx.timer

2017/02/27 09:21:58 [info] migrating core for keyspace kong

2017/02/27 09:22:00 [info] core migrated up to: 2015-01-12-175310_skeleton

2017/02/27 09:22:09 [info] core migrated up to: 2015-01-12-175310_init_schema

2017/02/27 09:22:10 [info] core migrated up to: 2015-11-23-817313_nodes

2017/02/27 09:22:10 [verbose] could not start Kong, stopping services

2017/02/27 09:22:10 [verbose] leaving serf cluster

2017/02/27 09:22:11 [verbose] left serf cluster

2017/02/27 09:22:11 [verbose] stopping serf agent at /usr/local/kong/pids/serf.pid

2017/02/27 09:22:11 [debug] sending signal to pid at: /usr/local/kong/pids/serf.pid

2017/02/27 09:22:11 [debug] no pid file at: /usr/local/kong/pids/serf.pid

2017/02/27 09:22:11 [verbose] serf agent stopped

2017/02/27 09:22:11 [verbose] stopping dnsmasq at /usr/local/kong/pids/dnsmasq.pid

2017/02/27 09:22:11 [debug] sending signal to pid at: /usr/local/kong/pids/dnsmasq.pid

2017/02/27 09:22:11 [debug] no pid file at: /usr/local/kong/pids/dnsmasq.pid

2017/02/27 09:22:11 [verbose] stopped services

Error:

/usr/local/share/lua/5.1/kong/cmd/start.lua:37: /usr/local/share/lua/5.1/kong/cmd/start.lua:18: Error during migration 2016-02-25-160900_remove_null_consumer_id: [cassandra error] [Invalid] Predicates on non-primary-key columns (consumer_id) are not yet supported for non secondary index queries

stack traceback:

        [C]: in function 'error'

        /usr/local/share/lua/5.1/kong/cmd/start.lua:37: in function 'cmd_exec'

        /usr/local/share/lua/5.1/kong/cmd/init.lua:89: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:89>

        [C]: in function 'xpcall'

        /usr/local/share/lua/5.1/kong/cmd/init.lua:89: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:45>

        /usr/local/bin/kong:13: in function 'file_gen'

        init_worker_by_lua:38: in function <init_worker_by_lua:36>

        [C]: in function 'pcall'

        init_worker_by_lua:45: in function <init_worker_by_lua:43>

 

Kong keyspace at the time of the error….   Note that there is no index on plugins.consumer_id

 

cqlsh> desc kong

 

CREATE KEYSPACE kong WITH replication = {'class': 'NetworkTopologyStrategy', 'egi': '1', 'ldc': '1'}  AND durable_writes = true;

 

CREATE TABLE kong.schema_migrations (

    id text PRIMARY KEY,

    migrations list<text>

) WITH bloom_filter_fp_chance = 0.01

    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'

    AND comment = ''

    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}

    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND dclocal_read_repair_chance = 0.1

    AND default_time_to_live = 0

    AND gc_grace_seconds = 864000

    AND max_index_interval = 2048

    AND memtable_flush_period_in_ms = 0

    AND min_index_interval = 128

    AND read_repair_chance = 0.0

    AND speculative_retry = '99.0PERCENTILE';

 

CREATE TABLE kong.plugins (

    id uuid,

    name text,

    api_id uuid,

    config text,

    consumer_id uuid,

    created_at timestamp,

    enabled boolean,

    PRIMARY KEY (id, name)

) WITH CLUSTERING ORDER BY (name ASC)

    AND bloom_filter_fp_chance = 0.01

    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'

    AND comment = ''

    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}

    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND dclocal_read_repair_chance = 0.1

    AND default_time_to_live = 0

    AND gc_grace_seconds = 864000

    AND max_index_interval = 2048

    AND memtable_flush_period_in_ms = 0

    AND min_index_interval = 128

    AND read_repair_chance = 0.0

    AND speculative_retry = '99.0PERCENTILE';

CREATE INDEX plugins_api_id_idx ON kong.plugins (api_id);

 

CREATE TABLE kong.nodes (

    name text PRIMARY KEY,

    cluster_listening_address text,

    created_at timestamp

) WITH bloom_filter_fp_chance = 0.01

    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'

    AND comment = ''

    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}

    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND dclocal_read_repair_chance = 0.1

    AND default_time_to_live = 3600

    AND gc_grace_seconds = 864000

    AND max_index_interval = 2048

    AND memtable_flush_period_in_ms = 0

    AND min_index_interval = 128

    AND read_repair_chance = 0.0

    AND speculative_retry = '99.0PERCENTILE';

CREATE INDEX nodes_cluster_listening_address_idx ON kong.nodes (cluster_listening_address);

 

CREATE TABLE kong.consumers (

    id uuid PRIMARY KEY,

    created_at timestamp,

    custom_id text,

    username text

) WITH bloom_filter_fp_chance = 0.01

    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'

    AND comment = ''

    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}

    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND dclocal_read_repair_chance = 0.1

    AND default_time_to_live = 0

    AND gc_grace_seconds = 864000

    AND max_index_interval = 2048

    AND memtable_flush_period_in_ms = 0

    AND min_index_interval = 128

    AND read_repair_chance = 0.0

    AND speculative_retry = '99.0PERCENTILE';

CREATE INDEX consumers_username_idx ON kong.consumers (username);

CREATE INDEX consumers_custom_id_idx ON kong.consumers (custom_id);

 

CREATE TABLE kong.apis (

    id uuid PRIMARY KEY,

    created_at timestamp,

    name text,

    preserve_host boolean,

    request_host text,

    request_path text,

    strip_request_path boolean,

    upstream_url text

) WITH bloom_filter_fp_chance = 0.01

    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'

    AND comment = ''

    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}

    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND dclocal_read_repair_chance = 0.1

    AND default_time_to_live = 0

    AND gc_grace_seconds = 864000

    AND max_index_interval = 2048

    AND memtable_flush_period_in_ms = 0

    AND min_index_interval = 128

    AND read_repair_chance = 0.0

    AND speculative_retry = '99.0PERCENTILE';

CREATE INDEX apis_request_host_idx ON kong.apis (request_host);


----------------------------------------------------------------------------

Thibault Charbonnier

unread,
Feb 27, 2017, 1:00:16 PM2/27/17
to kong...@googlegroups.com
I just gave the 2 DC cluster a try with Kong 0.9.8 and did not encounter
any trouble running the migrations from 'kong start'. I have no doubt
you are encountering this issue, but am having a hard time reproducing it.

If you execute the following:

SELECT data_center, peer, rpc_address, release_version,
schema_version FROM system.peers;

Are the resulting rows consistent with what you'd expect your cluster to
look like? Is the schema_version the same on each node?

--
t.

brad...@gmail.com

unread,
Feb 28, 2017, 8:05:01 AM2/28/17
to Kong
The schema versions look correct, or at least the same between the two nodes in the cluster.   I compared output from system.peers to system.local and they matched.

In looking at this issue, we found that the system clock is not in sync between the 2 servers.  

Server-1

time correct to within 12714 ms

   polling server every 64 s


Server-2

time correct to within 42 ms

   polling server every 256 s


Could this explain what we are seeing?   How much time drift is too much and what would the side-effects be?

BTW...   We have a 6-node cluster that had been failing with a similar error but last evening I started Kong (no keyspace yet) and didn't get an error.   I did notice that it was VERY slow to start.   It took ~5 minutes to complete the migrations (I didn't verify the time, just seemed that long).   I haven't done anything after starting it to validate that it is working correctly but the output from the start command looked okay.

Thibault Charbonnier

unread,
Mar 1, 2017, 1:45:43 PM3/1/17
to kong...@googlegroups.com
Hi,

Would you mind trying out this patch:
    https://github.com/Mashape/kong/pull/2145

I believe it should help with such inconsistent migrations (as well as make them a bit faster). Would appreciate if you let me know your results with it, thanks.

t.
--
You received this message because you are subscribed to the Google Groups "Kong" group.
To unsubscribe from this group and stop receiving emails from it, send an email to konglayer+...@googlegroups.com.
To post to this group, send email to kong...@googlegroups.com.
Visit this group at https://groups.google.com/group/konglayer.
To view this discussion on the web visit https://groups.google.com/d/msgid/konglayer/915f7b83-d5ac-4eda-8ed3-0a526438b7fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

brad...@gmail.com

unread,
Mar 2, 2017, 11:48:36 AM3/2/17
to Kong

I replied earlier but don't see it so forgive me if this is a duplicate response....

Yes, we will try that patch.   

Also, now that we have resolved the NTP time synchronization between the Cassandra servers I have been able (for the first time) to successfully start kong against an empty Cassandra kong keyspace on the two node cluster (1 per DC).

I don't know if I can conclude that this was the whole problem but I have had someone with a good bit of Cassandra experience look at my configuration and they didn't see any issue other than the time synchronization.  This may be something you could simulate if you really wanted to replicate the issue.   One server was 40 seconds apart from the other.   The fix was to restart the NTP daemon on the server that was off and wait for the clock to finally synchronize, which took a long time.  

I will update if I learn more or if I still have problems after applying your patch.   Thanks for your help and interest.
Reply all
Reply to author
Forward
0 new messages