Can't get data to Redis

107 views
Skip to first unread message

Or Yagel

unread,
Dec 17, 2014, 6:54:24 AM12/17/14
to skyli...@googlegroups.com
Hi,

I'm trying to deploy Skyline.
I was able to get data from seed_data.
Every once in a while when I restart horizon I do get data sent to Redis for a short period.
We send more than 10K metrics from Graphite, if I do succeed to get data to Redis, it's partial and only for a short period.

I added a few logs in the listener.py file, it looks like the listener starts, it does some work for a few seconds and then hangs in the while loop in read_all function.

Can you please help me figure out what's wrong?

2014-12-17 11:50:12 :: 40048 :: starting horizon agent
2014-12-17 11:50:12 :: 40050 :: started worker
2014-12-17 11:50:12 :: 40049 :: started worker
2014-12-17 11:50:12 :: 40052 :: started listener
2014-12-17 11:50:12 :: 40048 :: started roomba
2014-12-17 11:50:12 :: 40052 :: listening over udp for messagepack on 2025
2014-12-17 11:50:12 :: 40051 :: started listener
2014-12-17 11:50:12 :: 40051 :: listening over tcp for pickles on 2024
2014-12-17 11:50:12 :: 40054 :: operated on metrics. in 0.004786 seconds
2014-12-17 11:50:12 :: 40054 :: metrics. keyspace is 0
2014-12-17 11:50:12 :: 40054 :: blocked 0 times
2014-12-17 11:50:12 :: 40054 :: euthanized 0 geriatric keys
2014-12-17 11:50:12 :: 40054 :: sleeping due to low run time...
2014-12-17 11:50:15 :: 40051 :: connection from 73.48.22.172:2024
2014-12-17 11:50:15 :: 40049 :: queue size at 0
2014-12-17 11:50:17 :: 40049 :: queue size at 1
2014-12-17 11:50:17 :: 40049 :: queue size at 0
2014-12-17 11:50:19 :: 40049 :: queue size at 1
2014-12-17 11:50:20 :: 40049 :: queue size at 0
2014-12-17 11:50:22 :: 40049 :: queue size at 2
2014-12-17 11:50:22 :: 40049 :: queue size at 0
2014-12-17 11:50:23 :: 40060 :: operated on metrics. in 0.234799 seconds
2014-12-17 11:50:23 :: 40060 :: metrics. keyspace is 0
2014-12-17 11:50:23 :: 40060 :: blocked 0 times
2014-12-17 11:50:23 :: 40060 :: euthanized 336 geriatric keys
2014-12-17 11:50:23 :: 40060 :: sleeping due to low run time...
2014-12-17 11:50:24 :: 40049 :: queue size at 0
2014-12-17 11:50:27 :: 40049 :: queue size at 0
2014-12-17 11:50:30 :: 40049 :: queue size at 1
2014-12-17 11:50:32 :: 40049 :: queue size at 0
2014-12-17 11:50:33 :: 40082 :: operated on metrics. in 0.131834 seconds
2014-12-17 11:50:33 :: 40082 :: metrics. keyspace is 0
2014-12-17 11:50:33 :: 40082 :: blocked 0 times
2014-12-17 11:50:33 :: 40082 :: euthanized 189 geriatric keys
2014-12-17 11:50:33 :: 40082 :: sleeping due to low run time...
2014-12-17 11:50:33 :: 40049 :: queue size at 0
2014-12-17 11:50:43 :: 40105 :: operated on metrics. in 0.079526 seconds
2014-12-17 11:50:43 :: 40105 :: metrics. keyspace is 0
2014-12-17 11:50:43 :: 40105 :: blocked 0 times
2014-12-17 11:50:43 :: 40105 :: euthanized 42 geriatric keys
2014-12-17 11:50:43 :: 40105 :: sleeping due to low run time...
2014-12-17 11:50:48 :: 40049 :: worker queue is empty and timed out
2014-12-17 11:50:48 :: 40050 :: worker queue is empty and timed out
2014-12-17 11:50:53 :: 40109 :: operated on metrics. in 0.005590 seconds
2014-12-17 11:50:53 :: 40109 :: metrics. keyspace is 0
2014-12-17 11:50:53 :: 40109 :: blocked 0 times
2014-12-17 11:50:53 :: 40109 :: euthanized 0 geriatric keys
2014-12-17 11:50:53 :: 40109 :: sleeping due to low run time...

Abe Stanway

unread,
Dec 17, 2014, 8:01:49 PM12/17/14
to Or Yagel, skyli...@googlegroups.com
What does keys * in Redis return? Likely your data is in Redis, it's just not long enough to trigger analysis.

--
You received this message because you are subscribed to the Google Groups "skyline-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to skyline-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Abe Stanway

Or Yagel

unread,
Dec 18, 2014, 8:05:28 AM12/18/14
to skyli...@googlegroups.com, ory...@gmail.com
Thanks,
I looked in Redis before posting.
The data sometimes get to Redis, but even if it does, it arrives for few seconds and then stops.

I noticed that Horizon uses 100% CPU on a quad core machine, I guess that something is wrong but I'm not sure what.

Abe Stanway

unread,
Dec 18, 2014, 12:37:23 PM12/18/14
to Or Yagel, skyli...@googlegroups.com
I see. What is your FULL_DURATION set to?

Or Yagel

unread,
Dec 18, 2014, 1:36:44 PM12/18/14
to skyli...@googlegroups.com, ory...@gmail.com
First, please allow me to say, what you're doing (helping others) is not trivial and it is highly appreciated - thanks!

The FULL_DURATION is 86400.

More info:
I added some logs to the read_all function in the listener.py:
The function:
def read_all(self, sock, n):
       
"""
        Read n bytes from a stream
        """

        logger
.info('read_all started')
        data
= ''
        counter
= 0
       
while n > 0:
            counter
+= 1
            logger
.info("{}={}".format("n", n))
            buf
= sock.recv(n)
            n
-= len(buf)
            data
+= buf
            logger
.info("{}={}".format("buf", len(buf)))
        logger
.info('read_all ended')
       
return data

I added the Horizon log below.
You can see that it looks like it's getting data at start and then stops.
It sometimes write to Redis, and some times doesn't, but it always stops after few seconds.
If I run tcpdump while horizon is running, I can see data arrives on port 2024.

started with pid 2327
2014-12-18 16:16:13 :: 2327 :: starting horizon agent
2014-12-18 16:16:13 :: 2328 :: started worker
2014-12-18 16:16:13 :: 2329 :: started listener
2014-12-18 16:16:13 :: 2330 :: started listener
2014-12-18 16:16:13 :: 2330 :: listening over udp for messagepack on 2025
2014-12-18 16:16:13 :: 2329 :: listening over tcp for pickles on 2024
2014-12-18 16:16:13 :: 2327 :: started roomba
2014-12-18 16:16:16 :: 2332 :: operated on metrics. in 2.387138 seconds
2014-12-18 16:16:16 :: 2332 :: metrics. keyspace is 3488
2014-12-18 16:16:16 :: 2332 :: blocked 0 times
2014-12-18 16:16:16 :: 2332 :: euthanized 0 geriatric keys
2014-12-18 16:16:16 :: 2332 :: sleeping due to low run time...
2014-12-18 16:16:21 :: 2329 :: connection from 78.86.92.172:2024
2014-12-18 16:16:21 :: 2329 :: read_all started
2014-12-18 16:16:21 :: 2329 :: n=4
2014-12-18 16:16:21 :: 2329 :: buf=4
2014-12-18 16:16:21 :: 2329 :: read_all ended
2014-12-18 16:16:21 :: 2329 :: read_all started
2014-12-18 16:16:21 :: 2329 :: n=25735
2014-12-18 16:16:21 :: 2329 :: buf=8564
2014-12-18 16:16:21 :: 2329 :: n=17171
2014-12-18 16:16:21 :: 2329 :: buf=4284
2014-12-18 16:16:21 :: 2329 :: n=12887
2014-12-18 16:16:21 :: 2329 :: buf=12887
2014-12-18 16:16:21 :: 2329 :: read_all ended
2014-12-18 16:16:21 :: 2328 :: queue size at 2
2014-12-18 16:16:21 :: 2328 :: queue size at 20
2014-12-18 16:16:21 :: 2328 :: queue size at 20
2014-12-18 16:16:21 :: 2329 :: read_all started
2014-12-18 16:16:21 :: 2329 :: n=4
2014-12-18 16:16:21 :: 2328 :: queue size at 19
2014-12-18 16:16:21 :: 2328 :: queue size at 18
2014-12-18 16:16:21 :: 2328 :: queue size at 17
2014-12-18 16:16:21 :: 2328 :: queue size at 16
2014-12-18 16:16:21 :: 2328 :: queue size at 15
2014-12-18 16:16:21 :: 2328 :: queue size at 14
2014-12-18 16:16:21 :: 2328 :: queue size at 13
2014-12-18 16:16:21 :: 2328 :: queue size at 12
2014-12-18 16:16:21 :: 2328 :: queue size at 11
2014-12-18 16:16:21 :: 2328 :: queue size at 10
2014-12-18 16:16:21 :: 2328 :: queue size at 9
2014-12-18 16:16:21 :: 2328 :: queue size at 8
2014-12-18 16:16:21 :: 2328 :: queue size at 7
2014-12-18 16:16:21 :: 2328 :: queue size at 6
2014-12-18 16:16:21 :: 2328 :: queue size at 5
2014-12-18 16:16:21 :: 2328 :: queue size at 4
2014-12-18 16:16:21 :: 2328 :: queue size at 3
2014-12-18 16:16:21 :: 2328 :: queue size at 2
2014-12-18 16:16:21 :: 2328 :: queue size at 1
2014-12-18 16:16:21 :: 2328 :: queue size at 0
2014-12-18 16:16:22 :: 2329 :: buf=4
2014-12-18 16:16:22 :: 2329 :: read_all ended
2014-12-18 16:16:22 :: 2329 :: read_all started
2014-12-18 16:16:22 :: 2329 :: n=679
2014-12-18 16:16:22 :: 2329 :: buf=679
2014-12-18 16:16:22 :: 2329 :: read_all ended
2014-12-18 16:16:22 :: 2329 :: read_all started
2014-12-18 16:16:22 :: 2329 :: n=4
2014-12-18 16:16:22 :: 2328 :: queue size at 0
2014-12-18 16:16:24 :: 2329 :: buf=4
2014-12-18 16:16:24 :: 2329 :: read_all ended
2014-12-18 16:16:24 :: 2329 :: read_all started
2014-12-18 16:16:24 :: 2329 :: n=3075
2014-12-18 16:16:24 :: 2329 :: buf=3075
2014-12-18 16:16:24 :: 2329 :: read_all ended
2014-12-18 16:16:24 :: 2328 :: queue size at 1
2014-12-18 16:16:24 :: 2329 :: read_all started
2014-12-18 16:16:24 :: 2329 :: n=4
2014-12-18 16:16:24 :: 2328 :: queue size at 0
2014-12-18 16:16:25 :: 2329 :: buf=4
2014-12-18 16:16:25 :: 2329 :: read_all ended
2014-12-18 16:16:25 :: 2329 :: read_all started
2014-12-18 16:16:25 :: 2329 :: n=951
2014-12-18 16:16:25 :: 2329 :: buf=951
2014-12-18 16:16:25 :: 2329 :: read_all ended
2014-12-18 16:16:25 :: 2329 :: read_all started
2014-12-18 16:16:25 :: 2329 :: n=4
2014-12-18 16:16:25 :: 2328 :: queue size at 0
2014-12-18 16:16:26 :: 2329 :: buf=4
2014-12-18 16:16:26 :: 2329 :: read_all ended
2014-12-18 16:16:26 :: 2329 :: read_all started
2014-12-18 16:16:26 :: 2329 :: n=103
2014-12-18 16:16:26 :: 2329 :: buf=103
2014-12-18 16:16:26 :: 2329 :: read_all ended
2014-12-18 16:16:26 :: 2329 :: read_all started
2014-12-18 16:16:26 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=4
2014-12-18 16:16:27 :: 2329 :: read_all ended
2014-12-18 16:16:27 :: 2329 :: read_all started
2014-12-18 16:16:27 :: 2329 :: n=3338
2014-12-18 16:16:27 :: 2329 :: buf=3338
2014-12-18 16:16:27 :: 2329 :: read_all ended
2014-12-18 16:16:27 :: 2329 :: read_all started
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2328 :: queue size at 2
2014-12-18 16:16:27 :: 2328 :: queue size at 1
2014-12-18 16:16:27 :: 2328 :: queue size at 0
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4
2014-12-18 16:16:27 :: 2329 :: buf=0
2014-12-18 16:16:27 :: 2329 :: n=4

Abe Stanway

unread,
Dec 19, 2014, 12:00:46 AM12/19/14
to Or Yagel, skyli...@googlegroups.com
And you're using seed_data.py? That isn't supposed to seed forever - at the end you should get a message as per https://github.com/etsy/skyline/blob/master/utils/seed_data.py#L69

Gary Wilson

unread,
Dec 19, 2014, 4:57:32 AM12/19/14
to Or Yagel, skyli...@googlegroups.com
Hi Or

In both your horizon log examples you paste in only a short timeframe having, your last log is less than a minute.
I assume it just continues to do that, by that I mean the queue does not increase again after the first short burst.

However note that having "stops and starts" in the queue size is normal at 10K metrics.  Over a 60 second period the probably will spike with data and then drop to 0 and then spiking up and then back to 0 as graphite sends the queue data.

Glad to see that you now have the data definitely coming from graphite.  What are your graphite metrics relating to the new skyline relay telling you, are you see full queue drops, etc?

carbon.relays.<hostname>.
destinations.<ip_address>:2024:None.fullQueueDrops
carbon.relays.<hostname>.destinations.<ip_address>:2024:None.attemptedRelays
carbon.relays.<hostname>.destinations.<ip_address>:2024:None.*

And you now have keys in the keyspace.

However you have mentioned a few things in your mail so perhaps we can get more info relating to what your settings and your environment.

You mention horizon using 100% CPU on a quad-core.  That is normal :)

What are your settings.py values for:

ANALYZER_PROCESSES =
WORKER_PROCESSES =

We run 2 ANALYZER_PROCESSES and 1 WORKER_PROCESS on our quad core skyline node and it runs at 100% flat out all the time on our 20K metrics, that is normal.

And on your carbon side what are your:

MAX_QUEUE_SIZE =
USE_FLOW_CONTROL =

Is this server doing anything other than redis, horizon and analyzer?

What OS and python are you running?

There is nothing in the redis log or your syslog/messages logs that may be related - no syn coookies floods or anything (although never seen that with a carbon-relay) but you never know.

If you are getting the queue data and data into redis e.g. metrics. keyspace is 3488 and the queue is going UP and then DOWN and then...

I suppose you are expecting 10K metrics, however remember you will only the active metrics sent.  For instance you could have 10K whisper files, but only 3488 active metrics.

Your graphite carbon.relay.* metrics have all the answers to these question.

If your horizon queue goes up for 10 seconds and then redis gets data and stops, then does the same thing for short periods every minute, then I would say it is all probably working as it should be.
Bearing in mind analyzer will not analyse anything until it has FULL_DURATION of metrics.

What does the analyzer log say?

Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
0 new messages