Major delays in Cloudfront log files making it into our S3 bucket

727 views
Skip to first unread message

Peter Vandenberk

unread,
Sep 22, 2014, 6:25:03 AM9/22/14
to snowpl...@googlegroups.com, anal...@simplybusiness.co.uk, dani...@simplybusiness.co.uk
Hi,

Probably not Snowplow-specific, but I thought I'd ask the question on this forum regardless, as others may have the same issue...

Basically we're currently experiencing major delays in Cloudfront log files showing up in the dedicated S3 bucket.

At present, we're starting to see Cloudfront log files with a "2014-09-21-17.*.gz" timestamp, so 5PM yesterday afternoon, almost an 18-hour delay.

Are other people seeing the same?  Is there anything we can do about this, or is that just the way it is with Cloudfront?

I've checked the AWS "health" dashboard, and there don't appear to be any known/reported issues with Cloudfront at the moment, or over the last couple of days.

Any help/comments/pointers would be much appreciated.

Best regards,

Peter Vandenberk
Principal Developer
(Simpy Business)

Yali Sassoon

unread,
Sep 22, 2014, 7:09:03 AM9/22/14
to snowpl...@googlegroups.com, anal...@simplybusiness.co.uk, dani...@simplybusiness.co.uk
Hi Peter,

I don't know about the particular issue you've raised, but in general we're recommending all our users migrate from the Cloudfront collector to the Clojure collector. There are lots of benefits:
  1. Support POST requests (especially from mobile devices, but also useful for server-side tracking)
  2. Much smaller number of collector log files produced - one per collector instance per hour
  3. Shorter latency before events arrive in S3 - events will always be written within the hour - addressing your specific issue here
  4. Support for third party cookies




--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7841 954 117
@yalisassoon

Peter Vandenberk

unread,
Sep 22, 2014, 7:13:39 AM9/22/14
to snowpl...@googlegroups.com, anal...@simplybusiness.co.uk, dani...@simplybusiness.co.uk
Hi Yali,

Thanks for your reply... yes, that is very much on our roadmap, but for a variety of reasons, we can't make that switch just yet and we're "stuck" with the Cloudfront collector for the time being.

We've seen Cloudfront delays in the past, but usually at most for a couple of hours... an 18-hour delay is quite severe (and is impacting our downstream reporting :-( 

Peter

Daniel Ramagem

unread,
Sep 22, 2014, 9:17:15 AM9/22/14
to snowpl...@googlegroups.com, anal...@simplybusiness.co.uk, dani...@simplybusiness.co.uk
Hi Peter,

Yours was a timely message, as my team also started noticing a similar increase in the lag for the CloudFront logs to show up in our S3 buckets.  In our case it seems like it's been a 9 hour delay.  We've set up some monitoring around the contents of our S3 buckets and this weekend our alarms went off when they couldn't detect new logs within a 3-hour window, which has generally been the lag we've observed since we adopted Snowplow a few months ago.

Our pipeline (enrichment, processing, and importing into our data warehouse) is executed daily, so the 3-hour lag had been acceptable but I'm now getting concerned as to the unpredictability of CloudFront.  Like you, we're also stuck with CloudFront for now, but I'm going to start investigating the effort to switch to the Clojure based connectors.

Daniel

Peter Vandenberk

unread,
Sep 22, 2014, 11:23:00 AM9/22/14
to snowpl...@googlegroups.com, anal...@simplybusiness.co.uk, dani...@simplybusiness.co.uk
Hey Daniel,

I've done some digging, and even though I couldn't find anything like a Cloudfront SLA, the "File Name Format and Timing of File Delivery" section of this page mentions:

Typically, CloudFront saves log files within 24 hours after receiving the corresponding requests.

24 hours... it could all be a whole lot worse!

  :-)

Peter

Daniel Ramagem

unread,
Sep 22, 2014, 11:34:41 AM9/22/14
to snowpl...@googlegroups.com
Peter,

It's actually even worse than that (my emphasis in bold):

We recommend that you use the logs to understand the nature of the requests for your content, not as a complete accounting of all requests. CloudFront delivers access logs on a best-effort basis. The log record for a particular request might be delivered long after the request was actually processed, or not at all. In rare cases, usage that appears in the AWS usage tracking and billing systems might not appear in CloudFront access logs.

So no guarantees at all.  My team has been aware of this "limitation" for a while, but we were focused on standing up the other parts of our platform and we concluded that CloudFront had been recording numbers very close to other tools in our platform (e.g., Google Analytics).  So we've trusted CloudFront and we've enjoyed the fact that it requires no maintenance.  But the breaking log formats by Amazon and now this increasing delay is starting to put some pressure for us to revisit our collector strategy.

Daniel
Daniel Bloomfield Ramagem
Software Engineer
Opower


We’re hiring! See jobs here

Peter Vandenberk

unread,
Sep 23, 2014, 10:40:22 AM9/23/14
to snowpl...@googlegroups.com
Hey Daniel,

Just a quick update from our end: the situation has gotten worse for us overnight, and we're now looking at a 28-hour delay   :-(
We raised a support ticket with AWS, but no joy on that front yet... when we hear from them, I'll post an update to this forum!

Best,

Peter

Daniel Ramagem

unread,
Sep 23, 2014, 11:44:09 AM9/23/14
to snowpl...@googlegroups.com
Thanks for the update Peter!  I just checked and we are also experiencing a 26-hour delay in the logs.  And our S3 logs bucket currently contains about 23 zipped CloudFront log files from July and early September.  Yikes.  I am also going to try to file a support ticket about this.  Let's keep this thread alive as we both get news.

Daniel

PS: Can anyone else on the list confirm similar behavior with CloudFront?

Gabor Ratky

unread,
Sep 23, 2014, 11:45:53 AM9/23/14
to snowpl...@googlegroups.com
Yes, we're seeing the same behavior, using us-east-1 S3 buckets.

Gabor

Peter Vandenberk

unread,
Sep 23, 2014, 12:18:20 PM9/23/14
to snowpl...@googlegroups.com
FWIW - our Cloudfront distribution is for an S3 bucket in the "eu-west-1" (Ireland) region...  our Cloudfront log bucket is also in that region.

Christian Lubasch

unread,
Sep 23, 2014, 2:24:01 PM9/23/14
to snowpl...@googlegroups.com
Same super delay here. Annoying.

joachim.fr...@springlane.de

unread,
Sep 23, 2014, 4:44:29 PM9/23/14
to snowpl...@googlegroups.com
We see similar situation - today with 16 hours delay.

joachim.fr...@springlane.de

unread,
Sep 23, 2014, 4:44:40 PM9/23/14
to snowpl...@googlegroups.com

Robert Kingston

unread,
Sep 23, 2014, 6:31:54 PM9/23/14
to snowpl...@googlegroups.com
Silly question, but how are you checking log file delays? Filenames meta data, queries?

Will confirm on my end.

Gabor Ratky

unread,
Sep 23, 2014, 7:22:00 PM9/23/14
to snowpl...@googlegroups.com
Robert,

The filename contains the day and the hour (in UTC) that it contains data for. The latest time period CloudFront delivered logs for us is:

XXXXXXXXXXXXXX.2014-09-23-18.ZzacBMY8.gz

This is promising because the lag seems to be only 5 hours now which is closer to what we're used to. During normal operations, ~90% of the logs should arrive within 3 hours and 99.9% of the logs should arrive within 24 hours for any given hour period. At least that's been our experience.

The sad part is that we have not received a single log file for the period between 2014-09-22-10 and 2014-09-23-17 and I don't think we ever will. So that's ~31 hours of lost event data.

Gabor

Daniel Ramagem

unread,
Sep 23, 2014, 8:26:36 PM9/23/14
to snowpl...@googlegroups.com
Gabor,

I can confirm that we're experiencing the exact same behavior (back to 3 hour gap) and issue (lost 31 hours of logs) you reported.  The lost data is a super bummer, as the AWS health dashboard didn't even report any issues with the service--although I understand handing out the usage logs were never guaranteed.  I think we all got caught in Amazon's non-SLAs for CloudFront.

Daniel

Robert Kingston

unread,
Sep 23, 2014, 9:37:23 PM9/23/14
to snowpl...@googlegroups.com
Ditto, guys.The contents of our in bucket also show the same symptoms. 

However the throughput of logs is still really low - it's almost lunchtime here and since running enrichment at 8am, I would normally be seeing thousands of logfiles in the bucket. We're only seeing a couple of hundred.

Robert Kingston

unread,
Sep 23, 2014, 9:45:51 PM9/23/14
to snowpl...@googlegroups.com
Can't find anything around missing logfiles - I've made a post here to see if we can get a response:


Feel free to jump on.

Peter Vandenberk

unread,
Sep 24, 2014, 7:12:03 AM9/24/14
to snowpl...@googlegroups.com
Hi all,

We're received the following notice from the Cloudfront team in response to a support issue we raised with them:

We can confirm that there has been a delay in writing a portion of your Amazon CloudFront access logs to your Amazon S3 bucket. Specifically, there has been a delay in delivering access logs for usage that occurred between September 22nd at approximately 9AM PDT and September 23rd at approximately 11AM PDT.
 
Logs for usage that occurred from September 23rd at 11AM PDT onward are currently being delivered normally. Over the next several hours, the system will automatically fill in logs that were delayed. There’s no need to take any action, and no access logs will be lost
 
We apologize for any inconvenience this delay may have caused. We are continuing to track this issue closely and will provide you with a further update when we have caught up with the backlog.


Seems to confirm what everyone in this thread has been reporting, and I guess the good news is that there won't be any data loss... having said that, though, it does sounds like it will be a long time before things will be back to normal.

Peter

Daniel Ramagem

unread,
Sep 24, 2014, 8:52:55 AM9/24/14
to snowpl...@googlegroups.com
That's great news Peter, thanks for sharing it with us!

Kudos and thanks also to everyone who jumped on this thread to share their status.

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simon Rumble

unread,
Sep 24, 2014, 11:08:45 PM9/24/14
to snowpl...@googlegroups.com
And Amazon have deleted the thread Robert linked to!

Simon Rumble

unread,
Sep 25, 2014, 8:39:41 PM9/25/14
to snowpl...@googlegroups.com
How is everyone's Cloudfront data looking now? We seem to be back to normal but still missing a bunch of data on 22nd September:
Inline images 1
(not sure if Google Groups allows attachments, so if not this: https://dl.dropboxusercontent.com/u/62533350/Collection%20sanity%20check.png )

This has given me impetus to finally switch to direct S3 collection. I've made some changes to SnowCannon to get it outputting in Clojure format. Will get some testing in over the next few days and get it into production.

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simon Rumble

unread,
Sep 25, 2014, 8:43:34 PM9/25/14
to snowpl...@googlegroups.com
I should point out that we're seeing large numbers of files turning up in our S3 buckets, so they do seem to be clearing the backlog. Will wait and see whether we get 22nd September back.

Daniel Ramagem

unread,
Sep 25, 2014, 9:16:00 PM9/25/14
to snowpl...@googlegroups.com
Hey Simon,

Same observations on my end: since last night we've been getting a boatload of log files from the missing 31-hour gap.  I'm waiting on our regularly scheduled Snowplow run tomorrow morning to compare with the expected daily data.

FWIW, I'm also looking into the Clojure collector option.

Daniel

joachim.fr...@springlane.de

unread,
Sep 26, 2014, 3:00:55 AM9/26/14
to snowpl...@googlegroups.com
Yesterday and today we also receive a bunch of log files and for now it looks as if all past days are fine. Let's see what happens the next days...

Gabor Ratky

unread,
Sep 26, 2014, 5:08:10 AM9/26/14
to snowpl...@googlegroups.com
Hi Simon,

With September 22 (NEVER FORGET) the shortcomings of the CloudFront collector became very apparent to all of us but I wonder if you ever considered the client-side latency hit of recording with SnowCannon and on your own servers.

One benefit of the CloudFront collector is that the edge locations are close to the user and the fire and forget GET requests complete faster and before users navigate away. Even a short delay can mean losing a few percentage point of events because the browser can't complete the request if it takes longer.

I'd be super curious to hear about your experience or if you ever considered multiple collectors in different DCs.

Gabor

Simon Rumble

unread,
Sep 26, 2014, 5:15:15 PM9/26/14
to snowpl...@googlegroups.com
Hi Gabor.

For us the difference between Cloudfront and SnowCannon is pretty insignificant in the Australian region. Cloudfront only has two nodes here, Sydney and Melbourne. Our SnowCannon instances are in Sydney. Regardless, the benefits of a third party cookie are well worth it with 85+ sites.

There's no reason you couldn't run a multi-region implementation of SnowCannon with Anycast DNS pointing people to the nearest.

Robert Kingston

unread,
Sep 26, 2014, 8:15:21 PM9/26/14
to snowpl...@googlegroups.com

Our data is also missing swathes of Sept 22 still. I last ran enrichment a 2 hours ago.

@Gabor, my thoughts exactly. Would love to see someone comparing CF/Clojure collectors running side by side. Last time I checked, Yali and Alex are running them simultaneously on the Snowplow site. 

At least the local storage queue should go some way to ensuring events are fired off regardless of latency.

Peter Vandenberk

unread,
Sep 29, 2014, 4:47:53 AM9/29/14
to snowpl...@googlegroups.com
Hi all,

I just thought I'd update everyone with a further reply from the AWS Cloudfront team, in response to our original support ticket:

I have received confirmation from our engineering team that the backfill of missing logs has been completed.
I apologize, as there was some additional delay beyond our estimated end-time in this process.
Please let me know if you believe you are missing any logs from this period. I apologize for any inconvenience this may have caused.

So I guess if you're missing logs from September 22nd, it might be worth raising a support ticket with Amazon.

FYI,

Peter V.

Yali Sassoon

unread,
Sep 29, 2014, 8:35:17 AM9/29/14
to snowpl...@googlegroups.com
We do run the Cloudfront and Clojure collectors side-by-side on our own website, as indicated by @Rob. (This is very easy with v2.0 of the JS tracker, because it supports sending data to multiple collectors.)

When I get a chance, I'll post a detailed comparison of the data we receive from each collector, but we receive slightly more events from the Clojure collector than the Cloudfront collector - this is the more reliable of the two. (Although the difference is very small.)

@Gabor was making a different point on latency, which is that it's likely the Cloudfront collector responds returns faster than the Clojure collector because it's effectively running at many more end points. It would be good to find an objective way of testing this (any ideas?) This is one of the reasons we've kept the Clojure collector as simple as possible, so that at least the server responds in the fast possible time.
Reply all
Reply to author
Forward
0 new messages