So what DID happen with the internet this week?

24 views
Skip to first unread message

mar...@ranyard.info

unread,
Sep 11, 2016, 5:58:37 PM9/11/16
to Leeds Hack Space
Hi All,

I thought I'd drop a line to the list to explain why I lost my Friday
evening to fixing the space internet (only to have it cut on Saturday
morning!). If you're not techie, this won't interest you, but it's
really interesting from a techie point of view!

So, a few people reported the laser not working, and we pinned it down
to certain domains not resolving from inside the hackspace. I was about
to put in place a hacky fix when someone suggested safely rebooting the
server to clear it's DNS cache. When that was done, internal .hackspace
domains were resolving but no external domains did. Pinging external
internet IPs worked so there was definitely a major issue with our DNS
resolution.

I had to fetch my laptop from work (thanks to Alex for the lift) to go
to the space to figure out what on earth was happening. By the time
we'd done the round trip, some of the internet was resolvable again, so
it clearly wasn't a full failure, but still weird.

Looking at the DNS logs (we're running bind9 for info, I last touched
bind at bind8), it was full of cryptic messages about unable to validate
something-or-other (Alex may have the exact error) and googling that
message came back with pretty conclusive "your clock is wrong" and
something regarding DNSsec.

Now, like any well-set-up system, marvin is set up to get it's time from
the internet (ntp), and does so on reboot and at intervals during the
day. Here's where it gets a little wierd. If the server had been able
to look up pool.ntp.org, then it would have fixed itself.

The conclusion (having eliminated a lot of impossible, whatever remains,
however improbable...) that I have come to is that when the server
halted due to heat, it did not do so irrevocably. It wasn't a HCF
instruction, and so when the heat levels returned, the cpu resumed
ticking, having lost some number of hours. And of course that broke dns
lookups locally to the net and so ntp couldn't reset the time.

To combat this, we added verisign's no-track public DNS[1] to both the
server (so ntp doesn't break if bind goes into a tizwaz) and to the dhcp
hand-out list (so if bind dies, we still have internet dns). So,
theoretically, heat taking out the server should not leave us crippled
when the server regains consciousness.

Of course, that definitely doesn't stop the peddlars cutting the wire
the next day!

Cheers,
--
Martyn

[1] I use google DNS but there are members who dislike google's data
policies. OpenDNS is also an option but they have been known to return
bogus results for fails and blocks. Verisign spotted this distrust
model and chose to launch a no-track dns
https://www.verisign.com/en_US/security-services/public-dns/index.xhtml

Stanto

unread,
Sep 11, 2016, 8:04:02 PM9/11/16
to Leeds Hack Space
Pbbbtfpt good grief. Weird behaviour and physical problems, at near the same time, how frustrating.

Thanks for sorting it out, good info' on verisign, too. Wasn't aware of that.

Alex Silcock

unread,
Sep 11, 2016, 8:34:53 PM9/11/16
to leeds-ha...@googlegroups.com
If anyone's interested in seeing what changes were made to alter the DNS settings, there are a number of merged pull requests on the Marvin repo on GitHub.

--
You received this message because you are subscribed to the Google Groups "Leeds Hack Space" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leeds-hack-space+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andy Cook

unread,
Sep 12, 2016, 4:04:20 AM9/12/16
to Leeds Hack Space
Thanks Martyn,

I read it all... But didn't understand it of course but was still interesting. 😀

Aidan Dunbar

unread,
Sep 12, 2016, 4:19:14 AM9/12/16
to leeds-ha...@googlegroups.com
Thank you Callum for fixing the physical cable-cut problem on Saturday.

A

On 12 September 2016 at 09:04, Andy Cook <onlyha...@gmail.com> wrote:
Thanks Martyn,

I read it all... But didn't understand it of course but was still interesting. 😀

Samwise Wilson

unread,
Sep 12, 2016, 4:51:10 AM9/12/16
to Leeds Hack Space
This update broke my workflow! I had configured emacs to interpret a temperature rise and time drift as my ctrl key as its hard to reach. 

Can you please add an option to re-enable heat based time drift?


On Monday, 12 September 2016 09:19:14 UTC+1, Aidan wrote:
Thank you Callum for fixing the physical cable-cut problem on Saturday.

A
On 12 September 2016 at 09:04, Andy Cook <onlyha...@gmail.com> wrote:
Thanks Martyn,

I read it all... But didn't understand it of course but was still interesting. 😀

--
You received this message because you are subscribed to the Google Groups "Leeds Hack Space" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leeds-hack-spa...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages