Mac Freezing/Stuck in Standby

184 views
Skip to first unread message

John Huntington

unread,
Jul 7, 2014, 1:06:04 PM7/7/14
to ql...@googlegroups.com
I'm testing out QLab for an upcoming 24/7 attraction, and I've been getting a weird freeze/crash.  

Here's a little video documenting the system in the frozen state:

The system is rock solid when I'm messing with it, and then when I leave it for longer it goes into this weird crash/standby state, where Qlab and the Mac OS freezes, but the machine still responds to Ping commands over the network from the connected PC.  (This time I started it on Thursday around noon and it froze around 3am Saturday)

The QLab Mac is being driven from Medialon Manager on a PC via QLab's pseudo-OSC, and then outputting 48 channels of audio onto a Dante network. Control and Dante Audio are on two different Ethernet adapters/IP Addresses on the Mac (through two different switches), and I'm running QLab 3.0.15, and OSX 10.9.1.  

I had the system preferences Computer Sleep set to "never" and then Display Sleep set to "never", although I had a screen saver set to turn on after one hour.  

I did have "Put hard discs to sleep when possible" checked; it seems like if it 36 hours or so just fine then the disc thing shouldn't be a problem, but and have now unchecked that and am starting another test.  

But in the meantime I thought I'd put it out to the experts here.  

Thanks!

John


Dave "luckydave" Memory

unread,
Jul 7, 2014, 1:11:28 PM7/7/14
to ql...@googlegroups.com
On Monday, July 7, 2014 at 10:06 AM, John Huntington wrote:
I'm running QLab 3.0.15, and OSX 10.9.1.  

I had the system preferences Computer Sleep set to "never" and then Display Sleep set to "never", although I had a screen saver set to turn on after one hour.  

I did have "Put hard discs to sleep when possible" checked

First, I'd strongly recommend updating both the OS and QLab to their current versions. I don't know of any reason to stay on older versions, and both have fixed bugs in the updates, so it could be that you're running into issues that have been resolved already by Apple or by us.

There's no reason to allow the hard disc to sleep, and plenty of reason not to allow it. While unlikely, there's still the possibility that a disc spin-down, and resulting spin-up, is causing a hang when data that's needed is taking too long to access.

-- 

Paul Gotch

unread,
Jul 7, 2014, 1:13:02 PM7/7/14
to ql...@googlegroups.com
On Mon, Jul 07, 2014 at 10:06:04AM -0700, John Huntington wrote:
> But in the meantime I thought I'd put it out to the experts here.

Is there anything interesting in the console logs from about the time
that the freeze happened?

-p
--
Paul Gotch
--------------------------------------------------------------------

John Huntington

unread,
Jul 7, 2014, 1:14:24 PM7/7/14
to ql...@googlegroups.com
Thanks, unfortunately our technician has to do that for this machine and he's on vacation.  I'll try that when he gets back.

John

John Huntington

unread,
Jul 7, 2014, 1:37:39 PM7/7/14
to ql...@googlegroups.com, pa...@chiark.greenend.org.uk, paulg...@chiark.greenend.org.uk
On Monday, July 7, 2014 1:13:02 PM UTC-4, Paul Gotch wrote:
Is there anything interesting in the console logs from about the time
that the freeze happened?


Interesting, I'm a Mac Noob and didn't know about this.  But I checked and there's a bunch of stuff around that time:

Jul  5 03:34:53 ENT-PROD-052.local QLab[225]: Error: _performOscMethod: received no cues.

Jul  5 03:34:56 ENT-PROD-052.local bndaemon[69]: Sent,req=12742,resp=0,Ping,ecsa.citytech.cuny.edu,Sat Jul  5 07:34:56 2014

Jul  5 03:34:56 ENT-PROD-052 kernel[0]:

NOTE: (This is part of our school’s network access control system, and this machine is off the network so it keeps failing this ping so I deleted a bunch of these from the log going forward)

 

NetworkAudioEngine[0xffffff803c0e7000]::refreshARP - bundle 0/169.254.41.173 next ARP in 285 seconds @ 384603.536147479

Jul  5 03:34:58 ENT-PROD-052.local QLab[225]: Error: _performOscMethod: received no cues.

Jul  5 03:35:03 ENT-PROD-052.local QLab[225]: Error: _performOscMethod: received no cues.

Jul  5 03:35:04 ENT-PROD-052.local QLab[225]: Warning: audioDeviceDidOverload: Dante Virtual Soundcard

 

This "audioDeviceDidOverload" message seems to be happening a lot in the logs even up through this morning.  This is the current version of Dante Virtual Soundcard, and it's running now and these errors are not being generated.

 

Jul  5 03:35:08 ENT-PROD-052.local QLab[225]: Error: _performOscMethod: received no cues.

Jul  5 03:35:19 --- last message repeated 2 times ---


I assume this is after QLab froze but the messages are still coming in from Medialon--these messages continue until this morning.


Jul  5 03:35:19 ENT-PROD-052 kernel[0]: NetworkAudioEngine[0xffffff803c0e7000]::refreshARP - bundle 2/169.254.43.201 next ARP in 292 seconds @ 384633.536693022

. . .

 

Jul  5 03:35:53 ENT-PROD-052.local QLab[225]: Warning: audioDeviceDidOverload: Dante Virtual Soundcard

Jul  5 03:35:56 --- last message repeated 9 times ---

 

Jul  5 03:35:56 ENT-PROD-052.local QLab[225]: Error: _performOscMethod: received no cues.

Jul  5 03:35:56 ENT-PROD-052.local QLab[225]: Warning: audioDeviceDidOverload: Dante Virtual Soundcard

Jul  5 03:36:01 --- last message repeated 15 times ---

 

Etc… 

Thanks!

John

John Huntington

unread,
Jul 7, 2014, 1:44:16 PM7/7/14
to ql...@googlegroups.com, pa...@chiark.greenend.org.uk, paulg...@chiark.greenend.org.uk
Oh I should point out that I've been intentionally pounding this machine beyond normal usage to try to force any problems to appear now (as they have) and not this fall.  

But even with this load the activity monitor shows 84% idle, and the Dante controller shows no errors.

Thanks!

John

Andy Lang

unread,
Jul 7, 2014, 1:58:36 PM7/7/14
to ql...@googlegroups.com
On Mon, Jul 7, 2014 at 1:37 PM, John Huntington <jch3ny...@gmail.com> wrote:
> This "audioDeviceDidOverload" message seems to be happening a lot in the
> logs even up through this morning. This is the current version of Dante
> Virtual Soundcard, and it's running now and these errors are not being
> generated.

Hey John,

I offer this with the caveat that any and all of this might perform
differently once you update, since you're 5 updates behind on QLab
plus a few OS revs. That said, audioDeviceDidOverload means that the
audio device--in this case VSC--was ready for the next chunk of audio
data, and QLab had the audio data ready to give it to it (ie, it
wasn't waiting to get it from a slow hard drive).

In the absence of other concurrent error logs, this generally
indicates that the CPU isn't able to keep up with things*, so you
either need to simplify the show, or upgrade to a more powerful
computer. If you're stress testing the machine well above what you'd
normally see in show conditions, then, I'd vote on the former being
the easier solution.

But, since these computers are on the network, and configured with all
the access control stuff that the IT department puts on it, you're
already fighting a losing battle for CPU capacity. You really need to
get the IT department to treat these as show control devices, and not
regular computers, and properly optimize them so that they're ONLY
running QLab and show critical software. Sam posted a guide to
important things to tweak on your computer for best performance in
QLab, which you can find at
http://figure53.com/notes/2013-10-29-prepare-execute-troubleshoot/

I know it's a struggle, but you really will see a vast performance
improvement if you can pry these things out of IT's hands and treat it
just like any other dedicated piece of show control gear. You wouldn't
let them install all sorts of extra management stuff on your PLC
controller, right?

Good luck,
Andy

*-In conjunction with other error messages, audioDeviceDidOverload can
be caused by other problems, which is why we ask users to send us the
full log files, rather than just try to self-serve parsing them.
Sometimes two or three different messages can combine to point to
something very different.

John Huntington

unread,
Jul 7, 2014, 2:03:49 PM7/7/14
to ql...@googlegroups.com
Thanks...  Unfortunately, if these machines ever go on the network then we have to put the school's horrible and stupid network access control on there.  We have no other choice.

Usually it's not too onerous, but I didn't even know it was trying to ping home like that until I looked at the logs.

I will request the upgrades, but do you not trust the Mac activity monitor saying the machine's at 87% or so idle?  And when I've been listening to the audio it's all been fine--no glitches, etc.  

Thanks!

John


Paul Gotch

unread,
Jul 7, 2014, 2:13:20 PM7/7/14
to ql...@googlegroups.com
On Mon, Jul 07, 2014 at 10:37:39AM -0700, John Huntington wrote:
> Jul 5 03:34:56 ENT-PROD-052.local bndaemon[69]:
> Sent,req=12742,resp=0,Ping,ecsa.citytech.cuny.edu,Sat Jul 5 07:34:56 2014

The Bradford Security Agent is a nasty horrible thing which could
easily be randomly running 'compliance' scans at silly times.
Unfortunately too many sites try and secure the machines connected to
the network rather than the network itself.

It's really worth having a clean 'show optimised' machine that doesn't
have anything other than the standard OS install, QLab and supporting
software such as DVS on it and is treated as an embedded device rather
than a computer.

Chris Ashworth

unread,
Jul 7, 2014, 2:15:27 PM7/7/14
to ql...@googlegroups.com
On July 7, 2014 at 2:03:50 PM, John Huntington (jch3ny...@gmail.com) wrote:
I will request the upgrades, but do you not trust the Mac activity monitor saying the machine's at 87% or so idle?  And when I've been listening to the audio it's all been fine--no glitches, etc.  


It’s possible those logs are showing up at moments that do not reflect audible glitches. I think in general it is worth delaying detailed analysis/investigation until QLab and the OS are updated, since there are a great many significant bugs fixed since that version.  That’ll clear out those as possible sources of problems.

Cheers,

C

Andy Lang

unread,
Jul 7, 2014, 2:17:23 PM7/7/14
to ql...@googlegroups.com
On Mon, Jul 7, 2014 at 2:03 PM, John Huntington <jch3ny...@gmail.com> wrote:
> Thanks... Unfortunately, if these machines ever go on the network then we
> have to put the school's horrible and stupid network access control on
> there. We have no other choice.

Well, if it's for show use, it shouldn't touch the network. It should
only touch your show LAN. So that seems like it'd solve that problem?

> Usually it's not too onerous, but I didn't even know it was trying to ping
> home like that until I looked at the logs.

Not onerous on the surface != not onerous under the hood :-)

> I will request the upgrades, but do you not trust the Mac activity monitor
> saying the machine's at 87% or so idle? And when I've been listening to the
> audio it's all been fine--no glitches, etc.

Chris just beat me to the punch on this; if they're not audible
glitches, they may not be problematic, and, in either case, we can't
really troubleshoot more until they do those updates at a minimum, and
ideally get it entirely off the WAN so that you can properly configure
it as a show control machine.

-Andy

Chris Ashworth

unread,
Jul 7, 2014, 2:19:23 PM7/7/14
to ql...@googlegroups.com
On July 7, 2014 at 2:17:23 PM, Andy Lang (an...@figure53.com) wrote:

Chris just beat me to the punch on this; 


Sorry!  {hangs head and goes back to the coding cubicle}  

John Huntington

unread,
Jul 7, 2014, 2:22:57 PM7/7/14
to ql...@googlegroups.com
On 7/7/2014 2:16 PM, Andy Lang wrote:
> On Mon, Jul 7, 2014 at 2:03 PM, John Huntington <jch3ny...@gmail.com> wrote:
>> Thanks... Unfortunately, if these machines ever go on the network then we
>> have to put the school's horrible and stupid network access control on
>> there. We have no other choice.
> Well, if it's for show use, it shouldn't touch the network. It should
> only touch your show LAN. So that seems like it'd solve that problem?

Kind of hard to upgrade the OS these days without internet access. :-)
And that used to be our policy, but our spring show had three Watchout
machines live on the internet as part of the show
(http://www.wingmantheshow.com/). Being on the internet is,
unfortunately, part of the future and we have to figure out how to
address this issue going forward somehow.

We run Watchout, Medialon, etc etc with virus detection and the network
access control stuff on the machine, although unless needed the machines
aren't on the internet.


> Chris just beat me to the punch on this; if they're not audible
> glitches, they may not be problematic, and, in either case, we can't
> really troubleshoot more until they do those updates at a minimum, and
> ideally get it entirely off the WAN so that you can properly configure
> it as a show control machine.

Unfortunately, it has to keep internet access, but I've requested the
updates from our technician who has all the license info, etc. This is
why I test things months in advance around here :-)

Thanks!

John

--
www.controlgeek.net/blog
www.johnhuntington.photography

Paul Gotch

unread,
Jul 7, 2014, 2:23:59 PM7/7/14
to ql...@googlegroups.com
On Mon, Jul 07, 2014 at 11:03:49AM -0700, John Huntington wrote:
> Thanks... Unfortunately, if these machines ever go on the network then we
> have to put the school's horrible and stupid network access control on
> there. We have no other choice.

Don't put it on the network then. Or only put it on show control
networks which are physically separate from the school's network.

It is actually possible to download OS X updates from another computer
and install them, it's also possible to update Qlab that way.

It's possible that outdevicedidoverload is being caused by some
security software hogging the disk or the CPU at an inopertune moment.
Unfortunately the activity monitor doesn't contain a logging function.

John Huntington

unread,
Jul 7, 2014, 2:28:26 PM7/7/14
to ql...@googlegroups.com, pa...@chiark.greenend.org.uk, paulg...@chiark.greenend.org.uk
On Monday, July 7, 2014 2:23:59 PM UTC-4, Paul Gotch wrote:
Don't put it on the network then.

Kind of hard to do Dante without it :-)
 
Or only put it on show control
networks which are physically separate from the school's network.

Don't have a choice any more--see above.

Being on the internet is, unfortunately, the future for a lot of show computers.

John 

Dave "luckydave" Memory

unread,
Jul 7, 2014, 2:29:22 PM7/7/14
to ql...@googlegroups.com
On Monday, July 7, 2014 at 11:22 AM, John Huntington wrote:
Unfortunately, it has to keep internet access, but I've requested the
updates from our technician who has all the license info, etc. This is
why I test things months in advance around here :-)

There's no license information needed for either update. Just download the combo updater on another computer from Apple's site for the OS, and QLab 3.0.20 from ours. Install, and you're done.

-- 

John Huntington

unread,
Jul 7, 2014, 2:32:54 PM7/7/14
to ql...@googlegroups.com
Thanks, I just sent you and Andy a private message about this.

Thanks!

John

Dave "luckydave" Memory

unread,
Jul 7, 2014, 2:34:16 PM7/7/14
to ql...@googlegroups.com
On Monday, July 7, 2014 at 11:32 AM, John Huntington wrote:
Thanks, I just sent you and Andy a private message about this.

For the record, sending to us directly is less efficient than sending to sup...@figure53.com, where all support requests should go. :)

-- 

Paul Gotch

unread,
Jul 7, 2014, 2:47:57 PM7/7/14
to ql...@googlegroups.com
On 07/07/2014 19:28, John Huntington wrote:
> Kind of hard to do Dante without it :-)

Dante is also notriously fickle and requires specific support for
specific network features in switches. I'd always run an entire separate
network for it rather than putting it across a corporate network.

Similar things apply to ACN for lighting.

> Don't have a choice any more--see above.
>
> Being on the internet is, unfortunately, the future for a lot of show
> computers.

I'm afraid we are arguing in circles. The kind of problems caused by
Antivirus and NAC software are not something that can be solved.

For example most lighting desks these days are Embedded Windows
computers and some only have ACN outputs and require external ethernet
connected nodes to create traditional DMX. Trying to install all this
kind of stuff on one one of those would result in the vendor voiding
your warranty and not supporting you.

-p

John Huntington

unread,
Jul 7, 2014, 3:13:02 PM7/7/14
to ql...@googlegroups.com
On 7/7/2014 2:47 PM, Paul Gotch wrote:
On 07/07/2014 19:28, John Huntington wrote:
Kind of hard to do Dante without it :-)

Dante is also notriously fickle and requires specific support for specific network features in switches.

What specific support for specific network features are you talking about? AVB, of course, needs specific switch features, which is one of its downfalls (I just wrote about this, in fact http://controlgeek.net/blog/2014/6/24/avb-and-audinate-dante-an-update-after-infocomm)  I've had lots of problems getting the Dante controller to properly punch through Windows firewall, but once that's done I've run Dante on a wide variety of switches without incident. 

That said, we generally run Dante over managed small business grade Cisco switches.  For our haunted hotel, I put Dante into its own VLAN, and I just leave everything for link local IP address assignment and it's worked well since 2011:
http://controlgeek.net/blog/2012/3/16/managed-switchrouting-ethernet-infrastructure-for-the-graves.html

 I'd always run an entire separate network for [Dante] rather than putting it across a corporate network. 


Similar things apply to ACN for lighting. 


Actually the graphic in this 2007 article shows ACN devices being discovered all over the ETC corporate network.  
http://controlgeek.net/articles-and-other-work/2007/9/1/the-acnfuture-is-here.html
(And boy was I wrong about the (full) ACN future being "here" :-) 

Lots of show control devices (some with embedded Windows) are now designed to sit (at least on one ethernet jack) on the internet.  It's up to us now to manage the security, etc. 

In a school environment, we have to run anti virus stuff.  We used to not run it and leave machines off the networks, but the students managed to bring in all kinds of crazy malware on USB drives, etc.  (think Stuxnet).





Don't have a choice any more--see above.

Being on the internet is, unfortunately, the future for a lot of show
computers.

I'm afraid we are arguing in circles. The kind of problems caused by Antivirus and NAC software are not something that can be solved.


I think we mostly agree, but the problem is that shows we are doing today require live internet access, and I don't think that will go away.  Unfortunately, just saying "don't put it on the internet" (as I have for years), won't cut it going forward.  I've already lost that argument :-)

John

-- 
www.controlgeek.net/blog
www.johnhuntington.photography

Paul Gotch

unread,
Jul 7, 2014, 4:28:07 PM7/7/14
to ql...@googlegroups.com
On 07/07/2014 20:13, John Huntington wrote:
> What specific support for specific network features are you talking
> about?

- The switch must have enough backplane bandwidth to be non-blocking
between pairs of ports.
- Energy Efficient Ethernet must be switched off.
- The switch must be able to forward packets between pairs of ports at
1.488 Mpps ie Gigabit with normal size frames there are sadly switches
which claim to be Gigabit around which can't do this.
- For non-trivial systems IGMP snooping is a required feature
- For non-trivial systems Diffserv QoS is a required feature

It goes without saying that the switch must be stable. I've had Netgear
L2 managed switches crash and stop forwarding packets due to a machine
going crazy and flooding it with ARP packets. In this kind of situation
it doesn't matter if things are separated by VLANs your show network and
possibly a show in front of paying customers has been interrupted due to
something that happened on a logically different network.

> That said, we generally run Dante over managed small business grade
> Cisco switches.

Yamaha even recommend the SB 300 series switches if you dig deep enough
for example.

> away. Unfortunately, just saying "don't put it on the internet" (as I
> have for years), won't cut it going forward. I've already lost that
> argument :-)

There are degrees. However IT departments tend to have a binary view of
'it has our stack of stuff on it otherwise it doesn't go on any of our
networks' this is basically incompatible with 'I can't have my show in
front of paying customers interrupted by your virus scan blocking disk
IO so that my video doesn't play'.

There are lots of ways of providing similar levels of security but all
of them require discipline for example ensuring USB keys have been
scanned on a different machine before being plugged into the show
machine. Similar things apply to if you ultimately need some internet
access then everything but the very small number of endpoints you
actually need are firewalled out in both directions.

I fear that without clear thinking based on a proper threat model and
procedures based on prevention rather than detection the next edition of
your book is going to be quite short.

Taking your example of Stuxnet the sad thing is that the point not
solving this problems gets us to is no general purpose machines running
software like Qlab and lots of supposed embedded devices which are
actually far less secure.

-p

John Huntington

unread,
Jul 7, 2014, 4:40:17 PM7/7/14
to ql...@googlegroups.com
On 7/7/2014 4:27 PM, Paul Gotch wrote:
> There are degrees. However IT departments tend to have a binary view
> of 'it has our stack of stuff on it otherwise it doesn't go on any of
> our networks' this is basically incompatible with 'I can't have my
> show in front of paying customers interrupted by your virus scan
> blocking disk IO so that my video doesn't play'.

Well I guess we work in different worlds. I don't have the option of
going around our IT department.

>
> There are lots of ways of providing similar levels of security but all
> of them require discipline for example ensuring USB keys have been
> scanned on a different machine before being plugged into the show
> machine.

That's nice in theory, but it doesn't work in a school environment.

> Similar things apply to if you ultimately need some internet access
> then everything but the very small number of endpoints you actually
> need are firewalled out in both directions.
>
> I fear that without clear thinking based on a proper threat model and
> procedures based on prevention rather than detection the next edition
> of your book is going to be quite short.

Well, that's coming off to me as an insult, so I don't really see any
point in continuing this conversation.

John

--
www.controlgeek.net/blog
www.johnhuntington.photography

Paul Gotch

unread,
Jul 7, 2014, 4:47:23 PM7/7/14
to ql...@googlegroups.com
On 07/07/2014 21:40, John Huntington wrote:
> Well, that's coming off to me as an insult, so I don't really see any
> point in continuing this conversation.

I it was not meant as such and I apologise for causing you to take it a
such.

-p

John Huntington

unread,
Jul 7, 2014, 4:52:38 PM7/7/14
to ql...@googlegroups.com
Thanks for clarifying--I actually (to a point) advocate exactly what you
are saying in the book. :-)

It's been an interesting discussion in any case, I owe you a drink next
time you're in NYC for the misunderstanding....

Now I really need to step away to get some other work done :-)

John



--
www.controlgeek.net/blog
www.johnhuntington.photography

Reply all
Reply to author
Forward
0 new messages