Ah. Mote is not yet capable of delayed, or otherwise, scheduled events. Seems like an interesting feature to pursue in the future, though.
What can be done is to play audio that's tied to a triggered event. For example, Mote has what we call MEL or movement event layers, which are basically trigger areas that fire up script when a token enters their bounds.
Audio can also be tied into macro buttons, which play when the macro is used.