WAI:ARIA politeness

Gijs Kruitbosch

unread,

Apr 26, 2007, 10:49:11 AM4/26/07

to Peter Thiessen, Charles L. Chen, Aaron Leventhal

All,

I'm having some issues with WAI:ARIA politeness levels. The idea I had
was that in ChatZilla, normally messages would be spoken at the 'polite'
level. However, I also thought it would be best if there was some way
for the user to designate which messages they thought more important
than others. Usually this would include messages with their names in
them, or messages in IRC channels with very low traffic/activity, or
messages in high-traffic channels including keywords they're interested
in (say "WAI:ARIA" in #developers on moznet, or something).

This is where things get problematic. Maybe I'm just plain wrong, but
judging from what I've heard from Charles Chen, if I were to set the
politeness level to "assertive" for such messages, the user would be
interrupted, and all previous live region changes would be discarded, or
otherwise the user would be unable to identify the context/order in
which such changes had appeared.

To me, this makes the live region politeness settings mostly useless. On
busy IRC channels, while reading incoming messages, discarding all
messages before an important message (and after the currently read
message) means the user will have no indication anything followed
between the last 'normal' message and the important message. This is
obviously horribly disorienting and should be avoided.

Personally, I feel the 'assertive' politeness setting should be used for
emphasis, and the screenreader should itself give an indication of
reading things out of order. Personally, I think the following sequence
of read messages would make perfect sense even for people dependent on
their AT (imagine the user is Tim):

<John> Oops, I forgot I had to do that other thing.
<Tim> What other thing?
Important: <John> Tim: could we discuss live regions for a bit?
Earlier changes:
<John> I forgot to walk the dog.

(
the above would be read, if given the actual order of messages:
<John> Oops, I forgot I had to do that other thing.
<Tim> What other thing?
<John> I forgot to walk the dog.
...
<John> Tim: could we discuss live regions for a bit?
)

I could also imagine that you might want to read such important messages
even if the tab or window on which they appear is in the background,
though I'm less sure about that.

As everyone knows I'm new to a11y, so perhaps I'm horribly wrong - but
to me it seems disadvantegeous to have no way to designate a difference
in importance for what might otherwise be way too much information for a
user to handle.

I'd love comments on this issue, and any help / ideas you may have to
offer on this particular problem, or the problem of accessible chat in
general.

~ Gijs

Charles Chen

unread,

Apr 26, 2007, 8:47:56 PM4/26/07

to

The main problem here is that there are two ways to do this.
1. Higher priority messages speak before lower priority messages but do
not clear the lower priority messages from the queue in front of them.
2. Higher priority messages speak before lower priority messages AND
clear the lower priority messages from the queue in front of them.

Both methods could be valid in different contexts.
#1 is good for the situation that you are discussing in this post
#2 is good for reducing verbosity and keeping things in chronological order

Since Aaron had mentioned the idea of minimizing verbosity and how much
is spoken when a region changes, I chose to implement #2. But Gijs is
making a strong case for having #1.

A way to temporarily sidestep this issue would be for me to provide both
modes in Fire Vox and just let the user choose which one they want. That
is something that I am looking into. However, my own personal belief is
that web developers should be able to specify which ever mode is best
for their web app (with the user able to override that of course). A
good way to allow that would be to have another WAI-ARIA attribute that
indicates whether a message should clear out earlier, lower priority
messages. This is an idea that I've put forth to the PFWG, but for me to
make a convincing case, I need more use cases.

ChatZilla is a good use case; can anyone think of other uses? Or better
yet, point me to some? Thanks.

-Charles

Peter

unread,

Apr 26, 2007, 10:29:32 PM4/26/07

to

Hi Gijs,

its nice to here that great minds think alike :) In Charles's paper
and mine (http://www.w4a.info/2007/prog/), I had a similar idea of
ranking the importance of messages using the three (four if you
include off) politeness levels: polite, assertive, rude. This would
give a neat-o queue that would allow users to hear the most important
messages (information filtering).

Charles pointed out that the problem with this method was that
messages would constantly be interrupting each other and potentially
really confuse the user. (as he mentioned in the previous post) Though
the different politeness settings are in place to give different live
regions on an information rich and active page different importance. A
system message for example might be given a rude setting. Though if
you used this in a chat setting it would probably disorientate the
user. I'm not sure of any user tests where this has been shown though.

But right, from what I understand, live regions aren't really meant to
be used for ranking information rather just "dumbly" exposing DOM
updates. After a few posts (on this newsgroup for example), the
general consensus was that the AT should carry ranking intelligence.
Charles did a smashing job with modifying FireVox
(firevox.clcworld.net) to filter messages in ReefChat
(reefchat.overscore.com) based on username - this is similar to having
a specific user name with a higher ranked priority (so the
intelligence is in FireVox rather than in the ReefChat live region
markup).

-peter

Sina Bahram

unread,

Apr 26, 2007, 10:36:35 PM4/26/07

to Charles Chen, dev-acce...@lists.mozilla.org

I'd like to offer my thoughts from the perspective of someone who has to
listen to such messages, as a screen reader/blind user, and from a
programming perspective as well, since my formal training is computer
science.

I'd like to propose the following priorities, and I would like to extend the
definition of these priorities to depend upon a cancel functionality which
allows for the cancellation of messages at a given priority or higher.

I shall demonstrate this in my examples below.

Priorities in ascending precedence order:

Supplemental

Message

Notify

Alert

Explanations follow.

Name:

Supplemental

Summary:

Supplemental information is only important if it can be said "now".

Action:

Condition #1:

If no utterance is being spoken ==> speak

Condition #2:

If another utterance of priority "notify" is being spoken ==> queue to be
spoken.

Name:

Message

Summary:

Messages are the most common priority. They flush the supplemental queue,
and they queue up to be spoken.

Action:

#1: flush the supplemental queue

Note: this includes any utterances of priority "supplemental" that are
currently being spoken

#2: if no utterance of priority "message" or higher is being spoken ==>
speak; otherwise, queue up

Note:

The above could be concisely expressed as just action #2, but I wanted to
make it explicitly clear that "message" priority utterances override,
interrupt, and cancel all supplemental messages.

Name:

Notify

Summary:

This priority does not flush any queues, and in turn is not flushed by
"message" or "supplemental" utterances.

Actions:

#1: if nothing is being spoken, then speak

#2: if "message" utterances are being spoken, then wait for the specific
utterance to be completed, then speak. This does not cancel the rest of the
message utterances, as they will simply resume speaking after this utterance
finishes.

#3: if something of priority "notification" or "alert" is speaking, then
queue up.

Name:

Alert

Summary:

This priority flushes all other queues of "supplemental", "message", and
"notify".

Action:

#1: unless if something of type "alert" is being spoken, interrupt and
cancel all other utterances, and speak immediately.

Now, there are a lot of places where the above could be improved. For
example, it would be nice to have a priority level that does not interrupt
other "alert" messages but queues them up.

Basically there can be shades within each partition ... But I think this
gives the most versatility.

In this way, we can have the following situation:

We will assign information such as the time of an IM as supplemental

We will assign the IM itself as a message

We will assign the change in status of a user as notify.

We will assign the exit dialog as alert.

So we are messaging along, and we have the following per haps?

(message) "Bob: hi Sina"
*some time passes by before Bob responds*
(supplemental) "the time of the message"
Message: "Sina: hi Bob!"
(supplemental) "the time of the message"
*as Sina's message is finishing being read, this notification comes along*
Notification: "John has signed in!"
Message: "Sina: Bob, I'm going to exit now, so I can run away from John!"
(supplemental) "the time of the message"
Alert: "Are you sure you want to exit?"

I would hear the following

"Bob: hi Sina"
"the time of the message"
"Sina: hi Bob!"
"John has signed in!"
"Sina: Bob, I'm going to exit now, so I can run away from John!"
"Are you sure you want to exit?"

... I am actually going to leave this open for questions.

I would point out that because of the way the cancels work ... This has the
capability of being quite versatile with some very basic implicit
definitions up front.

Hope this makes sense?

Take care,
Sina

-Charles

_______________________________________________
dev-accessibility mailing list
dev-acce...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-accessibility

Gijs Kruitbosch

unread,

Apr 27, 2007, 4:36:16 AM4/27/07

to Sina Bahram, Charles Chen, dev-acce...@lists.mozilla.org

Hey Sina,

This actually seems like it would work great for ChatZilla's purposes.
The one question I have is about chronological processing, the answer to
which I can't clearly find in your example. Given a multi-user chat
where people mentioning your name will have a "Notify" priority, what
would happen in this case (you're Sina, obviously):

<Bob> Hi Tim, how was your weekend?
<Tim> Oh, it was great, thanks for asking. Yours?
<Bob> Not too bad, a bit busy I guess. Did you go out at all?
<Tim> I did on saturday, nothing spectacular though.
<Bob> Ah, ok. Any plans to hang out together again?
<Tim> Sina: how was your weekend, by the way?

Imagine your screenreader is still reading out Bob's second message as
Tim's message with your name in it comes along. As I understand it, you
would want Tim's message (with priority "Notify") to be read before
continuing with the other (earlier) messages which had priority "Message".

The question I now have is this: would the screenreader give any
(spoken) indication of the fact it processed something out of order? I
attempted to make it clear in my earlier example that I thought it
should (ie, the "important:" and "Earlier changes:" text). Obviously I'm
not in as good a position as you to judge whether this would be
necessary, but I imagined it would cause confusion if nothing was done
to point out this out-of-order-ness. On the other hand, if you're well
aware of what conditions make a message have "Notify" priority, I can
imagine it would be surplus information (because you'd realize what had
happened). So I'm curious where you stand on this particular issue (as
well as anyone else's opinions, for that matter).

~ Gijs

Al Gilman

unread,

Apr 27, 2007, 8:03:46 AM4/27/07

to dev-acce...@lists.mozilla.org

First, I'd like to emphasize Peter's point. The WAI-ARIA politeness
levels are primarily tageted at giving the author and server a way
to hint suggestions to the queuing and verbosity control that the
AT and other client side software manage.

The slogan is "author proposes, user disposes."

Further, based on just how Aaron has been presenting it, there
is nothing in WAI-ARIA semantics that would ever force actual
message content in a chat to be lost. The successive messages
are not successive values of the same object, but incremental
bits of content that are catenated onto the end of a wairole:log
object. The auto-browse mode of the voicing layer might indeed
skip to a 'rude' addition

<rude>[John] Charles, your pants are on fire! (22:14:12)</rude>

over the

<polite>[Sue] Oh, Charles, that was sweet. (22:12:48)</polite>

But all the messages would remain in the log until the buffer overflows,
and the AT could save the entire stream on the client side any time
it was skipping actual message content to keep up.

And the user could drop out of auto-browse mode to read the log
in order in non-real time at any time. (hypothetical AT modes; no
implementation in FireVox is implied).

[floodgate warning]

In the case of a chat application, the verbosity level wants to be
highly adaptive, with the level of filtering varying in real time
depending on the amount of free speech time available for
supplemental utterances, more or less like video description content.
When there's too much dialog, you can't squeeze in much description.
When there's dead air as far as fresh messages, you can take the time
to voice "Charles is typing a comment..."

Sina, one of my technology pipedreams is to get people describing
protocols such as the one you laid out for us in Harel state chart form.
This could be done in UML or in the new State Chart XML from the
Voice Browser Working Group in W3C. It is the same underlying model.

I also want to point out that for the kind of text content that Sina
is adaptively filtering, there is metadata that is stronger than the
WAI-ARIA politeness to base pruning decisions on.

The name of the speaker and the timestamp of the message are
properties attached to the utterance, the main message text, by the
system. This is something that the AT or voicing layer in a self-voicing
App such as Fire Vox should know. This semantics is available in markup
if you use the W3C Timed Text markup directly, and possibly also if
you refer to this in defining your metadata. The latter takes more
semantic web processing, but I wouldn't rule it out.

The point is, I believe, that 'speaker' and 'timeAtEnd' are chat-domain
concepts and we should put these at arms length from the more
broad-brush WAI-ARIA terms. These are things where we should work with
the Synchronized Multimedia Working Group (SYMM) -- the people who brought
you SMIL. The Mobile community and SYMM are unhappy with the fact
that the Timed Text Working Group did not put out a streaming-friendly
markup specification. They are pushing to carry on the work in that direction.
This is exactly where we want common or at least interoperable terms as
to timing for a timestamped capture of a multi-speaker conference call and
for a text chat that operates by typing and send-on-CarriageReturn.

The filter rule: in chat, "alert on any message addressed to me" is in the
same "system layer" as Jaws scripting for Excel. It's a rule that applies
only under the rapid, real-time situation of text chat and it can key off
metadata not recognized by the base configuration of the AT.

Al

Charles Chen

unread,

Apr 27, 2007, 9:03:17 AM4/27/07

to

> But all the messages would remain in the log until the buffer overflows,
> and the AT could save the entire stream on the client side any time
> it was skipping actual message content to keep up.
>
> And the user could drop out of auto-browse mode to read the log
> in order in non-real time at any time. (hypothetical AT modes; no
> implementation in FireVox is implied).

An implementation of that in Fire Vox is being worked on.

But even before that is ready, the ChatZilla support for that already
exists since you can go into the transcript window and scroll through
that quickly with Fire Vox.

Gijs Kruitbosch

unread,

Apr 27, 2007, 9:33:42 AM4/27/07

to

Right, but if there's no clue for the user that messages were dropped,
then that's not really good enough, in my opinion. If users
continuously have to check / read scrollback then that's almost as bad
as no WAI:ARIA support at all.

~ Gijs

Sina Bahram

unread,

Apr 27, 2007, 9:59:08 AM4/27/07

to dev-acce...@lists.mozilla.org

Aha, I see your question ... Great point!

Ok, here are my thoughts on that.

Because the user understands the semantics of the environment which they
are in; AKA, a chat with multiple people, the user should not have the order
switched up on them.

Instead I think that they should all be of type priority, but now I would
like to introduce the concept of channels.

I think that it would be very easy to express interest in a specific type of
message, or in a given priority of messages that could be put into a
channel.

For example:

I'd like all messages on channel 1

I'd like to add "Bob: Sina:" messages to channel 2

I'd like to add "Tim: Sina:" messages on channel 3

I'd like "Bob: Tim:" messages on channel 4

Obviously there is no intelligence here ... It is simply any messages from
the given parties, not messages directed at other users ... We're not doing
AI here :)

But, here is what this allows me to do

Only follow all messages in the channel

Only follow messages from Bob

Only follow messages from Tim

Only follow messages from Tim and Bob --> if three of us are the only
members, then essentially, I can follow their conversation, if let's say two
other members suddenly join and start jabbering away.

Because I will have Bob and Tim on my listen specification for channel 4, I
will hear them respond, and know to switch over to channel 1, if I wish to
follow their conversations with anyone else, at which point I can setup
additional channels.

Also, don't forget that user status changes are notification which means
I'll be notified of these users coming in, no matter what channel I'm in.

Sina Bahram

unread,

Apr 27, 2007, 10:09:55 AM4/27/07

to dev-acce...@lists.mozilla.org

Should we three setup a conference call to discuss this over the phone?

I think there are some good ideas, and we could benefit from a strategy
session to figure out the best way?

Possibly Saturday sometime?

If anyone is not in the US, then pick a time better for you, as I am up all
hours of the day and night.

I can make the calls, as I have what is essentially free planetary long
distance

Aaron, would you like to join this?

Take care,
Sina

-----Original Message-----
From: dev-accessib...@lists.mozilla.org
[mailto:dev-accessib...@lists.mozilla.org] On Behalf Of Gijs
Kruitbosch

Sent: Friday, April 27, 2007 9:34 AM
To: dev-acce...@lists.mozilla.org

~ Gijs

_______________________________________________

Sina Bahram

unread,

Apr 27, 2007, 10:22:33 AM4/27/07

to Al Gilman, dev-acce...@lists.mozilla.org

I think that's a great idea, but how do we proceed?

One of my personal goals is to adapt this priority system for reading
subtitled content for visually impaired folks. I would really love to be
able to consume some foreign content , with jaws or something else jabbering
in my ear about the subtitles. Because of how fast I listen to my AT ... I
could completely get the semantic content of what is being said, entire
seconds before it is actually spoken.

Take care,
Sina

-----Original Message-----
From: dev-accessib...@lists.mozilla.org
[mailto:dev-accessib...@lists.mozilla.org] On Behalf Of Al Gilman
Sent: Friday, April 27, 2007 8:04 AM
To: dev-acce...@lists.mozilla.org
Subject: RE: WAI:ARIA politeness and keeping up in Chat

First, I'd like to emphasize Peter's point. The WAI-ARIA politeness levels
are primarily tageted at giving the author and server a way to hint
suggestions to the queuing and verbosity control that the AT and other
client side software manage.

The slogan is "author proposes, user disposes."

Further, based on just how Aaron has been presenting it, there is nothing in
WAI-ARIA semantics that would ever force actual message content in a chat to
be lost. The successive messages are not successive values of the same
object, but incremental bits of content that are catenated onto the end of a
wairole:log object. The auto-browse mode of the voicing layer might indeed
skip to a 'rude' addition

<rude>[John] Charles, your pants are on fire! (22:14:12)</rude>

over the

<polite>[Sue] Oh, Charles, that was sweet. (22:12:48)</polite>

But all the messages would remain in the log until the buffer overflows, and

the AT could save the entire stream on the client side any time it was
skipping actual message content to keep up.

And the user could drop out of auto-browse mode to read the log in order in
non-real time at any time. (hypothetical AT modes; no implementation in
FireVox is implied).

[floodgate warning]

Al

>Take care,
>Sina
>
>-----Original Message-----
>From: dev-accessib...@lists.mozilla.org
>[mailto:dev-accessib...@lists.mozilla.org] On Behalf Of

>Charles Chen
>Sent: Thursday, April 26, 2007 8:48 PM

>To: dev-acce...@lists.mozilla.org
>Subject: Re: WAI:ARIA politeness
>

Al Gilman

unread,

Apr 27, 2007, 11:14:39 AM4/27/07

to dev-acce...@lists.mozilla.org

At 10:22 AM -0400 27 04 2007, Sina Bahram wrote:
>I think that's a great idea, but how do we proceed?

I realized after I hit 'send' that I hadn't worked through that part.

I have send a reference to the thread on this group/list to the chair
of the SYMM WG. I have from PFWG a stinky, old action item to set
up some dialog with them about items of mutual interest including
politeness and timing model.

I'll include Sina in scheduling discussions if we get so far as scheduling
a real-time call or chat. Gijs, would you like me to include you, too?
Charles? Aaron? Anybody else?

Caveat: There is a big gain in terms of getting their attention if
one has read the current SMIL3 Working Draft:

http://www.w3.org/TR/SMIL3/

.. so as to get your head into their world.

Bonus points for Timed Text:

http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/

Al

David Bolter

unread,

Apr 27, 2007, 11:36:33 AM4/27/07

to Sina Bahram, dev-acce...@lists.mozilla.org

Sina,

I think your channels approach is interesting. It sounds like you are
describing channels as a subset of conversation from the uber-channel
(i.e. the chat), but I'm wondering about what is actually spoken. Is it
only the currently selected channel that is actually producing TTS? Or
are we thinking of making use of multiple audio channels here? Or is
perhaps this a user preference?

I've been following this thread a bit and I agree that reordering the
chat, for any reason, is far from ideal. What if the chat ordering is
preserved, but another asynchronous TTS audio channel is used for
rude-level live region stuff?

cheers,
David

> Take care,
> Sina
>
> -----Original Message-----
> From: dev-accessib...@lists.mozilla.org

> [mailto:dev-accessib...@lists.mozilla.org] On Behalf Of Gijs
> Kruitbosch
> Sent: Friday, April 27, 2007 4:36 AM
> To: dev-acce...@lists.mozilla.org

Sina Bahram

unread,

Apr 27, 2007, 11:58:57 AM4/27/07

to David Bolter, dev-acce...@lists.mozilla.org

Only the channel that is selected is being spoken.

This can be extended to select multiple channels and assign them to left ear
and right ear.

Or it can be further extending, with appropriate audio mixing, to offer
different voices talking simultaneously with 300 MS offsets.

Let's stick to one channel at a time, and offer two channels simultaneously?

Take care,
Sina

-----Original Message-----
From: david bolter [mailto:david....@gmail.com] On Behalf Of David Bolter
Sent: Friday, April 27, 2007 11:37 AM
To: Sina Bahram
Cc: dev-acce...@lists.mozilla.org
Subject: Re: WAI:ARIA politeness

Sina,

cheers,
David

> Take care,
> Sina
>
> -----Original Message-----
> From: dev-accessib...@lists.mozilla.org

> [mailto:dev-accessib...@lists.mozilla.org] On Behalf Of Gijs
> Kruitbosch
> Sent: Friday, April 27, 2007 4:36 AM
> To: dev-acce...@lists.mozilla.org

Peter Parente

unread,

Apr 27, 2007, 2:15:05 PM4/27/07

to

Politeness levels, chat channels, priorities, etc. could all be mapped
to separate audio channels. I think the answer to the question of
"What's the 'proper' mapping?' is dependent both on the capabilities
of the listener and the task at hand. For instance, the ability to
switch attention among multiple audio streams and select one of N
varies with operating environment (to name just one of many factors).
In a noisy real-world environment (e.g., classroom), a user might
prefer listening to one chat voice over many simultaneous voices. As
for a task, say the user is in 10 busy chat rooms at once. A chat
channel to audio channel mapping is probably not preferable in this
situation. For another example, pretend the user is in just one
channel, but some chat bot keeps sending high priority, rude messages.
In this case, the mapping from priority to simultaneous audio channel
may make listening to the lower priority audio stream extremely
difficult.

What I'm hinting at here is that the user has to be in control of the
mapping from live region data and metadata to some "rendering
technique." In this context, "in control" may mean the ability to
customize some reasonable default mapping (e.g., configurable queue
time for rude messages before they interrupt) as well as the ability
to choose an alternative rendering (e.g., switch from rude messages =
speech in second audio channel to rude messages = audio icons in
second audio channel with speech queued for a later announce during a
period of inactivity). I'm using concurrent audio streams as an
example here, but I think my point applies to other mappings as well
(e.g., queued speech, auditory icons).

Pete

Charles Chen

unread,

Apr 27, 2007, 6:17:15 PM4/27/07

to

Sure, I'd love to be in on this conference call. Could it be some time
other than this Saturday? I have to be at Austin Penguin Day this
Saturday so that won't work for me...

Also, just a heads up, I will be at W4A next weekend.

Thanks.

-Charles

Gijs Kruitbosch

unread,

Apr 27, 2007, 6:54:42 PM4/27/07

to

[Note: Google groups is horrible about doing threading, so I'm not
sure I'm replying to the right post, but in essence I'm trying to
reply to everything said after my last post ;-) ]

So, I think the idea of multiple channels is very promising, realistic
and useful. I also think Charles and Peter Thiessen have been working
on this for the Ajax chat thing, though I'm not sure if it turned up
anything, as I think the last I heard was that they were trying to use
different volumes instead, but perhaps I'm mistaken - I'll let them
speak for themselves.

The issue I have with this, and I'm having trouble trying to figure
things out from your replies, is where the logic is going to be. So
where does the user specify what messages end up in what channel, and
how does the AT pick up on that.

My initial idea is that it would somehow be indicated in the chat
output, using some accessible interface (AT-SPI, IAccessible2,
WAI:ARIA, "something else") and that the user would configure it as a
semantic issue in the chat application.

Why? I don't personally believe that having the logic in the AT is
going to be useful, simply because it doesn't have *all* the semantic
information a chat app has access to, and it would require various
different algorithms to cope with different kinds of chat, social
situations, and traffic intensity (ie, using different channels for a
group chat with less than 10 messages an hour is not very effective,
but using one channel for a group chat with more than 100 messages in
5 minutes isn't either).

So then, for me, the logical conclusion would be to have the logic in
the application, but having logic in there just for AT purposes seems
'wrong', and even if one would disagree with that (I can see some
might) then the practical problem is that a lot of people might not
implement it, because they don't want to go to such an effort. Or so I
think.

This is where I currently get stuck - I'm having trouble figuring out
where the logic would live, and what kind of logic it would be, and
how free the ATs would be in using it. I'd love any input on how silly
the above thoughts are, or a different explanation of something you
might already have said which was supposed to answer the 'question'
above.

==
A little bit about the ChatZilla situation which I believe is
relevant. I'm currently working on a "message filter" implementation.
This will enable users and plugins/extensions to define filters, much
like one is currently able to use filters in mail applications
(Thunderbird, Outlook (Express)) and GMail. So you could match
messages based on nickname, content, location (in IRC, networks/
channels matter and nicks are not unique across them). I have a
working prototype. I still need to integrate the current algorithms we
use to determine what to "ping" a user for, when to create new message
tabs etc. in it, test the save/load functionality and write a UI for
it (none of the above is trivial). Other than that, it's going well.

The point of telling you this is, once this is done, it would be
trivial to allow the user to configure what 'channel' messages should
go in, using whatever criteria they would want. Originally, I thought
one would use it to specifically specify when messages are
"important" (which we currently assess by checking the nickname and
content against a list of "stalk" words). Important messages are
highlighted, copied to the tab of the network they appeared on (in
addition to the tab of the channel they were sent to), and they will
cause a 'beep' or user-configurable sound to fire. Important messages,
or so I thought, should also be given a higher WAI:ARIA politeness
setting. Clearly this would be Bad if it just caused (unannounced)
drops of earlier messages. Which is, in fact, why I made this
newsgroup topic.

So clearly the message filter could be used for such a "channels"
approach, though I imagine that one might want a specific UI or
command for the channels stuff, which is probably very dynamic, and
something you'd want to be able to change quickly.

So much for ChatZilla background. Again, I'd like to stress I'm new
here, so apologies if I'm not making sense or saying particularly dumb
things :-)

~ Gijs

PS: I think a conference call would definitely be good. I can use
Skype, which is relatively cheap. I'm available after about 6pm GMT on
saturday, and from then on practically all the way up to and including
tuesday, though I live in the Netherlands, and I do sleep (at fairly
normal times, though I won't mind getting up early or going to bed
late for the occasion). Oh, and I have an appointment at 8am European
time on monday, so staying up late on sunday is also not a good plan,
nor is meeting early on monday. Anyway, I'm breaking up, it's 1am, I'm
going to go get sleep.

Charles Chen

unread,

Apr 27, 2007, 7:06:30 PM4/27/07

to

This post is a response to some of the ideas that have been put forth so
far.

1. The idea of a separate audio channel is interesting and it's
something that Peter Thiessen and myself have been examining. Having
looked at it, my view is that it's a great concept and something that
should be explored further, but it is not something that is ready for
prime time yet. The main reason is that commonly available synthesizers
(like SAPI), do not allow developers to start a second audio channel. If
I create a second TTS object and tell it to speak, SAPI still waits for
the first one to finish (or interrupts the first one) before starting
the second one. Currently, the only way I've been able to get more than
one audio channel of speech has been to use two different synthesizers
(using SAPI + Java FreeTTS).

2. Many of these ideas are good (especially Sina's), but I am concerned
that they may be too specific to the issue of chat and not generalizable
enough to be included in WAI-ARIA. So, is there any way to generalize
this to other web applications?

3. There is also the issue of "how complex is too complex for web
developers". That's a concern too as something that is too complex may
intimidate web developers into not using it at all and may also lead to
misuse which could potentially do more harm than good. This is a complex
problem, but is there a way to greatly simplify these ideas and still
get most of the functionality?

4. My own two cents on how to approach this problem:
Add something to WAI-ARIA that specifies whether an update should clear
out earlier updates, jump the queue without clearing updates, or jump
the queue without clearing earlier updates AND having itself inserted
into the back of the queue in chronological order.

In the case of chat, the last option of jumping the queue without
clearing other and also inserting itself into the back would be the one
to use when there is a high priority message (such as the user being
addressed directly by username).

Let's use the chat scenario put forth by Gijs in the first post.

The actual order is:

<John> Oops, I forgot I had to do that other thing.
<Tim> What other thing?
<John> I forgot to walk the dog.

<John> Tim: could we discuss live regions for a bit?

What the user hears is:

<John> Tim: could we discuss live regions for a bit?

<John> Oops, I forgot I had to do that other thing.
<Tim> What other thing?
<John> I forgot to walk the dog.

<John> Tim: could we discuss live regions for a bit?

This is good because it addresses both issues:
The user will hear the higher priority message first, but the user will
also know the relative order of the messages since it will be spoken
again in chronological order. If the user realizes that is an old
message, the user can tell the AT to skip that message.

Also, given those semantics, the AT can do something smarter by adding
in information for the user. In fact, the AT could give the user
something like:
*High priority message* <John> Tim: could we discuss live regions for a
bit?

<John> Oops, I forgot I had to do that other thing.
<Tim> What other thing?
<John> I forgot to walk the dog.

*Message presented earlier* <John> Tim: could we discuss live regions
for a bit?

The content between the * are inserted by the AT and could be in a
different pitch, or even an earcon. Also, in the case of users who would
not want any message to be played twice, they could tell the AT to treat
such a case as either a queue jumper that does not clear the queue, or
as a lower priority message that does not jump the queue.

The easiest way to do this would be to divide assertive up into the 3 types.
Rude would clear everything out.
Assertive would do one of 3 things as specified above.
Polite would still be polite.
Off would still be off.

Thoughts?

-Charles

Sina Bahram

unread,

Apr 27, 2007, 7:55:48 PM4/27/07

to Peter Parente, dev-acce...@lists.mozilla.org

But this is why the actual channel focus, as well as the mapping is up to
the user.

Am I missing a part of your question?

Take care,
Sina

-----Original Message-----
From: dev-accessib...@lists.mozilla.org
[mailto:dev-accessib...@lists.mozilla.org] On Behalf Of Peter
Parente
Sent: Friday, April 27, 2007 2:15 PM
To: dev-acce...@lists.mozilla.org
Subject: Re: WAI:ARIA politeness

Pete

_______________________________________________

Peter Parente

unread,

Apr 27, 2007, 9:54:56 PM4/27/07

to

> 1. The idea of a separate audio channel is interesting and it's
> something that Peter Thiessen and myself have been examining. Having
> looked at it, my view is that it's a great concept and something that
> should be explored further, but it is not something that is ready for
> prime time yet.

Charles,

Let me know if you need help getting concurrent speech streams working
with SAPI. Part of my thesis is on using more advanced audio rendering
(e.g. concurrent speech and auditory icons, ambiance, 3D
spatialization) to improve the mapping from the high-bandwidth visual
display to audio. I have some working code that pipes SAPI output
through the FMOD (www.fmod.org) auditory mixer (free as in beer, but
not open source). I'm sure you could find another OSS library that can
accomplish the same goal, and apply the same technique. See
http://www.cs.unc.edu/~parente/clique/#demo for a demo.

I also have a pretty large bibtex database on research into auditory
information display, audiology, auditory scene analysis, auditory icon
design, auditory display evaluation, etc. If you'd like a copy to read
up on how to make more complex auditory displays usable, let me know.
I'll export the relevant references and my brief notes for you to
grok.

Pete

Charles Chen

unread,

Apr 27, 2007, 10:01:19 PM4/27/07

to

Well, what I am doing is creating a SAPI object and using it to generate
speech. Even if I create a second object, SAPI doesn't seem to use it as
a second.

So how do I get a second SAPI object that will speak on a separate
stream? Also, how do I get this to work on Java FreeTTS?

Peter Parente

unread,

Apr 27, 2007, 10:44:04 PM4/27/07

to

On Apr 27, 10:01 pm, Charles Chen <clc4...@HotPOP.com> wrote:
> Well, what I am doing is creating a SAPI object and using it to generate
> speech. Even if I create a second object, SAPI doesn't seem to use it as
> a second.

Right. There's a way to make SAPI generate the speech waveform data,
but then have it hand it back to you so you can mix it with other
waveforms and send it to an output device yourself. In the case of
Clique, I generate the waveform buffers and pipe them through FMOD in
multiple threads. The method also gives you the index markers paired
with the byte (or sample, I forget which) offsets in the waveform. I
monitor FMOD playback progress in the threads, and trigger the
appropriate callbacks when the index markers are reached.

Instead of inundating you with the Clique code, I think it's best if
you look at the pyTTS Python wrapper for SAPI I wrote a few years
back. The tutorial at http://www.cs.unc.edu/~parente/tech/tr02.shtml
has a section titled "Memory buffers" which briefly mentions that
pyTTS has the ability to synthesize speech waveforms into a buffer for
later playback. That section has a link to how you can pipe the speech
through pySonic, my Python wrapper for FMOD. If you look at the pyTTS
code and the pySonic tutorial for FMOD, you can probably piece
together how it's done.

If you need more concrete guidance, let me know. I figured its best to
let you explore and not hold you up while writing out a giant
explanation.

I'm not familiar enough with FreeTTS to tell you if a similar approach
is possible.

Pete

>
> > Charles,
>
> > Let me know if you need help getting concurrent speech streams working
> > with SAPI. Part of my thesis is on using more advanced audio rendering
> > (e.g. concurrent speech and auditory icons, ambiance, 3D
> > spatialization) to improve the mapping from the high-bandwidth visual
> > display to audio. I have some working code that pipes SAPI output
> > through the FMOD (www.fmod.org) auditory mixer (free as in beer, but
> > not open source). I'm sure you could find another OSS library that can
> > accomplish the same goal, and apply the same technique. See

> >http://www.cs.unc.edu/~parente/clique/#demofor a demo.

Sina Bahram

unread,

Apr 28, 2007, 1:25:57 AM4/28/07

to dev-acce...@lists.mozilla.org

Two thoughts on this

#1:

The system I presented only had an example of chat so that it would stay
relevant ... It's completely applicable to almost anything.

For example; the SSIP protocol has a similar definition of such priority
levels and that abstracts out very well to reading arbitrary output from an
AT or other synthesis-requiring software.

#2:

I really don't think this idea of repeating the message, or even moving it
up in the queue, as an actual message, is clearly defined or will be clear
to the user.

For example, what gives that message such a high importance?

Who dictates this importance?

Why not simply read it chronologically, exactly like the visual reader.

Chats are temporal and linear proceedings with multiple streams going on at
once, but they do not usually have future requirements. What I mean to say
is that I usually read the messages in a chat as they are received. I might
want to skip some messages, and I might want to read a certain set, then go
back and read another set, but even when I do this, I will be reading them
in order.

Can someone give me a reason for the importance of a message dictating its
position in the queue such as in this example?

For notifications and alerts, I completely understand, but in this case: it
is simply another message from someone.

Also, I would like to point out one additional point.

It is a common misconception, to assume that the AT will be talking at a
slow rate.

Here is an example:

*

This is an example of how long it takes to read this sentence.

*

I hear that above sentence in under one second, while using my AT.

If I slow my AT down to the rate it originally came at, it takes over 3 to
3.5 seconds.

Most chats, involve something like this

Bob: how's it going?
Tim: well dude, you?
Bob: alright, I guess: tons of work this week!
Tim: yeh, I hear that ... Got projects due all of Thursday and Friday
Bob: I better get back to some of this stuff
Tim: ditto: ttyl dude

It took my AT around four seconds to read me that entire snippet.

So ... I'm just thinking that this so called queue, will not back up as much
as one thinks.

I hope this makes sense?

Please ask if it doesn't ... I have no problem explaining further.

One side note ... Not including this sentence, I read this entire email in
33 seconds ... Granted, that's probably two times slower than most of you
who are reading this visually, but I also don't have my stuff cranked up all
the way either.

Take care,
Sina

-----Original Message-----
From: dev-accessib...@lists.mozilla.org
[mailto:dev-accessib...@lists.mozilla.org] On Behalf Of Charles
Chen
Sent: Friday, April 27, 2007 7:07 PM
To: dev-acce...@lists.mozilla.org
Subject: Re: WAI:ARIA politeness

Thoughts?

-Charles

Sina Bahram

unread,

Apr 28, 2007, 1:27:32 AM4/28/07

to dev-acce...@lists.mozilla.org

Peter,

Can you please send me those resources?

Thanks so much

Take care,
Sina

-----Original Message-----
From: dev-accessib...@lists.mozilla.org
[mailto:dev-accessib...@lists.mozilla.org] On Behalf Of Charles
Chen
Sent: Friday, April 27, 2007 10:01 PM
To: dev-acce...@lists.mozilla.org
Subject: Re: WAI:ARIA politeness

Sina Bahram

unread,

Apr 28, 2007, 1:30:08 AM4/28/07

to dev-acce...@lists.mozilla.org

If you guys can hold on about a week ... We'll be meeting with university
legal over here to decide the licensing of the Remote Access Bridge
(www.RemoteAccessBridge.com)

In that code, there exists a complete abstraction of SAPI, FreeTTS, and
JSAPI that allows for this.

The mixing is not done yet, but the abstraction of the TTS's is, and the
mixing is the next step.

Incidentally, this is what allows one to make use of something like Clique
remotely, and efficiently, over a remote connection.

Take care,
Sina

-----Original Message-----
From: dev-accessib...@lists.mozilla.org
[mailto:dev-accessib...@lists.mozilla.org] On Behalf Of Peter
Parente
Sent: Friday, April 27, 2007 10:44 PM
To: dev-acce...@lists.mozilla.org
Subject: Re: WAI:ARIA politeness

Pete

Sina Bahram

unread,

Apr 28, 2007, 1:37:08 AM4/28/07

to dev-acce...@lists.mozilla.org

Sure ... Whenever is good for you. Let's start with that and get others in?

Sunday?

Charles Chen

unread,

Apr 28, 2007, 8:47:23 AM4/28/07

to

> #2:
>
> I really don't think this idea of repeating the message, or even moving it
> up in the queue, as an actual message, is clearly defined or will be clear
> to the user.
>
> For example, what gives that message such a high importance?

In the case of chat, I see it as equivalent to a bolded message. The
system will bold the message if it contains your username.

>
> Who dictates this importance?
>
The author can set it as a default (which the user can override).

>
> Why not simply read it chronologically, exactly like the visual reader.
>
>
> Chats are temporal and linear proceedings with multiple streams going on at
> once, but they do not usually have future requirements. What I mean to say
> is that I usually read the messages in a chat as they are received. I might
> want to skip some messages, and I might want to read a certain set, then go
> back and read another set, but even when I do this, I will be reading them
> in order.
>
> Can someone give me a reason for the importance of a message dictating its
> position in the queue such as in this example?

I was trying to replicate my own visual browsing style for chat with
that idea.
I keep my cursor in the chat window if I'm trying to read through it -
if there is a flood of messages and I see a bolded message, I scroll
down to that message without moving my cursor position. I read the
bolded message, then I go back up and resume reading where I was before.
I end up reading the first few words of the bolded message, then
realizing that I've already read it, I skip forward to the next message.

> It is a common misconception, to assume that the AT will be talking at a
> slow rate.

The problem is that not everyone has the same speed setting. Would you
say that a majority of users have it faster or slower or the same rate
as yourself?

The priority system of Supplemental, Message, Notify, and Alert seems to
map directly (with the exception of not having a supplemental politeness
level) to the solution of:

1. Higher priority messages speak before lower priority messages but do
not clear the lower priority messages from the queue in front of them.

The disadvantage of this method is that it would not maintain
chronological order and it would lead to more things being spoken...

Since you don't like the idea of something being both a notify and a
message, perhaps what we should have is:

Off
Supplemental
Polite
ClearingAssertive
NonclearingAssertive
Rude

where Supplemental maps to Supplemental, Polite maps to Message,
NonclearingAssertive maps to Notify, and Rude maps to Alert.
ClearingAssertive would be an assertive which clears out Supplemental
and Polite messages but not other ClearingAssertives or Rudes. Clearing
and Nonclearing assertives would just queue (considered to be at the
same politeness level).

Does that make more sense to you?

Regarding the multiple synthesizers going at once: I think I should have
been clearer on what type of mixing I need. I need a way to do this in
C/C++ for SAPI and a way to do this in Java for Java FreeTTS. Having it
all abstracted into one C/C++ component will not work for me since it
would mean that I couldn't do the same support on Mac/Linux. Also, there
needs to be a fall back for situations where the platform does not allow
multiple synthesizers/support for multiple synthesizers has not been
implemented yet.

A teleconference this Sunday would be fine. What time and what number?

Thanks again for your help.

-Charles

Gijs Kruitbosch

unread,

Apr 28, 2007, 12:27:29 PM4/28/07

to Sina Bahram, Peter Parente

Sina Bahram wrote:
> But this is why the actual channel focus, as well as the mapping is up to
> the user.
>
> Am I missing a part of your question?
>
> Take care,
> Sina
>

If you were replying to me, yeah. Where does the user input this
mapping? To the application? The AT? Something else? How does the
app/something else expose the user's choice to the AT?

~ Gijs

Sina Bahram

unread,

Apr 28, 2007, 12:42:03 PM4/28/07

to dev-acce...@lists.mozilla.org

Hi Charles,

Yes that does make a bit more sense, thank you. I suppose this is all
trivial as long as we can allow this mapping to be controlled by the user. I
don't think it would be that hard to have three to four possible actions
that can dictate five or so politeness levels, clear, queue, and interrupt
for example as some primitives would be nice to play with ... That way,
they can be presented with one or two presets, and then they can change them
around and define their own preset.

In fact, if we can export these presets, we can then come up with very nice
presets for stock ticker pages as opposed to chat as opposed to search
engines that display the search results with ajax before the user hits
enter?

Take care,
Sina

-----Original Message-----
From: dev-accessib...@lists.mozilla.org
[mailto:dev-accessib...@lists.mozilla.org] On Behalf Of Charles
Chen
Sent: Saturday, April 28, 2007 8:47 AM
To: dev-acce...@lists.mozilla.org
Subject: Re: WAI:ARIA politeness

-Charles

Sina Bahram

unread,

Apr 28, 2007, 12:45:00 PM4/28/07

to dev-acce...@lists.mozilla.org

I think that it would have to be the application, because there's absolutely
no chance to convince the AT to do something like this. This is light years
head of anything that they do, and innovation is not something AT companies
focus on.

It would be nice to have an abstracted middle layer to handle all TTS
activities, which is why I suggested the TTS wrapper that the remote access
bridge uses; however, that is written in Java, and doesn't help us in c/c++.

Although, I would point out that because it understands and fully parses
SSIP, which allows for everything we've discussed, that the c/c++ could
simply output SSIP on localhost and have it parsed by the Java TTS wrapper.

Believe it or not, this actually does work pretty well.

Take care,
Sina

-----Original Message-----
From: Gijs Kruitbosch [mailto:gijskru...@gmail.com]
Sent: Saturday, April 28, 2007 12:27 PM
To: Sina Bahram
Cc: 'Peter Parente'
Subject: Re: WAI:ARIA politeness

Sina Bahram

unread,

Apr 28, 2007, 1:03:02 PM4/28/07

to dev-acce...@lists.mozilla.org

Two more things I forgot to address.

I would say most advanced users are either a little bit faster than me or
maybe 10% slower than me.

Beginning and average users are about 25% slower than the speed I quoted.

Phone call time:

How about Sunday 1:00pm EDT --> folks can send me their number and we can do
this?

If it is more than three, I'll need someone else to bring in the fourth
person, but hopefully we can figure that out?

Take care,
Sina

-----Original Message-----
From: dev-accessib...@lists.mozilla.org
[mailto:dev-accessib...@lists.mozilla.org] On Behalf Of Charles
Chen

Sent: Saturday, April 28, 2007 8:47 AM

To: dev-acce...@lists.mozilla.org
Subject: Re: WAI:ARIA politeness

-Charles

Gijs Kruitbosch

unread,

Apr 28, 2007, 1:22:01 PM4/28/07

to Sina Bahram, dev-acce...@lists.mozilla.org

I won't be able to make it at that time. 3 or 4pm would work better for
me. (Yeah, I've accepted some other appointments since my last post,
hence me changing my availability)

~ Gijs

Sina Bahram

unread,

Apr 28, 2007, 1:23:08 PM4/28/07

to dev-acce...@lists.mozilla.org

Sounds like a plan; how about 3:00 then?

Take care,
Sina

-----Original Message-----
From: Gijs Kruitbosch [mailto:gijskru...@gmail.com]
Sent: Saturday, April 28, 2007 1:22 PM
To: Sina Bahram

David Bolter

unread,

Apr 28, 2007, 3:21:46 PM4/28/07

to Peter Parente, dev-acce...@lists.mozilla.org

Charles,

Just throwing this out there. I'm sure Peter's stuff is awesome.

I don't have the code in front of me ATM but quite some time ago I used
MSAPI and DirectSound together to get multiple TTS voices + sound file
mixing. It was a native (C++) "stop gap" windows app, which I called
"Cacophony". It was hacked together (along with a dll using jni) for a
Java Swing Auxiliary Audio Look and Feel project we did here at the ATRC
many years ago (before Java Speech), but we ended up using it for years
(for a few projects). It is probably possible to give the source away
(I'd need to check). Let me know if you are interested, and I'll dig it
up. Hmmmm... I wonder if it still works.

cheers,
David

> I'm not familiar enough with FreeTTS to tell you if a similar approach
> is possible.
>
> Pete
>
>
>>> Charles,
>>>
>>> Let me know if you need help getting concurrent speech streams working
>>> with SAPI. Part of my thesis is on using more advanced audio rendering
>>> (e.g. concurrent speech and auditory icons, ambiance, 3D
>>> spatialization) to improve the mapping from the high-bandwidth visual
>>> display to audio. I have some working code that pipes SAPI output
>>> through the FMOD (www.fmod.org) auditory mixer (free as in beer, but
>>> not open source). I'm sure you could find another OSS library that can
>>> accomplish the same goal, and apply the same technique. See
>>> http://www.cs.unc.edu/~parente/clique/#demofor a demo.
>>>
>>> I also have a pretty large bibtex database on research into auditory
>>> information display, audiology, auditory scene analysis, auditory icon
>>> design, auditory display evaluation, etc. If you'd like a copy to read
>>> up on how to make more complex auditory displays usable, let me know.
>>> I'll export the relevant references and my brief notes for you to
>>> grok.
>>>
>>> Pete
>>>
>
>

Peter Parente

unread,

Apr 30, 2007, 3:30:15 PM4/30/07

to

> Regarding the multiple synthesizers going at once: I think I should have
> been clearer on what type of mixing I need. I need a way to do this in
> C/C++ for SAPI and a way to do this in Java for Java FreeTTS. Having it
> all abstracted into one C/C++ component will not work for me since it
> would mean that I couldn't do the same support on Mac/Linux. Also, there
> needs to be a fall back for situations where the platform does not allow
> multiple synthesizers/support for multiple synthesizers has not been
> implemented yet.

Hi Charles,

I just saw this section of one of our older posts. The ability to get
the waveform back from SAPI in C/C++ at least should mirror what I did
in Python using COM. Again, I can't give you much advice about Java or
Mac. On Linux, you're at the mercy of the particular speech engine to
give you the waveform back. gnome-speech certainly can't do it, and I
haven't seen support for such a feature in Speech Dispatcher yet.

Pete