[Standards] XEP-0301 feedback

19 views
Skip to first unread message

Florian Zeitz

unread,
Mar 6, 2012, 6:04:19 PM3/6/12
to stan...@xmpp.org, ma...@realjabber.org
Hello standards-list, hello Mark,

"inspired" by the recent discussions surrounding XEP-0301 (Real Time
Text) I had a look over it's current status and felled I should provide
some feedback.
Some is rather minor some I'm somewhat more concerned about, I've
ordered it sequentially rather than by importance.

Section 3: Glossary
The "RTT" entry seems superfluous to me. I'd be better to just note the
acronym in the "real-time text" entry. Also the remark about the
element's name is misleading as that is generally lower-case.

Section 4.2.2: 'seq' attribute
It seems to me the start of a session (and therefore when to reset this
to 0) is not clearly defined, since the start and cancel events are
purely optional. In a more general sense I'm somewhat concerned about
the attempt to reimplement the transport layer on the application layer.
(see below)

Section 4.2.3: 'event' attribute
I feel the requirement for session start and cancel needs to be either
tightened (if we decide we absolutely need it for this protocol) or
removed. Having it truly optional makes it useless for detecting the
actual session start and end IMHO. It's sole purpose appears to be
mapping to SIP, which is a problem possible best handled separately.

The new and reset events appear to have been introduced under the
assumption that messages get lost. If they are not the reset event
can be safely removed and the new event is implicit upon receipt of a
<body/> element.

Section 4.5.1: action elements
The normative text in this section should be further explained.
E.g.: What is REQUIRED for the <t/> element? Support, inclusion in each
<rtt/> element, etc. (It is relatively clear to me what you mean, I just
wish it was somewhat more fleshed out)
"A client conforming to this specification MUST accept <t/>, <e/> and
<d/> elements and handle them as described in the following..."

Section 4.5.1.3: counting
It appears to me that the rules for determining the position and count
of code points are somewhat backwards. In particular if the sending
client does perform any normalization before sending the counts need to
be based on the normalized version since the receiving client can not
undo such normalization (this is the opposite of what is described in
the text). Also most of the described transformations are only relevant
for display on screen and should not change the string.
IMHO it should suffice to count code points based on what is send over
the wire.

Section 4.5.2: action elements
I'd like to hear some rational on why there is forward and backward
delete. Both appear to be able to generate the same results.
It did occur to me that they are meant to be used in conjunction with
cursor display. However, it appears that this would cause interesting
possible situations. E.g. what happens if a character is forward deleted
at a position preceding the cursor. In that situation the absolute
position of the cursor should move one to the left, but will instead
move 1 to the right relative to the text (it might move over the right
end). I'd prefer expected cursor position to always be transmitted
explicitly in these cases and have either delete variant removed.

Section 4.6: error recovery
As mentioned before, the attempt to correct errors is my biggest concern
about this XEP. For the case of reconnects it appears to me that the
sending client will always be able to notice this situation and treat it
as a new RTT session.
Messages being dropped by servers on the other hand is an issue I've not
yet experienced. I'm however willing to believe there are servers in
existence that do this. Ideally I'd expect this to be discoverable via
an error response, but seeing how this itself generates traffic I can
see how implementations would not send this. Do we have other protocols
that need to respect this?
Ideally I'd like to simplify the protocol a bit in this matters.
E.g. sequence numbers could be reused for each new RTT message,
seq="0" would then imply event="new", etc. However, I've not thought
through all implications of such a system yet.

Section 6.4.1: message length limit
In the second example with the split messages I would not have expected
an empty <rtt/> element. If that is actually intended to indicate the
<body/> is part of the RTT session this should be mentioned elsewhere.

Regards,
Florian Zeitz

Mark Rejhon

unread,
Mar 6, 2012, 8:10:04 PM3/6/12
to Florian Zeitz, stan...@xmpp.org
Hello Florian!

Excellent comments, and I'll make some minor tweaks to my almost-finished update based on your comments.
Some things I'd like to address:

- REQUIRED for Senders/Recipients ... Event='start' is REQUIRED for the first element of a real-time message.
- REQUIRED for Recipients. event='reset' is also REQUIRED
- The seq was developed, because, it was found in fact to be necessary. For example, bad wireless reception using servers that don't do offline message delivery. And also, situations of extreme congestion has happened. Also, recipients may disconnect and reconnect while the sender is still composing a message. You can

Mark Rejhon

unread,
Mar 6, 2012, 9:38:28 PM3/6/12
to Florian Zeitz, stan...@xmpp.org
(Ooops -- I accidentally sent my last message early.)
(Please disregard the reply I sent 1 hour ago)

About XEP-0301 In-Band Real-Time Text 


On Tue, Mar 6, 2012 at 6:04 PM, Florian Zeitz <flo...@babelmonkeys.de> wrote:
Hello standards-list, hello Mark, "inspired" by the recent discussions surrounding XEP-0301 (Real Time Text) I had a look over it's current status and felled I should provide some feedback. Some is rather minor some I'm somewhat more concerned about, I've ordered it sequentially rather than by importance.

Thanks for writing!  You have some excellent suggestions of areas to clarify in the specification.


Section 3: Glossary
The "RTT" entry seems superfluous to me. I'd be better to just note the
acronym in the "real-time text" entry. Also the remark about the
element's name is misleading as that is generally lower-case.

The glossary has been simplified in the next version of the spec, amongst several other simplifications that have now been done.
 

Section 4.2.2: 'seq' attribute
It seems to me the start of a session (and therefore when to reset this
to 0) is not clearly defined, since the start and cancel events are
purely optional. In a more general sense I'm somewhat concerned about
the attempt to reimplement the transport layer on the application layer.
(see below)

The seq was developed, because, it was found in fact to be necessary.  
-- Disconnect and reconnect cycles.  Including those caused by bad wireless reception (WiFi, 3G, mobile phone)
-- Not all servers support offline message delivery.
-- BOSH failures in a web browser at the client side (i.e. intermittent HTTP request failures)
-- And less often, situations of extreme congestion has happened.
My XEP-0301 is intentionally designed to survive a wide variety of situations that have actually happened in my tests.  This attribute is critical to my applications, as well as Next Generation 9-1-1 experimental demos.  See http://tools.ietf.org/html/draft-tschofenig-ecrit-xmpp-es-00 ... "Emergency Services Functionality with the Extensible Messaging and Presence Protocol (XMPP)", section 4.5 mentions XEP-0301 as one of the possible functionalities that may become a part of NG9-1-1.  Real-time text technologies here is very useful for accessibility too; as it replaces the need to carry around a teletypewriter (TTY).   

We need to keep the 'seq' attribute, as it is essential for message integrity during less-than-ideal situations.  I actually expanded it to recommend an event='reset' once every 10 XMPP messages, to improve resilence even further -- the latest version of RealJabber now does this.  (you can the new RealJabber on two computers -- and test this concept by disconnecting-reconnecting in the middle of a conversation while the other person is still typing real-time in RealJabber, and the real-time text recovers automatically within a few seconds of reconnecting because of the automatic event='reset' occuring at regular intervals.)

Also, very rarely (it only happened on BlackBerry's Google Talk client) when you've transmitted several XMPP messages simultaneously (i.e. network congestion) and they all get the same timestamp, and then they get delivered out-of-order (wrong order) that they were transmitted.  This should never happen, but it actually occasionally does.   An earlier version of XEP-0301 was more complex, having a 'msg' attribute for message number.  That was removed, reduced down to just one 'seq' value for simplification.

-- 'seq' does not need to start at 0.  I'll eliminate that requirement (clarification)
-- You can change the 'seq' value anytime there's an event='new' or event='reset'.  
Setting it back to 0 again works, although I prefer not to reset it to 0 because of the danger of a user disconnecting while seq='0' and reconnecting after it's incremented, reset back to '0' again, and incremented to '1', and the user had reconnected, getting consecutive seq numbers for totally different real-time messages that were never delivered.  In this case, the wrong <rtt> will be displayed, resulting in rare text scrambling.  This actually happened once in my random testing, so I stopped resetting seq back to 0 everytime there was an event='reset'.   

 
Section 4.2.3: 'event' attribute
I feel the requirement for session start and cancel needs to be either
tightened (if we decide we absolutely need it for this protocol) or
removed. Having it truly optional makes it useless for detecting the
actual session start and end IMHO. It's sole purpose appears to be
mapping to SIP, which is a problem possible best handled separately.
 The new and reset events appear to have been introduced under the
assumption that messages get lost. If they are not the reset event
can be safely removed and the new event is implicit upon receipt of a
<body/> element.

You are right, that 'start' and 'cancel' is not really required, so I may be removing them.  However, I think that 'cancel' may still be necessary to signal the other recipient to stop transmitting incoming <rtt> for the remainder of the chat session, in order to save bandwidth, whenever the recipient wants to turn off RTT (i.e. via a button or switch, while in middle of conversation).   So there's still a usefulness for 'cancel' even if 'start' is not neeed (the start of RTT during a chat session is simply the first delivery of an <rtt> element.)

 
Section 4.5.1: action elements
The normative text in this section should be further explained.
E.g.: What is REQUIRED for the <t/> element? Support, inclusion in each
<rtt/> element, etc. (It is relatively clear to me what you mean, I just
wish it was somewhat more fleshed out)
"A client conforming to this specification MUST accept <t/>, <e/> and
<d/> elements and handle them as described in the following..."

Agreed, it should be clarified.
 

Section 4.5.1.3: counting
It appears to me that the rules for determining the position and count
of code points are somewhat backwards. In particular if the sending
client does perform any normalization before sending the counts need to
be based on the normalized version since the receiving client can not
undo such normalization (this is the opposite of what is described in
the text). Also most of the described transformations are only relevant
for display on screen and should not change the string.
IMHO it should suffice to count code points based on what is send over
the wire.

We have to keep "Unicode code points" (more on that later) but I agree that normalization paragraph is definitely confusing, so I've modified it to the following wording:

   "For interoperability of p and n values, processing MUST be done on the transmitted Unicode real-time message. For senders , this is the version of the Unicode message text after any Unicode normalization, emoticon graphics images conversion to Unicode, display text formatting, processing of Unicode combining marks, etc. For recipients obtaining text from the <t> element, this is the Unicode text immediately after XML processing, and before any further processing. From the perspective of p and n values, a real-time message is treated as an editable array of Unicode code points."

Now, the reason why we have to keep "Unicode code points":
The section 9 "Internationalization Considerations" explains why XEP-0301 uses "code points" technique:
Different programming platforms use different internal Unicode encodings, which may be different from the transmission encoding (UTF-8) for XMPP. 
 -- Multiple Unicode code points may represent one displayable Unicode character (i.e. combining marks).
Action elements operate on Unicode code points, not on displayable characters.
-- Characters U+10000 through U+1FFFF, which are single code points, but are represented as multiple surrogate code units in certain Unicode encodings (i.e. UTF-16).
Action elements operate on Unicode code points, not on individual surrogate code units.
-- Some Unicode encodings use a variable number of bytes per Unicode code point (i.e. UTF-8).
Action elements operate on Unicode code points, not on individual bytes.

Real-time editing (mid-text inserts/deletes) of Unicode text containing variable-length encodings, causes major text scrambling if the recipient and sender 
So unfortunately, a standardization of counting is essential, especially when international text is involved.

The 'char' of some programming languages is sometimes 8-bit, sometimes 16-bit, and sometimes 32-bit.
Often, XMPP libraries already pre-convert the text to different Unicode encodings.
Not all of them have access to the original UTF-8 "wire" text, so I can't depend on counting via "wire format" (UTF-8) unless I ask them to convert everything back to UTF-8 before processing.  But that itself is a catch-22 because in many programming languages, String.Insert / String.Delete operations, don't operate on UTF-8.

Therefore, we decided it ended up being necessary to standardize on 'unicode code points'.
In addition, here's a flowchart diagram that may help understand the Unicode preservation scenario better:
(This file will need to be updated for the latest XEP-0301 draft)


Section 4.5.2: action elements
I'd like to hear some rational on why there is forward and backward
delete. Both appear to be able to generate the same results.

-- Note that cursor position never changes with a forward delete operation.
-- The cursor position only changes if a backspace operation is done.
-- Subsequent action elements does not require knowledge of the cursor position of preceding action elements.

However, it is true that the standard could still work with just one of the two text-deleting codes (and instead using the <c/> element to correct any cursor position).  I have seriously considered removing one of the two codes.  However, we found that bandwidth is more efficient if I kept both codes.   
 
It did occur to me that they are meant to be used in conjunction with
cursor display. However, it appears that this would cause interesting
possible situations. E.g. what happens if a character is forward deleted
at a position preceding the cursor. In that situation the absolute
position of the cursor should move one to the left, but will instead
move 1 to the right relative to the text (it might move over the right
end). I'd prefer expected cursor position to always be transmitted
explicitly in these cases and have either delete variant removed.

It actually optimizes bandwidth to have both codes available, because it reduces the number of cursor-position-correcting <c/> elements transmitted.  (<c/> is only transmitted when needed, i.e. cursor position is not where it should be after the specific delete operation is done)

The source code that I wrote to comply with section 6.2.1 "Monitoring Message Edits" ( http://xmpp.org/extensions/xep-0301.html#monitoring_message_edits ) is actually implemented at line 298 of RealTimeText.cs of RealJabber .... function EncodeRawRTT viewable at http://code.google.com/p/realjabber/source/browse/trunk/CSharp/RealTimeText.cs#298 intelligently decides whether to do a <e> or a <d> based on where the cursor should go at the end, located at lines 360-370 of the above hyperlink.   
That said, if bandwidth wasn't as important, I could just very easily remove either <e> or <d> and it would not make any difference to the end user to RealJabber, since a cursor position correction is done.

If we keep <d> and eliminate <e> it means backspace operations might need to be accompanied by a corrective cursor repositioning.
If we keep <e> and eliminate <d> it means delete key operations  might need to be accompanied by a corrective cursor repositioning.
(Note: This is optional -- this is only for clients that decide to support transmission / reception of cursor positioning)

Also, I've actually stopped using <c/> because I realized an empty <t p='#'/> element does exactly the same thing as <c p='#'/>  ... 
Therefore I am actually thinking of removing two action elements from the next XEP-0301
- Remove <c/> because I can use empty <t/> to do exactly the same thing.
- Remove <g/> because I can use XEP-0224 instead successfully anyway.  
That reduces the number of action elements to just 4, and it makes it easy to merge Tier 1 with Tier 2 into one unified table for simplicity. 
However, I've found enough reason to keep both <d> and <e> -- but our team can still be swayed by further arguments against having both.


Section 4.6: error recovery
As mentioned before, the attempt to correct errors is my biggest concern
about this XEP. For the case of reconnects it appears to me that the
sending client will always be able to notice this situation and treat it
as a new RTT session.

Error recovery is actually simpler than it looks -- it consumes less than 5% of the source code in RealTimeText.cs
One of my business clients had serious problems without error recovery (we had an attempt to make it optional), so we actually expanded Error Recovery to RECOMMEND event='reset' at regular intervals, such as once every 10 <rtt> messages (or once every 10 seconds).  Also, sometimes you can't detect online/offline transitions, for example, Google Talk network sometimes can't see the online/offline status of jabber.org users, and you can have RTT conversations with users that appear offline (i.e. invisible).


Section 6.4.1: message length limit
In the second example with the split messages I would not have expected
an empty <rtt/> element. If that is actually intended to indicate the
<body/> is part of the RTT session this should be mentioned elsewhere.

Yes, that was the intent.  This will need to be clarified.
It's not essential, but it is a very useful indicator.

Your comments are useful.  I'd appreciate hearing back from you, and you are very welcome to run RealJabber (www.realjabber.org) and test it out with me -- email me privately to make an appointment -- I would appreciate your comments, given your useful insights.

Thanks!
Mark Rejhon

Florian Zeitz

unread,
Mar 7, 2012, 1:15:15 PM3/7/12
to Mark Rejhon, stan...@xmpp.org
Am 07.03.2012 03:38, schrieb Mark Rejhon:
>
> The seq was developed, because, it was found in fact to be necessary.
> -- Disconnect and reconnect cycles. Including those caused by bad wireless
> reception (WiFi, 3G, mobile phone)
> -- Not all servers support offline message delivery.
> -- BOSH failures in a web browser at the client side (i.e. intermittent
> HTTP request failures)
> -- And less often, situations of extreme congestion has happened.
> My XEP-0301 is intentionally designed to survive a wide variety of
> situations that have actually happened in my tests. This attribute is
> critical to my applications, as well as Next Generation 9-1-1 experimental
> demos. See http://tools.ietf.org/html/draft-tschofenig-ecrit-xmpp-es-00 ...
> "Emergency Services Functionality with the Extensible Messaging and
> Presence Protocol (XMPP)", section 4.5 mentions XEP-0301 as one of the
> possible functionalities that may become a part of NG9-1-1. Real-time text
> technologies here is very useful for accessibility too; as it replaces the
> need to carry around a teletypewriter (TTY).
>
One of my main points here was that I don't see the use specifically for
reconnects. Upon reconnect it is clearly the sender's responsibility to
start a new session with the receiver. After all the receiver might have
lost all state at that point. I think it would be way more sensible to
specify that the first RTT message sent to a resource that was
previously offline needs to have event="new" or event="reset".
For stanzas getting lost on life streams sequence numbers appear like a
sane solution though.

> We need to keep the 'seq' attribute, as it is essential for message
> integrity during less-than-ideal situations. I actually expanded it to
> recommend an event='reset' once every 10 XMPP messages, to improve
> resilence even further -- the latest version of RealJabber now does this.
> (you can the new RealJabber on two computers -- and test this concept by
> disconnecting-reconnecting in the middle of a conversation while the other
> person is still typing real-time in RealJabber, and the real-time text
> recovers automatically within a few seconds of reconnecting because of the
> automatic event='reset' occuring at regular intervals.)
>

As stated above I do believe sync should be explicit and instant upon
reconnect. Effectively transmitting everything that has already been
typed every 10 messages just to compensate congestion control seems like
overdoing it to me...

> Also, very rarely (it only happened on BlackBerry's Google Talk client)
> when you've transmitted several XMPP messages simultaneously (i.e. network
> congestion) and they all get the same timestamp, and then they get
> delivered out-of-order (wrong order) that they were transmitted. This
> should never happen, but it actually occasionally does. An earlier
> version of XEP-0301 was more complex, having a 'msg' attribute for message
> number. That was removed, reduced down to just one 'seq' value for
> simplification.
>

Side note: I'd rather have us fix implementations than design protocols
to survive them. XMPP does guarantee in-order delivery.

> -- 'seq' does not need to start at 0. I'll eliminate that requirement
> (clarification)
> -- You can change the 'seq' value anytime there's an event='new' or
> event='reset'.
> Setting it back to 0 again works, although I prefer not to reset it to 0
> because of the danger of a user disconnecting while seq='0' and
> reconnecting after it's incremented, reset back to '0' again, and
> incremented to '1', and the user had reconnected, getting consecutive seq
> numbers for totally different real-time messages that were never delivered.
> In this case, the wrong <rtt> will be displayed, resulting in rare text
> scrambling. This actually happened once in my random testing, so I stopped
> resetting seq back to 0 everytime there was an event='reset'.
>

Which reminds me of something I forgot to mention in my last message.
Unless I overlooked it there is no defined behaviour once 'seq' is
incremented past 2^32-1. Assuming wrap around, it would be nice to have
a note that implementations need to make sure to not accidentally assume
desynchronization when this happens.

> You are right, that 'start' and 'cancel' is not really required, so I may
> be removing them. However, I think that 'cancel' may still be necessary to
> signal the other recipient to stop transmitting incoming <rtt> for the
> remainder of the chat session, in order to save bandwidth, whenever the
> recipient wants to turn off RTT (i.e. via a button or switch, while in
> middle of conversation). So there's still a usefulness for 'cancel' even
> if 'start' is not neeed (the start of RTT during a chat session is simply
> the first delivery of an <rtt> element.)
>

I'm not convinced "cancel" is needed. Not advertising the disco feature
seems perfectly sufficient to avoid receiving RTT messages. I'm doubt
there are cases where this is a per contact choice, or RTT would be
disabled after half a conversation. It would also always be possible to
politely ask the sender to turn off RTT as a last resort.

> We have to keep "Unicode code points" (more on that later) but I agree that
> normalization paragraph is definitely confusing, so I've modified it to the
> following wording:

> *
> *
> * "For interoperability of p and n values, processing MUST be done on


> the transmitted Unicode real-time message. For senders , this is the
> version of the Unicode message text after any Unicode normalization,
> emoticon graphics images conversion to Unicode, display text
> formatting, processing of Unicode combining marks, etc. For recipients
> obtaining text from the <t> element, this is the Unicode text immediately
> after XML processing, and before any further processing. From the
> perspective of p and n values, a real-time message is treated as an

> editable array of Unicode code points."*
>
This text seems fine to me. Notice that I did never question code points
were the right thing to count. I fully agree with you there.

> Also, I've actually stopped using <c/> because I realized an empty <t
> p='#'/> element does exactly the same thing as <c p='#'/> ...
> Therefore I am actually thinking of removing two action elements from the
> next XEP-0301
> - Remove <c/> because I can use empty <t/> to do exactly the same thing.
> - Remove <g/> because I can use XEP-0224 instead successfully anyway.
> That reduces the number of action elements to just 4, and it makes it easy
> to merge Tier 1 with Tier 2 into one unified table for simplicity.
> However, I've found enough reason to keep both <d> and <e> -- but our team
> can still be swayed by further arguments against having both.
>

I like that approach. Always having cursor repositioning implicit in the
edit actions seems compelling to me. At that point having both <d/> and
<e/> would not seem as icky to me either. As I understand it the main
change here would be that <d/> and <e/> are redefined to perform
absolute repositioning of the cursor instead of performing a relative
movement of 0 and -1 respectively. This also addresses my other concern
about the possibility of positioning the cursor outside of the text.

> Error recovery is actually simpler than it looks -- it consumes less than
> 5% of the source code in RealTimeText.cs
> One of my business clients had serious problems without error recovery (we
> had an attempt to make it optional), so we actually expanded Error Recovery
> to RECOMMEND event='reset' at regular intervals, such as once every 10
> <rtt> messages (or once every 10 seconds). Also, sometimes you can't
> detect online/offline transitions, for example, Google Talk network
> sometimes can't see the online/offline status of jabber.org users, and you
> can have RTT conversations with users that appear offline (i.e. invisible).
>

Starting a RTT conversation with an already offline user should be
impossible due to the requirement to check the stream feature. Also I
tend to believe that when invisible users are talking to someone they
should "smoke out" (as someone called it) first. Meaning an invisible
user should send directed presence to whomever he is going to message first.
Interop problems are always unfortunate, but again I'd rather see them
fixed than designing protocols to work around them.

I'm glad I could provide some helpful feedback,
Florian

Florian Zeitz

unread,
Mar 7, 2012, 3:27:35 PM3/7/12
to mark...@gmail.com, stan...@xmpp.org
Am 07.03.2012 19:28, schrieb Mark Rejhon:
> Hello,
>
> I am on the go (on phone), so replying off the mailing list but two things caught my attention:
>
I assume it was accidentally not CCed to the list then. It is CCed on
this reply, I hope that is okay.

> 1. Xep-0301 never uses relative cursor positioning. I am confused.
> Can you re-explain, because <e> and <d> have always used absolute positioning and aren't being changed here; just the discussion whether to eliminate one or the other.
>
Maybe that was the case in your implementation, however XEP-0301
currently states:
"* For <e> element (Backspace), the cursor position is moved left as
text is deleted.
* For <d> element (Forward Delete) and all other action elements, the
cursor position is unaffected."
This sounds like relative positioning to me. Absolute would be setting
the cursor position to the 'p' attributes value for forward delete and
to the difference of 'p' and 'n' attributes on backwards delete

> 2. Advertising of RTT must always be done, for accessibility reasons.
> We can't disable section 5 or it defeats the ability of deaf people to
> attempt to initiate a RTT conversation in Adium/Pidgin (and soon
google, microsoft, etc).
> The goal of RTT is to become a widespread protocol in ten years, much like closed captioning.
> We want to be able to let the recipient be notified of an incoming RTT and to let the recipient decide whether to turn on/off RTT.
> If we stop advertising RTT, deaf people can't make call and the hearing person with the RTT off, won't ever be notified.
> Can you suggest an alternative method of refusing RTT that does not require disabling disco?
>
That is possibly quite different though. If a client is configured to
always reject RTT sessions there is no point in ever offering it to that
client. Therefore it's perfectly fine for such a client to not send the
disco feature.

I think what you're concerned about is not that people can't receive RTT
messages, but rather that they can't be prompted to send them.
A possible way forward might be to define one feature indicating
willingness to receive RTT messages, and one indicating willingness to
send them upon request. Which would of course imply introducing a way to
request a session, as opposed to having a way to cancel a session you
never asked for.
Possibly someone else can come up with a yet better idea though.

Mark Rejhon

unread,
Mar 7, 2012, 6:29:31 PM3/7/12
to Florian Zeitz, stan...@xmpp.org
Hello! 
Thanks for your reply -- your comments are useful in guiding areas that I need to clarify to make sections of XEP-0301 less confusing to others.

On Wed, Mar 7, 2012 at 1:15 PM, Florian Zeitz <flo...@babelmonkeys.de> wrote:
> The seq was developed, because, it was found in fact to be necessary.
[snip]
One of my main points here was that I don't see the use specifically for
reconnects. Upon reconnect it is clearly the sender's responsibility to
start a new session with the receiver. After all the receiver might have
lost all state at that point.

Actually, in real practice, disconnects and reconnects are seamless in a lot of software.  For example, many mobile chat programs are designed to automatically reconnect (even while in the middle of composing a long instant message) when reception comes back & the original XMPP session timed out due to lack of reception.

It is very simple, and very "in-line" with the way of prevailing use cases that are common in certain brands of wireless mobile chat applications (iPhone, Android).  Although not all of them do things this way, many of them do the following, for disconnect-reconnect:

SITUATION: Automatic disconnect-reconnect cycle done in background
EXAMPLE: Cellphone loses reception.  Server times out the connection.  Spontaneous disconnect.  User still has a partially composed message that's not yet sent.
WHAT HAPPENS: Chat window stays open. Send button becomes disabled.  Partially composed message is not erased (he can copy and paste it somewhere else or reconnect)  In polite software design, the chat window is still visible so that the user can still view chat history even though the user lost connection.  Often, the user will have a partially-composed message when the chat software suddenly disconnected.   (You're on a train that just moved through a tunnel, and the reception suddenly got lost.  You've suddenly walked into an elevator, etc)  Suddenly, the chat client reconnects in the background, and the chat window becomes "live" again.  The user now has an opportunity to send the partially-composed message.

Both the seq number, as well as the event='reset' makes really easy (yet still optional) for a chat client to be able to resume the real-time message.  I found it a rather simple programming practice in RealJabber because event='reset' behaves exactly like event='new' with the only difference that 'new' starts a new message, and 'reset' starts the current message over.   (In RealJabber, event='new' and event='reset' is actually treated identically at the moment) 

 
I think it would be way more sensible to
specify that the first RTT message sent to a resource that was
previously offline needs to have event="new" or event="reset".
For stanzas getting lost on life streams sequence numbers appear like a
sane solution though.

I actually already specify this -- I already mention this as part of the "Error Recovery" -- that a client SHOULD do an event="new" or event="reset" if the user comes back online.   It is not currently a MUST because if the client doesn't do it, it is not fatal to interoperability of real-time text -- it just means real-time text automatically resumes a little bit later.   (Using fewer "MUST"s is beneficial to a specification)

 
As stated above I do believe sync should be explicit and instant upon
reconnect. Effectively transmitting everything that has already been
typed every 10 messages just to compensate congestion control seems like
overdoing it to me...
 
True, I thought so too, and I already specify that sync should be instant upon reconnect. 
However, the retransmission is only useful for the lost stanzas.   

Lost stanzas actually does happen in actual practice, mostly caused by bad cellular reception, but also sometimes over unreliable BOSH connections.  Also, imagine being lost in a forest or at sea with only 1 bar of reception, and the deaf person needs to make a text-based "phone call" using NG9-1-1 over a real time text protocol (XEP-0301 is one of the candidates).   The phone drops and regains reception randomly.   XEP-0301 must survive that.   It's already being covered in documents internally, as well as in some public documents -- like the proposed IETF document that I already linked to.   

When real-time text is enabled, sometimes people stop hitting Enter, and start typing large messages (i.e. 500+ characters).  For a slow typist, that could mean more than 1 or 2 minutes before real-time text resumes, unless I specified retransmission.   That time lost can be dangerous during a Next Generation 9-1-1 telephone call by a deaf person.   A single real-time transmission of a few keypresses can often be over 250 bytes (about 500 bytes when including keypress delays) , but can expand to up to 1 kilobyte for or about 400 bytes when including TCP/IP overhead.   That would be 2.5 kilobytes over 10 seconds.   (About 5 kilobytes if including keypress delays).   The average length of an instant message is actually less than 40 characters.   So this adds only about 2-3% bandwidth to require retransmission.   For a 500 byte message, that includes keypress delays, the retransmission adds only 10% bandwidth (500 extra bytes every 10 seconds of continuous typing).   

Note that retransmissions are OPTIONAL. (the planned addendum to XEP-0301)
Note that no retransmissions occur when no typing is done.  (so we don't lose bandwidth during idle moments)
Disclaimer: With one of my clients, I have done work to help implement XEP-0301 in a Next Generation 9-1-1 experimental demo, and this retransmission is actually added because of this paid-work experience.  Work for NG9-1-1 is currently one of my sources of income for XEP-0301.


Side note: I'd rather have us fix implementations than design protocols
to survive them. XMPP does guarantee in-order delivery.

Agreed.  It does not, however, impact some other very good reasons to keep 'seq' and the error recovery/optional redundancy.

 
Which reminds me of something I forgot to mention in my last message.
Unless I overlooked it there is no defined behaviour once 'seq' is
incremented past 2^32-1. Assuming wrap around, it would be nice to have
a note that implementations need to make sure to not accidentally assume
desynchronization when this happens.

You're right -- I knew I should have added a note -- but then I then realized it would take more about 50,000 years for this to happen, and the desynchronization would only be very temporary (it would only last until the next event='new' or event='reset').   Since seq increments once per second during continuous typing, it would take 50,000 years of continuous typing at a default transmission interval, for the wraparound scenario to happen.  And if it even did wraparound, the penalty is only a stall of a few seconds caused by not defining wraparound behaviour.    Also, some programming languages (i.e. Java) do not have unsigned integers (only signed integers), which compliates trying to a note about an event that would never happen in normal practice.   Would you like to suggest an idea of a single-sentence note that can be added? 


> However, I've found enough reason to keep both <d> and <e> -- but our team
> can still be swayed by further arguments against having both.
>
I like that approach. Always having cursor repositioning implicit in the
edit actions seems compelling to me. At that point having both <d/> and
<e/> would not seem as icky to me either. As I understand it the main
change here would be that <d/> and <e/> are redefined to perform
absolute repositioning of the cursor instead of performing a relative
movement of 0 and -1 respectively. This also addresses my other concern
about the possibility of positioning the cursor outside of the text.

Actually, there's a bit of a confusion here:
XEP-0301 always use absolute cursor positioning for all action elements.  
Cursor positioning has never been relative for XEP-0301.
No action element in XEP-0301 depended on the cursor position of the previous action element.

I suspect that your confusion might be caused by the fact that "n" is relative, but "p" is always absolute.  The omission of "p" simply means the cursor is absolutely put at the end of the line (p = length of real time message), as specified by section 4.5.1.3 "Rules for Attribute Values"  .... And section 4.5.3 say to always keep the cursor within the confines of the real-time message.

However, yes, I agree, this is probably confusing.  I am now re-wording things to try to clear up confusion; I will contact you later with a draft wording to confirm, before submitting v0.2 of XEP-0301 to XSF.
You are right, I confused you, so I will fix the spec to avoid that.

 
Interop problems are always unfortunate, but again I'd rather see them
fixed than designing protocols to work around them.

Agreed, agreed!
I do agree some justifications are not good (i.e. out-of-order message delivery).   

However, the below fully justifies the usage of rudimentary (and mostly optional) error recovery that XEP-0301 supports:
-- Wireless reception is not always good.  We can't always do perfect wireless signal.
-- And a group of us needs to meet requirements for mission-critical reliability of Next Generation 9-1-1

Thanks,
Mark Rejhon

Mark Rejhon

unread,
Mar 7, 2012, 6:46:17 PM3/7/12
to Florian Zeitz, stan...@xmpp.org
However, the below fully justifies the usage of rudimentary (and mostly optional) error recovery that XEP-0301 supports:
-- Wireless reception is not always good.  We can't always do perfect wireless signal.
-- And a group of us needs to meet requirements for mission-critical reliability of Next Generation 9-1-1

I noticed that you have a Denmark email address, so I forgot to mention that 9-1-1 in North America is equivalent to 1-1-2 in Europe (the emergency phone number service).   There are actual trials of real-time text being used in Europe in emergency service -- see http://www.reach112.eu that is already live in several countries (List of countries at http://www.reach112.eu/view/en/project/description.html ) ... It is using a different real-time text standard, but some other systems are considering XEP-0301, including a client of mine.

Thanks
Mark Rejhon

Florian Zeitz

unread,
Mar 7, 2012, 7:43:43 PM3/7/12
to Mark Rejhon, stan...@xmpp.org
Thank you. Believe it or notm I have in fact used an instant messenger
before. I however fail to see how this is relevant to any of this.
The point is that such a "seamless" reconnect (from the receivers UI
point of view) is (for the sender) indistinguishable from, let's say, my
device crashing and me signing in with the same resource from a
completely different device almost immediately after.

>> I think it would be way more sensible to
>> specify that the first RTT message sent to a resource that was
>> previously offline needs to have event="new" or event="reset".
>> For stanzas getting lost on life streams sequence numbers appear like a
>> sane solution though.
>>
>
> I actually already specify this -- I already mention this as part of the
> "Error Recovery" -- that a client SHOULD do an event="new" or event="reset"
> if the user comes back online. It is not currently a MUST because if the
> client doesn't do it, it is not fatal to interoperability of real-time text
> -- it just means real-time text automatically resumes a little bit later.
> (Using fewer "MUST"s is beneficial to a specification)
>

I can only assume you're talking about your working copy, because there
is not a single SHOULD requirement in that section in the published
version of XEP-0301.
I do also believe this is entirely the wrong place for such a statement.
The matter of fact is simply that a newly only resource (no matter
whether it reconnected, or connected for the first time) SHOULD be sent
an <rtt/> containing either event="new" or event="reset" as the first
RTT message. Doing anything else is bound to imply sending data to the
receiver that it can not possibly understand.
I do however agree that the way XEP-301 works this is in fact a SHOULD
requirement, as failure to comply with this is not fatal.

> Note that retransmissions are OPTIONAL. (the planned addendum to XEP-0301)
> Note that no retransmissions occur when no typing is done. (so we don't
> lose bandwidth during idle moments)
> Disclaimer: With one of my clients, I have done work to help implement
> XEP-0301 in a Next Generation 9-1-1 experimental demo, and this
> retransmission is actually added because of this paid-work experience.
> Work for NG9-1-1 is currently one of my sources of income for XEP-0301.
>

Ack. "OPTIONAL" seems like an appropriate requirement level.

>> Which reminds me of something I forgot to mention in my last message.
>> Unless I overlooked it there is no defined behaviour once 'seq' is
>> incremented past 2^32-1. Assuming wrap around, it would be nice to have
>> a note that implementations need to make sure to not accidentally assume
>> desynchronization when this happens.
>>
>
> You're right -- I knew I should have added a note -- but then I then
> realized it would take more about 50,000 years for this to happen, and the
> desynchronization would only be very temporary (it would only last until
> the next event='new' or event='reset'). Since seq increments once per
> second during continuous typing, it would take 50,000 years of continuous
> typing at a default transmission interval, for the wraparound scenario to
> happen. And if it even did wraparound, the penalty is only a stall of a
> few seconds caused by not defining wraparound behaviour. Also, some
> programming languages (i.e. Java) do not have unsigned integers (only
> signed integers), which compliates trying to a note about an event that
> would never happen in normal practice. Would you like to suggest an idea
> of a single-sentence note that can be added?
>

If you're going to (as you said earlier) drop the requirement of
starting at "0" this becomes more of a problem. Clients could implement
TCP-ish behaviour of starting at a random sequence number. The upper
limit would then potentially be hit sooner.
Somewhat inspired by the text in RFC 793 I'd suggest:
"Arithmetic on the sequence number MUST be be performed modulo 2**32. In
particular the sequence number wraps around to 0 when incremented past
(2**32 - 1). Note that comparison of sequence numbers has to accommodate
for this fact."

At this point I'm actually very confused. Let me quote the published
version of XEP-0301 again:


"For <d> element (Forward Delete) and all other action elements, the
cursor position is unaffected."

So I would assume that (following that text) after performing
<c p="0"/><d p="42"/>
the cursor would remain at position "0".
What you suggested appeared to me like you wanted this to move the
cursor to position "42". That seems quite sensible to me, but would be a
clear change from that text. Am I overlooking something?

>> Interop problems are always unfortunate, but again I'd rather see them
>
> fixed than designing protocols to work around them.
>>
>
> Agreed, agreed!
> I do agree some justifications are not good (i.e. out-of-order message
> delivery).
>
> However, the below fully justifies the usage of rudimentary (and mostly
> optional) error recovery that XEP-0301 supports:
> -- Wireless reception is not always good. We can't always do perfect
> wireless signal.
> -- And a group of us needs to meet requirements for mission-critical
> reliability of Next Generation 9-1-1
>

So, maybe I just live in a to idealistic mindset, but I have severe
doubts that a terrible wireless signal can ever yield lost stanzas.
That would not only imply that TCP packages were completely lost (no
retransmits etc.), but also that they were lost in such a way that the
stream's XML was still valid. For now I refuse to believe that until
proven wrong.
Mission-critical reliability is only really an issue if it is endangered.
I've currently accepted that we need sequence numbers only because
you're telling me that some XMPP servers perform congestion control that
necessitates this (if that's the case, so be it). I've not yet heard any
other plausible example of stanzas being lost.

Regards,
Florian Zeitz
P.S.: Having seen your other mail.
.de is in fact Germany, Denmark is .dk.
Also I was fully aware of what 9-1-1 is, but thank you for thinking
about local differences :).

Mark Rejhon

unread,
Mar 7, 2012, 10:46:50 PM3/7/12
to Florian Zeitz, stan...@xmpp.org
Hello Florian,
You did an excellent job of pointing out a couple of mistakes I made.  (Thank you!)


On Wed, Mar 7, 2012 at 7:43 PM, Florian Zeitz <flo...@babelmonkeys.de> wrote:
Thank you. Believe it or notm I have in fact used an instant messenger
before. I however fail to see how this is relevant to any of this.
The point is that such a "seamless" reconnect (from the receivers UI
point of view) is (for the sender) indistinguishable from, let's say, my
device crashing and me signing in with the same resource from a
completely different device almost immediately after.

That's true.  The XEP-0301 spec allow real-time text to automatically appear again irregardless of how you reconnect (restart app, different client, manual relogin, automatic reconnect, etc).  Even as the sender continuously types.

Another reason I forgot to mention -- XEP-0301 works with concurrent login.

With concurrent-login, logging in from separate applications, even from separate computers (phone vs PC).  Real-time text still works, and it gets reinitialized via the regular reset retransmissions -- another useful purpose for retransmission.   You can't always detect concurrent-logins especially since your online status stays online.   All concurrent multiple-login clients successfully receive incoming RTT.  

Related scenario (crash & phantom concurrent login)
-- Imagine that your client crashes (hopefully not due to XEP-0301 ;-) or it's abruptly terminated (i.e. by Task Manager)
-- The crash occurs while the other person is sending incoming RTT.  
-- The other person sending you RTT doesn't notice you're offline yet because the XMPP server doesn't time you out instantly (turns you offline.).  Sometimes timeout is longer than your ability to re-launch your chat program.
-- You restart your client quickly
-- The other person's still-in-progress RTT reappears in your software quickly, thanks to the event='reset' retransmissions.
-- There is no offline/online status change in this situation, because the server still thinks there's 2 concurrent logins (including the original login that hasn't timed out yet after the crash)
-- Since XEP-0301 is designed to work successfully with concurrent login, RTT works as expected, and causes no furter-contributing inconveniences.

Related info about concurrent logins in Supplement document (including two different methods of handling incoming RTT from two or more concurrently logged-in clients) -- http://www.marky.com/realjabber/XMPP-RTT-Supplement_2011-06-17.pdf
This is just background information about MUC and concurrent-logins (which XEP-0301 was designed to be compatible with), but has been left out of XEP-0301 spec, for simplicity.


> I actually already specify this -- I already mention this as part of the
> "Error Recovery" -- that a client SHOULD do an event="new" or event="reset"
> if the user comes back online.   It is not currently a MUST because if the
> client doesn't do it, it is not fatal to interoperability of real-time text
> -- it just means real-time text automatically resumes a little bit later.
> (Using fewer "MUST"s is beneficial to a specification)
>
I can only assume you're talking about your working copy, because there
is not a single SHOULD requirement in that section in the published
version of XEP-0301.

Ooops, I meant "MAY".  Thanks for correcting me!
I'm referring to the word used in section 4.6.4 "Helping the Recipient Stay In Sync"
(I apologize -- I was going by memory)

 
> Note that retransmissions are OPTIONAL. (the planned addendum to XEP-0301)
> Note that no retransmissions occur when no typing is done.  (so we don't
> lose bandwidth during idle moments)
> Disclaimer: With one of my clients, I have done work to help implement
> XEP-0301 in a Next Generation 9-1-1 experimental demo, and this
> retransmission is actually added because of this paid-work experience.
>  Work for NG9-1-1 is currently one of my sources of income for XEP-0301.
>
Ack. "OPTIONAL" seems like an appropriate requirement level.

In regards to "Ack" -- if you are referring to concern about potential "commercial influences" -- don't worry.  I am a strong advocate of open-sourcing anyway -- my RealTimeText.java and RealTimeText.cs (C#) is released at  http://code.google.com/p/realjabber/ under the Apache 2.0 source code license which allows commercial usage -- so others can benefit too.  More widely-available XEP-0301 is good for accessibility reasons.  There is a strong incentive to give out free source code in this case.  Especially as I cannot use the telephone, and I benefit from conversational text technologies like RTT.  Eventually I'll produce a good developer webpage and package my various RTT code into "libraries" for various platforms that there's demand for.  In fact, I've noticed even non-deaf people love the feature (even more than video), though they need to be told about the existence of the feature.  

It wasn't originally an income source when I originally submitted.  I initially created a deaf-friendly chat program prototype back in late 2010 that warranted XEP-0301, then I got approached after I had already submitted XEP-0301 spec.  It's all good though; since it's one possible way of making XEP-0301 mainstream, and benefits the accessibility initatives long-term if XEP-0301 gets used for such an important purpose such as NG9-1-1 emergency service for the deaf.  The reason why XEP-0301 hasn't been updated recently is because I've had to work on a boring non-XMPP job -- so finally getting a little bit of XEP-0301 specific income from some sources is actually a good thing, and allows me to focus more effort on improving the standard -- more time to devote to working on the Standard. 

 
If you're going to (as you said earlier) drop the requirement of
starting at "0" this becomes more of a problem. Clients could implement
TCP-ish behaviour of starting at a random sequence number.
"Arithmetic on the sequence number MUST be be performed modulo 2**32. In

particular  the sequence number wraps around to 0 when incremented past
(2**32 - 1). Note that comparison of sequence numbers has to accommodate
for this fact."

Excellent point.  I'll find a single sentence to insert, probably similar to your suggestion.
I have to accomodate programming languages that don't support unsigned integers (Java only supports up to 2**31 for positive values), I think I'll need to make it modulo 2**31.  I have to add some confusing arithmetic in order to implement proper wraparound behaviour between my RealTimeText.Java and RealTimeText.cs ... Right now, is there any way I can avoid adding wraparound logic?

I have a new idea to pass by you, because I had also wanted to keep 'seq' as simple as possible.
-- For <rtt> elements WITHOUT an 'event' attribute, seq MUST increment for each consecutive <rtt> element.
-- For <rtt> elements WITH an 'event' attribute, seq MAY continue incrementing, or be changed to a random value.  (Reusing the same seq value, such as 0, are NOT RECOMMENDED, since it results in duplicate seq values used within a short time period.)
-- The seq value for an <rtt> element with an 'event' attribute, MUST be less than 2^30 (1,048,576).  This reasonably prevents any potential wraparound issue (from increments) from ever occuring within a human's lifetime with integer variables in any modern programming (signed 32-bit being the worst case), and eliminates the need to program wraparound arithmetic in software clients.

I'll need to convert this wording into a spec-friendly format, but what do you think of the general concept of avoiding wraparound?
I eventually plan to maintain several XEP-0301 codebases, and I'd rather not have to test "wraparound math interop" between Java vs C# vs C/C++
So this is simpler, as I'm not really trying to go the "full TCP replacement" -- but a simplest possible error-recovery algorithm.

 
> However, yes, I agree, this is probably confusing.  I am now re-wording
> things to try to clear up confusion; I will contact you later with a draft
> wording to confirm, before submitting v0.2 of XEP-0301 to XSF.
> You are right, I confused you, so I will fix the spec to avoid that.
>
At this point I'm actually very confused. Let me quote the published
version of XEP-0301 again:
"For <d> element (Forward Delete) and all other action elements, the
cursor position is unaffected."

Ooops, I see your confusion.  I apologize.  What it really means:
"For <d> element (Forward Delete) and all other action elements, the cursor position is unaffected by the 'n' value."

Which means, for <d> elements, cursor is indeed moved by the 'p' attribute, but never affected by the 'n' attribute (as it is for <e> attribute)
I really sincerely apologize for my confusion -- I will fix the wording, as you have pointed out an excellent potential source of confusion in XEP-0301.
Possible variant of a new simplified rewrite of the section (also accomodates removal of the <c> and <g> elements, for simplification):


Optional Remote Cursor

Implementation of cursor (caret indicator) for the incoming real-time text is OPTIONAL.  Recipient clients that do not support a remote cursor, can simply ignore keeping track of a cursor position, and skip this section.  

All action elements always have absolute cursor positioning.  When a  <t>, <e>, or <d> action element is processed, the beginning of the cursor position calculation is the absolute position in attribute "p".  If the "p" attribute is not specified, the cursor is put at the absolute position of the end of the message (p = length of message text).   Then the next subsequent modification of the cursor position is as follows:
  • With <t> element (Insert Text), the cursor position (from 'p') is incremented by length of the text being inserted, putting cursor at end of inserted text.
    This mimics the forward cursor movement of regular insertion of text.

  • With <e> element (Backspace), the cursor position  (from 'p') is decremented by 'n'. If 'n' is greater than 'p', then cursor position becomes 0.
    This mimics common cursor behaviour of a Backspace key.

  • With <d> element (Forward Delete), the cursor position (from 'p') is unaffected.
    This mimics common cursor behaviour of a Delete key.
Note that the <t/> element may be empty, and in this case, only the cursor position is updated in clients that support a remote cursor (and otherwise fully ignored by clients that do not support a remote cursor).  Note that <t/> without a 'p' value means the remote cursor should get put at the end of a line.  Senders MAY transmit empty <t/> elements (with or without a 'p' value) whenever the sender is repositioning the cursor without any text changes.  Note that a missing 'n' value for any attribute, is assumed to be a value of n='1'.

Thank you for pointing out a valuable confusion that needs to be cleared up.
Note -- this is actually the way RealJabber does it right now too, so it's backwards compatible with what I tried to make XEP-0301 do.  (In fact, this proposed XEP-0301 "Remote Cursor" edit is apparently still compatible with old implementations including older versions of RealJabber, since now-unsupported action elements such as <c> and <g> can be safely ignored)


So, maybe I just live in a to idealistic mindset, but I have severe

Realistically, in real practice, it's proven beneficial for unexpectedly large number of purposes beyond those anticipated.
-- Does the concept of multiple concurrent logins convince you, though? 
-- Does the concept of BOSH (with random HTTP errors) convince you? 

Nontheless, I share your ideals in generally, but I also simultaneously have to be realistic. 
In real-life (where it matters), I've actually found retransmits to be massively beneficial in everyday use, especially due to multiple-login.   
Conversation is no longer nearly as disrupted whenever I switch clients (switch computers) while the sender is still composing a real-time message, because RTT automatically resumes (since the online indicator never changes, and can't be used to trigger a retransmit).   Especially if the sender is taking 1-2 minutes to finish composing and I don't want to wait for the <body> to see the RTT message.

Note -- BOSH over HTTP can disconnect-reconnect many times, while making it look like a continuous XMPP session.  But sometimes the web server messes up too as a separate chain of events (i.e. random HTTP failures between successes) that's unrelated to the XMPP server.  Yes, I realize, theoretically it's possible to design the javascript in the web browser to buffer and retransmit failed HTTP requests (i.e. random HTTP successes, random HTTP errors, timeouts, etc), but that is not always the case, and it's often even difficult to do so with the strophe library.   

I promise to keep it flexible in my open source that I do now and the future -- allow retransmits to be turned off.
The Apache 2.0 licensed C# module RealTimeText.cs has a configurable retransmit interval that can be turned off.  (Encoder.Redundancy property = true|false) for those who want to turn off retransmit.

Your comments have been good!

Thanks,
Mark rejhon

Mark Rejhon

unread,
Mar 8, 2012, 12:34:59 AM3/8/12
to Florian Zeitz, stan...@xmpp.org
On Wed, Mar 7, 2012 at 3:27 PM, Florian Zeitz <flo...@babelmonkeys.de> wrote:
> 2. Advertising of RTT must always be done, for accessibility reasons.
> We can't disable section 5 or it defeats the ability of deaf people to
> attempt to initiate a RTT conversation in Adium/Pidgin (and soon
google, microsoft, etc).
> The goal of RTT is to become a widespread protocol in ten years, much like closed captioning.
> We want to be able to let the recipient be notified of an incoming RTT and to let the recipient decide whether to turn on/off RTT.
> If we stop advertising RTT, deaf people can't make call and the hearing person with the RTT off, won't ever be notified.
> Can you suggest an alternative method of refusing RTT that does not require disabling disco?
>
That is possibly quite different though. If a client is configured to
always reject RTT sessions there is no point in ever offering it to that
client. Therefore it's perfectly fine for such a client to not send the
disco feature.

I originally thought so, but unfortunately, that does not meet planned accessibility requirements. 
There are two excellent reasons why we need to be able to advertise capability even if the client auto-rejects

1. It's easier to design a client to display a subtle notification.  
("*** Incoming real-time text was rejected since you have it turned off. [_Click to turn on_]")
This serves as a reminder, especially if they only want it turned off 99% of the time.
This meets accessibility requirements.

2. It's easier to measure popularity of XEP-0301 supported clients.  Capability advertising provides an important data point.  This may eventually become important statistical data for advocating future paper work (i.e. government legislation that require real-time text capability on cell phones, as a replacement for obsolete deaf TTY/teletypewriters)

Also, AOL Instant Messenger's Real-Time IM is an enable/disable feature much like audio/video can be turned on/off in the middle of a conversation.  I am also preparing for the eventually of Google adding XEP-0301 someday to their products, so I need to make sure I'm covered with a good balance.  Although I am trying to avoid complicated session control. Different vendors may do things differently, here is my plan for my work with common open-source clients (i.e. Adium and Pidgin, etc):

This is the plan when adding XEP-0301 to, including Adium and Pidgin:
-- RTT should be available by default (to stay accessible)
-- RTT should not automatically transmit by default in mainstream clients. (People aren't expecting their typing to be transmitted live by default.)
-- RTT should be easy to enable "in-situ" even when it's disabled 99% of the time.
-- RTT should NOT be easy to forget about after it's turned off/disabled.
-- It should be possible to have RTT have at least three settings: auto-accept, auto-reject, and confirm-first.
-- Compromise default setting is 'confirm-first'.  Some people love RTT, and some people hate RTT.

Session Control is accomplished by:
-- Starting an RTT session is simply by starting to transmit <rtt> elements.
-- Receiving an RTT session is simply detecting incoming <rtt> elements.
-- Deactivating an RTT session might be done by sending <rtt event='cancel'> (decision still needs to be made)

Clients will, generally, be configurable to at least:
-- auto-accept: Always accept and display incoming RTT, always transmit outgoing RTT
-- auto-reject: Don't display incoming RTT, don't transmit outgoing RTT
-- confirm-first: (default) If incoming RTT is detected, ask user if they want outgoing RTT 

Clients will automatically display notifications in all three cases..
i.e. Potential example non-intrusive status messages added to message history (or statusbar) may include the following:
-- auto-accept: "*** Real-time text is active for this chat session"
-- auto-reject: "*** Incoming real-time text detected, but you rejected it. [Click to Configure...]"
-- confirm-first: (default) "*** Incoming real-time text detected.  [Click Activate] to enable live transmission of your typing too."
Note, these are just example/potential messages, that aims to make it as easy/accessible as possible even when RTT is turned off.

In the goal of introducing real-time text to the mainstream, I plan to make it as easy as possible (while not being annoying) for non-technical users to enable real-time text, even if they have real-time text disabled most of the time.  This is extremely important for deaf-accessibility reasons, especially since my goal is to make real-time text a standard feature in instant messaging software.  My users include older family members that barely know how to use a computer, and it's difficult to tell them step-by-step instructions on how to turn on real-time text.

Now, there are many different session-control standards that can be invented for XEP-0301, but I'm trying to avoid using protocol or out-of-band negotiation layers (i.e. Jingle) at the moment.   Ideas are welcome, but make sure that the XEP-0301 specification is able to accomodate the above plans that are currently planned for a few mainstream clients.  This was a difficult compromise agreement in the Real Time Text Taskforce (R3TF) that took several days to agree on, and we need to make sure that XEP-0301 has enough capability to permit the above scenarios.

Thanks,
Mark Rejhon
Reply all
Reply to author
Forward
0 new messages