"inspired" by the recent discussions surrounding XEP-0301 (Real Time
Text) I had a look over it's current status and felled I should provide
some feedback.
Some is rather minor some I'm somewhat more concerned about, I've
ordered it sequentially rather than by importance.
Section 3: Glossary
The "RTT" entry seems superfluous to me. I'd be better to just note the
acronym in the "real-time text" entry. Also the remark about the
element's name is misleading as that is generally lower-case.
Section 4.2.2: 'seq' attribute
It seems to me the start of a session (and therefore when to reset this
to 0) is not clearly defined, since the start and cancel events are
purely optional. In a more general sense I'm somewhat concerned about
the attempt to reimplement the transport layer on the application layer.
(see below)
Section 4.2.3: 'event' attribute
I feel the requirement for session start and cancel needs to be either
tightened (if we decide we absolutely need it for this protocol) or
removed. Having it truly optional makes it useless for detecting the
actual session start and end IMHO. It's sole purpose appears to be
mapping to SIP, which is a problem possible best handled separately.
The new and reset events appear to have been introduced under the
assumption that messages get lost. If they are not the reset event
can be safely removed and the new event is implicit upon receipt of a
<body/> element.
Section 4.5.1: action elements
The normative text in this section should be further explained.
E.g.: What is REQUIRED for the <t/> element? Support, inclusion in each
<rtt/> element, etc. (It is relatively clear to me what you mean, I just
wish it was somewhat more fleshed out)
"A client conforming to this specification MUST accept <t/>, <e/> and
<d/> elements and handle them as described in the following..."
Section 4.5.1.3: counting
It appears to me that the rules for determining the position and count
of code points are somewhat backwards. In particular if the sending
client does perform any normalization before sending the counts need to
be based on the normalized version since the receiving client can not
undo such normalization (this is the opposite of what is described in
the text). Also most of the described transformations are only relevant
for display on screen and should not change the string.
IMHO it should suffice to count code points based on what is send over
the wire.
Section 4.5.2: action elements
I'd like to hear some rational on why there is forward and backward
delete. Both appear to be able to generate the same results.
It did occur to me that they are meant to be used in conjunction with
cursor display. However, it appears that this would cause interesting
possible situations. E.g. what happens if a character is forward deleted
at a position preceding the cursor. In that situation the absolute
position of the cursor should move one to the left, but will instead
move 1 to the right relative to the text (it might move over the right
end). I'd prefer expected cursor position to always be transmitted
explicitly in these cases and have either delete variant removed.
Section 4.6: error recovery
As mentioned before, the attempt to correct errors is my biggest concern
about this XEP. For the case of reconnects it appears to me that the
sending client will always be able to notice this situation and treat it
as a new RTT session.
Messages being dropped by servers on the other hand is an issue I've not
yet experienced. I'm however willing to believe there are servers in
existence that do this. Ideally I'd expect this to be discoverable via
an error response, but seeing how this itself generates traffic I can
see how implementations would not send this. Do we have other protocols
that need to respect this?
Ideally I'd like to simplify the protocol a bit in this matters.
E.g. sequence numbers could be reused for each new RTT message,
seq="0" would then imply event="new", etc. However, I've not thought
through all implications of such a system yet.
Section 6.4.1: message length limit
In the second example with the split messages I would not have expected
an empty <rtt/> element. If that is actually intended to indicate the
<body/> is part of the RTT session this should be mentioned elsewhere.
Regards,
Florian Zeitz
Hello standards-list, hello Mark, "inspired" by the recent discussions surrounding XEP-0301 (Real Time Text) I had a look over it's current status and felled I should provide some feedback. Some is rather minor some I'm somewhat more concerned about, I've ordered it sequentially rather than by importance.
Section 3: Glossary
The "RTT" entry seems superfluous to me. I'd be better to just note the
acronym in the "real-time text" entry. Also the remark about the
element's name is misleading as that is generally lower-case.
Section 4.2.2: 'seq' attribute
It seems to me the start of a session (and therefore when to reset this
to 0) is not clearly defined, since the start and cancel events are
purely optional. In a more general sense I'm somewhat concerned about
the attempt to reimplement the transport layer on the application layer.
(see below)
Section 4.2.3: 'event' attribute
I feel the requirement for session start and cancel needs to be either
tightened (if we decide we absolutely need it for this protocol) or
removed. Having it truly optional makes it useless for detecting the
actual session start and end IMHO. It's sole purpose appears to be
mapping to SIP, which is a problem possible best handled separately.
The new and reset events appear to have been introduced under the
assumption that messages get lost. If they are not the reset event
can be safely removed and the new event is implicit upon receipt of a
<body/> element.
Section 4.5.1: action elements
The normative text in this section should be further explained.
E.g.: What is REQUIRED for the <t/> element? Support, inclusion in each
<rtt/> element, etc. (It is relatively clear to me what you mean, I just
wish it was somewhat more fleshed out)
"A client conforming to this specification MUST accept <t/>, <e/> and
<d/> elements and handle them as described in the following..."
Section 4.5.1.3: counting
It appears to me that the rules for determining the position and count
of code points are somewhat backwards. In particular if the sending
client does perform any normalization before sending the counts need to
be based on the normalized version since the receiving client can not
undo such normalization (this is the opposite of what is described in
the text). Also most of the described transformations are only relevant
for display on screen and should not change the string.
IMHO it should suffice to count code points based on what is send over
the wire.
Section 4.5.2: action elements
I'd like to hear some rational on why there is forward and backward
delete. Both appear to be able to generate the same results.
It did occur to me that they are meant to be used in conjunction with
cursor display. However, it appears that this would cause interesting
possible situations. E.g. what happens if a character is forward deleted
at a position preceding the cursor. In that situation the absolute
position of the cursor should move one to the left, but will instead
move 1 to the right relative to the text (it might move over the right
end). I'd prefer expected cursor position to always be transmitted
explicitly in these cases and have either delete variant removed.
Section 4.6: error recovery
As mentioned before, the attempt to correct errors is my biggest concern
about this XEP. For the case of reconnects it appears to me that the
sending client will always be able to notice this situation and treat it
as a new RTT session.
Section 6.4.1: message length limit
In the second example with the split messages I would not have expected
an empty <rtt/> element. If that is actually intended to indicate the
<body/> is part of the RTT session this should be mentioned elsewhere.
> We need to keep the 'seq' attribute, as it is essential for message
> integrity during less-than-ideal situations. I actually expanded it to
> recommend an event='reset' once every 10 XMPP messages, to improve
> resilence even further -- the latest version of RealJabber now does this.
> (you can the new RealJabber on two computers -- and test this concept by
> disconnecting-reconnecting in the middle of a conversation while the other
> person is still typing real-time in RealJabber, and the real-time text
> recovers automatically within a few seconds of reconnecting because of the
> automatic event='reset' occuring at regular intervals.)
>
As stated above I do believe sync should be explicit and instant upon
reconnect. Effectively transmitting everything that has already been
typed every 10 messages just to compensate congestion control seems like
overdoing it to me...
> Also, very rarely (it only happened on BlackBerry's Google Talk client)
> when you've transmitted several XMPP messages simultaneously (i.e. network
> congestion) and they all get the same timestamp, and then they get
> delivered out-of-order (wrong order) that they were transmitted. This
> should never happen, but it actually occasionally does. An earlier
> version of XEP-0301 was more complex, having a 'msg' attribute for message
> number. That was removed, reduced down to just one 'seq' value for
> simplification.
>
Side note: I'd rather have us fix implementations than design protocols
to survive them. XMPP does guarantee in-order delivery.
> -- 'seq' does not need to start at 0. I'll eliminate that requirement
> (clarification)
> -- You can change the 'seq' value anytime there's an event='new' or
> event='reset'.
> Setting it back to 0 again works, although I prefer not to reset it to 0
> because of the danger of a user disconnecting while seq='0' and
> reconnecting after it's incremented, reset back to '0' again, and
> incremented to '1', and the user had reconnected, getting consecutive seq
> numbers for totally different real-time messages that were never delivered.
> In this case, the wrong <rtt> will be displayed, resulting in rare text
> scrambling. This actually happened once in my random testing, so I stopped
> resetting seq back to 0 everytime there was an event='reset'.
>
Which reminds me of something I forgot to mention in my last message.
Unless I overlooked it there is no defined behaviour once 'seq' is
incremented past 2^32-1. Assuming wrap around, it would be nice to have
a note that implementations need to make sure to not accidentally assume
desynchronization when this happens.
> You are right, that 'start' and 'cancel' is not really required, so I may
> be removing them. However, I think that 'cancel' may still be necessary to
> signal the other recipient to stop transmitting incoming <rtt> for the
> remainder of the chat session, in order to save bandwidth, whenever the
> recipient wants to turn off RTT (i.e. via a button or switch, while in
> middle of conversation). So there's still a usefulness for 'cancel' even
> if 'start' is not neeed (the start of RTT during a chat session is simply
> the first delivery of an <rtt> element.)
>
I'm not convinced "cancel" is needed. Not advertising the disco feature
seems perfectly sufficient to avoid receiving RTT messages. I'm doubt
there are cases where this is a per contact choice, or RTT would be
disabled after half a conversation. It would also always be possible to
politely ask the sender to turn off RTT as a last resort.
> We have to keep "Unicode code points" (more on that later) but I agree that
> normalization paragraph is definitely confusing, so I've modified it to the
> following wording:
> *
> *
> * "For interoperability of p and n values, processing MUST be done on
> the transmitted Unicode real-time message. For senders , this is the
> version of the Unicode message text after any Unicode normalization,
> emoticon graphics images conversion to Unicode, display text
> formatting, processing of Unicode combining marks, etc. For recipients
> obtaining text from the <t> element, this is the Unicode text immediately
> after XML processing, and before any further processing. From the
> perspective of p and n values, a real-time message is treated as an
> editable array of Unicode code points."*
>
This text seems fine to me. Notice that I did never question code points
were the right thing to count. I fully agree with you there.
> Also, I've actually stopped using <c/> because I realized an empty <t
> p='#'/> element does exactly the same thing as <c p='#'/> ...
> Therefore I am actually thinking of removing two action elements from the
> next XEP-0301
> - Remove <c/> because I can use empty <t/> to do exactly the same thing.
> - Remove <g/> because I can use XEP-0224 instead successfully anyway.
> That reduces the number of action elements to just 4, and it makes it easy
> to merge Tier 1 with Tier 2 into one unified table for simplicity.
> However, I've found enough reason to keep both <d> and <e> -- but our team
> can still be swayed by further arguments against having both.
>
I like that approach. Always having cursor repositioning implicit in the
edit actions seems compelling to me. At that point having both <d/> and
<e/> would not seem as icky to me either. As I understand it the main
change here would be that <d/> and <e/> are redefined to perform
absolute repositioning of the cursor instead of performing a relative
movement of 0 and -1 respectively. This also addresses my other concern
about the possibility of positioning the cursor outside of the text.
> Error recovery is actually simpler than it looks -- it consumes less than
> 5% of the source code in RealTimeText.cs
> One of my business clients had serious problems without error recovery (we
> had an attempt to make it optional), so we actually expanded Error Recovery
> to RECOMMEND event='reset' at regular intervals, such as once every 10
> <rtt> messages (or once every 10 seconds). Also, sometimes you can't
> detect online/offline transitions, for example, Google Talk network
> sometimes can't see the online/offline status of jabber.org users, and you
> can have RTT conversations with users that appear offline (i.e. invisible).
>
Starting a RTT conversation with an already offline user should be
impossible due to the requirement to check the stream feature. Also I
tend to believe that when invisible users are talking to someone they
should "smoke out" (as someone called it) first. Meaning an invisible
user should send directed presence to whomever he is going to message first.
Interop problems are always unfortunate, but again I'd rather see them
fixed than designing protocols to work around them.
I'm glad I could provide some helpful feedback,
Florian
> 1. Xep-0301 never uses relative cursor positioning. I am confused.
> Can you re-explain, because <e> and <d> have always used absolute positioning and aren't being changed here; just the discussion whether to eliminate one or the other.
>
Maybe that was the case in your implementation, however XEP-0301
currently states:
"* For <e> element (Backspace), the cursor position is moved left as
text is deleted.
* For <d> element (Forward Delete) and all other action elements, the
cursor position is unaffected."
This sounds like relative positioning to me. Absolute would be setting
the cursor position to the 'p' attributes value for forward delete and
to the difference of 'p' and 'n' attributes on backwards delete
> 2. Advertising of RTT must always be done, for accessibility reasons.
> We can't disable section 5 or it defeats the ability of deaf people to
> attempt to initiate a RTT conversation in Adium/Pidgin (and soon
google, microsoft, etc).
> The goal of RTT is to become a widespread protocol in ten years, much like closed captioning.
> We want to be able to let the recipient be notified of an incoming RTT and to let the recipient decide whether to turn on/off RTT.
> If we stop advertising RTT, deaf people can't make call and the hearing person with the RTT off, won't ever be notified.
> Can you suggest an alternative method of refusing RTT that does not require disabling disco?
>
That is possibly quite different though. If a client is configured to
always reject RTT sessions there is no point in ever offering it to that
client. Therefore it's perfectly fine for such a client to not send the
disco feature.
I think what you're concerned about is not that people can't receive RTT
messages, but rather that they can't be prompted to send them.
A possible way forward might be to define one feature indicating
willingness to receive RTT messages, and one indicating willingness to
send them upon request. Which would of course imply introducing a way to
request a session, as opposed to having a way to cancel a session you
never asked for.
Possibly someone else can come up with a yet better idea though.
> The seq was developed, because, it was found in fact to be necessary.
[snip]
One of my main points here was that I don't see the use specifically for
reconnects. Upon reconnect it is clearly the sender's responsibility to
start a new session with the receiver. After all the receiver might have
lost all state at that point.
I think it would be way more sensible to
specify that the first RTT message sent to a resource that was
previously offline needs to have event="new" or event="reset".
For stanzas getting lost on life streams sequence numbers appear like a
sane solution though.
As stated above I do believe sync should be explicit and instant upon
reconnect. Effectively transmitting everything that has already been
typed every 10 messages just to compensate congestion control seems like
overdoing it to me...
Side note: I'd rather have us fix implementations than design protocols
to survive them. XMPP does guarantee in-order delivery.
Which reminds me of something I forgot to mention in my last message.
Unless I overlooked it there is no defined behaviour once 'seq' is
incremented past 2^32-1. Assuming wrap around, it would be nice to have
a note that implementations need to make sure to not accidentally assume
desynchronization when this happens.
> However, I've found enough reason to keep both <d> and <e> -- but our team
> can still be swayed by further arguments against having both.
>
I like that approach. Always having cursor repositioning implicit in the
edit actions seems compelling to me. At that point having both <d/> and
<e/> would not seem as icky to me either. As I understand it the main
change here would be that <d/> and <e/> are redefined to perform
absolute repositioning of the cursor instead of performing a relative
movement of 0 and -1 respectively. This also addresses my other concern
about the possibility of positioning the cursor outside of the text.
Interop problems are always unfortunate, but again I'd rather see them
fixed than designing protocols to work around them.
However, the below fully justifies the usage of rudimentary (and mostly optional) error recovery that XEP-0301 supports:-- Wireless reception is not always good. We can't always do perfect wireless signal.-- And a group of us needs to meet requirements for mission-critical reliability of Next Generation 9-1-1
>> I think it would be way more sensible to
>> specify that the first RTT message sent to a resource that was
>> previously offline needs to have event="new" or event="reset".
>> For stanzas getting lost on life streams sequence numbers appear like a
>> sane solution though.
>>
>
> I actually already specify this -- I already mention this as part of the
> "Error Recovery" -- that a client SHOULD do an event="new" or event="reset"
> if the user comes back online. It is not currently a MUST because if the
> client doesn't do it, it is not fatal to interoperability of real-time text
> -- it just means real-time text automatically resumes a little bit later.
> (Using fewer "MUST"s is beneficial to a specification)
>
I can only assume you're talking about your working copy, because there
is not a single SHOULD requirement in that section in the published
version of XEP-0301.
I do also believe this is entirely the wrong place for such a statement.
The matter of fact is simply that a newly only resource (no matter
whether it reconnected, or connected for the first time) SHOULD be sent
an <rtt/> containing either event="new" or event="reset" as the first
RTT message. Doing anything else is bound to imply sending data to the
receiver that it can not possibly understand.
I do however agree that the way XEP-301 works this is in fact a SHOULD
requirement, as failure to comply with this is not fatal.
> Note that retransmissions are OPTIONAL. (the planned addendum to XEP-0301)
> Note that no retransmissions occur when no typing is done. (so we don't
> lose bandwidth during idle moments)
> Disclaimer: With one of my clients, I have done work to help implement
> XEP-0301 in a Next Generation 9-1-1 experimental demo, and this
> retransmission is actually added because of this paid-work experience.
> Work for NG9-1-1 is currently one of my sources of income for XEP-0301.
>
Ack. "OPTIONAL" seems like an appropriate requirement level.
>> Which reminds me of something I forgot to mention in my last message.
>> Unless I overlooked it there is no defined behaviour once 'seq' is
>> incremented past 2^32-1. Assuming wrap around, it would be nice to have
>> a note that implementations need to make sure to not accidentally assume
>> desynchronization when this happens.
>>
>
> You're right -- I knew I should have added a note -- but then I then
> realized it would take more about 50,000 years for this to happen, and the
> desynchronization would only be very temporary (it would only last until
> the next event='new' or event='reset'). Since seq increments once per
> second during continuous typing, it would take 50,000 years of continuous
> typing at a default transmission interval, for the wraparound scenario to
> happen. And if it even did wraparound, the penalty is only a stall of a
> few seconds caused by not defining wraparound behaviour. Also, some
> programming languages (i.e. Java) do not have unsigned integers (only
> signed integers), which compliates trying to a note about an event that
> would never happen in normal practice. Would you like to suggest an idea
> of a single-sentence note that can be added?
>
If you're going to (as you said earlier) drop the requirement of
starting at "0" this becomes more of a problem. Clients could implement
TCP-ish behaviour of starting at a random sequence number. The upper
limit would then potentially be hit sooner.
Somewhat inspired by the text in RFC 793 I'd suggest:
"Arithmetic on the sequence number MUST be be performed modulo 2**32. In
particular the sequence number wraps around to 0 when incremented past
(2**32 - 1). Note that comparison of sequence numbers has to accommodate
for this fact."
At this point I'm actually very confused. Let me quote the published
version of XEP-0301 again:
"For <d> element (Forward Delete) and all other action elements, the
cursor position is unaffected."
So I would assume that (following that text) after performing
<c p="0"/><d p="42"/>
the cursor would remain at position "0".
What you suggested appeared to me like you wanted this to move the
cursor to position "42". That seems quite sensible to me, but would be a
clear change from that text. Am I overlooking something?
>> Interop problems are always unfortunate, but again I'd rather see them
>
> fixed than designing protocols to work around them.
>>
>
> Agreed, agreed!
> I do agree some justifications are not good (i.e. out-of-order message
> delivery).
>
> However, the below fully justifies the usage of rudimentary (and mostly
> optional) error recovery that XEP-0301 supports:
> -- Wireless reception is not always good. We can't always do perfect
> wireless signal.
> -- And a group of us needs to meet requirements for mission-critical
> reliability of Next Generation 9-1-1
>
So, maybe I just live in a to idealistic mindset, but I have severe
doubts that a terrible wireless signal can ever yield lost stanzas.
That would not only imply that TCP packages were completely lost (no
retransmits etc.), but also that they were lost in such a way that the
stream's XML was still valid. For now I refuse to believe that until
proven wrong.
Mission-critical reliability is only really an issue if it is endangered.
I've currently accepted that we need sequence numbers only because
you're telling me that some XMPP servers perform congestion control that
necessitates this (if that's the case, so be it). I've not yet heard any
other plausible example of stanzas being lost.
Regards,
Florian Zeitz
P.S.: Having seen your other mail.
.de is in fact Germany, Denmark is .dk.
Also I was fully aware of what 9-1-1 is, but thank you for thinking
about local differences :).
Thank you. Believe it or notm I have in fact used an instant messenger
before. I however fail to see how this is relevant to any of this.
The point is that such a "seamless" reconnect (from the receivers UI
point of view) is (for the sender) indistinguishable from, let's say, my
device crashing and me signing in with the same resource from a
completely different device almost immediately after.
> I actually already specify this -- I already mention this as part of the
> "Error Recovery" -- that a client SHOULD do an event="new" or event="reset"
> if the user comes back online. It is not currently a MUST because if the
> client doesn't do it, it is not fatal to interoperability of real-time text
> -- it just means real-time text automatically resumes a little bit later.
> (Using fewer "MUST"s is beneficial to a specification)
>
I can only assume you're talking about your working copy, because there
is not a single SHOULD requirement in that section in the published
version of XEP-0301.
> Note that retransmissions are OPTIONAL. (the planned addendum to XEP-0301)
> Note that no retransmissions occur when no typing is done. (so we don't
> lose bandwidth during idle moments)
> Disclaimer: With one of my clients, I have done work to help implement
> XEP-0301 in a Next Generation 9-1-1 experimental demo, and this
> retransmission is actually added because of this paid-work experience.
> Work for NG9-1-1 is currently one of my sources of income for XEP-0301.
>
Ack. "OPTIONAL" seems like an appropriate requirement level.
If you're going to (as you said earlier) drop the requirement of
starting at "0" this becomes more of a problem. Clients could implement
TCP-ish behaviour of starting at a random sequence number.
"Arithmetic on the sequence number MUST be be performed modulo 2**32. In
particular the sequence number wraps around to 0 when incremented past
(2**32 - 1). Note that comparison of sequence numbers has to accommodate
for this fact."
> However, yes, I agree, this is probably confusing. I am now re-wording
> things to try to clear up confusion; I will contact you later with a draft
> wording to confirm, before submitting v0.2 of XEP-0301 to XSF.
> You are right, I confused you, so I will fix the spec to avoid that.
>
At this point I'm actually very confused. Let me quote the published
version of XEP-0301 again:
"For <d> element (Forward Delete) and all other action elements, the
cursor position is unaffected."
So, maybe I just live in a to idealistic mindset, but I have severe
> 2. Advertising of RTT must always be done, for accessibility reasons.
> We can't disable section 5 or it defeats the ability of deaf people to
> attempt to initiate a RTT conversation in Adium/Pidgin (and soon
google, microsoft, etc).
> The goal of RTT is to become a widespread protocol in ten years, much like closed captioning.
> We want to be able to let the recipient be notified of an incoming RTT and to let the recipient decide whether to turn on/off RTT.
> If we stop advertising RTT, deaf people can't make call and the hearing person with the RTT off, won't ever be notified.
> Can you suggest an alternative method of refusing RTT that does not require disabling disco?
>
That is possibly quite different though. If a client is configured to
always reject RTT sessions there is no point in ever offering it to that
client. Therefore it's perfectly fine for such a client to not send the
disco feature.