(1) Which characters can be used in a header key?
(2) Which characters can be used in a header value?
(3) Which character encoding can be used to encode the headers?
(4) Is "foo:bar" the same as "foo: bar"
(5) Can the same key appear twice in the headers part of a given frame?
Then, the headers are used for multiple purposes:
- STOMP itself: content-length, ack, receipt...
- technology specific: http://stomp.codehaus.org/Stomp+JMS
- broker specific: activemq.exclusive or ack-timeout (ocamlmq)
- user-supplied message headers
It would be nice to have a naming convention to avoid name clashes.
Cheers,
Lionel
--
Regards,
Hiram
Blog: http://hiramchirino.com
Open Source SOA
http://fusesource.com/
https://svn.apache.org/repos/asf/activemq/stomp/trunk/webgen/src/stomp10/specification.page
got any objections to this BNF? :
CHAR = <any US-ASCII character (octets 0 - 127)>
header = header-name ":" header-value
header-name = 1*<any CHAR except LF or ":">
header-value = 1*<any CHAR except LF>
>>
>> (4) Is "foo:bar" the same as "foo: bar"
>
> I would say yes, though I guess it is unclear :-(
>
For sake of consistency.. it needs to generally be no. The only folks
you help out with that kind of whitespace laxness is folks using STOMP
from a keyboard playing with a STOMP server using telnet. I don't
think that should be too common.
>>
>> (5) Can the same key appear twice in the headers part of a given frame?
>
> Yes, though this may be a semantic error depending on the header, and
> I think this is probably wrong.
>
> Multiple destination headers is an interesting thought experiment, and
> I think adds more conceptual overhead for the nifty hacks it would
> allow. For 1.1, I'll propose that "specification headers (ie,
> destination, etc) must appear only once in a message" or some such
> verbage.
>
I posted and earlier topic related to this at:
http://groups.google.com/group/stomp-spec/browse_thread/thread/3e8715770a63effa#
I'd love to get your comments on that.
> -brian
>
>
>
>>
>> Then, the headers are used for multiple purposes:
>> - STOMP itself: content-length, ack, receipt...
>> - technology specific:http://stomp.codehaus.org/Stomp+JMS
>> - broker specific: activemq.exclusive or ack-timeout (ocamlmq)
>> - user-supplied message headers
>>
>> It would be nice to have a naming convention to avoid name clashes.
>>
>> Cheers,
>>
>> Lionel
--
I think this is too limiting - the number of cases where we encounter
non-ASCII data that needs to be attached is quite large e.g the name
of a person encoded in an X509 DN parsed by stomp+ssl that becomes
the JMSXUserId is one case we see right now when using activemq.
I think UTF-8 is more appropriate for the value (perhaps not necessary
in the header key).
cheers,
James.
Of course if we limit it to ASCII, it would be also ok to instead
support the same MIME header encoding mechanisms as HTTP :
http://www.ietf.org/rfc/rfc2047.txt.
cheers,
James.
--
I wonder if we really need header keys like ` or {\,
> > (2) Which characters can be used in a header value?
>
> ASCII (but not control chars)
IMHO, this is not enough.
Today, we use X.509 certificates for client-broker authentication and
they end up in the header (to track who connected to the broker). Some
of our DNs do already contain accentuated characters.
I think that, for values at least, we really need more than ASCII. I
would suggest to allow any Unicode string (without a newline).
In fact, even for the header keys, we may allow Unicode strings too
(without newline and colon). Since the headers in STOMP also carry the
message header, users may want to put meaningful strings wrt their
language...
> > (3) Which character encoding can be used to encode the headers?
>
> ASCII :-)
If we agree that Unicode is needed, UTF-8 would seem natural.
> > (4) Is "foo:bar" the same as "foo: bar"
>
> I would say yes, though I guess it is unclear :-(
If yes then how do you encode a value which is a string starting with
the space character?
Another look at the problem: what is the advantage of allowing spaces
after the colon?
> > (5) Can the same key appear twice in the headers part of a given frame?
>
> Yes, though this may be a semantic error depending on the header, and
> I think this is probably wrong.
Same question as above: what would be the advantage of allowing dups?
I can see problems (e.g. software handling them differently from what
is expected) but no advantages.
Cheers,
Lionel
see http://groups.google.com/group/stomp-spec/browse_thread/thread/3e8715770a63effa#
This brings back my initial question: which characters can be used in
a header value?
If \n is allowed, it must be encoded/escaped somehow.
If \n is not allowed, we do not have the problem.
Cheers,
Lionel
CHAR = <any US-ASCII character (octets 0 - 127)>
LF = <US-ASCII LF, linefeed (octect 10)>
header = header-name ":" header-value
header-name = 1*<any CHAR except LF or ":">
header-value = 1*<any CHAR except LF>
--
For the purposes of interoperability with other transport protocols
used in the same broker I think we need to specify a character
encoding which should be used if a broker has header values which
contain characters that are not US-ASCII. In particular since JMS
allows UTF-16 in any header (i.e. Java String) things will break
badly for a consumer in STOMP reading data sent by a JMS producer.
This works right now because actually most implementations do support
UTF-8/UTF-16 out of the box for header values.
cheers,
James.
IMHO, we have to be pragmatic here.
The big difference between header and body is that header is used and
seen by the messaging infrastructure. A good example is selectors.
In the header, users usually want to put text/string (I leave aside
the question of typed values since it has a separate thread). This
probably covers 99% of the use cases and we should have good support
for this. Obviously, ASCII is too restrictive. We could specify that
headers can only contain Unicode text (= sequence of Unicode
characters) and that we always use UTF-8 on the wire.
If some users want binary data in the header, they can simply Base64
encode it. I doubt that they want the messaging infrastructure to
decode and use this binary.
Cheers,
Lionel
--
--
Indeed.
In fact, both parties could have several size restrictions:
- maximum headers size (bytes in the headers part of the frame)
- maximum number of header keys
- maximum length of any given header value
- maximum body size
- ...
Is it worth opening a new thread to discuss these?
BTW, a server can tell the client "sorry, your message is too big" via
an ERROR frame. The client cannot send such an error to the server...
Cheers,
Lionel