Headers Specification

Lionel Cons

unread,

Jun 15, 2010, 5:56:40 AM6/15/10

to stomp...@googlegroups.com

Here are some points regarding frame headers not covered in the 1.0 spec.

(1) Which characters can be used in a header key?

(2) Which characters can be used in a header value?

(3) Which character encoding can be used to encode the headers?

(4) Is "foo:bar" the same as "foo: bar"

(5) Can the same key appear twice in the headers part of a given frame?

Then, the headers are used for multiple purposes:
- STOMP itself: content-length, ack, receipt...
- technology specific: http://stomp.codehaus.org/Stomp+JMS
- broker specific: activemq.exclusive or ack-timeout (ocamlmq)
- user-supplied message headers

It would be nice to have a naming convention to avoid name clashes.

Cheers,

Lionel

Dejan Bosanac

unread,

Jun 15, 2010, 8:29:12 AM6/15/10

to stomp...@googlegroups.com

Hi,

I think we should provide BNF of the spec. Something like Hiram did for the current spec

http://fisheye6.atlassian.com/browse/activemq/stomp/trunk/webgen/src/stomp10/specification.page?r=HEAD

Cheers
--
Dejan Bosanac - http://twitter.com/dejanb

Open Source Integration - http://fusesource.com/
ActiveMQ in Action - http://www.manning.com/snyder/
Blog - http://www.nighttale.net

Hiram Chirino

unread,

Jun 15, 2010, 8:45:09 AM6/15/10

to stomp...@googlegroups.com

Speaking of the actual specification document for 1.1, I'd like to
propose we maintain it as a markdown document in a github project?
Perhaps with an auxiliary website generated with webgen?

--
Regards,
Hiram

Blog: http://hiramchirino.com

Open Source SOA
http://fusesource.com/

Dejan Bosanac

unread,

Jun 15, 2010, 9:04:27 AM6/15/10

to stomp...@googlegroups.com

+1

Cheers
--
Dejan Bosanac - http://twitter.com/dejanb

Open Source Integration - http://fusesource.com/
ActiveMQ in Action - http://www.manning.com/snyder/
Blog - http://www.nighttale.net

brianm

unread,

Jun 15, 2010, 11:14:01 AM6/15/10

to stomp-spec

Sure, why not :-)

http://github.com/stomp/stomp-spec

Will get 1.0 in later today (unless someone beats me to it and sends a
pull request) for purposes of building on incrementally.

-Brian

On Jun 15, 7:04 am, Dejan Bosanac <de...@nighttale.net> wrote:
> +1
>
> Cheers
> --
> Dejan Bosanac -http://twitter.com/dejanb
>
> Open Source Integration -http://fusesource.com/
> ActiveMQ in Action -http://www.manning.com/snyder/

> Blog -http://www.nighttale.net

>
> On Tue, Jun 15, 2010 at 2:45 PM, Hiram Chirino <hi...@hiramchirino.com>wrote:
>
>
>
> > Speaking of the actual specification document for 1.1, I'd like to
> > propose we maintain it as a markdown document in a github project?
> > Perhaps with an auxiliary website generated with webgen?
>
> > On Tue, Jun 15, 2010 at 8:29 AM, Dejan Bosanac <de...@nighttale.net>
> > wrote:
> > > Hi,
> > > I think we should provide BNF of the spec. Something like Hiram did for
> > the
> > > current spec
>

> >http://fisheye6.atlassian.com/browse/activemq/stomp/trunk/webgen/src/...
>
> > > Cheers
> > > --
> > > Dejan Bosanac -http://twitter.com/dejanb
>
> > > Open Source Integration -http://fusesource.com/
> > > ActiveMQ in Action -http://www.manning.com/snyder/
> > > Blog -http://www.nighttale.net
>
> > > On Tue, Jun 15, 2010 at 11:56 AM, Lionel Cons <lionel.c...@cern.ch>

brianm

unread,

Jun 15, 2010, 11:19:31 AM6/15/10

to stomp-spec

Proposed answers inline:

On Jun 15, 3:56 am, Lionel Cons <lionel.c...@cern.ch> wrote:
> Here are some points regarding frame headers not covered in the 1.0 spec.
>
> (1) Which characters can be used in a header key?

ASCII (but not control chars)

>
> (2) Which characters can be used in a header value?

ASCII (but not control chars)

>
> (3) Which character encoding can be used to encode the headers?

ASCII :-)

>
> (4) Is "foo:bar" the same as "foo: bar"

I would say yes, though I guess it is unclear :-(

>
> (5) Can the same key appear twice in the headers part of a given frame?

Yes, though this may be a semantic error depending on the header, and
I think this is probably wrong.

Multiple destination headers is an interesting thought experiment, and
I think adds more conceptual overhead for the nifty hacks it would
allow. For 1.1, I'll propose that "specification headers (ie,
destination, etc) must appear only once in a message" or some such
verbage.

-brian

Hiram Chirino

unread,

Jun 15, 2010, 11:23:56 AM6/15/10

to stomp...@googlegroups.com

Here's a version of 1.0 in markdown (mostly, just strip off the top)

https://svn.apache.org/repos/asf/activemq/stomp/trunk/webgen/src/stomp10/specification.page

brianm

unread,

Jun 15, 2010, 11:28:31 AM6/15/10

to stomp-spec

On Jun 15, 9:23 am, Hiram Chirino <hi...@hiramchirino.com> wrote:
> Here's a version of 1.0 in markdown (mostly, just strip off the top)
>

> https://svn.apache.org/repos/asf/activemq/stomp/trunk/webgen/src/stom...

Awesome, imported!

Hiram Chirino

unread,

Jun 15, 2010, 11:33:12 AM6/15/10

to stomp...@googlegroups.com

On Tue, Jun 15, 2010 at 11:19 AM, brianm <bri...@skife.org> wrote:
> Proposed answers inline:
>
> On Jun 15, 3:56 am, Lionel Cons <lionel.c...@cern.ch> wrote:
>> Here are some points regarding frame headers not covered in the 1.0 spec.
>>
>> (1) Which characters can be used in a header key?
>
> ASCII (but not control chars)
>
>>
>> (2) Which characters can be used in a header value?
>
> ASCII (but not control chars)
>
>>
>> (3) Which character encoding can be used to encode the headers?
>
> ASCII :-)
>

got any objections to this BNF? :

CHAR = <any US-ASCII character (octets 0 - 127)>
header = header-name ":" header-value
header-name = 1*<any CHAR except LF or ":">
header-value = 1*<any CHAR except LF>

>>
>> (4) Is "foo:bar" the same as "foo: bar"
>
> I would say yes, though I guess it is unclear :-(
>

For sake of consistency.. it needs to generally be no. The only folks
you help out with that kind of whitespace laxness is folks using STOMP
from a keyboard playing with a STOMP server using telnet. I don't
think that should be too common.

>>
>> (5) Can the same key appear twice in the headers part of a given frame?
>
> Yes, though this may be a semantic error depending on the header, and
> I think this is probably wrong.
>
> Multiple destination headers is an interesting thought experiment, and
> I think adds more conceptual overhead for the nifty hacks it would
> allow. For 1.1, I'll propose that "specification headers (ie,
> destination, etc) must appear only once in a message" or some such
> verbage.
>

I posted and earlier topic related to this at:
http://groups.google.com/group/stomp-spec/browse_thread/thread/3e8715770a63effa#
I'd love to get your comments on that.

> -brian
>
>
>
>>
>> Then, the headers are used for multiple purposes:
>> - STOMP itself: content-length, ack, receipt...
>> - technology specific:http://stomp.codehaus.org/Stomp+JMS
>> - broker specific: activemq.exclusive or ack-timeout (ocamlmq)
>> - user-supplied message headers
>>
>> It would be nice to have a naming convention to avoid name clashes.
>>
>> Cheers,
>>
>> Lionel

--

James Casey

unread,

Jun 15, 2010, 11:34:37 AM6/15/10

to stomp...@googlegroups.com

On 15 June 2010 17:19, brianm <bri...@skife.org> wrote:
> Proposed answers inline:
>
> On Jun 15, 3:56 am, Lionel Cons <lionel.c...@cern.ch> wrote:
>> Here are some points regarding frame headers not covered in the 1.0 spec.
>>
>> (1) Which characters can be used in a header key?
>
> ASCII (but not control chars)
>
>>
>> (2) Which characters can be used in a header value?
>
> ASCII (but not control chars)
>
>>
>> (3) Which character encoding can be used to encode the headers?
>
> ASCII :-)
>

I think this is too limiting - the number of cases where we encounter
non-ASCII data that needs to be attached is quite large e.g the name
of a person encoded in an X509 DN parsed by stomp+ssl that becomes
the JMSXUserId is one case we see right now when using activemq.

I think UTF-8 is more appropriate for the value (perhaps not necessary
in the header key).

cheers,

James.

James Casey

unread,

Jun 15, 2010, 11:40:37 AM6/15/10

to stomp...@googlegroups.com

On 15 June 2010 17:34, James Casey <james...@gmail.com> wrote:
> On 15 June 2010 17:19, brianm <bri...@skife.org> wrote:
>> Proposed answers inline:
>>
>> On Jun 15, 3:56 am, Lionel Cons <lionel.c...@cern.ch> wrote:
>>> Here are some points regarding frame headers not covered in the 1.0 spec.
>>>
>>> (1) Which characters can be used in a header key?
>>
>> ASCII (but not control chars)
>>
>>>
>>> (2) Which characters can be used in a header value?
>>
>> ASCII (but not control chars)
>>
>>>
>>> (3) Which character encoding can be used to encode the headers?
>>
>> ASCII :-)
>>
>
> I think this is too limiting - the number of cases where we encounter
> non-ASCII data that needs to be attached is quite large e.g the name
> of a person encoded in an X509 DN parsed by stomp+ssl that becomes
> the JMSXUserId is one case we see right now when using activemq.
>
> I think UTF-8 is more appropriate for the value (perhaps not necessary
> in the header key).
>

Of course if we limit it to ASCII, it would be also ok to instead
support the same MIME header encoding mechanisms as HTTP :
http://www.ietf.org/rfc/rfc2047.txt.

cheers,

James.

Hiram Chirino

unread,

Jun 15, 2010, 11:44:05 AM6/15/10

to stomp...@googlegroups.com

In general your going to have to encode binary data anyways since I
think it's important for the value to NOT contain the header
terminator char '\n'.

--

Lionel Cons

unread,

Jun 15, 2010, 12:03:13 PM6/15/10

to stomp...@googlegroups.com

brianm <bri...@skife.org> writes:
> > (1) Which characters can be used in a header key?
>
> ASCII (but not control chars)

I wonder if we really need header keys like ` or {\,

> > (2) Which characters can be used in a header value?
>
> ASCII (but not control chars)

IMHO, this is not enough.

Today, we use X.509 certificates for client-broker authentication and
they end up in the header (to track who connected to the broker). Some
of our DNs do already contain accentuated characters.

I think that, for values at least, we really need more than ASCII. I
would suggest to allow any Unicode string (without a newline).

In fact, even for the header keys, we may allow Unicode strings too
(without newline and colon). Since the headers in STOMP also carry the
message header, users may want to put meaningful strings wrt their
language...

> > (3) Which character encoding can be used to encode the headers?
>
> ASCII :-)

If we agree that Unicode is needed, UTF-8 would seem natural.

> > (4) Is "foo:bar" the same as "foo: bar"
>
> I would say yes, though I guess it is unclear :-(

If yes then how do you encode a value which is a string starting with
the space character?

Another look at the problem: what is the advantage of allowing spaces
after the colon?

> > (5) Can the same key appear twice in the headers part of a given frame?
>
> Yes, though this may be a semantic error depending on the header, and
> I think this is probably wrong.

Same question as above: what would be the advantage of allowing dups?

I can see problems (e.g. software handling them differently from what
is expected) but no advantages.

Cheers,

Lionel

Hiram Chirino

unread,

Jun 15, 2010, 12:07:05 PM6/15/10

to stomp...@googlegroups.com

On Tue, Jun 15, 2010 at 12:03 PM, Lionel Cons <lione...@cern.ch> wrote:
>
> Same question as above: what would be the advantage of allowing dups?
>
> I can see problems (e.g. software handling them differently from what
> is expected) but no advantages.
>

see http://groups.google.com/group/stomp-spec/browse_thread/thread/3e8715770a63effa#

Lionel Cons

unread,

Jun 15, 2010, 12:12:41 PM6/15/10

to stomp...@googlegroups.com

Hiram Chirino <hi...@hiramchirino.com> writes:
> In general your going to have to encode binary data anyways since I
> think it's important for the value to NOT contain the header
> terminator char '\n'.

This brings back my initial question: which characters can be used in
a header value?

If \n is allowed, it must be encoded/escaped somehow.

If \n is not allowed, we do not have the problem.

Cheers,

Lionel

Hiram Chirino

unread,

Jun 15, 2010, 12:16:09 PM6/15/10

to stomp...@googlegroups.com

I hope something like this makes it crystal clear:

CHAR = <any US-ASCII character (octets 0 - 127)>

LF = <US-ASCII LF, linefeed (octect 10)>

header = header-name ":" header-value
header-name = 1*<any CHAR except LF or ":">
header-value = 1*<any CHAR except LF>

--

James Casey

unread,

Jun 15, 2010, 1:11:31 PM6/15/10

to stomp-spec

That tells me the character set used, but not the character encoding
to be used for characters outside of the US-ASCII range.

For the purposes of interoperability with other transport protocols
used in the same broker I think we need to specify a character
encoding which should be used if a broker has header values which
contain characters that are not US-ASCII. In particular since JMS
allows UTF-16 in any header (i.e. Java String) things will break
badly for a consumer in STOMP reading data sent by a JMS producer.
This works right now because actually most implementations do support
UTF-8/UTF-16 out of the box for header values.

cheers,

James.

Hiram Chirino

unread,

Jun 15, 2010, 2:55:09 PM6/15/10

to stomp...@googlegroups.com

I think the encoding can vary depending on the header field. Not sure
if UTF-8 is good enough as a general solution as it does not escape
the control chars.

Lionel Cons

unread,

Jun 16, 2010, 5:13:26 AM6/16/10

to stomp...@googlegroups.com

Hiram Chirino <hi...@hiramchirino.com> writes:
> I think the encoding can vary depending on the header field. Not sure
> if UTF-8 is good enough as a general solution as it does not escape
> the control chars.

IMHO, we have to be pragmatic here.

The big difference between header and body is that header is used and
seen by the messaging infrastructure. A good example is selectors.

In the header, users usually want to put text/string (I leave aside
the question of typed values since it has a separate thread). This
probably covers 99% of the use cases and we should have good support
for this. Obviously, ASCII is too restrictive. We could specify that
headers can only contain Unicode text (= sequence of Unicode
characters) and that we always use UTF-8 on the wire.

If some users want binary data in the header, they can simply Base64
encode it. I doubt that they want the messaging infrastructure to
decode and use this binary.

Cheers,

Lionel

Dejan Bosanac

unread,

Jun 16, 2010, 5:36:32 AM6/16/10

to stomp...@googlegroups.com

Speaking of selectors, if we want to support them on the spec level, strings in headers are not enough. We'd need to support some semantics there, so that expressions like

color = 'blue' AND weight > 2500

could be parsed.

Cheers
--
Dejan Bosanac - http://twitter.com/dejanb

Open Source Integration - http://fusesource.com/
ActiveMQ in Action - http://www.manning.com/snyder/
Blog - http://www.nighttale.net

Hiram Chirino

unread,

Jun 16, 2010, 10:08:12 AM6/16/10

to stomp...@googlegroups.com

Yep.. also I think it should be noted in the spec that server
implementations may choose to place size restrictions on the headers.

--

Hiram Chirino

unread,

Jun 16, 2010, 10:08:50 AM6/16/10

to stomp...@googlegroups.com

Yeah that's another ball of wax. Which I think should be bundled
together with a JMS mapping section in the doc.

--

Lionel Cons

unread,

Jun 16, 2010, 10:19:45 AM6/16/10

to stomp...@googlegroups.com

Hiram Chirino <hi...@hiramchirino.com> writes:
> Yep.. also I think it should be noted in the spec that server
> implementations may choose to place size restrictions on the headers.

Indeed.

In fact, both parties could have several size restrictions:
- maximum headers size (bytes in the headers part of the frame)
- maximum number of header keys
- maximum length of any given header value
- maximum body size
- ...

Is it worth opening a new thread to discuss these?

BTW, a server can tell the client "sorry, your message is too big" via
an ERROR frame. The client cannot send such an error to the server...