I have written one using highlevel TCL code, but speed is an issue.
The libraries currently on the OSC site are mostly unsuitable for me,
because they do not separate the UDP transaction from the packing and
unpacking, and then use callbacks (whereas I want to use the TVL event
loop). In addition, because of the way the libraries are structured I
found in extrememly difficult to wrap them with SWIG; basically my C
and SWIG skills/understanding is not good enough.
Dave
Looking at the equivalent wrapper for Python, SimpleOSC, it looks like
OSC is just a specific UDP payload. So, for a simple integration into
the Tcl event loop, I'd suggest to use TclUDP: sending is just [puts -
nonewline], receiving is fileevents and [read].
Not yet looked at the actual payload format, but my guess is that
[binary format/scan] could readily do the same as the Python version.
Please say whether you need help for that.
-Alex
Sorry, maybee I need to explain further. SimpleOSC (according to
http://pypi.python.org/pypi/SimpleOSC/0.2.3 is a 'simple layer on top
of the already existant OSC implementation to make life easier to
those who don't have understanding of sockets' which is not what I
need.
I have already written the TCL library, which as you pointed out,
consists of 2 pieces, namely the actual UDP transmission , and the
packing and unpacking of various types of data into an OSC conformant
binary packet.
Most external libraries (written in C or C++) do not (as far as I can
see) separate the two steps from an API point of view, ie they provide
a send API (which is actually a wrap-and-send interface) and a method
of registering a callback to receive already unwrapped data. This I
find especially
frustrating because in general, the return value to the callback is
something along the lines of a pointer to a struct containing (many)
pointers to individual values.
The library I have written is separated, ie there is a wrap function
taking a tcl list and returning a binary OSC conformant 'object' , and
an unwrap function that takes a OSC 'object' and returns a list. This
fits much better with the TCL event loop (in which I do use the UDP
package you mentioned).
There are only 2 problems from my point of view:
1) Speed
2) IEEE conformance for doubles (which I personally can live with for
the moment)
To solve the speed problem I was hoping that someone out there has
already implemented such a library, or wrapped one of them in SWIG so
that it was usable from TCL.
Dave
OK but "one of them" is a bit underspecified, don't you think ?
If you don't say what kind of abstraction you want on top, which one
of "these libraries" would you prefer to see wrapped ?
By the way, I like the idea of a generic packer-unpacker like you did.
Do you have more details about where the overheads are ? Is it just a
question of time granularity (many small packets in short sequence),
or is there a more concentrated bottleneck ? Concatenations ?
Conversions ?
-Alex
> By the way, I like the idea of a generic packer-unpacker like you did.
> Do you have more details about where the overheads are ? Is it just a
> question of time granularity (many small packets in short sequence),
> or is there a more concentrated bottleneck ? Concatenations ?
> Conversions ?
>
I think it is too many small conversions within 1 packet, and too
fiddly. For example, converting a 32-bit int (in my code anyway) uses
a binary format/binary scan as appropriate. If done in C with access
to the underlying structs, a 4 byte copy work do the job. Then there
things like finding the end of a received string. Because it is binary
string, you cannot just search for \0, it also has
to be on a 4 byte boundary etc. Of course, there could be bottlenecks
within my own code; I have been
away from TCL for many years; the last version I used in anger did not
even have binary scan and binary format....
Dave
> -Alex
The one that most closely match what I have done in terms of features,
is Liblo.
http://opensoundcontrol.org/implementation/liblo-lightweight-osc-api
It is also the one I tried to wrap using SWIG, and was defeated on. My
(Pre-ANSI) C skills are too rusty.
Hmmm I think you are making false assumptions regarding the internal
conversions in Tcl. Basically they are lazy, which means, not done
unless necessary. For example if you extract an integer with [binary
scan i], assuming a little endian machine, then the result is a
Tcl_Obj of integer type, essentially containing exactly the same 4-
byte word with a bit of surrounding glue, but no expensive integer/
string conversion is done until you explicitly ask for the string
value.
Also, you seem to imply you're calling one [binary] per scalar. Is
there a reason why you're not extracting the whole structure in a
single [binary] call (with a pattern of more than one scalar type
specifier) ?
> Then there
> things like finding the end of a received string. Because it is binary
> string, you cannot just search for \0, it also has
> to be on a 4 byte boundary etc. Of course, there could be bottlenecks
> within my own code; I have been
> away from TCL for many years; the last version I used in anger did not
> even have binary scan and binary format....
Could you post your code just in case ?
-Alex
I will look at my code again. I know that if one unpacks a sequence
like string int32 string int32, that you cannot predict where the
first int32 will start until you have found the length of the string,
and worked out what the null padding will be at the end of the first
string. I could not see a way round this, but maybe.....
>
> > Then there
> > things like finding the end of a received string. Because it is binary
> > string, you cannot just search for \0, it also has
> > to be on a 4 byte boundary etc. Of course, there could be bottlenecks
> > within my own code; I have been
> > away from TCL for many years; the last version I used in anger did not
> > even have binary scan and binary format....
>
> Could you post your code just in case ?
>
> -Alex
Will put the code on a website, and then post the link to it. I have
just done a major test, by replacing the OSC/UDP code with code doing
the same job in TCP. This was beneficial, in so far as I found an old
debug statement in the code, with an 'after 10' in it.
Original code: 115 seconds
fixed code: 100 seconds
TCP code: 85 seconds
1,761 transactions
Dave
Gosh. Who's still designing such things in 2008 ?
Seriously, can you give a quick link to the precise documentation of
this part ?
-Alex
Original spec dates to 2002...
http://opensoundcontrol.org/spec-1_0-examples#OSCstrings
I was confused when I spoke in one of my first postings about having
to be careful about \0 inside an OSC string; OSC strings cannot
contain \0. I have not re-read my code yet, but if I remember
correctly, I took the road of converting binary to hex before scanning
the string (hopefully I had a good reason at the time); probably did
it because is easier to read/think in hex while designing/debugging.
Dave
Dave
The code is now up on http://www.wonaze.net/osc-tcl/
I had a read through my code again, and I can see that a re-write that
sticks to binary (rather than working in hex) would probably speed it
up.
Dave
> I had a read through my code again, and I can see that a re-write that
> sticks to binary (rather than working in hex) would probably speed it
> up.
>
> Dave
What about enhancing binary scan with a "zero terminated string" format/object?
binary scan $buffer IzI..
uwe
Hi Uwe,
Such an enhancement would only be useful in an OSC context, if it
could follow the OSC rule that there will be between 1 and 4 bytes of
\0s at the end of the string to pad it to a 32bit boundary.
Unfortunately, one of the other OSC objects, the blob, will still
require additional passes, because it is specified as [32bit len][X
bytes of data][0-3 bytes of padding to bring it to a boundary]. The
two most obvious implementations of a 'z' specifier (consume one null
byte / consume all null bytes (might chomp a byte out of the following
32bit int)) will not work with the OSC spec.
Dave
While I admit that OSC payload format is not really [binary]-friendly,
there may be an acceptable middle ground, taking advantage of the
32bit-aligned scheme:
- first pass, scan the whole packet as i*, returning a list of
integers.
- indexing pass, scan this list one variable-length thing at a time,
looking for null-padding with
if {!($value & 0xFF)} {
# the string ends in this word
# a few more tests with 0xFF00, 0xFF0000 and 0xFF000000
# to get the precise length
and also keeping track of blob-lengths. While this scanning takes
place, you're building up a new formatString made of i (integers), f
(floats), aNUMBER(strings), xNUMBER(padding).
- final pass, one single [binary] with the built formatString (and
var list) swallows it all in one gulp.
Since I have no real OSC cases to play with, I have not yet written
the code, but if you send me a large sample of payload, we could go
into a nice timing race. Just tell me.
-Alex
Hi Alex,
That sounds like a good approach. I would suggest that you use the
code I put in the link, on the sender side. That way you can control
the payload and test the individual cases, ie you can keep using the
old library on the one side, and start writing the new unpack routines
on the other side.
I will sit down build a bulk-sender, in such a way that lots of UDP
packets are pre-calculated, and then fired off as quickly as possible.
I will not be able to do anything for the next 24 hours; we have a
company meeting to which I have to travel to.
There is a OSC spec issue that you need to be aware of. In one place,
the writer got himself confused, and said a string is terminated with
0-3 nulls, and in another place, the example he shows, uses 1 to 4
nulls. The 1 to 4 nulls is the correct spec for strings, and 0-3
correct for blobs. He forgot that the string has to have a termination
null (whereas the blob has a length (32bit int) to help it).
Dave
OK. To optimize both our schedules, I would appreciate a raw binary
file to decode, instead of having to set up whatever contraption makes
sense in that strange OSC world. In case you wondered, I'm more
interested in helping you exploit 100% of Tcl's abilities and playing
with the technical challenge, than in learning much about OSC ;-)
> There is a OSC spec issue that you need to be aware of. In one place,
> the writer got himself confused, and said a string is terminated with
> 0-3 nulls, and in another place, the example he shows, uses 1 to 4
> nulls. The 1 to 4 nulls is the correct spec for strings, and 0-3
> correct for blobs. He forgot that the string has to have a termination
> null (whereas the blob has a length (32bit int) to help it).
OK, I had somewhat groked that, thanks.
-Alex
A binary file containing one UDP packet or many? If many, what you you
like as the separator between the packets ?
Dave
Many, to be realistic.
For this I usually do the following simple text/binary alternance:
fconfigure stdout -translation binary
foreach packet $l {
puts [string length $packet]
puts -nonewline $packet
}
which is simply read back with
fconfigure stdin -translation binary
while {1} {
if {[gets stdin len]<0} break
set packet [read stdin $len]
...
}
-Alex
Initially, I have put sampledata.bin on the website, as well as a new
test program, tcl4c.tcl, which produce when decoded (via tcl4a.tcl):
path=/alpha/beta/gamma , types={iiissiiTsFNIssfisbssmssts} ,
values=[32767 32768 32769 omega gamma 32768 0 alpha beta alpha
12345.6787109 1 a 001133557799bbddff00 a alpha 195051776 beta alpha
-3759595228763401487 beta]
The only non-standard feature is that the blob is displayed in hex
(001133557799bbddff00)
I really have to prepare my report for tomorrow....
Dave
OK two things:
(1) The spec is not buggy, it says "followed by a null, followed by
0-3 additional null", which in my book is really 1-4 ;-)
(2) The code below implements an OSC decoder which avoids any
pointwise [binary]. It is slightly different from my initial
suggestion in that it does not build a typestring for [binary];
instead it extracts both the I* and R* lists (integer and float) once
for all, and then peeks into them when needed.
Please try it out and tell me if it's faster than yours.
You'll notice a few discrepancies from the spec, due to my laziness
tonight:
- 64-bit ints are rendered as pairs of integers
- non-ifbs types are treated as aliases to ifbs (except for the
constants TFNI)
-Alex
----------------------------------------------------------------------------------------
proc osc_decode_bin buf {
binary scan $buf I* l
binary scan $buf R* m
osc_decode $l $m $buf
}
proc osc_blob {vpos l b} {
upvar $vpos pos
set len [lindex $l $pos]
set out [string range $b [expr {4*($pos+1)}] [expr {4*($pos+1)+
$len-1}]]
set pos [expr {$pos+1+(($len+3)/4)}]
return $out
}
proc osc_string {vpos l b} {
upvar $vpos pos
set n 0
while {1} {
set x [lindex $l [expr {$pos+$n}]]
if {!($x&0xFF)} break
incr n
}
if {$x&0xFF00} {
set pad 1
} elseif {$x&0xFF0000} {
set pad 2
} elseif {$x&0xFF000000} {
set pad 3
} else {
set pad 4
}
set out [string range $b [expr {4*$pos}] [expr {4*($pos+$n+1)-1-
$pad}]]
set pos [expr {$pos+$n+1}]
return $out
}
proc osc_wide {vpos l} {
upvar $vpos pos
set out [lrange $l $pos [expr {$pos+1}]]
incr pos 2
return $out
}
proc hex b {
binary scan $b H* x
return $x
}
proc osc_decode {l m b} {
set pos 0
set len [llength $l]
set path [osc_string pos $l $b]
if {$path=="#bundle"} {
set t [osc_wide pos $l $b]
set out [list bundle $t]
while {$pos<$len} {
set n [expr {[lindex $l $pos]/4}]
set l2 [lrange $l [expr {$pos+1}] [expr {$pos+$n}]]
set m2 [lrange $m [expr {$pos+1}] [expr {$pos+$n}]]
set b2 [string range $b [expr {4*($pos+1)}] [expr {4*($pos+$n)}]]
lappend out [osc_decode $l2 $m2 $b2]
set pos [expr {$pos+$n+1}]
}
return $out
}
set types [osc_string pos $l $b]
if {![regexp {^,(.*)$} $types pipo types]} {
error "Typestring not starting with comma: $types"
}
set out [list message $path]
foreach t [split $types ""] {
switch -exact -- $t {
i - c - r - m {lappend out $t [lindex $l $pos];incr pos}
f {lappend out $t [lindex $m $pos];incr pos}
s - S {lappend out $t [osc_string pos $l $b]}
b {lappend out $t [hex [osc_blob pos $l $b]]}
h - t - d {lappend out $t [osc_wide pos $l]}
T {lappend out $t True}
F {lappend out $t False}
N {lappend out $t Nil}
I {lappend out $t Inifinitum}
default {error "Unsupported type tag '$t'"}
}
}
return $out
}
The first speed tests indicate that the new code written by Alex is
about 10 times faster. I am now doing some cleanup work, so that the
'wide' items are decoded properly, and that the code will work with
8.4 (8.4 does not have binary scan R). The float values are suspect
anyway, since they are dependent on the TCL implementing IEEE 754
Dave
Hmm...Building on Alex's approach, decoding the 64bit items properly
is even faster by doing it according to the spec. I will post complete
code later on.
The trick is to add:
binary scan $buf W* w1
binary scan [string range $buf 4 end] W* w2
to osc_decode_bin, and then returning an item from w1 or w2 as needed.
Dave
Yes, I had thought of that too, but in my heart it was slightly less
elegant than the following:
proc pair2wide {x y} {expr {($x<<32)|($y&0xFFFFFFFF)}}
(actually I'm not sure whether internally it builds a Wide or a Big,
but for your calling program that shouldn't be a concern)
Have not done much timing analysis though. Maybe you can give it a
try ?
-Alex
-Alex
On May 14, 10:42 pm, "dave.joub...@googlemail.com"
<dave.joub...@googlemail.com> wrote:
> Hmm...Building on Alex's approach, decoding the 64bit items properly
> is even faster by doing it according to the spec. I will post complete
> code later on.
> The trick is to add:
> binary scan $buf W* w1
> binary scan [string range $buf 4 end] W* w2
> to osc_decode_bin, and then returning an item from w1 or w2 as needed.
Yes, I had thought of that too, but in my heart it was slightly less
I may have to backtrack, and try something like pair2wide, or binary
scan [string range $b [expr {4*$pos}] blah blah blah. This is because
the way you decode bundles, will effectively require swapping around
the odd and the even wides when the bundle is not on a 64bit boundary.
If there is no time penalty for handling the wides by not pre-
processing them, then I will post-process them rather than pre-
process; mainly because it would make the bundle code a bit easier for
someone else to read.
Dave
pair2wide is slower, because if you do not promote the first argument
to a wide, you get the wrong answer. I have to re-run the test between
pre and post.
Dave
This does seem to be the best compromise...
proc newOsc_wide {vpos b} {
upvar $vpos pos
set n [expr {$pos << 2}]
binary scan [string range $b $n [expr {$n+7}]] W out
incr pos 2
return $out
}
It also leads to a good solution for the 64bit floats.
Dave
> Can you elaborate ? Give an example ?
Sure (If I understand your concerns. (Sorry been out of touch and have
not done any more work on this.))
Trying to return proper values instead a pair of ints.
This is where you made the suggestion about proc pair2wide {x y} {expr
{($x<<32)|($y&0xFFFFFFFF)}}
The issue with this is that the first quantity has to be expanded to a
64bit quantity before the shift, and that the technique is not
applicable (as far as I can see) to floats. The current code now
looks like this (ie works directly with the buffer rather than the
ints):
.
.
h - t {lappend out $t [newOsc_wide pos $b]}
d {lappend out $t [newOsc_wideFloat pos $b]}
.
.
proc newOsc_wide {vpos b} {
upvar $vpos
pos
set n [expr {$pos << 2}]
binary scan [string range $b $n [expr {$n+7}]] W out
incr pos 2
return $out
}
proc newOsc_wideFloat {vpos b} {
upvar $vpos pos
set n [expr {$pos << 2}]
binary scan [string range $b $n [expr {$n+7}]] d out
incr pos 2
return $out
}
However, my tests are not done under extreme benchmarking conditions,
and someone else may get a different result, or have a better
suggestion (although the code is already 10 times faster than my first
attempt). Is [string range ....] more than twice a slow as
[lindex ....] ? Is there a technique for building a 64bit float out of
2 32bit ints or floats that is faster than the binary scan + string
range combination ?
Background:
My overall concern: How to handle 64bit quantities correctly rather
than returning 2 32bit quantities the way your original osc_wide
worked.
My first approach was to do a 64bit prescan the way you were handling
the 32bit prescan. This has two main issues related to code complexity
and therefore to readability and speed.
1) Pre-scanning on 64 bit quantities means you need to scan twice for
doubles and twice for floats
binary scan $buf W* w1
binary scan $buf d* d1
binary scan [string range $buf 4 end] W* w2
binary scan [string range $buf 4 end] d* d2
so you end up with 6 scans instead of two
2) Pre-scanning for 64 bit quantities means you need to compensate
when you handle bundles, because the bundle boundary (32bit) has 50/50
chance of not matching the 64bit boundary. One needs to test and then
call the recursive routine with w1 and w2 substrings switched and the
d1 and d2 substring switched (50% of the time).
My speed tests indicate that no real benefit to the 64bit scan, mainly
because there are not going to be very many 64bit quantities. This
approach was abandoned, as per my previous posting.
Dave
Sorry, I have been unclear. Of course pair2wide is not meant for
floats, it is just for ints !
Now I'll rephrase my question: do you have an example of a pair of
ints for which pair2wide fails ?
-Alex
OK, further testing shows binary scan+string range is about 25% slower
than lindex+bit-twiddling....
proc wide1 {vpos l} {
upvar $vpos pos
set a [lindex $l $pos]
set b [lindex $l [expr {$pos+1}]]
return [expr {(wide($a)<<32)|($b&0xFFFFFFFF)}]
}
proc wide2 {vpos b} {
upvar $vpos pos
set n [expr {$pos<<2}]
binary scan [string range $b $n [expr {$n+7}]] W out
return $out
}
set pos 0
set testLim 50000
set a 65536
set b $a
set trylist [list $a $b]
set buff [binary format II $a $b]
set t0 [clock clicks]
for {set c 0} {$c < $testLim} {incr c} {
set x [wide1 pos $trylist]
}
set t1 [clock clicks]
for {set c 0} {$c < $testLim} {incr c} {
set x [wide2 pos $buff]
}
set t2 [clock clicks]
puts stdout "in list: [expr ($t1 - $t0)/1000.0]"
puts stdout "bin scan: [expr ($t2 - $t1)/1000.0]"
puts stdout "ratio: [expr 1.0*($t2 - $t1)/($t1 - $t0)]"
set a 65536
set b $a
puts stdout [expr {($a<<32)|($b&0xFFFFFFFF)}]
65536
puts stdout [expr {(wide($a)<<32)|($b&0xFFFFFFFF)}]
281474976776192
Dave
OK, I understand, this was a pre-8.5 constraint, that's why I couldn't
reproduce :-)
Indeed in 8.5 there is a revolution: integers no longer overflow and
are promoted automatically !
% info patchlevel
8.5.0
% puts stdout [expr {($a<<32)|($b&0xFFFFFFFF)}]
281474976776192
So I think your solution of adding wide() in the expression is
excellent, since it works in all versions !
Now, what are tour timing results regarding this ? Isn't the pair2wide
solution (with your wide() improvement) faster than the various
[binary] tricks ?
-Alex
> proc wide1 {vpos l} {
> upvar $vpos pos
> set a [lindex $l $pos]
> set b [lindex $l [expr {$pos+1}]]
> return [expr {(wide($a)<<32)|($b&0xFFFFFFFF)}]}
Win a few more cycles, remove the local vars, [expr] understands [].
Also, return [expr] -> expr (I know the byte compiler wipes out the
difference nowadays, but since you're also playing with old
versions...)
proc wide1 {vpos l} {
expr {(wide([lindex $l $pos])<<32)|([lindex $l [expr {$pos
+1}]]&0xFFFFFFFF)}
}
-Alex
Hmmm. Did that provoke a big debate ?
>
> Now, what are tour timing results regarding this ? Isn't the pair2wide
> solution (with your wide() improvement) faster than the various
> [binary] tricks ?
>
> -Alex
Yes, it is, according to a specific test about 2 postings up. Probably
the fastest of all (in theory anyway) is to do the additional 4 64bit
scans as well. I might explore that again.
Dave
It is not (according to my primitive benchmark) worth doing a 64 pre-
scan for integers, one only gets about 2 or 3 % speed increase in the
decode routine, and you pay some for the additional 2 prescans.
It is maybe worth doing a 64bit pre-scan for 64 bit floats. The pre-
scan method is about 30% faster. But if you have zero 64 bit floats in
the message, then you pay up front anyway for the additional 2 pre-
scans and the additional 'if' and expr % and expr / in the bundle
recursion routine.
I might re-factor the bundle code anyway, since it is 'inside-out'
compared to the way I use the library at the moment.
Dave
-Alex
Pure Tcl approach is fast enough, and I would not put it aside for a C
library. I now just have to refactor the code so that it fits better
with the app I have written using it. My code expects something like:
proc handleMsg {oscMsg decodeCallback} {
set iTimeStamp [::osctime::nowToOscTime]
if message {
set splitStruct [::oscconv::unpackOSCpath $oscMsg]
if callback {
$decodeCallback $iTimeStamp $splitStruct
} else {
just print
}
return
}
if bundle {
for all bundles:
get bundletime
compare bundletime $iTimeStamp
if OK to execute bundle
handleMsg $oscMsg $decodeCallback
}
}
return
}
IE, the toplevel never expects to see bundles, only messages, and I
currently expect the types separately rather than interleaved with the
values. Also, currently the code loop that handles bundles sits in the
main namespace rather than the OSC library namespace
As far as the 64bit floats are concerned, I will do them the simple
way for now.
So, lots of fiddly bits, and possible changes to my app. I will update
this thread when I am done.
Dave
The latest code, including the new unpack routines that Alex made a
major contribution to, is now available at http://www.wonaze.net/osc-tcl/.
It now runs nicely in 'show me the packet' mode and callback mode. I
will be making an announcement, so that Tclers can now start talking
to other apps as found on http://en.wikipedia.org/wiki/OpenSound_Control
Dave
The latest code, including the unpack routines that Alex made such a
big contribution to, is now available at http://www.wonaze.net/osc-tcl/
I will be making an announce ment soon, so that Tclers can start
talking apps as found at http://en.wikipedia.org/wiki/OpenSound_Control
Dave