Re: Interplatform (interprocess, interlanguage) communication

Message has been deleted

Robert Klemme

unread,

Feb 3, 2012, 4:44:54 PM2/3/12

to

On 02/03/2012 08:52 PM, Stefan Ram wrote:
> »X« below is another language than Java, for example,
> VBA, C#, or C.
>
> When an X process and a Java process have to exchange
> information on the same computer, what possibilites are
> there? The Java process should act as a client, sending
> commands to the X process and also wants to read answers
> from the X process. So, the X process is a kind of server.
>
> My criteria are: reliability and it should not be extremely
> slow (say exchanging a string should not take more than
> about 10 ms). The main criterion is reliability.
>
> »Reliability« means little risk of creating problems, little
> risk of failure at run-time. (It might help when the client
> [=Java process] can reset the communication to a known and
> sane start state in case of problems detected at run-time.)
>
> The host OS is Windows, but a portable solution won't hurt.
>
> A list of possibilities I am aware of now:
>
> Pipes
>
> I have no experience with this. I heard one can establish
> a new process »proc« with »exec« and then use
>
> BufferedWriter out = new BufferedWriter(new
> OutputStreamWriter(proc.getOutputStream()));
> BufferedReader in = new BufferedReader(new
> InputStreamReader(proc.getInputStream()));

A pipes is just 1:1 communication and only in 1 direction.

> Files
>
> One process writes to the end of a file, the other reads
> from the end of the file? - I never tried this, don't know
> if it is guaranteed to work that one process can detect and
> read, whether the other has just appended something to a file.

You can, but what do you do with the ever increasing file? This is not
reliable since the filesystem will fill up at some point.

> What if the processes run very long and the files get too
> large? But OTOH this is very transparent, which makes it easy
> to debug, since one can open the files and directly inspect
> them, or even append commands manually with »copy con file«.
>
> Sockets
>
> This is slightly less transparent than files, but has the
> advantage that it becomes very easy to have the two
> processes running on different computers later, if this
> should ever be required. Debugging should be possible
> by a man-in-the-middle proxy that prints all information
> it sees or by connecting to the server with a terminal.

You can as well use a packet sniffer (Wireshark for example). If you
use a standard protocol you'll typically have encoding functionality in
the tool.

> JNI
>
> JNI might be used to access code written in C or
> ABI-compatible languages. This should be fast, but I heard
> that it is error prone to write JNI code and needs some
> learning (code less maintainable)?

That would be a clumsy approach IMHO.

I'd pick a higher level protocol such as

- SOAP (XML based, ubiquitous)
- CORBA (a little out of fashion but quite efficient in terms of network
transport)

Advantage: you can focus on definition of the API and need not take care
of all the nifty details. Choice should also depend on the availability
for language X, of course.

Kind regards

robert

Arne Vajhøj

unread,

Feb 3, 2012, 5:50:55 PM2/3/12

to

On 2/3/2012 2:52 PM, Stefan Ram wrote:
> »X« below is another language than Java, for example,
> VBA, C#, or C.
>
> When an X process and a Java process have to exchange
> information on the same computer, what possibilites are
> there? The Java process should act as a client, sending
> commands to the X process and also wants to read answers
> from the X process. So, the X process is a kind of server.
>
> My criteria are: reliability and it should not be extremely
> slow (say exchanging a string should not take more than
> about 10 ms). The main criterion is reliability.
>
> »Reliability« means little risk of creating problems, little
> risk of failure at run-time. (It might help when the client
> [=Java process] can reset the communication to a known and
> sane start state in case of problems detected at run-time.)
>
> The host OS is Windows, but a portable solution won't hurt.
>
> A list of possibilities I am aware of now:
>
> Pipes
>
> I have no experience with this. I heard one can establish
> a new process »proc« with »exec« and then use
>
> BufferedWriter out = new BufferedWriter(new
> OutputStreamWriter(proc.getOutputStream()));
> BufferedReader in = new BufferedReader(new
> InputStreamReader(proc.getInputStream()));

That would require the client to start the server.

Does not look as a good solution.

> Files
>
> One process writes to the end of a file, the other reads
> from the end of the file? - I never tried this, don't know
> if it is guaranteed to work that one process can detect and
> read, whether the other has just appended something to a file.
>

> What if the processes run very long and the files get too
> large? But OTOH this is very transparent, which makes it easy
> to debug, since one can open the files and directly inspect
> them, or even append commands manually with »copy con file«.

It should work, but it will be slow.

> Sockets
>
> This is slightly less transparent than files, but has the
> advantage that it becomes very easy to have the two
> processes running on different computers later, if this
> should ever be required. Debugging should be possible
> by a man-in-the-middle proxy that prints all information
> it sees or by connecting to the server with a terminal.

That would be my choice.

> JNI
>
> JNI might be used to access code written in C or
> ABI-compatible languages. This should be fast, but I heard
> that it is error prone to write JNI code and needs some
> learning (code less maintainable)?

JNI would mean single process.

It does fit with your problem description.

JNI is a bit tricky, but it is not more difficult than
many other things. But since Java programmers very rarely
use JNI, then most Java programmers never learn JNI properly
with the expected result. You could learn JNI if you need to.

Arne

Arne Vajhøj

unread,

Feb 3, 2012, 5:56:36 PM2/3/12

to

That type of pipe is bidirectional.

And Windows named pipes are bidirectional as well.

>> Files
>>
>> One process writes to the end of a file, the other reads
>> from the end of the file? - I never tried this, don't know
>> if it is guaranteed to work that one process can detect and
>> read, whether the other has just appended something to a file.
>
> You can, but what do you do with the ever increasing file? This is not
> reliable since the filesystem will fill up at some point.

It would be possible to switchover to a new file and
delete the old file if he really wanted to go this route.

> I'd pick a higher level protocol such as
>
> - SOAP (XML based, ubiquitous)
> - CORBA (a little out of fashion but quite efficient in terms of network
> transport)
>
> Advantage: you can focus on definition of the API and need not take care
> of all the nifty details. Choice should also depend on the availability
> for language X, of course.

They will use socket as transport.

But if the X language has a good SOAP toolkit, then it would
certainly make things a lot easier.

Arne

Arved Sandstrom

unread,

Feb 3, 2012, 6:13:05 PM2/3/12

to

On 12-02-03 03:52 PM, Stefan Ram wrote:
> »X« below is another language than Java, for example,
> VBA, C#, or C.
>
> When an X process and a Java process have to exchange
> information on the same computer, what possibilites are
> there? The Java process should act as a client, sending
> commands to the X process and also wants to read answers
> from the X process. So, the X process is a kind of server.
>
> My criteria are: reliability and it should not be extremely
> slow (say exchanging a string should not take more than
> about 10 ms). The main criterion is reliability.

[ SNIP ]

>
> Files
>
> One process writes to the end of a file, the other reads
> from the end of the file? - I never tried this, don't know
> if it is guaranteed to work that one process can detect and
> read, whether the other has just appended something to a file.
>

> What if the processes run very long and the files get too
> large? But OTOH this is very transparent, which makes it easy
> to debug, since one can open the files and directly inspect
> them, or even append commands manually with »copy con file«.

[ SNIP ]

A logical subset of files for IPC is database tables.

AHS
--
...wherever the people are well informed they can be trusted with their
own government...
-- Thomas Jefferson, 1789

Jeff Higgins

unread,

Feb 3, 2012, 6:46:09 PM2/3/12

to

On 02/03/2012 02:52 PM, Stefan Ram wrote:
> »X« below is another language than Java, for example,
> VBA, C#, or C.
>
> When an X process and a Java process have to exchange
> information on the same computer, what possibilites are
> there? The Java process should act as a client, sending
> commands to the X process and also wants to read answers
> from the X process. So, the X process is a kind of server.
>
> My criteria are: reliability and it should not be extremely
> slow (say exchanging a string should not take more than
> about 10 ms). The main criterion is reliability.
>

> »Reliability« means little risk of creating problems, little
> risk of failure at run-time. (It might help when the client
> [=Java process] can reset the communication to a known and
> sane start state in case of problems detected at run-time.)
>
> The host OS is Windows, but a portable solution won't hurt.
>

For Windows platform:
<http://msdn.microsoft.com/en-us/library/windows/desktop/aa365574%28v=vs.85%29.aspx>
Prune for Java/X support, prune again for your choice of protocol.

snip

Message has been deleted

Leif Roar Moldskred

unread,

Feb 4, 2012, 3:24:34 AM2/4/12

to

Stefan Ram <r...@zedat.fu-berlin.de> wrote:

> My criteria are: reliability and it should not be extremely
> slow (say exchanging a string should not take more than
> about 10 ms). The main criterion is reliability.
>
> »Reliability« means little risk of creating problems, little
> risk of failure at run-time. (It might help when the client
> [=Java process] can reset the communication to a known and
> sane start state in case of problems detected at run-time.)

Other options:

* Use a message broker such as CORBA or MQ. Perhaps the cleanest
solution code-wise, but requires more infrastructure and gives you a
more complicated installation.

* Communicate through a shared database. Fiddly, but can be convenient
if the programs share a database anyway.

* REST -- have an HTTP servlet running in one program and make RESTful
calls to it from the other. (Roughly the same as a SOAP approach,
but if the communication is mostly command and control messages, it
might be more convenient.)

* JNI wrapper around shared memory / memory mapped files. Very fiddly,
very system specific, but potentially very high performance.
(There's a discussion on using memory mapped files with java here:
http://tinyurl.com/6oa3wej )

* The presence of trigger / lock files in a directory. Quick and easy,
but limited.

* Use a shared web resource / whiteboard. Quick and convenient if your
programs do HTTP calls anyway, but requires the shared web resource
to be in place and there's performance and latency to take into
account.

--
Leif Roar Moldskred

Jan Burse

unread,

Feb 4, 2012, 7:57:34 AM2/4/12

to

I would add to the list:

Shared Memory

Stefan Ram schrieb:

> »X« below is another language than Java, for example,
> VBA, C#, or C.
>
> When an X process and a Java process have to exchange
> information on the same computer, what possibilites are
> there? The Java process should act as a client, sending
> commands to the X process and also wants to read answers
> from the X process. So, the X process is a kind of server.
>

> My criteria are: reliability and it should not be extremely
> slow (say exchanging a string should not take more than
> about 10 ms). The main criterion is reliability.
>
> »Reliability« means little risk of creating problems, little
> risk of failure at run-time. (It might help when the client
> [=Java process] can reset the communication to a known and
> sane start state in case of problems detected at run-time.)
>

> The host OS is Windows, but a portable solution won't hurt.
>

> A list of possibilities I am aware of now:
>
> Pipes
>
> I have no experience with this. I heard one can establish
> a new process »proc« with »exec« and then use
>
> BufferedWriter out = new BufferedWriter(new
> OutputStreamWriter(proc.getOutputStream()));
> BufferedReader in = new BufferedReader(new
> InputStreamReader(proc.getInputStream()));
>

> Files
>
> One process writes to the end of a file, the other reads
> from the end of the file? - I never tried this, don't know
> if it is guaranteed to work that one process can detect and
> read, whether the other has just appended something to a file.
>
> What if the processes run very long and the files get too
> large? But OTOH this is very transparent, which makes it easy
> to debug, since one can open the files and directly inspect
> them, or even append commands manually with »copy con file«.
>

> Sockets
>
> This is slightly less transparent than files, but has the
> advantage that it becomes very easy to have the two
> processes running on different computers later, if this
> should ever be required. Debugging should be possible
> by a man-in-the-middle proxy that prints all information
> it sees or by connecting to the server with a terminal.
>

Robert Klemme

unread,

Feb 4, 2012, 7:59:29 AM2/4/12

to

Well, that are actually two pipes aren't they? Or it's a socketpair,
depending on platform. Also, this approach only works if the Java
process always starts the other process. Alternatively the other
process would start the Java process this way and we can read from
System.in and write to System.out.

> And Windows named pipes are bidirectional as well.

Oh, I didn't knew that. Learn something new every day. Thanks!

>>> Files
>>>
>>> One process writes to the end of a file, the other reads
>>> from the end of the file? - I never tried this, don't know
>>> if it is guaranteed to work that one process can detect and
>>> read, whether the other has just appended something to a file.
>>
>> You can, but what do you do with the ever increasing file? This is not
>> reliable since the filesystem will fill up at some point.
>
> It would be possible to switchover to a new file and
> delete the old file if he really wanted to go this route.

Well, yes, but that soon gets nasty because of file locking etc.

>> I'd pick a higher level protocol such as
>>
>> - SOAP (XML based, ubiquitous)
>> - CORBA (a little out of fashion but quite efficient in terms of network
>> transport)
>>
>> Advantage: you can focus on definition of the API and need not take care
>> of all the nifty details. Choice should also depend on the availability
>> for language X, of course.
>
> They will use socket as transport.
>
> But if the X language has a good SOAP toolkit, then it would
> certainly make things a lot easier.

Exactly.

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Jeffrey H. Coffield

unread,

Feb 4, 2012, 11:50:01 AM2/4/12

to

On 02/03/2012 11:52 AM, Stefan Ram wrote:

> Sockets
>
> This is slightly less transparent than files, but has the
> advantage that it becomes very easy to have the two
> processes running on different computers later, if this
> should ever be required. Debugging should be possible
> by a man-in-the-middle proxy that prints all information
> it sees or by connecting to the server with a terminal.

SOAP has been mentioned, but I would also look at REST. An http post
with an XML response although less powerful, has a wider range of
support. Using port 80/443 to get to a server also greatly simplifies
firewall issues when the systems are remote.

>
> JNI
>
> JNI might be used to access code written in C or
> ABI-compatible languages. This should be fast, but I heard
> that it is error prone to write JNI code and needs some
> learning (code less maintainable)?
>

The biggest drawback to JNI (I feel) is that it opens up all the
disadvantages of C in a Java environment. It is difficult (for me) at
times to determine exactly where an error actually is as I use C only
when forced to.

Jeff Coffield
www.digitalsynergyinc.com

markspace

unread,

Feb 4, 2012, 1:55:39 PM2/4/12

to

On 2/4/2012 4:57 AM, Jan Burse wrote:
> I would add to the list:
>
> Shared Memory
>

What Java API do you use for that?

Jan Burse

unread,

Feb 4, 2012, 2:24:36 PM2/4/12

to

markspace schrieb:

One solution would be to port MemoryFiles from
Android to Java SE. The API of MemoryFiles is
seen here:

http://developer.android.com/reference/android/os/MemoryFile.html

You can also find the source code of the classes.
But suggesting the above has more to do with my
obsession for memory files (just joking).

But the following stack overflow entry lists 5 (five)
alternative ways do deal with shared memory in Java:

http://stackoverflow.com/questions/1491519/any-concept-of-shared-memory-in-java

Bye

Jan Burse

unread,

Feb 4, 2012, 2:29:20 PM2/4/12

to

Jan Burse schrieb:

> But the following stack overflow entry lists 5 (five)
> alternative ways do deal with shared memory in Java:
>
> http://stackoverflow.com/questions/1491519/any-concept-of-shared-memory-in-java
>

Oops, 4 solutions and 1 finger-wagging.

Roedy Green

unread,

Feb 4, 2012, 3:50:15 PM2/4/12

to

On 3 Feb 2012 19:52:08 GMT, r...@zedat.fu-berlin.de (Stefan Ram) wrote,
quoted or indirectly quoted someone who said :

> When an X process and a Java process have to exchange
> information on the same computer, what possibilites are
> there?

TCP/IP socket.

both talk to SQL database, presume data cached.

both read and write same file on SSD

JNI
--
Roedy Green Canadian Mind Products
http://mindprod.com
One of the most useful comments you can put in a program is
"If you change this, remember to change ?XXX? too".

Roedy Green

unread,

Feb 4, 2012, 4:22:02 PM2/4/12

to

On Sat, 04 Feb 2012 12:50:15 -0800, Roedy Green
<see_w...@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>both read and write same file on SSD

Let's say you used a simple RandomAccessFile. How could you implement
a busy lock field in the file to indicate the file was busy being
updated? or busy being read? In RAM you have test and set locks to
check a value and set the value in one atomic operation. How could
you simulate that without test and set hardware on the SSD? You can't
very well share a RAM lock between separate jobs.

Jan Burse

unread,

Feb 4, 2012, 4:33:23 PM2/4/12

to

Roedy Green schrieb:

> Let's say you used a simple RandomAccessFile. How could you implement
> a busy lock field in the file to indicate the file was busy being
> updated? or busy being read? In RAM you have test and set locks to
> check a value and set the value in one atomic operation. How could
> you simulate that without test and set hardware on the SSD? You can't
> very well share a RAM lock between separate jobs.

What do you want, a write lock or a read lock?
Here is a write lock:

Obtain the lock:
raf = new RandomAccessFile(file, "rw");

fo = new FileOutputStream(raf.getFD());
fo.getChannel().lock(0, Long.MAX_VALUE, false);

Release the lock:
fo.close();

raf.close();

Maybe it can be done even simpler, but the above
works for me over process / jvm boundaries. Can
be also used to synchronize jvm with non-jvm code.

Similar code I use to obtain a read lock, via an
FileInputStream and the lock() methods third
argument =true. Currently seems also to work on
Android, but did not yet thoroughly test...

Bye

(*)
http://docs.oracle.com/javase/1.4.2/docs/api/java/nio/channels/FileChannel.html#lock%28long,%20long,%20boolean%29

Arne Vajhøj

unread,

Feb 4, 2012, 6:29:11 PM2/4/12

to

On 2/4/2012 11:50 AM, Jeffrey H. Coffield wrote:
> On 02/03/2012 11:52 AM, Stefan Ram wrote:
>> Sockets
>>
>> This is slightly less transparent than files, but has the
>> advantage that it becomes very easy to have the two
>> processes running on different computers later, if this
>> should ever be required. Debugging should be possible
>> by a man-in-the-middle proxy that prints all information
>> it sees or by connecting to the server with a terminal.
>
> SOAP has been mentioned, but I would also look at REST. An http post
> with an XML response although less powerful, has a wider range of
> support. Using port 80/443 to get to a server also greatly simplifies
> firewall issues when the systems are remote.

That is given by the HTTP transport more than the RPC style SOAP
vs RESTful POX.

Arne

Arne Vajhøj

unread,

Feb 4, 2012, 6:35:27 PM2/4/12

to

The Java Process supports in and out.

Whether the OS does it via single bidirectional or two unidirectional
does not change the Java code.

Thinking of it then two sounds more likely as Java also need to
separate err and out - that would be a lot easier with two.

> Also, this approach only works if the Java
> process always starts the other process.

Yep.

>>>> One process writes to the end of a file, the other reads
>>>> from the end of the file? - I never tried this, don't know
>>>> if it is guaranteed to work that one process can detect and
>>>> read, whether the other has just appended something to a file.
>>>
>>> You can, but what do you do with the ever increasing file? This is not
>>> reliable since the filesystem will fill up at some point.
>>
>> It would be possible to switchover to a new file and
>> delete the old file if he really wanted to go this route.
>
> Well, yes, but that soon gets nasty because of file locking etc.

Some coding required.

Arne

Arne Vajhøj

unread,

Feb 4, 2012, 6:37:00 PM2/4/12

to

java.nio.MappedByteBuffer or some JNI I would guess.

Arne

Arne Vajhøj

unread,

Feb 4, 2012, 6:39:40 PM2/4/12

to

On 2/4/2012 3:50 PM, Roedy Green wrote:
> On 3 Feb 2012 19:52:08 GMT, r...@zedat.fu-berlin.de (Stefan Ram) wrote,
> quoted or indirectly quoted someone who said :
>
>> When an X process and a Java process have to exchange
>> information on the same computer, what possibilites are
>> there?
>
> TCP/IP socket.
>
> both talk to SQL database, presume data cached.
>
> both read and write same file on SSD
>
> JNI

If you had bothered read the entire post, then you may have
been able to avoid repeating those already listed in the
post.

Arne

Arne Vajhøj

unread,

Feb 4, 2012, 6:42:40 PM2/4/12

to

On 2/4/2012 4:22 PM, Roedy Green wrote:
> On Sat, 04 Feb 2012 12:50:15 -0800, Roedy Green
> <see_w...@mindprod.com.invalid> wrote, quoted or indirectly quoted
> someone who said :
>> both read and write same file on SSD
>
> Let's say you used a simple RandomAccessFile. How could you implement
> a busy lock field in the file to indicate the file was busy being
> updated? or busy being read? In RAM you have test and set locks to
> check a value and set the value in one atomic operation. How could
> you simulate that without test and set hardware on the SSD?

java.nio.channels.FileLock with the caveats about what the OS
supports.

> You can't
> very well share a RAM lock between separate jobs.

You can in most OS. It is just not well supported in Java.

Arne

BGB

unread,

Feb 6, 2012, 3:48:40 AM2/6/12

to

On 2/3/2012 12:52 PM, Stefan Ram wrote:
> »X« below is another language than Java, for example,
> VBA, C#, or C.
>

I am mostly a C developer, so I am writing more from my perspective...

> When an X process and a Java process have to exchange
> information on the same computer, what possibilites are

> there? The Java process should act as a client, sending
> commands to the X process and also wants to read answers
> from the X process. So, the X process is a kind of server.
>
> My criteria are: reliability and it should not be extremely
> slow (say exchanging a string should not take more than
> about 10 ms). The main criterion is reliability.
>
> »Reliability« means little risk of creating problems, little
> risk of failure at run-time. (It might help when the client
> [=Java process] can reset the communication to a known and
> sane start state in case of problems detected at run-time.)
>
> The host OS is Windows, but a portable solution won't hurt.
>
> A list of possibilities I am aware of now:
>
> Pipes
>
> I have no experience with this. I heard one can establish
> a new process »proc« with »exec« and then use
>
> BufferedWriter out = new BufferedWriter(new
> OutputStreamWriter(proc.getOutputStream()));
> BufferedReader in = new BufferedReader(new
> InputStreamReader(proc.getInputStream()));
>

no real comment, as I don't have much experience using pipes on Windows.

> Files

>
> One process writes to the end of a file, the other reads
> from the end of the file? - I never tried this, don't know
> if it is guaranteed to work that one process can detect and
> read, whether the other has just appended something to a file.
>

> What if the processes run very long and the files get too
> large? But OTOH this is very transparent, which makes it easy
> to debug, since one can open the files and directly inspect
> them, or even append commands manually with »copy con file«.
>

IME, I have often seen synchronization issues in these cases. sometimes
the OS will refuse to let multiple programs access the same file at the
same time, but sometimes it does work (I think depending on how the file
is opened and which flags are given and similar).

if just naively using "fopen()" or similar (in C), IME/IIRC, the OS will
typically only allow a single version of the file to be open at once
(not necessarily as limiting as it may seem).

in scenarios where it has worked (multiple versions can be opened), it
often seems like the OS is "lazy": one process will see an out-of-date
version of the file data (the data will often be out-of-date until the
writer closes the file or similar).

I never really felt all that inclined to look into the how/why/when
aspects of all this.

a partial exception is when using shared-memory, which tends to stay
up-to-date.

these issues don't seem to really pop up so much if one passes data in
an "open file, write, close file" or "open file, read, close file"
strategy (then the file is always seen up-to-date, and typically the
chance of clash remains fairly small).

this strategy is arguably not very efficient, but it is fairly simple
and tends to work "well enough" for many use cases (particularly passing
"globs of data once in a great while", or when operating at
"user-interaction" time-frames, such as the file is reloaded, say,
because the user just saved to it).

if done well, this can be used to implement things like a "magic
notepad", whereby data edited/saved in Notepad is automatically
reflected in the running app (say, by polling+"stat()", then processing
the file if it has changed).

conceptually, the latency should only really be limited by polling rate
(although granted polling isn't free, and a process bogging down the
system by polling a file in a tight loop isn't necessarily desirable
either).

another advantage of files is that they are more amendable to "makeshift
options" than some of the other strategies (one doesn't really need to
care what apps are thrown in the mix, so long as they can read/write the
files in question).

> Sockets
>
> This is slightly less transparent than files, but has the
> advantage that it becomes very easy to have the two
> processes running on different computers later, if this
> should ever be required. Debugging should be possible
> by a man-in-the-middle proxy that prints all information
> it sees or by connecting to the server with a terminal.
>

I have used sockets for IPC before fairly well.

a minor issue with TCP for IPC though is that sometimes the buffering
does something very annoying:
no matter how long one waits, TCP will not send the data until a certain
amount has been written to the socket (IIRC, one can disable buffering
or similar to prevent this, but unbuffered sockets can be evil on a
network if used naively, such as writing an individual byte or datum at
a time, rather than sending the entire message in a single write, since
an unbuffered socket may attempt to send a datagram for *every* write to
the socket).

TCP works fairly well for transmitting lots of small messages (and apart
from the potential buffering issue has very little latency).

UDP also has some merit, but the big annoying hassle of having to pack
ones' messages into UDP datagrams (however, UDP is much more resistant
against stalls, which can easily become an issue for TCP sockets if
going over the wider internet, however UDP is unreliable and unordered
which also needs to be taken into account).

> JNI
>
> JNI might be used to access code written in C or
> ABI-compatible languages. This should be fast, but I heard
> that it is error prone to write JNI code and needs some
> learning (code less maintainable)?
>

JNI can work, but is also annoying in some ways.

if one simply wants to call functions or pass data or messages to/from C
code, it works fairly well. JNI is, however, not readily capable of IPC
AFAIK. it also may result in some level of "physical coupling" between
code in the languages in question (may or may not be desirable, probably
depends on the task, often it is preferable IME to avoid coupling where
possible, even often within code within the same language).

it is also not necessarily all that much more convenient than options
such as sockets (likely depends a lot on the task though, for many, it
may just be easier to write a message parser/dispatcher for whatever
comes over the socket).

jebblue

unread,

Feb 7, 2012, 1:11:23 PM2/7/12

to

On Fri, 03 Feb 2012 19:52:08 +0000, Stefan Ram wrote:

> »X« below is another language than Java, for example,
> VBA, C#, or C.
>

> When an X process and a Java process have to exchange information on
> the same computer, what possibilites are there? The Java process
> should act as a client, sending commands to the X process and also
> wants to read answers from the X process. So, the X process is a kind
> of server.
>
> My criteria are: reliability and it should not be extremely slow (say
> exchanging a string should not take more than about 10 ms). The main
> criterion is reliability.
>

> Sockets
>
> This is slightly less transparent than files, but has the advantage
> that it becomes very easy to have the two processes running on
> different computers later, if this should ever be required. Debugging
> should be possible by a man-in-the-middle proxy that prints all
> information it sees or by connecting to the server with a terminal.
>

I recommend using sockets.

--
// This is my opinion.

BGB

unread,

Feb 7, 2012, 6:38:31 PM2/7/12

to

in general, I agree (sockets generally make the most sense), although
there are cases where file-based communications can make sense, although
probably not in the form as described in the OP.

another issue (besides how to pass messages), is what sort of form to
pass messages in.

usually, in my case, if storing data in files, I tend to prefer
ASCII-based formats.

usually, for passing messages over sockets, I have used "compact"
specialized binary formats, typically serialized data from some other
form (such as XML nodes or S-Expressions). although "magic byte value"
based message formats are initially simpler, they tend to be harder to
expand later (whereas encoding/decoding some more generic form, though
initially more effort, can turn out to be easier to maintain and extend
later).

note: this does not mean SOAP or CORBA or some other "standardized"
messaging system, rather just that one initially builds and processes
the messages in some form that is more high-level than spitting out
bytes, and processing everything via a loop and a big "switch()" or
similar (although this can be an initially fairly simple option, so has
some merit due to ease of implementation).

the main reason for picking a binary message-serialization format (for
something like S-Expressions or XML nodes), would be mostly if there is
a chance that the serialized data will go over the internet, and a
textual format can be a bit bulkier (and thus slower to transmit over a
slower connection), as well as typically being slower to decode (a
sanely designed message format can be much more quickly unpacked than a
textual format can be parsed).

sending text over sockets may have merits as well, and is generally
preferable for "open" protocols.

or such...

Arved Sandstrom

unread,

Feb 7, 2012, 7:26:23 PM2/7/12

to

On 12-02-07 07:38 PM, BGB wrote:
> On 2/7/2012 11:11 AM, jebblue wrote:

[ SNIP ]

I've done a fair bit with sockets myself, including recently, in fact
including on a current gig. Some of the message formats have been
designed by others, some by me. A few of them are specialized industry
standards, some are very custom and bespoke.

A few of the formats have been binary: fixed-length blocks of data with
fields at various offsets. Works well enough if it suits the data.

A bunch of others have been text and line-oriented: a fixed number of
lines of data in known order, so that line 10 is always the data for a
particular field.

Other things to consider: JAXB, JSON etc. Minimum coding fuss at the
endpoints if that's what's appropriate for constructing message payloads.

I like text-based protocols, for some simple situations, that behave
like SMTP or POP. But it obviously depends on what you expect your
client and server to do, it's just another approach to be aware of.

One of the big things in designing one's own messaging is error
handling. People generally do just fine with the happy path, but ignore
comprehensive error handling, or get wrapped around the axle trying to
do it.

A lot of situations admit of more than one approach.

Arne Vajhøj

unread,

Feb 7, 2012, 8:24:40 PM2/7/12

to

On 2/7/2012 6:38 PM, BGB wrote:
> On 2/7/2012 11:11 AM, jebblue wrote:
>> On Fri, 03 Feb 2012 19:52:08 +0000, Stefan Ram wrote:
>>> »X« below is another language than Java, for example,
>>> VBA, C#, or C.
>>>
>>> When an X process and a Java process have to exchange information on
>>> the same computer, what possibilites are there? The Java process
>>> should act as a client, sending commands to the X process and also
>>> wants to read answers from the X process. So, the X process is a kind
>>> of server.
>>>
>>> My criteria are: reliability and it should not be extremely slow (say
>>> exchanging a string should not take more than about 10 ms). The main
>>> criterion is reliability.
>>>
>>
>>> Sockets
>>>
>>> This is slightly less transparent than files, but has the advantage
>>> that it becomes very easy to have the two processes running on
>>> different computers later, if this should ever be required. Debugging
>>> should be possible by a man-in-the-middle proxy that prints all
>>> information it sees or by connecting to the server with a terminal.
>>>
>>
>> I recommend using sockets.
>
> in general, I agree (sockets generally make the most sense),

> another issue (besides how to pass messages), is what sort of form to
> pass messages in.
>
> usually, in my case, if storing data in files, I tend to prefer
> ASCII-based formats.
>
> usually, for passing messages over sockets, I have used "compact"
> specialized binary formats, typically serialized data from some other
> form (such as XML nodes or S-Expressions). although "magic byte value"
> based message formats are initially simpler, they tend to be harder to
> expand later (whereas encoding/decoding some more generic form, though
> initially more effort, can turn out to be easier to maintain and extend
> later).

If you want compact and text go for JSON.

Arne

Martin Gregorie

unread,

Feb 7, 2012, 8:31:24 PM2/7/12

to

On Tue, 07 Feb 2012 16:38:31 -0700, BGB wrote:

> in general, I agree (sockets generally make the most sense), although
> there are cases where file-based communications can make sense, although
> probably not in the form as described in the OP.
>

Yes, for small amounts of data or message passing between processes I
tend to like sockets - as others have said, the fact that they are
agnostic about the location of the communicating processes is often very
useful.

> usually, for passing messages over sockets, I have used "compact"
> specialized binary formats,
>

Yep. ASN.1 has to be about the most compact way of encoding structured,
multi-field messages with XML occupying the other end of the scale.

That said, for short, list of fields messages I often use a CSV string
preceded by an unsigned binary byte value containing the string length:
this type of message is both easy to transfer, even if the connection
wants to fragment it during transmission, and by having a printable text
payload, its also convenient for trouble shooting.

--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |

BGB

unread,

Feb 8, 2012, 2:55:53 AM2/8/12

to

On 2/7/2012 6:31 PM, Martin Gregorie wrote:
> On Tue, 07 Feb 2012 16:38:31 -0700, BGB wrote:
>
>> in general, I agree (sockets generally make the most sense), although
>> there are cases where file-based communications can make sense, although
>> probably not in the form as described in the OP.
>>
> Yes, for small amounts of data or message passing between processes I
> tend to like sockets - as others have said, the fact that they are
> agnostic about the location of the communicating processes is often very
> useful.
>

yep.

>> usually, for passing messages over sockets, I have used "compact"
>> specialized binary formats,
>>
> Yep. ASN.1 has to be about the most compact way of encoding structured,
> multi-field messages with XML occupying the other end of the scale.
>

I disagree partly WRT ASN.1:
a disadvantage of ASN.1 is that a lot of times it tends to use
fixed-width integer encodings (and often sends structures in a
"reasonably raw" form), whereas one can shave more bytes using a
variable-length-integer scheme (why encode an integer in 4 bytes if you
only need 1 byte in a given case?). it is also possible to shave more
bytes if one makes the format use an adaptive/context-sensitive encoding
scheme and maybe a variant of Huffman coding or similar (and possibly
encode integer values using a similar scheme to that used in Deflate).
it is in-fact not particularly difficult to outperform ASN.1 in these
regards.

granted, yes, custom Huffman-based data encodings are probably not "the
norm" for network protocols (though some programs, such as the Quake 3
engine, have used Huffman-compressed network protocols).

there is also "arithmetic coding" and "range coding", but with these it
is a lot harder to make the codec be acceptably fast (whereas there are
some tricks to allow optimizing Huffman codecs).

in cases where I have used XML, I have typically used a custom binary
XML variant, which can greatly reduce the overhead vs textual XML. in
terms of saving bytes, my encoding can be more compact than WBXML or
XML+Deflate, but is arguably more "esoteric", and as-is doesn't make use
of schemas (it is instead a basic adaptive coding, and is vaguely
similar to an LZ-Markov coding, attempting to exploit repeating patterns
in tag-structure and similar via prediction, but like most adaptive
codings initially transmits the data in a less dense form as it needs to
build up a new context for each message). the coding in question doesn't
use Huffman coding (for sake of simplicity, and because I don't always
particularly need "maximum compactness"), but a Huffman-based variant
could be created if needed.

there is also EXI, but I don't know how my encoding compares (EXI
probably does better though, given that IIRC it uses binary universal
codes and schemas).

for something else of mine I am using S-Expression based messages
(currently between components within the same process), and had
considered using a vaguely similar binary coding if/when I get around to it.

> That said, for short, list of fields messages I often use a CSV string
> preceded by an unsigned binary byte value containing the string length:
> this type of message is both easy to transfer, even if the connection
> wants to fragment it during transmission, and by having a printable text
> payload, its also convenient for trouble shooting.
>

yes, this is possible.

also possibly would be a TLV encoding (say, possibly doing something
similar to the Matroska MKV file-format).

say, the integer values are encoded something like (range, encoding):
0-127 0xxxxxxx
128-16383 10xxxxxx xxxxxxxx
16384-2097151 110xxxxx xxxxxxxx xxxxxxxx
2097152-... ...

likewise, one can get a signed variant by folding the sign into the LSB,
forming a pattern like: 0, -1, 1, -2, 2, ...

then, one defines tags as:
{
VLI tag;
VLI length;
byte data[length];
}

where tags can hold either data or messages (and, the smallest tag size
needs 2 bytes, or 3 bytes if one has 1 byte of payload for the tag).

if the length is optional (presence depends on tag), one can reduce the
typical tag size to 1 byte. likewise, tags can be combined with an
MTF/MRU scheme such that any recently used tags have a small value (and
can thus be encoded in a single byte). (many of my formats define tags
inline, rather than relying on some large hard-coded tag-list).

more bytes can be saved if more of the message structure is known, say
that not only does the tag encode a particular tag-type, but also may
carry information about what follows after it (various combinations of
attributes, and if it contains sub-tags and what they might be, ...).

if a new tag is defined, it is added to the MRU, but if not used
frequently may move "backwards" (towards higher index numbers) or
eventually be forgotten (falls off the end of the list).

note that some hard-coded tag-numbers will be needed for basic control
purposes (encoding new/unfamiliar tags, ...).

a Huffman-based variant could be similar, just one may encode integers
differently. an example scheme is to use a prefix value (Huffman coded)
and a suffix bit pattern (similar to Deflate). a simpler (but less
compact) scheme was used in JPEG, and IIRC I had before "compromised"
between them by having the Huffman table be stored using Rice codes.

example (prefix range, value range, suffix bits):
0-15 0-15 0
16-23 16-31 1
24-31 32-63 2
32-39 64-127 3
40-47 128-255 4
48-55 512-1024 5
56-63 1024-2047 6
64-71 2048-4095 7
72-79 4096-8191 8
80-87 8192-16383 9
...

also note that a nifty thing (also used in Deflate) is to compress the
Huffman table itself using Huffman coding.

likewise, one can save a few bytes if the encoder is smart enough to
recognize when tags encode numeric data (mostly specific to XML, with
S-Expressions or similar one knows when they are dealing with numeric data).

likewise, one can encode floats as a pair of integer values (although
floats present a few of their own complexities). one can also devise
special encodings for things like numeric vectors, quaternions, ... if
needed as well.

likewise, either an LZ77 or LZ-Markov scheme can be used for encoding
strings (an example would be to used a fixed-size rotating window like
in Deflate, and essentially using the same basic encoding for strings,
albeit likely with the use of an "End-Of-String" marker).

say (range, meaning):
0-255: literal byte values
258: End Of String
259-321: LZ77 Run (encodes length, followed by window offset).

String encoding would be used, say, for encoding both literal text, and
also for escaping things like tag and attribute names.

...

the main variability is mostly in terms of the type of payload being
transmitted:
be it XML-based, S-Expression based, or potentially object-based
(similar to either JSON, or a sort of "heap pickling" style system).

for most structured data, it shouldn't be needed to change the
"fundamentals" too much. the main difference is between tree-structured
and heap-like / graph-structured data, as graph-structured data is often
better sent as a flat list of objects with a certain entry being a "root
node" than as a tree (this can be accomplished either by building a
list, or using an algorithm to detect and break-up cycles when needed).

granted, for most use-cases something like this is likely to be overkill.

or such...

BGB

unread,

Feb 8, 2012, 3:41:52 AM2/8/12

to

well, text need not be all that limiting.
if one has XML or free-form S-Expressions (in their true sense, like in
Lisp or Scheme, not the mutilated/watered-down Rivest ones), then one
can do a fair amount with text.

IME, there are many tradeoffs (regarding ease of use, ...) between XML
and S-Exps, and neither seems "clearly better" (as far as
representations go, I find S-Exps easier to work with, but namespaces
and attributes in XML can make it more flexible, as one can more easily
throw new tags or attributes at the problem with less chance of breaking
existing code).

an example is this:
<foo> <bar value="3"/> </foo>
and:
(foo (bar 3))

now, consider one wants to add a new field to 'foo' (say 'ln').
<foo ln="15"> <bar value="3"/> </foo>
and:
(foo 15 (bar 3))

a difference here is that existing code will probably not even notice
the new XML attribute, whereas the positional nature of most
S-Expressions makes the latter far more likely to break something (and
there is no good way to "annotate" an S-Exp, whereas with XML it is
fairly solidly defined that one can simply add new attributes).

note: my main way of working with XML is typically via DOM-style
interfaces (if I am using it, it is typically because I am directly
working with the data structure, and not as the result of some dumb-ass
"data binding" crud...).

typically, the "internal representation" and "concrete serialization"
are different:
I may use a textual XML serialization, or just as easily, I could use a
binary format;
likewise for S-Exps (actually, I probably far more often represent
S-Exps as a binary format of one form or another than I use them in a
form externally serialized as text).

all hail the mighty DOM-node or CONS-cell...

> One of the big things in designing one's own messaging is error
> handling. People generally do just fine with the happy path, but ignore
> comprehensive error handling, or get wrapped around the axle trying to
> do it.
>

yeah, but this applies to programming in general, so message-passing is
likely nothing special here. one issue maybe special to sockets though
is the matter of whether or not the whole message has been received,
often resulting in some annoying code to basically read messages from
the socket and not decode them until the entire message has been received.

> A lot of situations admit of more than one approach.
>

agreed.

it is like me and file-formats.
often I just use ASCII text (simple, easy, editable in Notepad or
similar, ...).

I make plenty of use of simple line-oriented text formats as well.

other times, I might use more advanced binary formats, or maybe even
employ the use of "data compression" techniques (such as Huffman
coding), so a lot depends.

Arved Sandstrom

unread,

Feb 8, 2012, 6:19:05 AM2/8/12

to

You may have misunderstood something I said if you got that impression
from me, that text is all that limiting. :-)

[ SNIP ]

> note: my main way of working with XML is typically via DOM-style
> interfaces (if I am using it, it is typically because I am directly
> working with the data structure, and not as the result of some dumb-ass
> "data binding" crud...).

I haven't been able to completely avoid using the DOM, but I loathe the
API. If I'm using XML at all, and JAXB suits, I'll use JAXB. More
generally I'll use SAX or StAX.

I almost never encounter a situation where DOM is called for, simply
because no random access to the document is called for. When I send XML
back and forth as a payload, the entire thing is meant to be used, and
it makes sense to do the immediate and complete conversion into real
information rather than storing it into an opaque and kludgy DOM
representation.

For a lot of situations, not just message passing between endpoints, I
have backed away from XML anyway. For configuration files I have gotten
newly enthused by .properties files, because so often they fit the bill
much better than XML configuration files. And I mentioned JSON
previously, I prefer that to XML in many situations now.

[ SNIP ]

>> One of the big things in designing one's own messaging is error
>> handling. People generally do just fine with the happy path, but ignore
>> comprehensive error handling, or get wrapped around the axle trying to
>> do it.
>
> yeah, but this applies to programming in general, so message-passing is
> likely nothing special here.

That's true, but it's maybe a bit more of an art form with messages.
Your message producer may be Java and produce beautiful exceptions in
your carefully designed exception hierarchy, but your clients may very
well not be Java at all, in which case you may end up with an error
message sub-protocol that borrows ideas from from HTTP status codes.

A lot of Java programmers these days maybe have never really dealt with
return codes, because we sort of tell them not to use them in Java, but
in the case of implementation-neutral status codes (including ones for
errors) that's really the design mindset that you need to be in: status
codes.

one issue maybe special to sockets though
> is the matter of whether or not the whole message has been received,
> often resulting in some annoying code to basically read messages from
> the socket and not decode them until the entire message has been received.

There is that. Although I find that once you've worked through one or
two socket implementations that you tend to devise some pretty re-usable
code for handling the incomplete message situations.
[ SNIP ]

BGB

unread,

Feb 8, 2012, 2:07:00 PM2/8/12

to

ok.

it came off that you were implying that text only really worked well for
simple protocols, like SMTP, POP, HTTP, ...

>> note: my main way of working with XML is typically via DOM-style
>> interfaces (if I am using it, it is typically because I am directly
>> working with the data structure, and not as the result of some dumb-ass
>> "data binding" crud...).
>
> I haven't been able to completely avoid using the DOM, but I loathe the
> API. If I'm using XML at all, and JAXB suits, I'll use JAXB. More
> generally I'll use SAX or StAX.
>

I have rarely done things for which SAX has made sense...
usually in cases where SAX would make sense, I end up using
line-oriented text formats instead (because there is often little
obvious reason for why XML syntax would make much sense).

> I almost never encounter a situation where DOM is called for, simply
> because no random access to the document is called for. When I send XML
> back and forth as a payload, the entire thing is meant to be used, and
> it makes sense to do the immediate and complete conversion into real
> information rather than storing it into an opaque and kludgy DOM
> representation.
>

often, I use it for things like compiler ASTs, where it competes some
against S-Expressions (they are produced by the main parser, worked on,
and then later converted into bytecode or similar).

typically, one works by walking the tree, and potentially
rebuilding/rewriting a new tree in the process, or maybe adding
annotations to the existing tree.

a recent case where I did consider using XML as a message-passing
protocol, I ended up opting for S-Expressions (or, more properly,
Lisp-style lists) instead, mostly because they are a lot easier to build
and process, and much less painful than working with a DOM-style API
(and also because S-Expressions tend to perform better and use less
memory in my case as well...).

typically, the messages are tree-structured data of some sort (in the
recent example, it was being used for scene-graph delta messages, which
basically update the status of various objects in the scene, as well as
passing other events for things "going on", like sound-effects being
heard, updates to camera location and status, ...).

it is also desirable to keep the serialized representation small, since
a lot may be going on (in real time), and it would be annoying (say, to
players) if the connection got needlessly bogged down sending lots of
overly verbose update messages (more so if one has stuff like
network-synchronized rag-dolls or similar, where a ragdoll may send
position updates for nearly every bone for every frame).

say:
(bonedelta 499 (bone 0 (org ...) (rot ...)) (bone 1 (org ...) (rot ...))
...)
(bonedelta 515 ...)
...

hence, it may make a little sense to employ a compressed binary format.
I also personally dislike schemas or similar concepts, as they tend to
make things brittle (both the transmitter and receiver need a correct
and up-to-date schema, creating a higher risk of version issues), and
typically don't really compress all that much better (and are
potentially worse) than what a decent adaptive coding can do.

("on the wire", S-Exps and XML are not all that drastically different,
the main practical differences are more in terms of how one may work
with them in-program).

granted, yes, text+deflate also works OK if one is feeling lazy (since
IME Deflate will typically reduce textual XML or S-Exps to around
10%-25% their original size, vs say a 5%-10% one might get with a
specialized binary format).

there is also the tradeoff of designing a binary format to be standalone
(say, including its own Huffman compressor), or to be used in
combination with deflate (at which point one tries to design the format
to instead produce data which deflate can utilize efficiently).

in the latter option, there is the secondary concern of external deflate
(assuming that the data will probably be sent in a compressed channel or
stored in a ZIP file or similar), or using deflate internally (like in
PNG or similar).

there are many tradeoffs...

> For a lot of situations, not just message passing between endpoints, I
> have backed away from XML anyway. For configuration files I have gotten
> newly enthused by .properties files, because so often they fit the bill
> much better than XML configuration files. And I mentioned JSON
> previously, I prefer that to XML in many situations now.
>

I typically use line-oriented text formats for most of these purposes...

never really did understand why someone would use XML for things like
configuration files (it neither makes them easier to process, nor does
it help anything with users trying to edit them).

as-is, my configuration format consists of "console commands", which may
in turn set "cvars" or issue key-binding commands, ...

for another (more serious) system, I am using a format which is
partially a hybrid of INI and REG files (it is for a registry-like
hierarchical database). I have on/off considered switching to a binary
database format, but never got around to it.

some amount of other data is stored in formats similar to the Quake map
format, or other special-purpose text formats.

> [ SNIP ]
>
>>> One of the big things in designing one's own messaging is error
>>> handling. People generally do just fine with the happy path, but ignore
>>> comprehensive error handling, or get wrapped around the axle trying to
>>> do it.
>>
>> yeah, but this applies to programming in general, so message-passing is
>> likely nothing special here.
>
> That's true, but it's maybe a bit more of an art form with messages.
> Your message producer may be Java and produce beautiful exceptions in
> your carefully designed exception hierarchy, but your clients may very
> well not be Java at all, in which case you may end up with an error
> message sub-protocol that borrows ideas from from HTTP status codes.
>
> A lot of Java programmers these days maybe have never really dealt with
> return codes, because we sort of tell them not to use them in Java, but
> in the case of implementation-neutral status codes (including ones for
> errors) that's really the design mindset that you need to be in: status
> codes.
>

granted, I am actually primarily a C and C++ programmer, but
message-passing isn't particularly language-specific. granted, yes, the
lack of "standard" exceptions is an annoyance in C, where typically one
either needs to not use exceptions, or end up using non-portable
exception mechanisms, and there is no particularly good way to "build
ones' own", although some people have before done some fairly "creative"
things with macros...

> one issue maybe special to sockets though
>> is the matter of whether or not the whole message has been received,
>> often resulting in some annoying code to basically read messages from
>> the socket and not decode them until the entire message has been received.
>
> There is that. Although I find that once you've worked through one or
> two socket implementations that you tend to devise some pretty re-usable
> code for handling the incomplete message situations.
> [ SNIP ]
>

yep.

one can always tag messages and then give them with a length.

{ tag, length, data[length] }
message is then not processed until entire data region is received.
typically, this is plenty sufficient.

likewise, a PPP/HDLC style system (message start/end codes) could also
be used.

depending on other factors, one can also do things like in JPEG or MPEG,
and use a special escape-code for messages and control-codes.

this can allow a top-level message format like:
{ escape-code, tag [ length, data[length] ... ] }

typically, in such cases (I have seen) there have been ways to escape
the escape-code, usually for cases where the escape code appeared
by-chance in the data. this in-turn adds the annoyance of typically
having to escape any escape-codes in the payload data.

some others have partly worked around the above by making the escape
code fairly long (32 or 48 bits or more) and very unlikely to appear by
chance, and likely involving "sanity checks" to try to rule out false
positives.

say: { escape-magic, tag, length, data[length], checksum }
with the assumption that chance is very unlikely to lead to all of:
an escape magic, a valid tag value, a sane length, and a valid checksum.

depending, the escape-magic and tag can be the same value.

for example:
the byte 0x7E is magic;
7E,00 escapes 7E (or maybe 7E,7E)
7E,01 Start Of Message (followed by message data)
7E,02 End Of Message (maybe, followed by checksum)
others: reserved for link-control messages.

then one can pass encoded messages over the link.

typically, I have not tried parsing incomplete messages, as trying to
make a message decoder deal gracefully with truncated data is a bit more
of a hassle.

depending on other factors (say, if one is using Huffman), then one can
also use special markers to transmit the Huffman tables and other things.

say:
7E,03: Stream Reset (possibly followed by a stream/protocol ID magic)
7E,04-07: Huffman Tables 0-3
7E,08: End Of Huffman Table
...

Lew

unread,

Feb 8, 2012, 5:02:22 PM2/8/12

to

BGB wrote:
> ...

> an example is this:
> <foo> <bar value="3"/> </foo>
> and:
> (foo (bar 3))
>
> now, consider one wants to add a new field to 'foo' (say 'ln').
> <foo ln="15"> <bar value="3"/> </foo>
> and:
> (foo 15 (bar 3))
>
> a difference here is that existing code will probably not even notice
> the new XML attribute, whereas the positional nature of most

Ahem. You mean other than failing schema validation?

> S-Expressions makes the latter far more likely to break something (and

More likely than failing schema validation was for that well-designed XML-based
application?

> there is no good way to "annotate" an S-Exp, whereas with XML it is
> fairly solidly defined that one can simply add new attributes).

Attributes in XML are not annotation (with or without quotes). That role is filled by the actual 'annotation' element
http://www.w3schools.com/schema/el_annotation.asp

> note: my main way of working with XML is typically via DOM-style
> interfaces (if I am using it, it is typically because I am directly
> working with the data structure, and not as the result of some dumb-ass
> "data binding" crud...).

Sorry, "dumb-ass 'data-binding' crud"?

Why the extreme pejoratives? I would not say that there's anything wrong with
XML data-binding /per se/, although as with documented-oriented approaches it
can be done very badly.

> typically, the "internal representation" and "concrete serialization"
> are different:

I don't understand what you mean here. You cite these terms in quotes as though
they are a standard terminology for some specific things, but use them in their
ordinary meaning. The internal representation of what? The serialization
("concrete" or otherwise) of what? I don't mean to be obtuse here, but I am not
grokking the referents.

> I may use a textual XML serialization, or just as easily, I could use a
> binary format;
> likewise for S-Exps (actually, I probably far more often represent
> S-Exps as a binary format of one form or another than I use them in a
> form externally serialized as text).
>
> all hail the mighty DOM-node or CONS-cell...

WTF?

--
Lew

BGB

unread,

Feb 8, 2012, 8:49:05 PM2/8/12

to

On 2/8/2012 3:02 PM, Lew wrote:
> BGB wrote:
>> ...
>> an example is this:
>> <foo> <bar value="3"/> </foo>
>> and:
>> (foo (bar 3))
>>
>> now, consider one wants to add a new field to 'foo' (say 'ln').
>> <foo ln="15"> <bar value="3"/> </foo>
>> and:
>> (foo 15 (bar 3))
>>
>> a difference here is that existing code will probably not even notice
>> the new XML attribute, whereas the positional nature of most
>
> Ahem. You mean other than failing schema validation?
>

many of us don't use schemas with our XML.

I think the issue is that one particular technology, XML, is used in
significantly different ways by different people and for different reasons.

many people use XML for data-binding, and many other people who use it
could care less about data-binding.

some people may use XML for similar purposes to how people using Lisp
would use lists (never-mind if this is kind of awkward, it does work).

like, doing Lisp type stuff in Java using DOM-nodes in place of
cons-based lists... +1 now that Java also (sort of) has closures.

>> S-Expressions makes the latter far more likely to break something (and
>
> More likely than failing schema validation was for that well-designed XML-based
> application?
>

as noted, many people neither use schemas nor any sort of schema
validation. in many use-cases, schemas are overly constraining to the
ability of using XML to represent free-form data, or using them
otherwise would offer little particular advantage.

say, if one is using XML for compiler ASTs or similar (say, the XML is
used to represent a just-parsed glob of source-code), do they really
need any sort of schema?

http://en.wikipedia.org/wiki/Abstract_syntax_tree

>> there is no good way to "annotate" an S-Exp, whereas with XML it is
>> fairly solidly defined that one can simply add new attributes).
>
> Attributes in XML are not annotation (with or without quotes). That role is filled by the actual 'annotation' element
> http://www.w3schools.com/schema/el_annotation.asp
>

they can be used for annotating the nodes in many sane use cases...

a lot depends on how one is using the XML in a given context.

>> note: my main way of working with XML is typically via DOM-style
>> interfaces (if I am using it, it is typically because I am directly
>> working with the data structure, and not as the result of some dumb-ass
>> "data binding" crud...).
>
> Sorry, "dumb-ass 'data-binding' crud"?
>
> Why the extreme pejoratives? I would not say that there's anything wrong with
> XML data-binding /per se/, although as with documented-oriented approaches it
> can be done very badly.
>

yeah, this may have been stated overly strongly.

personally, IMO, data-binding is probably one of the worse and
technically more pointless ways of using XML (as, IMO, it leads to such
similarly ill-designed technologies as SOAP and similar...).

not that data-binding is itself necessarily itself pointless, but doing
it via overly verbose namespace-ridden XML is probably one of the worse
ways of doing it (vs either specialized file-formats, or the use of
binary data-binding formats, which IMO should also not be used for data
interchange).

admittedly, I also partly dislike traditional ways of using data-binding
as it often exposes things which are theoretically internal to the app,
namely structural data representation (via classes/...), with things
which should theoretically be isolated from the internal data
representation: file formats.

or, IOW: a file-format (or protocol/...) should express the data in
itself, and not express how it is physically represented within the
application.

likewise, data going into or coming out of a piece of code should be
ideally documented and defined in a form separate from the component in
question.

otherwise, data-binding is not that much different than a more modern
variant of writing raw structures and arrays to files.

>> typically, the "internal representation" and "concrete serialization"
>> are different:
>
> I don't understand what you mean here. You cite these terms in quotes as though
> they are a standard terminology for some specific things, but use them in their
> ordinary meaning. The internal representation of what? The serialization
> ("concrete" or otherwise) of what? I don't mean to be obtuse here, but I am not
> grokking the referents.
>

the internal representation of the data within the application code.

if one knows which objects or classes exist, what sorts of members they
contain, ... then one is essentially exposing data which should not be
visible, or for that matter relied upon for data interchange (or, for
that matter, relevant).

ideally, any data represented externally should be defined in terms of
its semantics: something will be present if it is relevant to the
meaning of the data. the serialization will then be defined in terms of
expressing the structure and semantics of the data, which may bear very
little resemblance to how the data is represented in the actual
classes/arrays/whatever which make up how the data is represented
internally to the application.

similarly, file formats should be as much abstracted from the
application code as is reasonably possible, with a "concrete"
specification for the file-format or data-representation being written
instead.

both XML and S-Expressions can be used as structured ways of
representing semantics, rather than as ways of representing the contents
of given a data-object.

>> I may use a textual XML serialization, or just as easily, I could use a
>> binary format;
>> likewise for S-Exps (actually, I probably far more often represent
>> S-Exps as a binary format of one form or another than I use them in a
>> form externally serialized as text).
>>
>> all hail the mighty DOM-node or CONS-cell...
>
> WTF?
>

DOM nodes can be very powerful (and are probably a much better way of
using XML than using it as some sort of data-binding thing).

cons-cells are pairs of dynamically-typed values, typically called "car"
and "cdr" and used to implement lists and similar (and are the main
building block of "everything" in languages like Lisp and Scheme, well,
along with "symbols" and "fixnums" and similar).

http://en.wikipedia.org/wiki/Cons_cell

they can also be implemented in C, C++, and Java without too much
trouble, and can be a fairly useful way of building various sorts of
data structures (although, sadly, they aren't nearly as efficient in
Java as they could be, but OTOH it is also sort of a pain to build a
dynamic type-system in C, so it probably evens out...).

then one can proceed to build logic based mostly on building and
processing lists.

or, conceptually, they can be regarded as a type of linked-list based
containers, however the ways they are traditionally used are
significantly different from traditional ways of using containers (they
are typically used as ways of building tree-structures, rather than
usually as ways of storing a collection of items).

it may be worthwhile to look-up information regarding Lisp and Scheme
and similar, not that there is necessarily much reason to actually use
the languages, but there are some ideas and ways of doing things which
can be mapped fairly nicely onto other, more common, languages.

Arne Vajhøj

unread,

Feb 8, 2012, 9:14:31 PM2/8/12

to

On 2/8/2012 8:49 PM, BGB wrote:
> as noted, many people neither use schemas nor any sort of schema
> validation. in many use-cases, schemas are overly constraining to the
> ability of using XML to represent free-form data, or using them
> otherwise would offer little particular advantage.

xsd:any do provide some flexibility in schemas.

> say, if one is using XML for compiler ASTs or similar (say, the XML is
> used to represent a just-parsed glob of source-code), do they really
> need any sort of schema?

I would expect syntax trees to follow certain rules and not be free
form.

Arne

Arne Vajhøj

unread,

Feb 8, 2012, 9:16:11 PM2/8/12

to

On 2/8/2012 2:07 PM, BGB wrote:
> On 2/8/2012 4:19 AM, Arved Sandstrom wrote:
>> On 12-02-08 04:41 AM, BGB wrote:
>>> note: my main way of working with XML is typically via DOM-style
>>> interfaces (if I am using it, it is typically because I am directly
>>> working with the data structure, and not as the result of some dumb-ass
>>> "data binding" crud...).
>>
>> I haven't been able to completely avoid using the DOM, but I loathe the
>> API. If I'm using XML at all, and JAXB suits, I'll use JAXB. More
>> generally I'll use SAX or StAX.
>>
>
> I have rarely done things for which SAX has made sense...
> usually in cases where SAX would make sense, I end up using
> line-oriented text formats instead (because there is often little
> obvious reason for why XML syntax would make much sense).

Non flat structure and validation comes to mind.

Arne

BGB

unread,

Feb 8, 2012, 9:50:19 PM2/8/12

to

fair enough.

often, one can implement non-flat structures with line-oriented formats,
for example:
...
groupDef {
...
groupDef {
itemDef {
...
}
...
}
...
}

a lot of time this may be combined with cosmetic indentation, but this
does not change if it is a line-oriented format, for example, writing:
groupDef
{
...
}

could very-well break the parser.

typically, I have not used validation:
if there is anything to validate, typically this logic will be placed in
the logic to parse the text.

a lot of times, code operates under the assumption that nearly anything
which can be reasonably done is valid de-facto (the code is written,
however, to ideally not do anything compromising).

granted, typically I don't deal a whole lot with anything "security
critical" or where there is much need to worry about "trust" or
"authorization" or similar (or if privacy or money or similar was
involved...). maybe if security were more of a concern, then added
layers of pedantics and validation would make a lot more sense.

in my typical use-cases, the theoretical worst case would probably be if
a 3rd party could somehow break the app and get control of the users' OS
or similar and cause damage, but again, modern Windows is itself partly
designed to try to defend against this (running applications by default
with constrained privileges, ...).

Lew

unread,

Feb 8, 2012, 11:07:54 PM2/8/12

to

In one breath we're singing the praises of binary formats, in the next we
complain that XML isn't sufficiently flexible.

"Do they really need any sort of schema?" with XML is usually a "yes".

But only if you're interested in clear, unambiguous, readily-parsable and
maintainable XML document formats.

People often excoriate the supposed verbosity of XML as though it were the only
criterion to measure utility.

There is no inherent advantage of a LISP/list-like format over any other, nor vice versa; it's all accordin'. If the convention is agreeable to all parties,
it will work. If all projects were one-off and isolated from the larger world,
we'd never need to adhere to a standard. If we don't mind inventing our own
tools for anything, we'd never have to adopt a standard with extensive tools
support.

Where are the *real* costs of a software system?

--
Lew

BGB

unread,

Feb 8, 2012, 11:10:13 PM2/8/12

to

On 2/8/2012 7:14 PM, Arne Vajhøj wrote:
> On 2/8/2012 8:49 PM, BGB wrote:
>> as noted, many people neither use schemas nor any sort of schema
>> validation. in many use-cases, schemas are overly constraining to the
>> ability of using XML to represent free-form data, or using them
>> otherwise would offer little particular advantage.
>
> xsd:any do provide some flexibility in schemas.
>

yep, but one can wonder what is the gain of using a schema if one is
just going to use "xsd:any"?...

it is also a mystery how well EXI behaves in this case (admittedly, I
have not personally looked into EXI in-depth, as I only briefly skimmed
over the spec a long time ago).

>> say, if one is using XML for compiler ASTs or similar (say, the XML is
>> used to represent a just-parsed glob of source-code), do they really
>> need any sort of schema?
>
> I would expect syntax trees to follow certain rules and not be free
> form.
>

well, there are some rules, but the question is more if a schema or the
use of validation would offer much advantage to make using it worth the
bother?...

the other possibility would be to make the next compiler stage, upon
seeing invalid data, give an error message essentially like "what the
hell is this?..." and halt compilation (typically this is what happens
if the compiler logic encounters a node type it doesn't know how to do
anything with in a situation where a known node-type is expected, or if
some required node is absent or similar).

so, one can have a schema to validate, say, that ones' "if" node looks like:
<if>
<cond> expr </cond>
<then> statement </then>
<else> statement </else>
</if>

but, OTOH, if upon getting back a null node when looking for "cond" or
"then", it causes an internal-error message to get displayed, it is the
same effect. even if it just ungracefully tries to use the null and
causes the program to crash, it is probably still not a huge loss (apart
from the annoyance that is a crash-prone compiler...).

I think the original point though was more about XML vs S-Expressions in
this case though:
XML allows more easily just stuffing-in new tags or contents for
existing tag-types, if this makes sense (it doesn't necessarily break
existing code or structures, and actually, protocols like XMPP make use
of this property fairly directly). for S-Exps, which are often
essentially, this is much less nice, and will often include needing more
node-types to deal with the presence or absence of certain features
(whereas with XML one can use different logic based on whether or not
certain attributes or tags are present or absent).

granted, it does still leave the possibility that one could structure
things more loosely (with S-Exps), say, rather than:
( if /cond/ /then/ /else/ )
one has:
( if (cond /cond/ ) (then /then/ ) (else /else/ ) )

so, gaining a little more flexibility at the cost of a little more
verbosity, which is possibly a reasonable point one could argue (my
client/server frame-delta protocol works more like this, typically using
marker tags before everything in place of lots of fixed argument lists,
although fixed-lists are used in many places as well).

trivia: the frame-delta protocol was originally intended to be
XML-based, but I switched out to S-Expressions at the last minute (just
prior to actually implementing it) mostly on the ground that S-Exps
would have been less effort (and I didn't feel like jerking off with the
added awkwardness using XML would bring at the moment).

a funny irony would be if someone were to devise some sort of schema
system and use it to try to validate their S-Expressions.

it is still an open question as to which is ultimately "better", as each
has strengths, and mostly seems to boil down to a tradeoff between
flexibility and ease-of-use.

BGB

unread,

Feb 9, 2012, 1:29:45 AM2/9/12

to

On 2/8/2012 9:07 PM, Lew wrote:
> On Wednesday, February 8, 2012 6:14:31 PM UTC-8, Arne Vajhøj wrote:
>> On 2/8/2012 8:49 PM, BGB wrote:
>>> as noted, many people neither use schemas nor any sort of schema
>>> validation. in many use-cases, schemas are overly constraining to the
>>> ability of using XML to represent free-form data, or using them
>>> otherwise would offer little particular advantage.
>>
>> xsd:any do provide some flexibility in schemas.
>>
>>> say, if one is using XML for compiler ASTs or similar (say, the XML is
>>> used to represent a just-parsed glob of source-code), do they really
>>> need any sort of schema?
>>
>> I would expect syntax trees to follow certain rules and not be free
>> form.
>
> In one breath we're singing the praises of binary formats, in the next we
> complain that XML isn't sufficiently flexible.
>

it is not like one can't have both:
have a format which is at the same time is a compressed binary format,
and can also retain the full flexibility of representing free-form XML
semantics, ideally without a major drop in compactness (this happens
with WBXML, and IIRC should also happen with EXI about as soon as one
starts encoding nodes which lie outside the schema).

this is partly why I was advocating a sort of pattern-building adaptive
format: it can build the functional analogue of a schema as it encodes
the data, and likewise does not depend on a schema to properly decode
the document. it is mostly a matter of having the format predict when it
doesn't need to specify tag and attribute names (it is otherwise similar
to a traditional data-compressor).

this is functionally similar to the sliding-window as used in deflate
and LZMA (7zip) and similar (in contrast to codebook-based data
compressors). functionally, it would have a little more in common with
LZW+MTF than with LZ77 though.

granted, potentially a binary format could incorporate both support for
schemas and the use of adaptive compression.

is XML really the text, or is it actually the structure?
I had operated under the premise that it was the data-structure (tags,
attributes, namespaces, ...), which allows for pretty much anything
which can faithfully encode the structure (without imposing too many
arbitrary restrictions).

> "Do they really need any sort of schema?" with XML is usually a "yes".
>
> But only if you're interested in clear, unambiguous, readily-parsable and
> maintainable XML document formats.
>

fair enough, I have mostly been using it "internally", and as noted, for
some of my file-formats, I had used a custom binary coded variant
(roughly similar to WBXML, but generally more compact and supporting
more features, such as namespaces and similar, which I had called SBXE).
it didn't make use of schemas, and worked by simply encoding the tag
structure into the file, and using basic contextual modeling strategies.

it also compared favorably with XML+GZ in my tests (which IIRC was also
generally smaller than WBXML). remotely possible would also be XML+BZip2
or XML+LZMA.

I had considered the possibility of a more "advanced" format (with more
advanced predictive modeling), but didn't bother (couldn't see much
point at the time of trying to shave off more bytes at the time, as it
was already working fairly well).

> People often excoriate the supposed verbosity of XML as though it were the only
> criterion to measure utility.
>

well, a lot depends...

for disk files, really, who cares?...
for a link where a several kB message might only take maybe 250-500ms
and is at typical "user-interaction" speeds (say, part of a generic "web
app"), likewise, who cares?...

it may matter a little more in a 3D interactive world where everything
going on in the visible scene has to get through at a 10Hz or 24Hz
clock-tick, and if the connection bogs down the user will be rather
annoyed (as their game world has essentially stalled).

one may have to make due with about 16-24kB/s (or maybe less) to better
ensure a good user experience (little is to say that the user has a
perfect internet connection either).

so, some sort of compression may be needed in this case.
(yes, XML+GZ would probably be sufficient).

if it were dial-up, probably no one would even consider using XML for
the network protocol in a 3D game.

> There is no inherent advantage of a LISP/list-like format over any other, nor vice versa; it's all accordin'. If the convention is agreeable to all parties,
> it will work. If all projects were one-off and isolated from the larger world,
> we'd never need to adhere to a standard. If we don't mind inventing our own
> tools for anything, we'd never have to adopt a standard with extensive tools
> support.
>

it is possible, it all depends.

a swaying factor in my last choice was the effort tradeoff of writing
the code (because working with DOM is kind of a pain...). IIRC, I may
have also been worrying about performance (mostly passing around lots of
numeric data as ASCII strings, ...).

but, I may eventually need to throw together a basic encoding scheme for
this case (a binary encoder for list-based data), that or just reuse an
existing data serializer of mine (mostly intended for generic data
serialization, which supports lists). it lacks any sort of prediction or
context modeling though, and is used in my stuff mostly as a container
format for bytecode for my VM and similar.

> Where are the *real* costs of a software system?
>

who knows?...

probably delivering the best reasonable user experience?...

for a game:
reasonably good graphics;
reasonably good performance (ideally, consistently over 30fps);
hopefully good gameplay, plot, story, ...

well, that and "getting everything done" (this is the hard one).

Arved Sandstrom

unread,

Feb 9, 2012, 5:24:15 AM2/9/12

to

[ SNIP ]

No need for the braces, if you're going to use those all you gain over
the XML is terseness.

Consider line-oriented files/messages like .properties files: these can
describe hierarchical structures perfectly well if you've got an
understood key=value syntax, specifically with a hierarchy-supporting
syntax for the keys. Easy to read and edit, easy to parse.

As an example take a look at log4j .properties and XML configuration
files. All you gain with the XML is the ability to validate against a
log4j DTD.

> a lot of times, code operates under the assumption that nearly anything
> which can be reasonably done is valid de-facto (the code is written,
> however, to ideally not do anything compromising).
>
> granted, typically I don't deal a whole lot with anything "security
> critical" or where there is much need to worry about "trust" or
> "authorization" or similar (or if privacy or money or similar was
> involved...). maybe if security were more of a concern, then added
> layers of pedantics and validation would make a lot more sense.
>
> in my typical use-cases, the theoretical worst case would probably be if
> a 3rd party could somehow break the app and get control of the users' OS
> or similar and cause damage, but again, modern Windows is itself partly
> designed to try to defend against this (running applications by default
> with constrained privileges, ...).
>

This is a narrow view of application security. Unless you're writing toy
apps, one would expect that your apps are doing *something*, and that
something includes access to databases or files or other resources.
Furthermore, if your app is used by anyone other than yourself, another
asset is in play, and that's your personal, team's or business's
reputation.

Privacy-sensitive data, or financial data, doesn't have to be involved,
and you don't need the actions of a malicious third party, in order to
have an application security problem. If your code is such that it
corrupts any persistent data, say, or is seriously under-performant
under load, or intermittently breaks and the app has to be re-started,
you've managed to trample all over the Integrity [1] and Availability
security attributes of CAI (Confidentiality, Availability,
Integrity)...all without the help of any malicious external threats.

Do you think your users care who or what mangled part of the
organizational data, or who or what is responsible for 20 percent
downtime? Some of your stakeholders will, sure, when culprits are being
sought, but most of your users will just care about proper function.

All application security starts with good coding. That's why so much of
standards like the Java Secure Coding Guidelines, or OWASP
Development/Code Review/Testing guides, have to do with good coding. And
I don't believe you can really relax your standards with some apps and
have high standards in another.

AHS

1. Strictly speaking not an integrity violation if you can detect the
unintended data corruption, ideally know what caused it, and even better
repair it, but in practice once the damage is done you often
*effectively* can't easily recover; the effort of detecting and fixing
is itself punitive.

BGB

unread,

Feb 9, 2012, 11:15:17 AM2/9/12

to

well, if the format is still line-oriented, one can still parse the
files using a loop, getting and splitting strings, and checking the
first token of each line.

parsing XML is a little more invovlved, since:
items may be split across lines, or multiple items may exist on the same
line;
one can no longer use whitespace or commas as the primary deliminator;
...

granted, yes, one can use SAX or similar, but alas...

one can wonder though, what really would be the gain of using XML syntax
in many such cases, vs the typical "relative niceness" of a line
oriented format.

like, say I have a format which looks like:
{
"classname" "func_door"
"angle" "-1"
...
{
[ 1 0 0 16 ] brick/mybrick [ 0 1 0 0 ] [ 0 0 1 0 ]
[ -1 0 0 16 ] brick/mybrick [ 0 1 0 0 ] [ 0 0 1 0 ]
[ 0 1 0 16 ] brick/mybrick [ 1 0 0 0 ] [ 0 0 1 0 ]
[ 0 -1 0 16 ] brick/mybrick [ 1 0 0 0 ] [ 0 0 1 0 ]
[ 0 0 1 16 ] brick/mybrick [ 1 0 0 0 ] [ 0 1 0 0 ]
[ 0 0 -1 16 ] brick/mybrick [ 1 0 0 0 ] [ 0 1 0 0 ]
}
}

would it really look much better as:
<entity>
<field var="classname" value="func_door"/>
<field var="angle" value="-1"/>
...
<brush>
<face plane="1 0 0 16" texture="brick/mybrick" sdir="0 1 0 0" tdir="0 0
1 0"/>
...
</brush>
</entity>

even despite the parser being more generic, and it being better labeled
what everything is, is it really an improvement WRT, say, readability?...

> Consider line-oriented files/messages like .properties files: these can
> describe hierarchical structures perfectly well if you've got an
> understood key=value syntax, specifically with a hierarchy-supporting
> syntax for the keys. Easy to read and edit, easy to parse.
>

yes, but this defeats your own prior point, namely indirectly asserting
that line-oriented == flat-structure.

point is, one can have hierarchical line-oriented files.

> As an example take a look at log4j .properties and XML configuration
> files. All you gain with the XML is the ability to validate against a
> log4j DTD.
>
>> a lot of times, code operates under the assumption that nearly anything
>> which can be reasonably done is valid de-facto (the code is written,
>> however, to ideally not do anything compromising).
>>
>> granted, typically I don't deal a whole lot with anything "security
>> critical" or where there is much need to worry about "trust" or
>> "authorization" or similar (or if privacy or money or similar was
>> involved...). maybe if security were more of a concern, then added
>> layers of pedantics and validation would make a lot more sense.
>>
>> in my typical use-cases, the theoretical worst case would probably be if
>> a 3rd party could somehow break the app and get control of the users' OS
>> or similar and cause damage, but again, modern Windows is itself partly
>> designed to try to defend against this (running applications by default
>> with constrained privileges, ...).
>>
> This is a narrow view of application security. Unless you're writing toy
> apps, one would expect that your apps are doing *something*, and that
> something includes access to databases or files or other resources.
> Furthermore, if your app is used by anyone other than yourself, another
> asset is in play, and that's your personal, team's or business's
> reputation.
>

"someone steals' the user's save-games!", that would be scary, or not
really...

most of the files in a game are generic resource data, but stealing them
is of little concern, and damaging them is more likely to be an
annoyance than an actual threat "oh crap, I might have to reinstall...".

> Privacy-sensitive data, or financial data, doesn't have to be involved,
> and you don't need the actions of a malicious third party, in order to
> have an application security problem. If your code is such that it
> corrupts any persistent data, say, or is seriously under-performant
> under load, or intermittently breaks and the app has to be re-started,
> you've managed to trample all over the Integrity [1] and Availability
> security attributes of CAI (Confidentiality, Availability,
> Integrity)...all without the help of any malicious external threats.
>

typically, crashes are more an annoyance than a major threat.

consider Skyrim: the damn thing can't usually keep going for more than 1
or 2 hours before crashing-to-desktop or similar.

of course, not everyone aspires towards Bethesda levels of stability.

> Do you think your users care who or what mangled part of the
> organizational data, or who or what is responsible for 20 percent
> downtime? Some of your stakeholders will, sure, when culprits are being
> sought, but most of your users will just care about proper function.
>

only likely matters if it is some sort of server-based or business type app.

ok, a game-server crashing could be a bit annoying if one were making
something like an MMORPG or something (like WoW...).

in my case, I am not:
the online play would likely be more for things like user-run deathmatch
servers and similar.

> All application security starts with good coding. That's why so much of
> standards like the Java Secure Coding Guidelines, or OWASP
> Development/Code Review/Testing guides, have to do with good coding. And
> I don't believe you can really relax your standards with some apps and
> have high standards in another.
>

it is more a matter of productivity:
focus on security, code-quality, ... in places where it is important;
otherwise, whatever one can mash together which basically works is
arguably good enough.

granted, it is not like there aren't some things I care about, like I
prefer clean and nice code over a tangled mess, but ultimately this may
be secondary to the greater concern, "get it done" (as, what good is
good code if the product can never get out the door and on the market?).

it is like with art:
some people can be perfectionist, and worry about tiny details which
hardly anyone would ever notice;
other people can try to make something "good enough" and hope users
don't notice or care about any little graphical imperfections.

> AHS
>
> 1. Strictly speaking not an integrity violation if you can detect the
> unintended data corruption, ideally know what caused it, and even better
> repair it, but in practice once the damage is done you often
> *effectively* can't easily recover; the effort of detecting and fixing
> is itself punitive.

potentially, but it depends on the relative costs.

if the worst case is forcing a reinstall, this is much less of an issue
than, say, if it breaks their savegames, which is much less of an issue
than if any "actually important" data is involved (compromises users'
privacy or security, causes damage to their computer, ...).

say, one doesn't want to have their app be a vector for virus delivery,
as this can give a bad reputation.

but, alas...

Lew

unread,

Feb 9, 2012, 12:40:59 PM2/9/12

to

BGB wrote:
> Lew wrote:

> > Arne Vajhøj wrote:
> >> BGB wrote:
> >>> as noted, many people neither use schemas nor any sort of schema
> >>> validation. in many use-cases, schemas are overly constraining to the
> >>> ability of using XML to represent free-form data, or using them
> >>> otherwise would offer little particular advantage.
> >>
> >> xsd:any do provide some flexibility in schemas.
> >>
> >>> say, if one is using XML for compiler ASTs or similar (say, the XML is
> >>> used to represent a just-parsed glob of source-code), do they really
> >>> need any sort of schema?
> >>
> >> I would expect syntax trees to follow certain rules and not be free
> >> form.
> >
> > In one breath we're singing the praises of binary formats, in the next we
> > complain that XML isn't sufficiently flexible.
> >
>
> it is not like one can't have both:

XML is much easier to modify and maintain when flexibility is a requirement.

> have a format which is at the same time is a compressed binary format,
> and can also retain the full flexibility of representing free-form XML
> semantics, ideally without a major drop in compactness (this happens
> with WBXML, and IIRC should also happen with EXI about as soon as one
> starts encoding nodes which lie outside the schema).
>
> this is partly why I was advocating a sort of pattern-building adaptive
> format: it can build the functional analogue of a schema as it encodes

That rather defeats the purpose of having a schema.

A schema is a contract that the various processes or other stakeholders use to
guarantee correctness of the XML and guide processing. If you develop it /ad
hoc/ you lose that contract.

> the data, and likewise does not depend on a schema to properly decode
> the document. it is mostly a matter of having the format predict when it
> doesn't need to specify tag and attribute names (it is otherwise similar
> to a traditional data-compressor).

I'm sure that's very clever, but it defeats the purpose of XML schema.

> this is functionally similar to the sliding-window as used in deflate
> and LZMA (7zip) and similar (in contrast to codebook-based data
> compressors). functionally, it would have a little more in common with
> LZW+MTF than with LZ77 though.

... and now you're off on some weird tangential topic.

> granted, potentially a binary format could incorporate both support for
> schemas and the use of adaptive compression.
>
>
> is XML really the text, or is it actually the structure?

Huh?

> I had operated under the premise that it was the data-structure (tags,
> attributes, namespaces, ...), which allows for pretty much anything
> which can faithfully encode the structure (without imposing too many
> arbitrary restrictions).

Huh?

XML is a formal specification for structured documents that is devoid of
semantics.

> > "Do they really need any sort of schema?" with XML is usually a "yes".
> >
> > But only if you're interested in clear, unambiguous, readily-parsable and
> > maintainable XML document formats.
> >
>
> fair enough, I have mostly been using it "internally", and as noted, for
> some of my file-formats, I had used a custom binary coded variant
> (roughly similar to WBXML, but generally more compact and supporting
> more features, such as namespaces and similar, which I had called SBXE).
> it didn't make use of schemas, and worked by simply encoding the tag
> structure into the file, and using basic contextual modeling strategies.

Bully. Good on ye.

> it also compared favorably with XML+GZ in my tests (which IIRC was also
> generally smaller than WBXML). remotely possible would also be XML+BZip2
> or XML+LZMA.

Compared "favorably" according to what criteria?

> I had considered the possibility of a more "advanced" format (with more
> advanced predictive modeling), but didn't bother (couldn't see much
> point at the time of trying to shave off more bytes at the time, as it
> was already working fairly well).

Huh?

> > People often excoriate the supposed verbosity of XML as though it were the only
> > criterion to measure utility.
> >
>
> well, a lot depends...
>
> for disk files, really, who cares?...
> for a link where a several kB message might only take maybe 250-500ms
> and is at typical "user-interaction" speeds (say, part of a generic "web
> app"), likewise, who cares?...
>
>
> it may matter a little more in a 3D interactive world where everything
> going on in the visible scene has to get through at a 10Hz or 24Hz
> clock-tick, and if the connection bogs down the user will be rather
> annoyed (as their game world has essentially stalled).

And that's a use case for XML how, exactly?

Saying "XML is bad because it doesn't keep bananas ripe" would be equally
relevant.

> one may have to make due with about 16-24kB/s (or maybe less) to better
> ensure a good user experience (little is to say that the user has a
> perfect internet connection either).
>
> so, some sort of compression may be needed in this case.
> (yes, XML+GZ would probably be sufficient).

Back in the universe where we're discussing XML's suitability, please.

> if it were dial-up, probably no one would even consider using XML for
> the network protocol in a 3D game.

Oh, you're talking about inter-node communication in a distributed game. Thanks
for finally making that clear. XML would be just fine as a transmission protocol for such a thing. I'm not saying ideal, but just fine.

If you're talking about network protocols you certainly are not talking about
frame-by-frame transmission of data with reply at 10 Hz, no matter what the
protocol, so your entire argument against XML for such a thing is moot.

> > There is no inherent advantage of a LISP/list-like format over any other, nor vice versa; it's all accordin'. If the convention is agreeable to all parties,
> > it will work. If all projects were one-off and isolated from the larger world,
> > we'd never need to adhere to a standard. If we don't mind inventing our own
> > tools for anything, we'd never have to adopt a standard with extensive tools
> > support.
> >
>
> it is possible, it all depends.
>
> a swaying factor in my last choice was the effort tradeoff of writing
> the code (because working with DOM is kind of a pain...). IIRC, I may

Huh? again. There's very little effort in writing XML code, whether DOM, JAXB,
SAX or StAX, given the wide availability of libraries to do so.

> have also been worrying about performance (mostly passing around lots of
> numeric data as ASCII strings, ...).

Based on what measurements?

> but, I may eventually need to throw together a basic encoding scheme for
> this case (a binary encoder for list-based data), that or just reuse an
> existing data serializer of mine (mostly intended for generic data
> serialization, which supports lists). it lacks any sort of prediction or
> context modeling though, and is used in my stuff mostly as a container
> format for bytecode for my VM and similar.
>
>
> > Where are the *real* costs of a software system?
> >
>
> who knows?...

Anyone who thinks about it realistically.

> probably delivering the best reasonable user experience?...

That's not a cost, that's a goal.

> for a game:
> reasonably good graphics;
> reasonably good performance (ideally, consistently over 30fps);
> hopefully good gameplay, plot, story, ...
>
> well, that and "getting everything done" (this is the hard one).

Those aren't costs. Those are goals.

Clear conclusions require clear reasoning on actual facts with relevance.

--
Lew

Arved Sandstrom

unread,

Feb 9, 2012, 5:58:24 PM2/9/12

to

On 12-02-09 12:15 PM, BGB wrote:
> On 2/9/2012 3:24 AM, Arved Sandstrom wrote:

[ SNIP ]

>
>> Consider line-oriented files/messages like .properties files: these can
>> describe hierarchical structures perfectly well if you've got an
>> understood key=value syntax, specifically with a hierarchy-supporting
>> syntax for the keys. Easy to read and edit, easy to parse.
>
> yes, but this defeats your own prior point, namely indirectly asserting
> that line-oriented == flat-structure.

Minor quibble, I didn't make such a point, not even indirectly. You may
be confusing me with Arne.

> point is, one can have hierarchical line-oriented files.

[ SNIP ]

Yes.

AHS

BGB

unread,

Feb 9, 2012, 6:15:27 PM2/9/12

to

On 2/9/2012 3:58 PM, Arved Sandstrom wrote:
> On 12-02-09 12:15 PM, BGB wrote:
>> On 2/9/2012 3:24 AM, Arved Sandstrom wrote:
> [ SNIP ]
>>
>>> Consider line-oriented files/messages like .properties files: these can
>>> describe hierarchical structures perfectly well if you've got an
>>> understood key=value syntax, specifically with a hierarchy-supporting
>>> syntax for the keys. Easy to read and edit, easy to parse.
>>
>> yes, but this defeats your own prior point, namely indirectly asserting
>> that line-oriented == flat-structure.
>
> Minor quibble, I didn't make such a point, not even indirectly. You may
> be confusing me with Arne.
>

ok. both names started with 'Ar', so I guess I didn't notice the change...

Arne Vajhøj

unread,

Feb 9, 2012, 6:45:17 PM2/9/12

to

But then the parser becomes more complex than using the
builtin XML parser.

> typically, I have not used validation:
> if there is anything to validate, typically this logic will be placed in
> the logic to parse the text.

But yout get it for free with XML - you just need to enable
validation.

Arne

Arne Vajhøj

unread,

Feb 9, 2012, 6:46:46 PM2/9/12

to

On 2/9/2012 5:24 AM, Arved Sandstrom wrote:
> As an example take a look at log4j .properties and XML configuration
> files. All you gain with the XML is the ability to validate against a
> log4j DTD.

And problems appending to an existing file ...

Arne

Arne Vajhøj

unread,

Feb 9, 2012, 6:48:50 PM2/9/12

to

????

No one in their right mind would parse XML manually.

You can pick between lots of nice XML API's (many of them
shipping with Java) that will handle all that.

Arne

Arne Vajhøj

unread,

Feb 9, 2012, 6:50:32 PM2/9/12

to

On 2/9/2012 5:58 PM, Arved Sandstrom wrote:
> On 12-02-09 12:15 PM, BGB wrote:
>> On 2/9/2012 3:24 AM, Arved Sandstrom wrote:
> [ SNIP ]
>>
>>> Consider line-oriented files/messages like .properties files: these can
>>> describe hierarchical structures perfectly well if you've got an
>>> understood key=value syntax, specifically with a hierarchy-supporting
>>> syntax for the keys. Easy to read and edit, easy to parse.
>>
>> yes, but this defeats your own prior point, namely indirectly asserting
>> that line-oriented == flat-structure.
>
> Minor quibble, I didn't make such a point, not even indirectly. You may
> be confusing me with Arne.
>
>> point is, one can have hierarchical line-oriented files.
> [ SNIP ]
>
> Yes.

You can have non flat structures other than XML, but parsing
quickly becomes very complex.

Arne

Arne Vajhøj

unread,

Feb 9, 2012, 6:54:25 PM2/9/12

to

On 2/8/2012 11:10 PM, BGB wrote:
> On 2/8/2012 7:14 PM, Arne Vajhøj wrote:
>> On 2/8/2012 8:49 PM, BGB wrote:
>>> as noted, many people neither use schemas nor any sort of schema
>>> validation. in many use-cases, schemas are overly constraining to the
>>> ability of using XML to represent free-form data, or using them
>>> otherwise would offer little particular advantage.
>>
>> xsd:any do provide some flexibility in schemas.
>>
>
> yep, but one can wonder what is the gain of using a schema if one is
> just going to use "xsd:any"?...

You still have some structure.

> it is also a mystery how well EXI behaves in this case (admittedly, I
> have not personally looked into EXI in-depth, as I only briefly skimmed
> over the spec a long time ago).

No idea. But I would assume EXI supports what is valid XML and XSD.

>>> say, if one is using XML for compiler ASTs or similar (say, the XML is
>>> used to represent a just-parsed glob of source-code), do they really
>>> need any sort of schema?
>>
>> I would expect syntax trees to follow certain rules and not be free
>> form.
>>
>
> well, there are some rules, but the question is more if a schema or the
> use of validation would offer much advantage to make using it worth the
> bother?...

Enforcing correctness of data is usually a good idea.

Arne

BGB

unread,

Feb 9, 2012, 7:02:45 PM2/9/12

to

there is no schema in use in this case, however...

hence, why the format would ideally need to be adaptive:
so one doesn't need a schema for it to work correctly, and also so the
use of free-form data will not hinder compression.

>> the data, and likewise does not depend on a schema to properly decode
>> the document. it is mostly a matter of having the format predict when it
>> doesn't need to specify tag and attribute names (it is otherwise similar
>> to a traditional data-compressor).
>
> I'm sure that's very clever, but it defeats the purpose of XML schema.
>

which is not being used in this case to begin with...

>> this is functionally similar to the sliding-window as used in deflate
>> and LZMA (7zip) and similar (in contrast to codebook-based data
>> compressors). functionally, it would have a little more in common with
>> LZW+MTF than with LZ77 though.
>
> ... and now you're off on some weird tangential topic.
>

data compression...

>> granted, potentially a binary format could incorporate both support for
>> schemas and the use of adaptive compression.
>>
>>
>> is XML really the text, or is it actually the structure?
>
> Huh?
>

as I see it, XML exists at 2 levels:
as a textual syntax;
as a semantic (a tree of tags with attributes and so on) structure,
which can be expressed via the textual syntax.

conceptually, the semantic structure of XML is more or less equivalent
to a tree of DOM nodes.

>> I had operated under the premise that it was the data-structure (tags,
>> attributes, namespaces, ...), which allows for pretty much anything
>> which can faithfully encode the structure (without imposing too many
>> arbitrary restrictions).
>
> Huh?
>
> XML is a formal specification for structured documents that is devoid of
> semantics.
>

the existence of the tags and attributes is the semantics...

>>> "Do they really need any sort of schema?" with XML is usually a "yes".
>>>
>>> But only if you're interested in clear, unambiguous, readily-parsable and
>>> maintainable XML document formats.
>>>
>>
>> fair enough, I have mostly been using it "internally", and as noted, for
>> some of my file-formats, I had used a custom binary coded variant
>> (roughly similar to WBXML, but generally more compact and supporting
>> more features, such as namespaces and similar, which I had called SBXE).
>> it didn't make use of schemas, and worked by simply encoding the tag
>> structure into the file, and using basic contextual modeling strategies.
>
> Bully. Good on ye.
>
>> it also compared favorably with XML+GZ in my tests (which IIRC was also
>> generally smaller than WBXML). remotely possible would also be XML+BZip2
>> or XML+LZMA.
>
> Compared "favorably" according to what criteria?
>

smaller output size.

I was testing each scenario, and comparing the output sizes.

>> I had considered the possibility of a more "advanced" format (with more
>> advanced predictive modeling), but didn't bother (couldn't see much
>> point at the time of trying to shave off more bytes at the time, as it
>> was already working fairly well).
>
> Huh?
>
>>> People often excoriate the supposed verbosity of XML as though it were the only
>>> criterion to measure utility.
>>>
>>
>> well, a lot depends...
>>
>> for disk files, really, who cares?...
>> for a link where a several kB message might only take maybe 250-500ms
>> and is at typical "user-interaction" speeds (say, part of a generic "web
>> app"), likewise, who cares?...
>>
>>
>> it may matter a little more in a 3D interactive world where everything
>> going on in the visible scene has to get through at a 10Hz or 24Hz
>> clock-tick, and if the connection bogs down the user will be rather
>> annoyed (as their game world has essentially stalled).
>
> And that's a use case for XML how, exactly?
>
> Saying "XML is bad because it doesn't keep bananas ripe" would be equally
> relevant.
>

because one can use XML as the client/server messaging protocol, say, in
place of "well, I am going to send message tags as raw bytes and have
each followed by some values...".

>> one may have to make due with about 16-24kB/s (or maybe less) to better
>> ensure a good user experience (little is to say that the user has a
>> perfect internet connection either).
>>
>> so, some sort of compression may be needed in this case.
>> (yes, XML+GZ would probably be sufficient).
>
> Back in the universe where we're discussing XML's suitability, please.
>
>> if it were dial-up, probably no one would even consider using XML for
>> the network protocol in a 3D game.
>
> Oh, you're talking about inter-node communication in a distributed game. Thanks
> for finally making that clear. XML would be just fine as a transmission protocol for such a thing. I'm not saying ideal, but just fine.
>
> If you're talking about network protocols you certainly are not talking about
> frame-by-frame transmission of data with reply at 10 Hz, no matter what the
> protocol, so your entire argument against XML for such a thing is moot.
>

both ends transmit concurrently and asynchronously, but one needs to get
the messages through at roughly 10Hz for things to remain playable
(otherwise, real-time interactivity starts to fall apart).

so, the server sends a 10Hz stream of updates to the client;
the client sends a 10Hz stream of movement impulses back to the server;
...

granted, ping time is a bit of an issue, as what actions the player is
trying to do, and what is going on at the servers' end, will invariably
drift somewhat.

typically, things like linear extrapolation and similar are used to try
to make up for sub-optimal ping (ideally, one tries to hide the results
of the ping time, where possible).

but, it does all work, as evidenced by the prevalence of online gaming
and similar.

I doubt anyone uses ping/pong or request/response based protocols for
this, as the ping times over the internet would likely render something
like this unusable (one would probably need to be on a LAN or something...).

>>> There is no inherent advantage of a LISP/list-like format over any other, nor vice versa; it's all accordin'. If the convention is agreeable to all parties,
>>> it will work. If all projects were one-off and isolated from the larger world,
>>> we'd never need to adhere to a standard. If we don't mind inventing our own
>>> tools for anything, we'd never have to adopt a standard with extensive tools
>>> support.
>>>
>>
>> it is possible, it all depends.
>>
>> a swaying factor in my last choice was the effort tradeoff of writing
>> the code (because working with DOM is kind of a pain...). IIRC, I may
>
> Huh? again. There's very little effort in writing XML code, whether DOM, JAXB,
> SAX or StAX, given the wide availability of libraries to do so.
>

well, the issue is that one needs an method call every time they want to
fetch an attribute's value or look up a node, which is a little more
painful than it could be.

it isn't like major pain or anything, but it does tend to result in
slightly longer and more awkward code.

with lists, traditionally there are operations like:
"cadr", "caddr", "cadddr", ..., "caadr", ..., and so on, which make it a
bit easier (and more compact) to reference particular items within a
list (since the operations essentially encode where to fetch the item from).

OTOH, with DOM one might end up with a chain a several statements to
access an item, and yet more statements if one is checking for null, so
it is just a little more awkward and verbose to work with, but granted
it is not like the difference is all that huge (IIRC, the performance
concern may well have been a bigger factor).

>> have also been worrying about performance (mostly passing around lots of
>> numeric data as ASCII strings, ...).
>
> Based on what measurements?
>

this case was based mostly on speculation that if I am creating piles of
new strings to pass numbers around, and I am passing a scene-graph
update on a 10Hz basis, most of which will become garbage immediately
afterwards, than creating all of those strings could get a little
expensive (mostly causing the garbage collector to start "doing its
thing" and reduce performance and similar).

I chose another option (namely lists) which had the option of passing
the the numbers without allocating any memory on the heap.

>> but, I may eventually need to throw together a basic encoding scheme for
>> this case (a binary encoder for list-based data), that or just reuse an
>> existing data serializer of mine (mostly intended for generic data
>> serialization, which supports lists). it lacks any sort of prediction or
>> context modeling though, and is used in my stuff mostly as a container
>> format for bytecode for my VM and similar.
>>
>>
>>> Where are the *real* costs of a software system?
>>>
>>
>> who knows?...
>
> Anyone who thinks about it realistically.
>
>> probably delivering the best reasonable user experience?...
>
> That's not a cost, that's a goal.
>
>> for a game:
>> reasonably good graphics;
>> reasonably good performance (ideally, consistently over 30fps);
>> hopefully good gameplay, plot, story, ...
>>
>> well, that and "getting everything done" (this is the hard one).
>
> Those aren't costs. Those are goals.
>

not effectively achieving a goal is a cost...

> Clear conclusions require clear reasoning on actual facts with relevance.
>

dunno.

a lot of time one works based on "the feel of the code" or "the feel of
the problem" or similar (if one "feels" that an option will lead to
suck, it often does lead to suck). one doesn't necessarily know what the
reasoning is, one can just follow along where it leads (it can be almost
like that of a physical sensation or similar, like "what does it feel
like the code wants to do here?").

also, estimating things based on past experiences and known behaviors
and "rules of thumb" and so on.

if one knows what something does, one can make an educated guess for
what it will do in a given situation.

everything else becomes mostly likelihoods and probabilities (like, how
likely is a good outcome, vs a sucky outcome, ...).

or such...

BGB

unread,

Feb 9, 2012, 11:40:18 PM2/9/12

to

IMO: not particularly...

tree-structured line-oriented text formats are not that much more
complicated to parse than flat-list line-oriented text formats (the only
obvious difference is that rather than a single loop, one typically has
multiple loops).

it all still basically amounts to:
read a line;
split the string;
do something with the split strings.

the main hassle IMO is when one needs to go from line-oriented to
token-oriented formats (say, writing a parser for something like
S-Expressions or XML). then one will typically need to buffer in the
whole file and read out a token at a time.

the next big complexity jump is when one starts dealing with things like
operator precedence, say, when parsing typical programming-language
style syntax (say, Java or JavaScript like).

after that, is probably context-sensitive declarations (namely, like in
C or C++), at which point parsing starts becoming a good deal more of a
PITA (writing a C++ parser is probably "non-trivial", since many cases
are ambiguous and one can often only know the correct way to parse
something by matching it against prior declarations).

personally, I like recursive hand-written descent parsing, as it is
fairly straightforwards and doesn't depend on external tools.

BGB

unread,

Feb 9, 2012, 11:46:18 PM2/9/12

to

depends on which language one is using at the time...

if one is using Java, then XML parsing is basically free.
if one is using C, then it is either "write some code to do it", or
suffer with a 3rd party library dependency (one might validly choose to
write the code themselves in this case).

I don't expect it is all that uncommon for a person to switch between
several different languages, and maybe deal with the strengths and
weaknesses of whichever language they are using at the time.

Lew

unread,

Feb 10, 2012, 11:51:10 AM2/10/12

to

BGB wrote:

> Arne Vajhøj wrote:
> > ????
> >
> > No one in their right mind would parse XML manually.
> >
> > You can pick between lots of nice XML API's (many of them
> > shipping with Java) that will handle all that.
> >
>
> depends on which language one is using at the time...
>
> if one is using Java, then XML parsing is basically free.

This /is/ a Java newsgroup, as you might have noticed.

> if one is using C, then it is either "write some code to do it", or

> suffer with a 3rd party [sic] library dependency (one might validly choose to

> write the code themselves in this case).

"Suffer"? The XML parsers for C are well-established, very reliable, and no
cause for suffering. Using a pejorative is not the same as establishing a
point.

There is nothing wrong with the third-party libraries, and the choice to
roll your own for C is rarely valid. You seem to suffer from NIH syndrome.

> I don't expect it is all that uncommon for a person to switch between
> several different languages, and maybe deal with the strengths and
> weaknesses of whichever language they are using at the time.

Not usually in the same program. Your expectation lacks relevance here.

--
Lew

BGB

unread,

Feb 10, 2012, 12:25:34 PM2/10/12

to

On 2/9/2012 4:54 PM, Arne Vajhøj wrote:
> On 2/8/2012 11:10 PM, BGB wrote:
>> On 2/8/2012 7:14 PM, Arne Vajhøj wrote:
>>> On 2/8/2012 8:49 PM, BGB wrote:
>>>> as noted, many people neither use schemas nor any sort of schema
>>>> validation. in many use-cases, schemas are overly constraining to the
>>>> ability of using XML to represent free-form data, or using them
>>>> otherwise would offer little particular advantage.
>>>
>>> xsd:any do provide some flexibility in schemas.
>>>
>>
>> yep, but one can wonder what is the gain of using a schema if one is
>> just going to use "xsd:any"?...
>
> You still have some structure.
>

probably.

>> it is also a mystery how well EXI behaves in this case (admittedly, I
>> have not personally looked into EXI in-depth, as I only briefly skimmed
>> over the spec a long time ago).
>
> No idea. But I would assume EXI supports what is valid XML and XSD.
>

yes, it is just that, IIRC, EXI uses the schema to know how to
efficiently encode structures (values are directly coded), and falls
back to a more naive strategy (describing the encoded tags) if the
schema doesn't cover a given case.

admittedly, I am less certain, partly as skimming over the spec,
admittedly I am not entirely certain how EXI works (would have to invest
a bit more time in reading over the spec).

note: even in the worst case, the output will still likely be tiny vs
textual XML.

more skimming... sudden mystery: if the format is a bitstream, why are
they apparently using a byte-aligned scheme for storing integers?...
(the cost here is that one has to then re-align with the next byte
boundary, potentially wasting on average several bits).

>>>> say, if one is using XML for compiler ASTs or similar (say, the XML is
>>>> used to represent a just-parsed glob of source-code), do they really
>>>> need any sort of schema?
>>>
>>> I would expect syntax trees to follow certain rules and not be free
>>> form.
>>>
>>
>> well, there are some rules, but the question is more if a schema or the
>> use of validation would offer much advantage to make using it worth the
>> bother?...
>
> Enforcing correctness of data is usually a good idea.
>

potentially, but checking against schemas isn't free.
depending on the application, it could be hard to justify spending the
extra clock cycles (except maybe for debugging purposes or similar).

a issue with ASTs is that they come in several forms:
giant, like in the output of a C compiler, where many tasks tend towards
"expensive" (it may take easily anywhere from 250ms-1500ms to shove all
this stuff through the various compiler stages);
small, like in a script-language VM, where typically it is desirable
that compile times still be fairly fast, since a major strength of
scripting languages is trying to keep "eval" and similar fairly close to
free.

granted, one could debate the sanity of using XML for ASTs in the first
place, but this started originally as a historical accident in my case
(I was writing an interpreter, and it was what I had on-hand, actually:
I partly hacked an existing XML-RPC implementation into being a script
interpreter...). however, it doesn't seem to actually hurt performance
too badly (ironically, in my C compiler, much more time goes into the
preprocessor and tokenizer, which are far more efficient and more highly
optimized).

side note: the C compiler doesn't use a standard DOM, but rather a
highly specialized, but still DOM-like, system (and may still dump ASTs
as text-form XML for debugging reasons). it involves, among other
things, optimizations for numerical data (attributes may store numeric
data directly, vs needing to use a string) and large hash-tables and
chaining for look-ups, as well as specialized operations to reduce typing.

my current scripting VM, however, internally uses lists/s-expressions
(note: they are neither AST compatible, nor will C code work effectively
on my scripting VM). this was due to a later rewrite "switching over" (I
was also reusing a lot of parts from a prior Scheme interpreter of mine
for this one).

but, anyways, I am more left thinking schema-checking would probably
make sense more when either some sort of security is a concern, or maybe
when sending data "over the wire" between multiple parties.

inserting a schema check between ones' parser and ones' bytecode emitter
doesn't seem nearly as compelling.

I guess, if a person really wanted, they could write a schema for the
ASTs, but it is not clear how useful it would be to do so (since,
generally, apart from someone mucking around with the compiler
internals, there is little direct reason to know or care what is going
on in there...).

or such...

BGB

unread,

Feb 10, 2012, 12:43:33 PM2/10/12

to

On 2/10/2012 9:51 AM, Lew wrote:
> BGB wrote:
>> Arne Vajhøj wrote:
>>> ????
>>>
>>> No one in their right mind would parse XML manually.
>>>
>>> You can pick between lots of nice XML API's (many of them
>>> shipping with Java) that will handle all that.
>>>
>>
>> depends on which language one is using at the time...
>>
>> if one is using Java, then XML parsing is basically free.
>
> This /is/ a Java newsgroup, as you might have noticed.
>

yes, but this thread is also about cross-language message passing, one
may have to face the issue that, at least one end, will not be using Java.

this means, of course, that both ends will need to be able to deal with
both sending and receiving the data.

>> if one is using C, then it is either "write some code to do it", or
>> suffer with a 3rd party [sic] library dependency (one might validly choose to
>> write the code themselves in this case).
>
> "Suffer"? The XML parsers for C are well-established, very reliable, and no
> cause for suffering. Using a pejorative is not the same as establishing a
> point.
>
> There is nothing wrong with the third-party libraries, and the choice to
> roll your own for C is rarely valid. You seem to suffer from NIH syndrome.
>

they introduce porting hassles:
does one bundle "libxml" with their app on Windows;
do they use MSXML and then deal with having to switch over to "libxml"
when building on Linux?
...

often, writing ones' own code to do something may be the fastest and
easiest option.

writing code to do something can also be a fun and entertaining
experience (giving oneself stuff to do, and then doing it, ...), and
also give ideas/experience which could be useful for other things.

granted, there is also the goal of getting things done in a timely
manner, so it is a tradeoff.

but, anyways, it is like asking a person never to write their own JPEG
loader/saver, or their own scripting-language compiler. yes, maybe a
person doesn't technically need to, but they may forsake potentially
valuable learning experiences (or the claim to having the skills to do so).

>> I don't expect it is all that uncommon for a person to switch between
>> several different languages, and maybe deal with the strengths and
>> weaknesses of whichever language they are using at the time.
>
> Not usually in the same program. Your expectation lacks relevance here.
>

so, then, a program written in a mix of 5 programming languages is
probably rare then?...

but, anyways, whether or not it is within the same program was not the
issue:
it could be in multiple cooperating programs which share data, or in
different components (which merely share APIs or similar).

Lew

unread,

Feb 10, 2012, 4:15:29 PM2/10/12

to

BGB wrote:
> yes, but this thread is also about cross-language message passing, one
> may have to face the issue that, at least one end, will not be using Java.
>
> this means, of course, that both ends will need to be able to deal with
> both sending and receiving the data.

This is the use case for which XML with schema excels. It is very nearly ideal
for the purpose. XML is semantically void with respect to the problem domain,
schemas provide a reliable contract for interpretation of the messages, they
provide a convenient human-readable format to ensure agreement by all
stakeholders, the drive the easy-to-use tools for XML-based message passing,
and such easy-to-use tools are abundantly available for every major platform
and computer language.

Your comments about different libraries' availability makes an asset sound like
a problem. It's a *good* thing that there are so many libraries available. XML
itself provides the compatibility.

--
Lew

BGB

unread,

Feb 10, 2012, 4:50:43 PM2/10/12

to

On 2/10/2012 2:15 PM, Lew wrote:
> BGB wrote:
>> yes, but this thread is also about cross-language message passing, one
>> may have to face the issue that, at least one end, will not be using Java.
>>
>> this means, of course, that both ends will need to be able to deal with
>> both sending and receiving the data.
>
> This is the use case for which XML with schema excels. It is very nearly ideal
> for the purpose. XML is semantically void with respect to the problem domain,
> schemas provide a reliable contract for interpretation of the messages, they
> provide a convenient human-readable format to ensure agreement by all
> stakeholders, the drive the easy-to-use tools for XML-based message passing,
> and such easy-to-use tools are abundantly available for every major platform
> and computer language.
>

yes, but it is the agreement on particular formats (say, that both
parties will use XML and have the contents laid out a particular way),
rather than the use of either schemas or validation, which allows for
said compatibility.

it is like claiming that people need to depend on standardized
dictionaries (and some sort of automatic word-use and grammar checker)
to be able to carry on a conversation, rather than, say, the
dictionaries existing as a means of recording agreed-upon word-use patterns.

or, like those people who go and claim that "math is reality" rather
than "math is a formalized system which can be used to describe
reality", and so on.

> Your comments about different libraries' availability makes an asset sound like
> a problem. It's a *good* thing that there are so many libraries available. XML
> itself provides the compatibility.
>

yep.

it is likewise for many common file-formats:
large numbers of people use them, write code to read and write them, ...
so, people go and write down how the file format works, such that others
can write things which can read and write the files.

luckily for everyone, most people can agree to use PNG and JPEG and so
on as well...

Lew

unread,

Feb 10, 2012, 5:32:23 PM2/10/12

to

On Friday, February 10, 2012 1:50:43 PM UTC-8, BGB wrote:
> On 2/10/2012 2:15 PM, Lew wrote:
> > BGB wrote:
> >> yes, but this thread is also about cross-language message passing, one
> >> may have to face the issue that, at least one end, will not be using Java.
> >>
> >> this means, of course, that both ends will need to be able to deal with
> >> both sending and receiving the data.
> >
> > This is the use case for which XML with schema excels. It is very nearly ideal
> > for the purpose. XML is semantically void with respect to the problem domain,
> > schemas provide a reliable contract for interpretation of the messages, they
> > provide a convenient human-readable format to ensure agreement by all
> > stakeholders, the drive the easy-to-use tools for XML-based message passing,
> > and such easy-to-use tools are abundantly available for every major platform
> > and computer language.
> >
>
> yes, but it is the agreement on particular formats (say, that both
> parties will use XML and have the contents laid out a particular way),
> rather than the use of either schemas or validation, which allows for
> said compatibility.

Sure, and schemas give a simple, readable, clear and unambiguous means to
communicate the proposal and reach an agreement.

You might as well say that it's the intent of the carpenter that makes the
furniture, not the saw. This does not make the saw any less useful or valuable.

> it is like claiming that people need to depend on standardized
> dictionaries (and some sort of automatic word-use and grammar checker)
> to be able to carry on a conversation, rather than, say, the
> dictionaries existing as a means of recording agreed-upon word-use patterns.

No, it's nothing like that.

It is like having a dictionary to record the agreement. Following your logic,
we'd claim that a dictionary isn't useful because all it does is record an
agreement in a structured, easily-followed and standard manner.

> or, like those people who go and claim that "math is reality" rather
> than "math is a formalized system which can be used to describe
> reality", and so on.

Huh? To make a math joke, you really are off on a tangent with that one.

What have you got against math people? Oh, and by the way, math is reality.

> > Your comments about different libraries' availability makes an asset sound like
> > a problem. It's a *good* thing that there are so many libraries available. XML
> > itself provides the compatibility.
> >
>
> yep.
>
>
> it is likewise for many common file-formats:
> large numbers of people use them, write code to read and write them, ...
> so, people go and write down how the file format works, such that others
> can write things which can read and write the files.
>
> luckily for everyone, most people can agree to use PNG and JPEG and so
> on as well...

But by your logic, PNG and JPEG are not useful because all we have to do is
invent our own format and agree to use it and re-invent all the nifty (and
often free) useful tools that only work on standard formats like PNG and JPEG,
thus throwing away all the human-centuries of engineering and wisdom that went
into those standards simply because we believe we're more clever than anyone
else and can exist in a vacuum and don't need all those steenkeen' free, useful
tools.

--
Lew

BGB

unread,

Feb 10, 2012, 7:10:14 PM2/10/12

to

On 2/10/2012 3:32 PM, Lew wrote:
> On Friday, February 10, 2012 1:50:43 PM UTC-8, BGB wrote:
>> On 2/10/2012 2:15 PM, Lew wrote:
>>> BGB wrote:
>>>> yes, but this thread is also about cross-language message passing, one
>>>> may have to face the issue that, at least one end, will not be using Java.
>>>>
>>>> this means, of course, that both ends will need to be able to deal with
>>>> both sending and receiving the data.
>>>
>>> This is the use case for which XML with schema excels. It is very nearly ideal
>>> for the purpose. XML is semantically void with respect to the problem domain,
>>> schemas provide a reliable contract for interpretation of the messages, they
>>> provide a convenient human-readable format to ensure agreement by all
>>> stakeholders, the drive the easy-to-use tools for XML-based message passing,
>>> and such easy-to-use tools are abundantly available for every major platform
>>> and computer language.
>>>
>>
>> yes, but it is the agreement on particular formats (say, that both
>> parties will use XML and have the contents laid out a particular way),
>> rather than the use of either schemas or validation, which allows for
>> said compatibility.
>
> Sure, and schemas give a simple, readable, clear and unambiguous means to
> communicate the proposal and reach an agreement.
>
> You might as well say that it's the intent of the carpenter that makes the
> furniture, not the saw. This does not make the saw any less useful or valuable.
>

a saw is actually physically needed for the work to be done.

a more accurate example would likely be:
does the carpenter need a CNC milling machine?

the carpenter could just saw at the wood, and make something.
and he could draw up a diagram or make a blueprint or similar if he wanted.

but, demanding that a schema be used is about like asking that he write
the CNC program, and have the machine do it.

>> it is like claiming that people need to depend on standardized
>> dictionaries (and some sort of automatic word-use and grammar checker)
>> to be able to carry on a conversation, rather than, say, the
>> dictionaries existing as a means of recording agreed-upon word-use patterns.
>
> No, it's nothing like that.
>
> It is like having a dictionary to record the agreement. Following your logic,
> we'd claim that a dictionary isn't useful because all it does is record an
> agreement in a structured, easily-followed and standard manner.
>

I was not saying dictionaries are not useful, only that one can carry on
a conversation without invoking one at every instant to validate what
one is saying.

a written specification for a file format will serve a similar purpose.
an XML schema could be considered as a narrower machine-readable subset
of a file-format specification. although there are cases where it could
be useful to validate against the schema, this is not likely the case in
every case.

>> or, like those people who go and claim that "math is reality" rather
>> than "math is a formalized system which can be used to describe
>> reality", and so on.
>
> Huh? To make a math joke, you really are off on a tangent with that one.
>
> What have you got against math people? Oh, and by the way, math is reality.
>

grr, those people annoy me, especially for their whole "the theory is
too pure to be used for anything actually useful" thing (of believing
that physical reality is somehow inferior to "mathematical perfection"
or whatever...).

at least software does something, and has slightly less occurrence of
people going on endlessly about "perfection" and whatever else (or
getting all condescending and nit-picky about something being "not
sufficiently perfect enough", bleh...).

also, my reality happens to be made mostly out of matter, and "stuff".

matter is obvious enough: one can see it, one can eat it, ...
secondarily: software is "real enough", because one can run it, and one
can copy it around via drives or over the internet, ...

but, where is the "math": it is seemingly nowhere to be found, and seems
mostly just to boil down to people messing around with symbolic
notations and describing the behavior of systems otherwise made out of
matter.

IMO, it makes about as much sense as those people who believe reality is
made out of emotions, or perceptions, or morals, or is actually a huge
pile of laws and words, or whatever else.

(decided against writing a bunch of arguments for how each apparently
fails as a good basis for observable reality).

rather each is by some means built on top of reality:
emotions and perception being a byproduct of the brain (itself made out
of matter...);
morals being (probably) a byproduct of large-scale cost/benefit
tradeoffs (bad behavior -> bad results, and is a place where emotions
and economics seem to converge, ...);
and laws and words are a byproduct of language use and peoples' attempts
to organize things.

likewise, math would seem to be a byproduct of the analysis and
description of physical and mechanical systems.

not that all this stuff doesn't matter, just reality is (probably) not
made out of it.

also note: it is possible to believe in a reality made out of matter,
and also believe in religious stuff and similar as well (because, as I
see it, the belief that they necessarily conflict is probably also flawed).

( could go into the matter of "matter + religion + morals + rational
self-interest + free market + ...", but, I have probably been going off
on enough of a tangent already... )

>>> Your comments about different libraries' availability makes an asset sound like
>>> a problem. It's a *good* thing that there are so many libraries available. XML
>>> itself provides the compatibility.
>>>
>>
>> yep.
>>
>>
>> it is likewise for many common file-formats:
>> large numbers of people use them, write code to read and write them, ...
>> so, people go and write down how the file format works, such that others
>> can write things which can read and write the files.
>>
>> luckily for everyone, most people can agree to use PNG and JPEG and so
>> on as well...
>
> But by your logic, PNG and JPEG are not useful because all we have to do is
> invent our own format and agree to use it and re-invent all the nifty (and
> often free) useful tools that only work on standard formats like PNG and JPEG,
> thus throwing away all the human-centuries of engineering and wisdom that went
> into those standards simply because we believe we're more clever than anyone
> else and can exist in a vacuum and don't need all those steenkeen' free, useful
> tools.
>

this is missing the point. to write ones' own code is not the same as to
forsake using an existing standardized file format.

I do use a lot of standardized formats, just I often feel little need to
use others' implementations of those formats.

for example, I have my own implementations of PNG, JPEG, Deflate, ...
granted, I didn't really "need" to do so, but often to use a library
means either creating an annoying external dependency issue, or needing
to drag around the library, when often one can get by just writing a
much smaller and more narrowly focused piece of code to deal with it.

Arved Sandstrom

unread,

Feb 10, 2012, 9:08:40 PM2/10/12

to

On 12-02-10 08:10 PM, BGB wrote:
[ SNIP ]

>
> this is missing the point. to write ones' own code is not the same as to
> forsake using an existing standardized file format.
>
> I do use a lot of standardized formats, just I often feel little need to
> use others' implementations of those formats.
>
> for example, I have my own implementations of PNG, JPEG, Deflate, ...
> granted, I didn't really "need" to do so, but often to use a library
> means either creating an annoying external dependency issue, or needing
> to drag around the library, when often one can get by just writing a
> much smaller and more narrowly focused piece of code to deal with it.
>

Apart from a situation where you are genuinely resource-constrained and
need to slim down the library in question [1], I don't see those factors
as justifying the effort. "External dependency"? You've already got one
- you depend on the file format specification. So would you rather spend
the (usually substantial) time understanding the spec and implementing
the format, or have other folks do it for you?

And "drag around the library"? Who are you kidding? Look at the size of
libtiff libraries on a typical Linux or Unix system, and then look at
the supported API: you think the library is bloated? You think the
effort is justified to understand the TIFF spec well enough to pick out
just the bits you need, so you can build your own library? Or look at
the Javadoc API for iText 5.1.3: http://api.itextpdf.com/itext/. You
think the 1.6 MB size of the core iText JAR is so indefensible that it's
worth your time to understand the PDF spec well enough to write your own
library for just the bits you need?

It's possible a few times in your career to adopt a new file format so
early that nobody else has a decent library for it. Or the only decent
ones are commercial, as another possibility. This is quite rare, though.

AHS

1. Possible, I suppose, if someone is asking you to do miracles with a
dinky low-end microcontroller.

BGB

unread,

Feb 11, 2012, 2:49:15 AM2/11/12

to

On 2/10/2012 7:08 PM, Arved Sandstrom wrote:
> On 12-02-10 08:10 PM, BGB wrote:
> [ SNIP ]
>>
>> this is missing the point. to write ones' own code is not the same as to
>> forsake using an existing standardized file format.
>>
>> I do use a lot of standardized formats, just I often feel little need to
>> use others' implementations of those formats.
>>
>> for example, I have my own implementations of PNG, JPEG, Deflate, ...
>> granted, I didn't really "need" to do so, but often to use a library
>> means either creating an annoying external dependency issue, or needing
>> to drag around the library, when often one can get by just writing a
>> much smaller and more narrowly focused piece of code to deal with it.
>>
> Apart from a situation where you are genuinely resource-constrained and
> need to slim down the library in question [1], I don't see those factors
> as justifying the effort. "External dependency"? You've already got one
> - you depend on the file format specification. So would you rather spend
> the (usually substantial) time understanding the spec and implementing
> the format, or have other folks do it for you?
>

it depends some...

but, anyways, depending on the format is not a dependency, since the
code doesn't care about the format spec. maybe the programmer does when
they implement it, but this doesn't matter for the program.

what is a dependency is whether or not the library exists on the user's
system. if one needs a library, and it is not there already, well then,
the app isn't going to work (hence why one would end up having to bundle
such libraries with the app, ...).

the main issue is also copy/pasting around a bunch of extra source-code,
and dealing with making sure it all builds, some of which may have
annoying legal terms if used this way: worse if it is GPL (though GPL is
generally annoying all around in these regards). some other libraries
have requirements that one mention the library and its authors in the
credits, ...

in the case of JPEG, it was more effort probably to skim through the
spec than write the code to load/save the format (mostly because the
JPEG spec is overly long-winded, and most of its relevant contents could
be probably boiled down to a few pages).

the only real difficult part of PNG is Deflate (yes, also handled by
zlib or similar, if one wants to worry about it).

it might be a little easier to "sell" someone on using all of these
libraries if they were all aggregated into a single library (much like
"libavcodec" in the case of audio/video codecs).

> And "drag around the library"? Who are you kidding? Look at the size of
> libtiff libraries on a typical Linux or Unix system, and then look at
> the supported API: you think the library is bloated? You think the
> effort is justified to understand the TIFF spec well enough to pick out
> just the bits you need, so you can build your own library? Or look at
> the Javadoc API for iText 5.1.3: http://api.itextpdf.com/itext/. You
> think the 1.6 MB size of the core iText JAR is so indefensible that it's
> worth your time to understand the PDF spec well enough to write your own
> library for just the bits you need?
>

TIFF: not sure why someone would want TIFF support in the first place,
so no really comment here. apparently it is mostly for people who want
48 bit color depth or something.

I have not considered PDF loading or saving (not terribly relevant in my
case).

an LWO loader might be nice, given I haven't gotten around to writing
one yet (but, the observant may notice: even if a 3rd party LWO loader
was used, it wouldn't probably load into the mesh-format my engine uses
already, making it essentially pointless). not that it really matters:
if I really cared much about LWO, I probably would have had a loader for
it already.

> It's possible a few times in your career to adopt a new file format so
> early that nobody else has a decent library for it. Or the only decent
> ones are commercial, as another possibility. This is quite rare, though.
>

or, one might develop their own file-formats as well, without being
chained to the cult of "does a library already exist for that?..."

another power of writing ones' own code is that there is control over
what is done and why. with a 3rd party library, one may be stuck with
whatever way *they* chose to do something, impeding ones' own freedom to
do it differently and to try out alternate possibilities.

more so, writing code is fairly cheap.

but, anyways, most of the stuff where people are worrying about writing
code oneself, is typically in regards to trivia.

what is there to really to gain from doing all of the hard parts of app
development, by actually writing the app, but then spending inordinate
time worrying about not re-implementing functionality which exists in
libraries.

probably, if a typical programmer can go read a spec for a file format,
throw something together, and have everything working ok in maybe a few
hours or so, what really is the problem? it could very well end up being
more time and effort working out differences between the library's API
and however the app does things internally.

it may even be the case that using the library would end up with one
writing more code than just doing it oneself more directly...

but, whatever, people can try to micro-optimize their productivity or
whatever if they want (ultimately, so long as one does stuff and gets
stuff done, it is probably good enough regardless of whether or not it
is the "most efficient" regarding programmer-time or whatever...).

doesn't hurt programmers too much, given it gives something to do,
especially if one is being paid by the hour, or by the kloc (arguably,
it is a win-win situation, either way the employer gets code, and the
employee gets money).

then one is all on the job, "keeping it real" and "doing their thing"
and similar.

> AHS
>
> 1. Possible, I suppose, if someone is asking you to do miracles with a
> dinky low-end microcontroller.

mostly it is about writing 3D engines for desktop PCs which work on both
Windows and Linux (though Windows is the much higher priority).

Arved Sandstrom

unread,

Feb 11, 2012, 1:04:27 PM2/11/12

to

On 12-02-11 03:49 AM, BGB wrote:
> On 2/10/2012 7:08 PM, Arved Sandstrom wrote:
>> On 12-02-10 08:10 PM, BGB wrote:
>> [ SNIP ]
>>>
>>> this is missing the point. to write ones' own code is not the same as to
>>> forsake using an existing standardized file format.
>>>
>>> I do use a lot of standardized formats, just I often feel little need to
>>> use others' implementations of those formats.
>>>
>>> for example, I have my own implementations of PNG, JPEG, Deflate, ...
>>> granted, I didn't really "need" to do so, but often to use a library
>>> means either creating an annoying external dependency issue, or needing
>>> to drag around the library, when often one can get by just writing a
>>> much smaller and more narrowly focused piece of code to deal with it.
>>>
>> Apart from a situation where you are genuinely resource-constrained and
>> need to slim down the library in question [1], I don't see those factors
>> as justifying the effort. "External dependency"? You've already got one
>> - you depend on the file format specification. So would you rather spend
>> the (usually substantial) time understanding the spec and implementing
>> the format, or have other folks do it for you?
>
> it depends some...
>
> but, anyways, depending on the format is not a dependency, since the
> code doesn't care about the format spec. maybe the programmer does when
> they implement it, but this doesn't matter for the program.

There are different types of dependencies. Which ones matter more? If
you are contractually bound to implement a given specification, I
guarantee you that except for the most trivial specs that you will spend
more time understanding the requirements than you will coding them up.
I'd call that a real dependency.

But I get that you mean only compile/link implementation dependencies. OK.

> what is a dependency is whether or not the library exists on the user's
> system. if one needs a library, and it is not there already, well then,
> the app isn't going to work (hence why one would end up having to bundle
> such libraries with the app, ...).

Sure. Or at a higher level if it's a managed or interpreted program,
does the user have the runtime or interpreter at all, let alone a
correct version. Your most elegant and compact program might be a Python
or Ruby or Windows Powershell script, but if the target user can't run
it, what's the point?

This is universal though. Like in the examples above, does the target
user have the right interpreters? If running C# or Java, do they have a
sufficiently recent runtime? Are the right versions of framework
libraries present? For C or C++ similar: what libraries exist? Do you
provide them yourself, or link them in? Do you go the GNU build route
and support only configure scripts and building from source? For Java or
C#, if using 3rd party libraries, how do you handle that? For
build/install mechanisms that support downloading of dependencies, like
Perl CPAN or Maven/Ivy or whatever, you have to configure all that.
Maybe you spend quality time configuring up a NSIS installer for
Windows, or a Mac OS X .pkg for use by Installer.

In the big scheme of things you've got enough effort devoted to all this
that I don't myself see how making use of a good 3rd party library
should be questioned...*for the reasons you are thinking of*. I can
certainly think of good reasons why a team would, and should, want to
debate the selection of a _given_ 3rd party library, but not because you
think it'll overly complicate your deployments.

> the main issue is also copy/pasting around a bunch of extra source-code,
> and dealing with making sure it all builds, some of which may have
> annoying legal terms if used this way: worse if it is GPL (though GPL is
> generally annoying all around in these regards). some other libraries
> have requirements that one mention the library and its authors in the
> credits, ...

Copying/pasting? !!! Annoying legal terms? !!! Mentioning folks in
credits? !!!

To borrow from Monday Night Football, "C'mon Man!" OK, granted, legal
requirements attached to candidate 3rd party libraries can be a blocker
(or a difficulty) when dealing with commercial software, but overall
this is quibbling. These are reasons you manufacture when you want to
roll your own code and won't be dissuaded.

> in the case of JPEG, it was more effort probably to skim through the
> spec than write the code to load/save the format (mostly because the
> JPEG spec is overly long-winded, and most of its relevant contents could
> be probably boiled down to a few pages).

Who cares about the quality of the spec? They are what they are. You
have to deal with them as is. Most of the W3C specs are way more turgid
and confusing than the image file format specs. Point being, you had to
read some of the spec - at *some point* - in order to load/save a legal
version of a JPEG file. That's my point: *someone* has to read and
understand as much of a spec as is needed to accomplish Task X, and why
would you want to do that if someone else did it for you?

I just glanced at the JPEG/JFIF and JPEG/EXIF file format specs, and I
gotta tell you, if you think that either of those are overly long-winded
then you haven't read very many specs. And "relevant contents...boiled
down to a few pages"??? Relevant to whom? You? There are other people
who use these file formats, and they may be interested in supporting
most or all of the spec. It sounds like to me that you know that *your*
JPEGs are a consistent small slice of the spec, and you want to write
code that only supports the BGB JPEG subset.

Good luck to the maintainers of your code after you leave.

[ SNIP ]

>> And "drag around the library"? Who are you kidding? Look at the size of
>> libtiff libraries on a typical Linux or Unix system, and then look at
>> the supported API: you think the library is bloated? You think the
>> effort is justified to understand the TIFF spec well enough to pick out
>> just the bits you need, so you can build your own library? Or look at
>> the Javadoc API for iText 5.1.3: http://api.itextpdf.com/itext/. You
>> think the 1.6 MB size of the core iText JAR is so indefensible that it's
>> worth your time to understand the PDF spec well enough to write your own
>> library for just the bits you need?
>
> TIFF: not sure why someone would want TIFF support in the first place,
> so no really comment here. apparently it is mostly for people who want
> 48 bit color depth or something.

Bit parochial, aren't we? TIFF usage is quite huge actually, in scads of
domains...but maybe not in your little niche. And it's got nothing to do
with 48 bit colour depth.

I happen to encounter TIFF a great deal, and it's almost always B/W or
grayscale when *I* do. But certain advantages of TIFF also carry over to
colour.

> I have not considered PDF loading or saving (not terribly relevant in my
> case).

Maybe it's not. JPEG support isn't particularly relevant to me. Point
being, if you did have to support programmatic creation or editing or
reading/display of PDFs, would you roll your own code? In 2012? That
would be insane.

[ SNIP ]

>> It's possible a few times in your career to adopt a new file format so
>> early that nobody else has a decent library for it. Or the only decent
>> ones are commercial, as another possibility. This is quite rare, though.
>
> or, one might develop their own file-formats as well, without being
> chained to the cult of "does a library already exist for that?..."

I have no intrinsic problem with that first bit. I like well-designed
file formats, and I've concocted a few of my own. A custom file format
can be the best thing to do as part of a solution.

I don't quite see how the first statement leads to the second, the one
about cults. I fully agree that if someone picked an unsuitable file
format simply because it had a library to accompany it, that that would
be questionable. But you're advancing a stronger argument, that even
when you've selected a suitable file format that does have a suitable
library, that you'd often prefer to dispense with the library, and write
your own code.

> another power of writing ones' own code is that there is control over
> what is done and why. with a 3rd party library, one may be stuck with
> whatever way *they* chose to do something, impeding ones' own freedom to
> do it differently and to try out alternate possibilities.

Is that a problem with libjpeg, say? You are free to use what source you
like from that codebase, make whatever changes you like, and
redistribute commercially without paying royalties. All you have to do
is assume blame and credit the original authors. Big deal.

Unless you think that their codebase is utter garbage then why not just
modify it?

> more so, writing code is fairly cheap.

It is? How much do you get paid? You work for a company? What's their
overhead for keeping you on the books? Are you involved in services
work? What's the cost to the client of spending unnecessary extra time
in coding? Or do you produce product? What's the cost to the end user of
unnecessary extra coding time?

Do you subject the code to testing? Do you write developer and end user
documentation for it? Does *someone*?

It's new code, a brand new burden for maintenance programmers. Do you
factor in the cost of their time, down the road?

Coding ain't cheap. Not even relatively.

> but, anyways, most of the stuff where people are worrying about writing
> code oneself, is typically in regards to trivia.

??? I don't get that.

> what is there to really to gain from doing all of the hard parts of app
> development, by actually writing the app, but then spending inordinate
> time worrying about not re-implementing functionality which exists in
> libraries.
>
> probably, if a typical programmer can go read a spec for a file format,
> throw something together, and have everything working ok in maybe a few
> hours or so, what really is the problem? it could very well end up being
> more time and effort working out differences between the library's API
> and however the app does things internally.

What kind of file format are we talking about here? It's got to be
pretty twinky if someone can read the spec for it, *and* "throw
something together", *and* having it working "OK", all in a few hours.

What's "OK", anyways? You certainly didn't allow for a whole bunch of
time there to write comprehensive unit tests for the "something".

> it may even be the case that using the library would end up with one
> writing more code than just doing it oneself more directly...

Let's be real, one can always identify trivial cases where that's true,
but it's not a solid argument for the general case.

> but, whatever, people can try to micro-optimize their productivity or
> whatever if they want (ultimately, so long as one does stuff and gets
> stuff done, it is probably good enough regardless of whether or not it
> is the "most efficient" regarding programmer-time or whatever...).

You've mentioned "good enough" a few times in your posting history. It's
a venerable engineering concept, and can be carried over (_has been_
carried over) to software development.

Let's be clear: "good enough" in software development means [1] that the
product has sufficient benefits, has *no* critical problems, the
benefits sufficiently outweigh the problems, and further improvement is
more harmful than helpful.

In other words, "good enough" means that anything you consider doing has
an inadequate return on investment of time and money.

This is a pretty high bar, actually. It doesn't mean what most
developers seem to think it means. And I'm not convinced that your
development philosophy falls in line with "good enough".

> doesn't hurt programmers too much, given it gives something to do,
> especially if one is being paid by the hour, or by the kloc (arguably,
> it is a win-win situation, either way the employer gets code, and the
> employee gets money).

Doesn't help the client/consumer much, does it? We're all professional
developers here, aren't we? Don't they still matter?

> then one is all on the job, "keeping it real" and "doing their thing"
> and similar.
>
>> AHS
>>
>> 1. Possible, I suppose, if someone is asking you to do miracles with a
>> dinky low-end microcontroller.
>
> mostly it is about writing 3D engines for desktop PCs which work on both
> Windows and Linux (though Windows is the much higher priority).

Desktop PCs: d'you think you're hurting for resources on a typical
desktop PC these days?

AHS

1. See http://www.satisfice.com/articles/good_enough_quality.pdf

Arne Vajhøj

unread,

Feb 11, 2012, 2:45:42 PM2/11/12

to

On 2/10/2012 12:25 PM, BGB wrote:
> On 2/9/2012 4:54 PM, Arne Vajhøj wrote:
>> On 2/8/2012 11:10 PM, BGB wrote:
>>> On 2/8/2012 7:14 PM, Arne Vajhøj wrote:
>>>> On 2/8/2012 8:49 PM, BGB wrote:
>>>>> say, if one is using XML for compiler ASTs or similar (say, the XML is
>>>>> used to represent a just-parsed glob of source-code), do they really
>>>>> need any sort of schema?
>>>>
>>>> I would expect syntax trees to follow certain rules and not be free
>>>> form.
>>>>
>>>
>>> well, there are some rules, but the question is more if a schema or the
>>> use of validation would offer much advantage to make using it worth the
>>> bother?...
>>
>> Enforcing correctness of data is usually a good idea.
>>
>
> potentially, but checking against schemas isn't free.
> depending on the application, it could be hard to justify spending the
> extra clock cycles (except maybe for debugging purposes or similar).

One of the points is that you can validate during integration test
and if you encounter a problem but keep validation turned off otherwise.

And besides I would assume the big XML parser libraries to have
optimized the validation quite a bit.

Arne

Arne Vajhøj

unread,

Feb 11, 2012, 2:47:53 PM2/11/12

to

????

But that not how you work with XML.

You load it into a DOM and use XPath to pull
out what you need.

> personally, I like recursive hand-written descent parsing, as it is
> fairly straightforwards and doesn't depend on external tools.

You need to depend on some tools.

If you use Java then you depend on standard Java library.

Standard Java library contains several XML parsers.

No extra dependency for Java.

Arne

Arne Vajhøj

unread,

Feb 11, 2012, 2:49:40 PM2/11/12

to

But given that there is little code reuse between Java and C
in the first place, then C not having builtin XML parser
is no reason not use the builtin one in Java.

And if we talk C then look at LIBXML2 - it works pretty well.

Arne

Arne Vajhøj

unread,

Feb 11, 2012, 2:52:35 PM2/11/12

to

On 2/10/2012 12:43 PM, BGB wrote:
> On 2/10/2012 9:51 AM, Lew wrote:
>> BGB wrote:
>>> if one is using C, then it is either "write some code to do it", or
>>> suffer with a 3rd party [sic] library dependency (one might validly
>>> choose to
>>> write the code themselves in this case).
>>
>> "Suffer"? The XML parsers for C are well-established, very reliable,
>> and no
>> cause for suffering. Using a pejorative is not the same as establishing a
>> point.
>>
>> There is nothing wrong with the third-party libraries, and the choice to
>> roll your own for C is rarely valid. You seem to suffer from NIH
>> syndrome.
>>
>
> they introduce porting hassles:
> does one bundle "libxml" with their app on Windows;
> do they use MSXML and then deal with having to switch over to "libxml"
> when building on Linux?

LIBXML2 works fine on Windows, so you can use it on both platforms.

> but, anyways, it is like asking a person never to write their own JPEG
> loader/saver, or their own scripting-language compiler. yes, maybe a
> person doesn't technically need to, but they may forsake potentially
> valuable learning experiences (or the claim to having the skills to do so).

I think you should very clearly distinguish between when you talk about
learning and programming production code.

The goals are just so different.

Arne

Arne Vajhøj

unread,

Feb 11, 2012, 2:55:38 PM2/11/12

to

Multiple libraries for same language each available on all platforms is
a good thing.

But he was talking about multiple libraries for same language
one for each platform, which is bad.

The good thing is that the particular example mentioned in
a previous posts does not apply as LIBXML2 is available for
both platforms.

Arne

Lew

unread,

Feb 11, 2012, 3:06:17 PM2/11/12

to

Arne Vajhøj wrote:
> BGB wrote:

>> it all still basically amounts to:
>> read a line;
>> split the string;
>> do something with the split strings.
>
> ????
>
> But that not how you work with XML.
>
> You load it into a DOM and use XPath to pull
> out what you need.

Or you scan it with SAX or StAX and deal with XML-parsing events, or
you run the schema through JAXB and let it generate all the parsing
classes for you, or you use one of the many other standard libraries
in Java (or if not Java, in your favorite platform) that are available
for free.

>> personally, I like recursive hand-written descent parsing, as it is
>> fairly straightforwards and doesn't depend on external tools.

The number-theorist can never pass a calculus exam because he's only
just reinvented calculus by the end of the hour.

> You need to depend on some tools.
>
> If you use Java then you depend on standard Java library.
>
> Standard Java library contains several XML parsers.
>
> No extra dependency for Java.

The thing is, BGB, that your macho programming style is impractical
and not very justifiable. It's wonderful that you have reinvented the
programming world and all, and are so clever and knowledgeable, but
most of us programmers work in a workaday pragmatic environment where
best practices really do save the day. That means use the standards,
and the abundant tools that support them, and give up our egos that
make us feel that superman heroics are the only available path. I
sincerely hope that readers of this thread can understand the manifest
shortcomings of the approaches that you've espoused here. (And that
they can disentangle themselves from your fetching but irrelevant
analogies.) To get the job done, to get it done right, and to minimize both error and development time, use the standards.

XML is just fine for just about every purpose to which it's put.
That's why it's popular now. People who cavil about "bandwidth" and
"10 Hz network messages" are tossing us red-herring sashimi. You
aren't going to get 10+ Hz message exchanges over the WAN. For
realistic message rates, XML suits beautifully. I speak from
experience with many, many projects that used every conceivable
message format from binary to CSV to custom to XML to protocol buffers
to JSON, and XML has distinct advantages. Its purported disadvantages
of bulk and bandwidth turn out to be non-issues in practice. Really.
That's real.

So, dear future readers, stick with what's known to be true by people
who actually do this work, not by some armchair theorist in a darkened
room who thinks that he has to do everything by hand and wants the
rest of us to follow his suboptimal strategy.

--
Lew

Lew

unread,

Feb 11, 2012, 3:14:16 PM2/11/12

to

On Saturday, February 11, 2012 11:45:42 AM UTC-8, Arne Vajhøj wrote:
> On 2/10/2012 12:25 PM, BGB wrote:
> > On 2/9/2012 4:54 PM, Arne Vajhøj wrote:
> >> On 2/8/2012 11:10 PM, BGB wrote:
> >>> On 2/8/2012 7:14 PM, Arne Vajhøj wrote:
> >>>> On 2/8/2012 8:49 PM, BGB wrote:
> >>>>> say, if one is using XML for compiler ASTs or similar (say, the XML is
> >>>>> used to represent a just-parsed glob of source-code), do they really
> >>>>> need any sort of schema?
> >>>>
> >>>> I would expect syntax trees to follow certain rules and not be free
> >>>> form.
> >>>>
> >>>
> >>> well, there are some rules, but the question is more if a schema or the
> >>> use of validation would offer much advantage to make using it worth the
> >>> bother?...
> >>
> >> Enforcing correctness of data is usually a good idea.
> >>
> >
> > potentially, but checking against schemas isn't free.

Oh, yeah, micro-optimize that last $0.0000001 of performance.

Great thinking.

Checking against schemas isn't so expensive, either. You spout this drivel,
BGB, about "isn't free", but where are your numbers? Show us reality, dude -
exactly how "not free" is schema validation, under what loads, on what
platforms? Hm?

I thought not.

>> depending on the application, it could be hard to justify spending the
>> extra clock cycles (except maybe for debugging purposes or similar).
>

How many "extra clock cycles", and does it cost less than the damage your
development techniques cause?

> One of the points is that you can validate during integration test
> and if you encounter a problem but keep validation turned off otherwise.
>
> And besides I would assume the big XML parser libraries to have
> optimized the validation quite a bit.

Given that BGB is just spewing dream talk with zero or less than zero facts,
evidence or measurement behind it, it's pretty safe to dismiss his
"conclusions".

or such ...

--
Lew

Arne Vajhøj

unread,

Feb 11, 2012, 3:18:35 PM2/11/12

to

On 2/11/2012 3:06 PM, Lew wrote:
> Arne Vajhøj wrote:
>> BGB wrote:
>>> it all still basically amounts to:
>>> read a line;
>>> split the string;
>>> do something with the split strings.
>>
>> ????
>>
>> But that not how you work with XML.
>>
>> You load it into a DOM and use XPath to pull
>> out what you need.
>
> Or you scan it with SAX or StAX and deal with XML-parsing events, or
> you run the schema through JAXB and let it generate all the parsing
> classes for you, or you use one of the many other standard libraries
> in Java (or if not Java, in your favorite platform) that are available
> for free.

If the size of the XML file allows it then DOM and XPath is
usually the least and most readable code.

>> You need to depend on some tools.
>>
>> If you use Java then you depend on standard Java library.
>>
>> Standard Java library contains several XML parsers.
>>
>> No extra dependency for Java.
>
> The thing is, BGB, that your macho programming style is impractical
> and not very justifiable. It's wonderful that you have reinvented the
> programming world and all, and are so clever and knowledgeable, but
> most of us programmers work in a workaday pragmatic environment where
> best practices really do save the day. That means use the standards,
> and the abundant tools that support them, and give up our egos that
> make us feel that superman heroics are the only available path. I
> sincerely hope that readers of this thread can understand the manifest
> shortcomings of the approaches that you've espoused here. (And that
> they can disentangle themselves from your fetching but irrelevant
> analogies.) To get the job done, to get it done right, and to minimize both error and development time, use the standards.

Bad day??

> XML is just fine for just about every purpose to which it's put.
> That's why it's popular now. People who cavil about "bandwidth" and
> "10 Hz network messages" are tossing us red-herring sashimi. You
> aren't going to get 10+ Hz message exchanges over the WAN. For
> realistic message rates, XML suits beautifully. I speak from
> experience with many, many projects that used every conceivable
> message format from binary to CSV to custom to XML to protocol buffers
> to JSON, and XML has distinct advantages. Its purported disadvantages
> of bulk and bandwidth turn out to be non-issues in practice. Really.
> That's real.

????

The industry standard for smartphone apps and AJAX web apps
are JSON because the bandwidth actually matters.

Arne

Arne Vajhøj

unread,

Feb 11, 2012, 3:20:32 PM2/11/12

to

In science you dismiss hypothesis's based on proving them wrong
not by noting the lack of proof.

Arne

BGB

unread,

Feb 11, 2012, 10:06:37 PM2/11/12

to

On 2/11/2012 12:52 PM, Arne Vajhøj wrote:
> On 2/10/2012 12:43 PM, BGB wrote:
>> On 2/10/2012 9:51 AM, Lew wrote:
>>> BGB wrote:
>>>> if one is using C, then it is either "write some code to do it", or
>>>> suffer with a 3rd party [sic] library dependency (one might validly
>>>> choose to
>>>> write the code themselves in this case).
>>>
>>> "Suffer"? The XML parsers for C are well-established, very reliable,
>>> and no
>>> cause for suffering. Using a pejorative is not the same as
>>> establishing a
>>> point.
>>>
>>> There is nothing wrong with the third-party libraries, and the choice to
>>> roll your own for C is rarely valid. You seem to suffer from NIH
>>> syndrome.
>>>
>>
>> they introduce porting hassles:
>> does one bundle "libxml" with their app on Windows;
>> do they use MSXML and then deal with having to switch over to "libxml"
>> when building on Linux?
>
> LIBXML2 works fine on Windows, so you can use it on both platforms.
>

yeah, it is an option.
however, it is not a standard library on Windows (in certain cases, one
may need to provide for it, or expect anyone who wants to build from
source to provide for it, ...).

>> but, anyways, it is like asking a person never to write their own JPEG
>> loader/saver, or their own scripting-language compiler. yes, maybe a
>> person doesn't technically need to, but they may forsake potentially
>> valuable learning experiences (or the claim to having the skills to do
>> so).
>
> I think you should very clearly distinguish between when you talk about
> learning and programming production code.
>
> The goals are just so different.
>

in my case, both often end up being the same code.

one may end up doing something initially as a learning activity, but if
one does so, and the code works fairly well, why write the same code
again?...

granted, being a programmer working for a corporation or something, vs
being an independent game developer, could also be a factor.

Arne Vajhøj

unread,

Feb 11, 2012, 10:41:42 PM2/11/12

to

C is not standard on Windows either.

You need to get some things.

>>> but, anyways, it is like asking a person never to write their own JPEG
>>> loader/saver, or their own scripting-language compiler. yes, maybe a
>>> person doesn't technically need to, but they may forsake potentially
>>> valuable learning experiences (or the claim to having the skills to do
>>> so).
>>
>> I think you should very clearly distinguish between when you talk about
>> learning and programming production code.
>>
>> The goals are just so different.
>>
>
> in my case, both often end up being the same code.
>
> one may end up doing something initially as a learning activity, but if
> one does so, and the code works fairly well, why write the same code
> again?...

Because what you learn the most from and what is most cost efficient for
the company may very well be two different things.

Arne

Lew

unread,

Feb 11, 2012, 11:18:45 PM2/11/12

to

BGB wrote:
> Arne Vajhøj wrote:

> > LIBXML2 works fine on Windows, so you can use it on both platforms.
> >
>
> yeah, it is an option.
> however, it is not a standard library on Windows (in certain cases, one

First you spend half a newsgroup thread decrying standards and
proudly boasting how you flout them, now you suddenly denigrate a
library for not being standard?

And yes, libxml2 is, too, standard, as such things go.

> may need to provide for it, or expect anyone who wants to build from
> source to provide for it, ...).

"provide for it"?

> >> but, anyways, it is like asking a person never to write their own JPEG
> >> loader/saver, or their own scripting-language compiler. yes, maybe a
> >> person doesn't technically need to, but they may forsake potentially
> >> valuable learning experiences (or the claim to having the skills to do
> >> so).
> >
> > I think you should very clearly distinguish between when you talk about
> > learning and programming production code.
> >
> > The goals are just so different.
> >
>
> in my case, both often end up being the same code.

So you don't write code for someone else?

> one may end up doing something initially as a learning activity, but if
> one does so, and the code works fairly well, why write the same code
> again?...

Asked and answered upthread. Don't cycle.

> granted, being a programmer working for a corporation or something, vs
> being an independent game developer, could also be a factor.

If no one but you ever looks at your code or has to maintain it,
you can be as idiosyncratic and antisocial as you like. I know
people who've lived alone so long they cannot maintain a civil
discourse in public. Their habits don't bother them when they're
home alone, but that doesn't make them optimal.

Most of your arguments sound like apologetics for undisciplined,
egocentric programming with little connection to facts or the real
world of the workaday programmer, or the costs thereof. Heck, you
haven't even answered the question as to what the costs of software
development are.

--
Lew

BGB

unread,

Feb 12, 2012, 12:20:32 AM2/12/12

to

yeah...

and anyways, I am not about "making conclusions" or "decreeing how
things should be done" or anything, rather, my view is there may be a
time and place for everything (and whatever is or is not the case can be
decided on a case-by-case basis or similar, based on whatever may apply
in the particular case in question, and whichever options may be cheaper
or more expensive, and similar).

IMHO, the idea that a person "should" always do things the same way in
every situation is itself arguably questionable. likewise goes for a
beliefs that something is universally required or universally
prohibited, ...

[ decided to leave out most of the rest of what I wrote. ]

basically, it all amounted to the frustration that there is little point
in trying to "prove" something which ultimately results to little more
than "hair splitting over a few percentage points...".

the thing is... textual XML is kind of bulky, but doing damn near
anything to it (like running it through deflate) will significantly
reduce its size (say, to around 10-25% its original size). one can
outperform this with specialized formats, but at this point it is
worrying about a few percentage points +/-.

what is the point of "proving" something which is ultimately of a fairly
limited significance and scope?...

maybe one can try to "prove" that people "should" actually give a crap.

or, for that matter, finding a particular claim to disprove (say, that X
is always true or always false). this is rarely the case with data
compression, as it is typically more about averages, and likewise, there
are cases for which the data may actually get bigger (about the only
real "absolute" in data compression is something commonly known as the
"Shannon limit").

secondarily is the "law of diminishing returns" (itself a natural result
of the Shannon limit), where essentially the compressibility of a piece
of data will form a sort of curve, and any (lossless) algorithms will
fall somewhere along this curve, and typically with a fairly consistent
ordering (say, for example, LZMA tends to compress better than BZip2
which tends to compress better than Deflate/GZip).

one can look at how each algorithm works internally, or experiment with
how they can use the basic parts to build other things or achieve
interesting results (and note mostly that the parts themselves tend to
fall along these sorts of curves, reducing "compression" mostly to a
matter of "going mix and match" with various parts and making
cost/benefit tradeoffs between particular combinations of parts).

note that going further along the curve tends to become increasingly
costly, hence why tradeoffs need to be made.

but, ultimately, how much something is relevant will itself tend to
depend somewhat on context.

BGB

unread,

Feb 12, 2012, 1:03:13 AM2/12/12

to

yeah.

and, in fact 10Hz asynchronous throughput is possible... (this is
typically how most online gaming works).

one can't reasonably get 10Hz synchronous (IOW: request/response), due
to ping times, but this is a different matter.

I am not claiming one can do request/response at 10Hz, as this probably
would be impossible over a WAN (at reasonable ping times).

online gaming generally isn't based on request/response though, so the
problems of not being able to get responses at this rate don't really
matter too much (the user may still notice the results of bad ping
times, namely stuff being out of place or seemingly teleporting around
or similar, but this is a different issue).

TCP has a throughput limit due to ping, where assuming unlimited
bandwidth but a 200ms ping, one will still be limited to somewhere
around 320kB/s or so (assuming a 64kB window).

however, as is, there are typically bandwidth constraints as well (but,
they are a little more subject to fudging).

the main goal is mostly to send everything at a bandwidth low enough
that the connection doesn't risk getting backlogged (depends on internet
connection, but 16-32 kB/s seems reasonably safe, as internet radio
often tends to operate in this range, say, 192kbps is 24kB/s).

stalls are also a potential risk as well.

or such...

BGB

unread,

Feb 12, 2012, 2:46:07 AM2/12/12

to

probably, but it is a question of how many things have to be worried
about as a part of getting it built (for someone wanting to rebuild from
source). if a program depends on a big pile of 3rd party libraries, it
may be harder to get rebuilt than if it doesn't.

it is arguably bad enough requiring that a particular C compiler be
installed (such as MSVC / Windows SDK), and that the program has to be
built in a certain way.

expecting the person to go download a bunch of libraries, get them
built, and put them all in the library and include paths, well, this is
adding a bit more to the cost.

this particular cost is a bit lower on Linux though, since 3rd party
libraries are more commonly available and are handled more gracefully
(nearly everything gets installed to "/usr/lib" and "/usr/include" and
similar).

alternatively, one could be like:
this app needs to be built in Cygwin.

the tradeoff though is that Cygwin has its own annoyances (needing to
have their DLL with compiled binaries, and the tendency for it to always
have a console window pop up for the app if it wasn't launched from a
console).

MinGW is a little nicer than Cygwin regarding the above, however it
doesn't come by default with a large pile of 3rd party libraries (so, it
has the same basic issue here as MSVC).

it is not that I haven't used any 3rd party libraries though, as a few
have been used, but essentially copied into the project. a few past
libraries were used, but later dropped since I had re-implemented their
functionality in smaller forms.

this avoids needing them as external dependencies, since then they are
built along with the application (then they are internal dependencies).

yeah, even within the same program, the matter of "what is allowed to
use and depend on what" can become its own issue (if one doesn't pay
attention to these internal dependencies, they may come around and bite).

>>>> but, anyways, it is like asking a person never to write their own JPEG
>>>> loader/saver, or their own scripting-language compiler. yes, maybe a
>>>> person doesn't technically need to, but they may forsake potentially
>>>> valuable learning experiences (or the claim to having the skills to do
>>>> so).
>>>
>>> I think you should very clearly distinguish between when you talk about
>>> learning and programming production code.
>>>
>>> The goals are just so different.
>>>
>>
>> in my case, both often end up being the same code.
>>
>> one may end up doing something initially as a learning activity, but if
>> one does so, and the code works fairly well, why write the same code
>> again?...
>
> Because what you learn the most from and what is most cost efficient for
> the company may very well be two different things.
>

well, it is possible.

often it ends up with a cycle where something is implemented once (or
maybe a few times), and very often if something similar is needed later,
code is reused via "copy/paste/edit" magic.

in my case though, admittedly I am not actually employed as a
programmer, but am more of a college student + independent game
developer (mostly working on a 3D FPS style game). like, one has to
"face the impossible" and so on (and, with luck, getting something on
the market and getting enough money to live on, and trying to make newer
and better stuff, ...).

admittedly, a person who was notable influence for me was John Carmack,
who was the lead programmer for id Software, and was well known for the
Doom and Quake series games. I learned a fair amount from his code
(since he tends to release it all under the GPL).

others influences include Linus Torvalds (who created Linux) and Notch
(Marcus Persson, most well known for creating Minecraft). the Linux
kernel was a notable influence on me regarding things like program
architecture and similar.

in my case this also means dealing with with the all art, sound, music,
... as well, so one tries to gain skills both as a programmer and as an
artist. I consider myself to be much more of a programmer than an artist
though (I started out programming, and the art can't do much without the
code).

there is still a lot of areas for learning and experimentation though.

or such...

BGB

unread,

Feb 12, 2012, 3:36:34 AM2/12/12

to

On 2/11/2012 9:18 PM, Lew wrote:
> BGB wrote:
>> Arne Vajhøj wrote:
>>> LIBXML2 works fine on Windows, so you can use it on both platforms.
>>>
>>
>> yeah, it is an option.
>> however, it is not a standard library on Windows (in certain cases, one
>
> First you spend half a newsgroup thread decrying standards and
> proudly boasting how you flout them, now you suddenly denigrate a
> library for not being standard?
>
> And yes, libxml2 is, too, standard, as such things go.
>

standard, as-in, comes bundled with the OS and compiler.

libxml2 comes on Linux, but not on Windows with the "Windows SDK" or
similar (AFAIK it does come with Cygwin though).

safely, one can use, on Windows:
the ANSI C runtime (more or less C89/C90);
any Win32 API provided stuff (Winsock, GDI, OpenGL, ...);
...

on Linux, one has:
the ANSI C runtime (C99);
POSIX;
X11 and OpenGL;
a big pile of FOSS libraries.

so, what is common:
the C runtime;
OpenGL.

and, what needs to be provided by OS specific shims:
BSD Sockets / Winsock;
GDI / GLX;
getting user-input (keyboard / mouse);
stuff for low-level memory management, threads, managing dynamic libraries;
...

most other things tend to be kept either internal or are optional.
I don't think architecturally it is all that unusual (take, for example,
Doom 3 and Mozilla, both of which appear to have a similar architecture).

>> may need to provide for it, or expect anyone who wants to build from
>> source to provide for it, ...).
>
> "provide for it"?
>

have to make sure it is built, installed, and on the OS's library and
include paths (like, mandating a particular build configuration).

otherwise, the library needs to be copied into the application's build
tree, and built along with the application.

>>>> but, anyways, it is like asking a person never to write their own JPEG
>>>> loader/saver, or their own scripting-language compiler. yes, maybe a
>>>> person doesn't technically need to, but they may forsake potentially
>>>> valuable learning experiences (or the claim to having the skills to do
>>>> so).
>>>
>>> I think you should very clearly distinguish between when you talk about
>>> learning and programming production code.
>>>
>>> The goals are just so different.
>>>
>>
>> in my case, both often end up being the same code.
>
> So you don't write code for someone else?
>

like a contractor or something?... nope.

I write software for myself, and hopefully to try to get money more
directly (selling directly to end users).

>> one may end up doing something initially as a learning activity, but if
>> one does so, and the code works fairly well, why write the same code
>> again?...
>
> Asked and answered upthread. Don't cycle.
>

with a thread this long, how is one supposed to remember all of what
they have said already?

>> granted, being a programmer working for a corporation or something, vs
>> being an independent game developer, could also be a factor.
>
> If no one but you ever looks at your code or has to maintain it,
> you can be as idiosyncratic and antisocial as you like. I know
> people who've lived alone so long they cannot maintain a civil
> discourse in public. Their habits don't bother them when they're
> home alone, but that doesn't make them optimal.
>
> Most of your arguments sound like apologetics for undisciplined,
> egocentric programming with little connection to facts or the real
> world of the workaday programmer, or the costs thereof. Heck, you
> haven't even answered the question as to what the costs of software
> development are.
>

costs of software development?...
besides the obvious stuff, like writing the program and making it work?...

I don't personally believe my strategy to be making it all that much
harder (or expensive) for myself.

but, anyways, discipline itself is a matter of cost/benefit tradeoffs,
like "what do I get out of it?" and similar. the general idea is that
people act in their self-interest to try to maximize their own gain. I
don't think my behaviors are inconsistent with this ideal (except being
lazy at times, where time spent not doing stuff is time wasted).

for example, there is the larger society, but each person is themselves,
and the people and things around them can be valued in terms of how they
may be of benefit to oneself and similar, but, with the side effect that
often whatever is most benefit to oneself may potentially be of benefit
to others as well (note: I don't currently personally believe altruism
actually exists, but rather it is likely a sort of misconception
regarding what really is in ones' best-interests).

similarly, the relative value of an action could be evaluated primarily
in terms of its most likely costs and benefits (to oneself, although not
necessarily immediate, for example a more immediate gain may cost more
later on, ...).

discipline can itself cost in terms of time or effort, whereas hacking
together something in the name of expedience can get it into working
order quicker.

or such...

Roedy Green

unread,

Feb 12, 2012, 7:30:04 AM2/12/12

to

On Sat, 04 Feb 2012 22:33:23 +0100, Jan Burse <janb...@fastmail.fm>
wrote, quoted or indirectly quoted someone who said :

>Obtain the lock:
> raf = new RandomAccessFile(file, "rw");
>
> fo = new FileOutputStream(raf.getFD());
> fo.getChannel().lock(0, Long.MAX_VALUE, false);
>
>Release the lock:
> fo.close();

That is quite a bit of overhead. I would work fine for low traffic. I
was hoping for something that would let you leave the file open in
both jobs. e.g. a reserved block or blocks in the file that the
intercommunicators read and wrote to lock. I suppose you could send
messages to a task that owned the file (a primitive database), but
then you might as well send messages directly to each other.

--
Roedy Green Canadian Mind Products
http://mindprod.com
One of the most useful comments you can put in a program is
"If you change this, remember to change ?XXX? too".

Arne Vajhøj

unread,

Feb 12, 2012, 9:23:17 AM2/12/12

to

...

> but, ultimately, how much something is relevant will itself tend to
> depend somewhat on context.

The fact that there is exceptions to most rules should not lead to
a perception that rules does not matter.

You should strive to go by the rules and only very reluctant go
for the exception if it is really needed.

Arne

Arne Vajhøj

unread,

Feb 12, 2012, 9:27:13 AM2/12/12

to

On 2/12/2012 1:03 AM, BGB wrote:
> On 2/11/2012 1:18 PM, Arne Vajhøj wrote:
>> On 2/11/2012 3:06 PM, Lew wrote:
>>> XML is just fine for just about every purpose to which it's put.
>>> That's why it's popular now. People who cavil about "bandwidth" and
>>> "10 Hz network messages" are tossing us red-herring sashimi. You
>>> aren't going to get 10+ Hz message exchanges over the WAN. For
>>> realistic message rates, XML suits beautifully. I speak from
>>> experience with many, many projects that used every conceivable
>>> message format from binary to CSV to custom to XML to protocol buffers
>>> to JSON, and XML has distinct advantages. Its purported disadvantages
>>> of bulk and bandwidth turn out to be non-issues in practice. Really.
>>> That's real.
>>
>> ????
>>
>> The industry standard for smartphone apps and AJAX web apps
>> are JSON because the bandwidth actually matters.
>
> yeah.

But please note that you do not invent your own JSON parser
either - you use something already done. In Java there are json.org,
gson etc..

> and, in fact 10Hz asynchronous throughput is possible... (this is
> typically how most online gaming works).
>
> one can't reasonably get 10Hz synchronous (IOW: request/response), due
> to ping times, but this is a different matter.
>
> I am not claiming one can do request/response at 10Hz, as this probably
> would be impossible over a WAN (at reasonable ping times).
>
> online gaming generally isn't based on request/response though, so the
> problems of not being able to get responses at this rate don't really
> matter too much (the user may still notice the results of bad ping
> times, namely stuff being out of place or seemingly teleporting around
> or similar, but this is a different issue).
>
> TCP has a throughput limit due to ping, where assuming unlimited
> bandwidth but a 200ms ping, one will still be limited to somewhere
> around 320kB/s or so (assuming a 64kB window).
>
> however, as is, there are typically bandwidth constraints as well (but,
> they are a little more subject to fudging).
>
> the main goal is mostly to send everything at a bandwidth low enough
> that the connection doesn't risk getting backlogged (depends on internet
> connection, but 16-32 kB/s seems reasonably safe, as internet radio
> often tends to operate in this range, say, 192kbps is 24kB/s).
>
> stalls are also a potential risk as well.

I can not follow you way of thinking. With multiple interactions
in parallel there are no strict correlation between latency and
throughput.

Arne

Arne Vajhøj

unread,

Feb 12, 2012, 9:29:21 AM2/12/12

to

You could (and probably should) checkin the libs used with
your source code!

No problems getting anything.

Arne

Arne Vajhøj

unread,

Feb 12, 2012, 9:31:02 AM2/12/12

to

On 2/12/2012 2:46 AM, BGB wrote:

That is all fine.

But many of your conclusions does not fit with a more traditional
developer job.

Arne

Martin Gregorie

unread,

Feb 12, 2012, 11:02:33 AM2/12/12

to

On Sat, 11 Feb 2012 22:41:42 -0500, Arne Vajhøj wrote:

>
> C is not standard on Windows either.
>
> You need to get some things.
>

Yes, so if you're intending to write code that ports easily between
Windows and *nix/POSIX (and in my case, OS-9), you end up writing a
compatibility library for each target OS. This is mainly a collection of
functions that are standard in one of the other target OSen and absent on
its target system. A good example is the command line parser getopt(),
which is absent from OS_9 and (IIRC) Windows libraries.

--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |

Arne Vajhøj

unread,

Feb 12, 2012, 11:16:03 AM2/12/12

to

Doesn't getopt exist in some GNU lib that you can get for all
platforms?

Arne

BGB

unread,

Feb 12, 2012, 1:33:51 PM2/12/12

to

yes, this is what is often done.

one ends up with essentially a pile of code intended to wrap various
APIs for various OS's, such that internally the app can use a more
consistent API.

there is also SDL, which on one hand wraps a lot of this stuff, but OTOH
is a 3rd party library which carries the usual issues. a way to make
this work though is to treat SDL as if it were a pseudo-OS (its wrappers
are wrapped in much the same way).

lots of other apps I have looked at seem to contain similar wrapper layers.

some people go further, and try to write wrappers to hide the
differences between Direct3D and OpenGL, but I don't personally go that
far (I just use OpenGL and regard it as "good enough").

however, due to secondary reasons (mostly making things more consistent,
like having a more consistent APIs for dealing with things like
shader/material effects, lighting, ...), a lot of OpenGL has ended up
being wrapped (in an an admittedly often ad-hoc manner).

I have noted that Doom3 tends to wrap OpenGL far more significantly
(though, it would probably be going off on a bit of a tangent to
describe Doom3's renderer here).

BGB

unread,

Feb 12, 2012, 2:13:05 PM2/12/12

to

possible.

others may go for an "all is allowed in programming, so long as it works
ok and gets the job done" mindset. whether or not rules are followed may
in turn depend on an evaluation of whether or not the rules work in
ones' favor.

so, on one hand: well, I can follow this rule, and get certain desirable
effects.

or, it may also work out as: this rule is stupid and inconvenient, I am
not going to bother following it.

or maybe: the existing rule is stupid/inconvenient/..., so I am going to
make up my own rules and follow them instead.

this does not necessarily mean making a standard of non-standard, as
some piece of standardized technology (formally, or de-facto, it really
doesn't matter) may itself carry desirable benefits.

as well noted, PNGs and JPEGs are an example of this:
they allow compatibility with existing applications which use these
formats, etc, ...

so, although one could devise their own graphics format (I have done so
before), using it may turn out to be so incredibly inconvenient for
everyone involved that using it is ultimately not worth the bother.

likewise, in the everyday world, breaking laws may lead in turn to the
police breaking down ones' door, and breaking moral and ethical rules
may lead to various other consequences (do bad things and bad things may
follow in turn).

so, all this doesn't give a person to do "whatever they want, whenever
they want", because the rules of cost/benefit will prevent this (too
many costs in these cases, defeating the benefits).

likewise, making a standard of non-standard, though not inherently bad,
would likely end up being overly costly (in terms of use or maintenance
or whatever else).

but, I am not going to try to list all of the costs and benefits one
might encounter or how one may weight them, as there are too many and
how much each may apply in a given situation is itself prone to vary.

Joshua Cranmer

unread,

Feb 12, 2012, 2:52:40 PM2/12/12

to

On 2/12/2012 2:36 AM, BGB wrote:
> safely, one can use, on Windows:
> the ANSI C runtime (more or less C89/C90);
> any Win32 API provided stuff (Winsock, GDI, OpenGL, ...);
> ...

I call BS on this, having worked on a major open-source project that
works on the major platforms of Windows, Mac OS X, and Linux (and also
Android, and I think Solaris and *BSD are still reasonably
well-supported, although the Haiku and OS/2 ports are now thoroughly
dead). What libraries does this? A small list:
* libpng
* libjpg
* libogg + related
* cairo
* thebes
* sqlite
* freetype
* libbz2
* libjar
* zlib

And all of these are still used on Windows; there are even more that are
used only on Linux or Mac OS X.

Which application is this? Mozilla.

It's not that hard to use other libraries on Windows.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

BGB

unread,

Feb 12, 2012, 3:33:03 PM2/12/12

to

On 2/12/2012 7:27 AM, Arne Vajhøj wrote:
> On 2/12/2012 1:03 AM, BGB wrote:
>> On 2/11/2012 1:18 PM, Arne Vajhøj wrote:
>>> On 2/11/2012 3:06 PM, Lew wrote:
>>>> XML is just fine for just about every purpose to which it's put.
>>>> That's why it's popular now. People who cavil about "bandwidth" and
>>>> "10 Hz network messages" are tossing us red-herring sashimi. You
>>>> aren't going to get 10+ Hz message exchanges over the WAN. For
>>>> realistic message rates, XML suits beautifully. I speak from
>>>> experience with many, many projects that used every conceivable
>>>> message format from binary to CSV to custom to XML to protocol buffers
>>>> to JSON, and XML has distinct advantages. Its purported disadvantages
>>>> of bulk and bandwidth turn out to be non-issues in practice. Really.
>>>> That's real.
>>>
>>> ????
>>>
>>> The industry standard for smartphone apps and AJAX web apps
>>> are JSON because the bandwidth actually matters.
>>
>> yeah.
>
> But please note that you do not invent your own JSON parser
> either - you use something already done. In Java there are json.org,
> gson etc..
>

in Java, yes, one may use the libraries.

on the C end, one may choose to throw one together, or use a JavaScript
VM if one is available, ...

it all depends.

>> and, in fact 10Hz asynchronous throughput is possible... (this is
>> typically how most online gaming works).
>>
>> one can't reasonably get 10Hz synchronous (IOW: request/response), due
>> to ping times, but this is a different matter.
>>
>> I am not claiming one can do request/response at 10Hz, as this probably
>> would be impossible over a WAN (at reasonable ping times).
>>
>> online gaming generally isn't based on request/response though, so the
>> problems of not being able to get responses at this rate don't really
>> matter too much (the user may still notice the results of bad ping
>> times, namely stuff being out of place or seemingly teleporting around
>> or similar, but this is a different issue).
>>
>> TCP has a throughput limit due to ping, where assuming unlimited
>> bandwidth but a 200ms ping, one will still be limited to somewhere
>> around 320kB/s or so (assuming a 64kB window).
>>
>> however, as is, there are typically bandwidth constraints as well (but,
>> they are a little more subject to fudging).
>>
>> the main goal is mostly to send everything at a bandwidth low enough
>> that the connection doesn't risk getting backlogged (depends on internet
>> connection, but 16-32 kB/s seems reasonably safe, as internet radio
>> often tends to operate in this range, say, 192kbps is 24kB/s).
>>
>> stalls are also a potential risk as well.
>
> I can not follow you way of thinking. With multiple interactions
> in parallel there are no strict correlation between latency and
> throughput.
>

there is a rough correlation though.

for the part about TCP, this was related to how TCP worked (in its
traditional form), namely the existence of a 64kB maximum window size.

apparently, this is out of date, as there is a feature known as
RFC-1323, which is enabled by default on Windows Vista and newer, which
allows a larger TCP window.

http://tools.ietf.org/html/rfc1323

for the part about moderating kB/s, this has a lot more to do with a
users' internet connection.

say, hypothetically, a user has dial-up.

now, what if the data being sent does not fit over dial-up (one is
trying to send 10kB/s, but a 56k modem can only handle ~6.5kB/s or so)?
well, then, the connection will backlog (the connection will send at the
rate it can send, and anything else will have to wait).

similar limits may exist over the internet, but in a less direct form:
consider, the internet is prone to occasionally drop a packet here or there.

so, stream is going over the internet, and a (single) packet drops, what
happens:
well, all the data up to the dropped packet reaches the other end, the
other end may send a packet back indicating the point recieved;
the sender will start resending data from that point;
the reciever will start transmitting again.

this results in essentially a ping-time delay in which no data can be sent.

if the sender is sending messages at a fixed rate, what happens?
well then, the messages will pile up, waiting to be sent;
after transmission resumes, several updates worth of data need to be sent;
if all of the updates fit within the bandwidth of the connection
(end-to-end), then there is may be no obvious stall (updates can all be
sent at full speed);
if the enough data back-logs so as to exceed the bandwidth available,
then it has to wait to be sent, and if the sender just keeps naively
sending updates, then essentially one gets a stall (and the data being
received by the receiver will start becoming progressively more
out-of-date).

these properties can be observed with things like internet radio and
video streaming (if the connection is fast enough, playback happens in
real-time without obvious stalls or re-buffering, even though the rate
at which the data comes over the internet is often very irregular).

similar also applies to internet telephony as well.

if one tries to operate within a fixed-bandwidth window, similar to
internet radio, most minor stalls can be glossed over (this limit being
a bit lower than the end-to-end transfer rate of the connection). going
lower is better, since the lower one goes, the more room there for error
there is.

the main issue is, namely, that the data being sent has to be able to
fit within these bandwidth limits (hence, why data compression is highly
desirable in this case).

an online game basically amounts to a bidirectional stream between the
client and server, with the server sending out a stream of updates
(typically, everything going on in the immediate view of the client),
and the client sends a stream of their attempted actions (in response to
what they see on screen).

if everything is working well, then the delays and irregularities of
their internet connection is mostly hidden, and to them it all seems
like they are interacting with the world in real-time (usually there is
a lot of trickery here as well, mostly based around linear extrapolation
and so on).

side note: each end may transmit time-stamps as part of their updates,
and the other end may transmit the last-received timestamps, partly so
that the timing delays can be estimated and partially compensated for.

another (similar concept) for players playing games is the concept of
"leading", where a person will take aim at a moving enemy, estimate the
speed of the projectile and where the enemy will be at the time, and aim
and fire at that location instead (then the enemy will essentially "run
into" the traveling projectile). note that if a player always aims at
where the enemy is "right now", very often they will miss (as by the
time the projectile reaches the destination, the enemy has already moved
out of the way).

so, the game does similar in an attempt to hide the "travel time" that
is the internet.

or such...

Arne Vajhøj

unread,

Feb 12, 2012, 3:50:08 PM2/12/12

to

There are plenty of libs for C as well.

30 seconds of googling found:

http://www.digip.org/jansson/
http://live.gnome.org/JsonGlib
http://sourceforge.net/projects/mjson/
http://oss.metaparadigm.com/json-c/

Arne

BGB

unread,

Feb 12, 2012, 4:34:10 PM2/12/12

to

fair enough.

I might look into it, although personally I don't use JSON for this at
the moment, and if it were needed in my-case, as-is my script VM can
parse JSON (given the language used is a superset of JavaScript
anyways), although this is potentially a less efficient strategy than a
dedicated parser.

say, in my case, it would be a tradeoff between either: using someone
else's library, passing the JSON through "eval", or spit out some logic
to make the VM parse the JSON directly (probably just copy/paste/edit
some of the existing parser code). then one could wonder secondarily:
what form would the JSON be parsed into? "there is a library for that"
is not always necessarily the least-effort option.

either way, one still might want to save more bytes, say, by running it
through deflate or similar. if one wants libraries, Java has it built
in, and in C-land there is zlib.

in my case, I also have a deflate codec which is stored as a single big
source file, mostly as this makes it a little more convenient (in
several ways) than using zlib. JPEG was similar, as originally I used
libjpeg, but reimplemented JPEG as a "single big source file" to be a
generally more convenient option (copy/paste the source file and go).

I don't claim a person might "always" want to do this, but as I see it,
it is still a potentially valid option. one could maybe go further, and
put Deflate, JPEG, PNG, and several other formats, all in a single big
file, at the drawback of the file becoming overly large.

similarly, the above probably would be fairly pointless in Java, both
because this stuff exists in the standard library, and also because this
would also result in a single giant class as well.

the cheapest option for one person may well turn out to be a more
expensive option for another person, say if they one of the options
amounts to "implement the functionality from the ground up" rather than
"copy-paste a few bits from over there and hack something together", or
even maybe just "add a few lines in a function over here and add a new
function or method over there which redirects the call to the first
function".

all of this stuff can be fairly relative, and there are rarely "cut and
dry" answers to problems.

as for the delay issue, found this article on part of the topic:
http://en.wikipedia.org/wiki/Lag_%28online_gaming%29

also maybe relevant:
http://en.wikipedia.org/wiki/Internet_streaming

BGB

unread,

Feb 12, 2012, 4:43:54 PM2/12/12

to

yes, but it is also worth noting that Mozilla does the whole "Mozilla
build" thingy on Windows, and are essentially they are bundling many of
the needed libraries and tools with the application as a part of the
build system.

as noted:
it is not saying that one *can't* use 3rd party libraries, but one may
need to make special provisions for them.

Mozilla does this.

not everyone may want to do this, as it is a reasonably heavy-weight
solution to the problem (but still better than "hey random person, go
download and build all of these libraries yourself", which is what some
applications have gone and done).

Martin Gregorie

unread,

Feb 12, 2012, 5:46:45 PM2/12/12

to

On Sun, 12 Feb 2012 11:16:03 -0500, Arne Vajhøj wrote:

>
> Doesn't getopt exist in some GNU lib that you can get for all platforms?
>

Pass. I know it was missing from Borland C on DOS and Windows and wasn't
published for OS-9. I ended up extending a PD version that was originally
published in the '68 MicroJournal for OS-9 and porting it to the other
two platforms.

I had a good (to me) reason for doing that. At the time I was more
familiar with OS-9 than Windows or Unix (Linux didn't exist at the time)
and I had got used to the OS-9 command line parser's ability to handle a
mix of options and arguments in any order rather than the straitjacket of
Unix's rigid options before arguments rule. My extension basically just
added the -x=value notation (also used by OS-9) to the standard Unix -
xvalue and -x value notation. I've since rewritten it for Java, adding
long option names (--xxxx and --xxxx=val) - something I've not gotten
round to adding to the C version.