If you want the biggest, boldest approach and don't care about
overhead, use CORBA.
If you want a simpler approach but with performance and implementation
difficulties, use SOAP or XML-RPC.
If you want the ultimate in simplicity and are willing to foresake
multi-language support, use Dopy, Pyro, or Twisted Spread.
Comments? Specifically, what are the relative advantages and
disadvantages of the last three mentioned products?
-- robin
What overhead would this be? From what I see of omniORB,
there isn't really that much. Also, CORBA is the most complete,
eg, it allows callbacks and passing around object references
the others don't have.
I've wanted to do something in CORBA for years. I've
never gotten there. One problem is that I'm used to Python,
where I don't need to describe the interface beforhand.
CORBA wants that IDL, and a change in the object's interface
must be reflected in the IDL. That just seems tedious to
me now.
I also do nearly everything in Python, so don't need the
ability for different langauges to interoperate. I just pass
around Python objects.
I keep hoping that one of the component systems (like
in GNOME or KDE) takes off, so that "scripting" a la
COM takes off in unix systems, but I've been hoping
for that for the last 5 years.
> If you want a simpler approach but with performance and
> implementation difficulties, use SOAP or XML-RPC.
I wouldn't put SOAP as simple, and I've had problems with
interoperability between various packages. If I needed to pass
simple data around, I would use XML-RPC.
> If you want the ultimate in simplicity and are willing to foresake
> multi-language support, use Dopy, Pyro, or Twisted Spread.
One reason I'm looking at Twisted is because it handles other
interfaces as well. I need to talk to SQL databases, SOAP and
XML-RPC servers, straight HTTP, and spawned off external
processes.
There is also older interfaces, like PVM and MPI from the
high-performance computing world, and Linda, and more,
but I haven't looked into them for years.
Andrew
da...@dalkescientific.com
I write this as a SOAP specialist and advocate.
.
.
.
--
Cameron Laird <Cam...@Lairds.com>
Business: http://www.Phaseit.net
Personal: http://phaseit.net/claird/home.html
I'm not sure about XML-RPC, SOAP, or Dopy, but I know Twisted Spread can
pass references around like this (and you can tell it just -how- you want
them passed), and I think Pyro can too.
> I've wanted to do something in CORBA for years. I've
> never gotten there. One problem is that I'm used to Python,
> where I don't need to describe the interface beforhand.
> CORBA wants that IDL, and a change in the object's interface
> must be reflected in the IDL. That just seems tedious to
> me now.
>
Right. I think this is at least part of the overhead Robin was talking
about. Several of the other schemes don't require this.
> I also do nearly everything in Python, so don't need the
> ability for different langauges to interoperate. I just pass
> around Python objects.
>
> I keep hoping that one of the component systems (like
> in GNOME or KDE) takes off, so that "scripting" a la
> COM takes off in unix systems, but I've been hoping
> for that for the last 5 years.
That'd be nice.
>
> > If you want a simpler approach but with performance and
> > implementation difficulties, use SOAP or XML-RPC.
>
> I wouldn't put SOAP as simple, and I've had problems with
> interoperability between various packages. If I needed to pass
> simple data around, I would use XML-RPC.
Personally, I think SOAP is worth ignoring. ;) XML-RPC is what I would
choose if I were going for that sort of solution, too.
>
> > If you want the ultimate in simplicity and are willing to foresake
> > multi-language support, use Dopy, Pyro, or Twisted Spread.
>
> One reason I'm looking at Twisted is because it handles other
> interfaces as well. I need to talk to SQL databases, SOAP and
> XML-RPC servers, straight HTTP, and spawned off external
> processes.
This is definitely a benefit. (Hooray, integration).
From what you've said here and in your original post, I think Spread will
probably be a pretty good fit for you.
HTH,
Jp
--
"The problem is, of course, that not only is economics bankrupt but it has
always been nothing more than politics in disguise ... economics is a form
of brain damage." -- Hazel Henderson
--
up 13 days, 14:00, 7 users, load average: 0.15, 0.14, 0.07
> If you want the biggest, boldest approach and don't care about
> overhead, use CORBA.
I can't agree with that analysis: CORBA has, of all your alternatives,
the least network bandwidth requirements.
Regards,
Martin
You might want to consider looking at OSE as well. It isn't pure Python
but then that can be a good thing depending on what you are doing.
The web site for OSE is:
Have a look through the "Python Manual" link on the web site.
Tim
mar...@v.loewis.de (Martin v. Löwis) wrote:
> I can't agree with that analysis: CORBA has, of all your alternatives,
> the least network bandwidth requirements.
I was referring to overhead like:
a) writing the interface definition and dealing with ID strings
b) more code
c) size of the "library"
I am quite sure the other implementations of DC are lighter in these
areas, though for some that may not be an issue. I was not considering
network bandwidth, but am glad to hear CORBA is efficient in that
respect.
-- robin
> SOAP and such are just concessions to commercial misunder-
> standings about what business needs.
I would like to know more about what you mean by this.
Jp Calderone <exa...@intarweb.us> wrote:
> Personally, I think SOAP is worth ignoring. ;) XML-RPC is what I would
> choose if I were going for that sort of solution, too.
I'd like more details here too. Why ignore SOAP?
-- robin
> mar...@v.loewis.de (Martin v. Löwis) wrote:
> > I can't agree with that analysis: CORBA has, of all your alternatives,
> > the least network bandwidth requirements.
>
> I was referring to overhead like:
> a) writing the interface definition and dealing with ID strings
It is true that you have to write interface definitions. However,
considering that you are doing distributed computing, I can't really
see this as "overhead". Overhead compared to what?
What are ID strings, and why do you need them in CORBA?
> b) more code
Code written by yourself? Compared to what? A CORBA client in Python
is really short. A CORBA server is larger, but then, the servers
for other distributed computing infrastructures are also larger.
> c) size of the "library"
That depends on the implementation you use. For a client, it is
certainly true that XML-RPC libraries are significantly (factor 10)
smaller than the Fnorb libraries.
Regards,
Martin
SOAP's s'posed to be the "Simple Object Access Protocol".
It's defining document begins, "SOAP is a lightweight
protocol ..."
It's a bad sign that it's fiction from the start. SOAP
isn't lightweight or simple, and it doesn't particularly
access objects.
SOAP is an RPC implementation. I'm fine with RPC, and I
like SOAP--'hope I get more jobs to do it during the next
year. However, I think commercial experience has demon-
strated adequately that RPC isn't safe in the hands of
the programming fraternity at large. It's something medi-
ocre programmers do wrong.
Businesses *think* they want their development crews to
standardize on an RPC, and XML is a good thing, isn't it?,
but they're wrong. RPC across organizational boundaries
turns out to be somewhere between difficult and a disaster.
Businesses that are happy with SOAP are actually using it
as a messaging service for asynchronous transmission of
XMLified documents with business content.
I repeat: for a mixture of correct and incorrect reasons,
XML, RPC, and so on are believed to be good things for
business. People conclude that SOAP must be a super-
technology, solving whole layers of issues at once. It's
not. It's OK, and, with enough support from Microsoft,
IBM, Oracle, and a few others, it certainly can dominate.
In truth, though, it answers the wrong question.
Python's own Paul Prescod has plenty to say about SOAP's
technical flaws. Check out <URL: http://
mail.python.org/pipermail/xml-sig/2002-February/007183.html >
and other references available through <URL: http://prescod.com >.
Certainly. You can pass around the actual Pyro proxy objects, without
bothering about object location, ID, etc, or pass around the object's UID.
Pyro also allows callbacks, although you have to do a little bit of extra
coding to enable them.
>>I've wanted to do something in CORBA for years. I've
>>never gotten there. One problem is that I'm used to Python,
>>where I don't need to describe the interface beforhand.
>>CORBA wants that IDL, and a change in the object's interface
>> must be reflected in the IDL. That just seems tedious to
>>me now.
>>
>
>
> Right. I think this is at least part of the overhead Robin was talking
> about. Several of the other schemes don't require this.
>
>
>>I also do nearly everything in Python, so don't need the
>>ability for different langauges to interoperate. I just pass
>>around Python objects.
I'd say: use Pyro (but I'm biased ofcourse ;=)
You won't have to specify an interface other than your regular
Python class, and it's designed for a Python-only environment.
--Irmen de Jong.
>I'll do this in an abbreviated form.
Thank you. That was very informative. At least it confirms my
suspicions.
I suppose now I'll be looking closer at Dopy, Pyro, and Twisted
Spread, so anything that will help me distinguish between them is
welcome.
-- robin
>I am attempting to summarise the distributed computing implementations
>available to Python programmers. My conclusions so far are as follows:
>
>If you want the biggest, boldest approach and don't care about
>overhead, use CORBA.
You might want to read the slides to the presentation I gave at the UK
Python conference last week. The title is "CORBA? Isn't that
Obsolete?". You can get it here:
http://www.grisby.org/presentations/accu2003.pdf
Cheers,
Duncan.
--
-- Duncan Grisby --
-- dun...@grisby.org --
-- http://www.grisby.org --
>> c) size of the "library"
>
>That depends on the implementation you use. For a client, it is
>certainly true that XML-RPC libraries are significantly (factor 10)
>smaller than the Fnorb libraries.
It's also worth pointing out that, although Fnorb and omniORB are both
a megabyte or two in size, that's still significantly smaller than
Python itself, so size is not an issue for the vast majority of
applications written in Python.
True, and Mike Olson did some pretty thorough analysis on this. See:
http://www-106.ibm.com/developerworks/library/ws-pyth9/
--Uche
http://uche.ogbuji.net
I'm interested to find out why the python server/client pair is so
abominably slow when going across our lightly loaded 100/10 Mbs
ethernet.
I modified the time-client.py script to allow setting of the server name
using an environ script.
When using the time-client on the same machine I see
C:\Python\tmp\test_servers>time-client.py
Connecting to ('localhost', 8080)
Time to connect: 0.040000
Sending a long string to the server
Time to send a string of 21000 chars, 0.000000
Recieving a long stirng from the server
Time to receive a string of 22000 chars, 0.000000
Sending lots of ints to the server
Time to send 5000 ints, 32.297000 (0.006459 per call)
On a machine on the same local net I see
R:\Python\tmp\test_servers>time-client.py
Connecting to ('192.168.0.3', 8080)
Time to connect: 0.000000
Sending a long string to the server
Time to send a string of 21000 chars, 0.000000
Recieving a long stirng from the server
Time to receive a string of 1455 chars, 0.000000
Sending lots of ints to the server
Time to send 5000 ints, 1007.937000 (0.201587 per call)
What am I missing that causes such painfully slow connections?
--
Robin Becker
>True, and Mike Olson did some pretty thorough analysis on this. See:
>
>http://www-106.ibm.com/developerworks/library/ws-pyth9/
Interesting article. Unfortunately, the first two timings of the CORBA
client aren't timing what Mike thinks they are. The first time, for
"connecting to server" doesn't actually connect to the server at all.
It just creates an object reference for it.
The second time, for "send string" _is_ timing sending a string, but
that is also when the TCP connection to the server is made. A
significant portion of the time is spent setting up the connection,
not transferring the string. If I modify the client to do the first
call twice, I get
Connecting to server
Time to connect to server, 0.000386
Sending a long string to the server
Time to send a string of 21000 chars, 0.002099
Sending a long string to the server
Time to send a string of 21000 chars, 0.001034
Recieving a long stirng from the server
Time to receive a string of 22000 chars, 0.001088
Sending lots of ints to the server
Time to send 5000 ints, 0.921309 (0.000184 per call)
So you see that the first call takes about twice as long as the second
one. Another interesting thing if you care about raw speed is that
CORBA strings are not allowed to have embedded nulls in them, and
undergo code set conversion, and checking these things slows them
down. Using a sequence of octets, which doesn't have any checks or
conversions brings the time down to 0.000827 and 0.000839 for sending
and receiving respectively.
For comparison, the raw socket client has these times on my machine:
Connecting to server
Time to connect to server, 0.001273
Sending a long string to the server
Time to send a string of 21000 chars, 0.000988
Recieving a long stirng from the server
Time to receive a string of 22000 chars, 0.000873
Sending lots of ints to the server
Time to send 5000 ints, 10.290595 (0.002058 per call)
Lies, damn lies, and statistics...
> Interesting article. Unfortunately, the first two timings of the CORBA
> client aren't timing what Mike thinks they are. The first time, for
> "connecting to server" doesn't actually connect to the server at all.
> It just creates an object reference for it.
>
> The second time, for "send string" _is_ timing sending a string, but
> that is also when the TCP connection to the server is made. A
> significant portion of the time is spent setting up the connection,
> not transferring the string. If I modify the client to do the first
> call twice, I get
>
> Connecting to server
> Time to connect to server, 0.000386
>
> Sending a long string to the server
> Time to send a string of 21000 chars, 0.002099
>
> Sending a long string to the server
> Time to send a string of 21000 chars, 0.001034
>
> Recieving a long stirng from the server
> Time to receive a string of 22000 chars, 0.001088
>
> Sending lots of ints to the server
> Time to send 5000 ints, 0.921309 (0.000184 per call)
>
To add some more to the mix, I benchmarked Pyro (3.2) :
Connecting to server
Time to connect to server, 0.026004
Sending a long string to the server
Time to send a string of 21000 chars, 0.003536
Recieving a long stirng from the server
Time to receive a string of 22000 chars, 0.003387
Sending lots of ints to the server
Time to send 5000 ints, 10.057741 (0.002012 per call)
I find this very fast for a pure Python solution....
I also measured the message sizes with tcpdump as mentioned in the article:
Actual message size sending 1,000 characters: 1390
Actual message size sending 100 integers: 27698 (with CORBA amongst the
smallest)
> Lies, damn lies, and statistics...
Amen.
--Irmen de Jong
--
Robin Becker
Yep that was on the same machine. Running the Pyro server on a different
machine on my 100 Mbit lan gives:
Connecting to server
Time to connect to server, 0.024740
Sending a long string to the server
Time to send a string of 21000 chars, 0.003866
Recieving a long stirng from the server
Time to receive a string of 22000 chars, 0.003579
Sending lots of ints to the server
Time to send 5000 ints, 4.586772 (0.000917 per call)
So you see, it's even faster when the server is running on another machine.
This can be explained because the pickling/unpickling is split across two
CPUs, when you run client+server on a single CPU they are fighting for cycles.
--Irmen de Jong
--
Robin Becker
No, Pyro reuses socket connections for method calls.
--Irmen
I just tried the original server.py/time-client.py pair again this time
on a freeBSD system
freeBSD-->localhost 500 secs
freeBSD-->192.168.0.3 50 secs (a local win32 machine).
I guess I'm just not doing what I think I'm doing.
--
Robin Becker
> sigh :(
> for real distributed computing I wouldn't want to hold the sockets open.
Sorry if I missed that requirement. I was just responding to Duncan's
CORBA timings...
> I just tried the original server.py/time-client.py pair again this time
> on a freeBSD system
>
> freeBSD-->localhost 500 secs
> freeBSD-->192.168.0.3 50 secs (a local win32 machine).
>
>
> I guess I'm just not doing what I think I'm doing.
I really can't explain these numbers either? Localhost is 10 times slower
than over the network???
Unless your client & server both struggle for CPU cycles on the same machine
at the same time...
--Irmen
This isn't to say that Irmen's time and size data isn't interesting and
worth knowing. But just to point out that the article wasn't simply
snubbing Pyro for no reason.
Yours, David...
--
mertz@ _/_/_/_/_/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY:_/_/_/_/ v i
gnosis _/_/ Postmodern Enterprises _/_/ s r
.cx _/_/ MAKERS OF CHAOS.... _/_/ i u
_/_/_/_/_/ LOOK FOR IT IN A NEIGHBORHOOD NEAR YOU_/_/_/_/_/ g s
>Robin Becker wrote:
>
>> sigh :(
>> for real distributed computing I wouldn't want to hold the sockets open.
>
>Sorry if I missed that requirement. I was just responding to Duncan's
>CORBA timings...
>
>> I just tried the original server.py/time-client.py pair again this time
>> on a freeBSD system
>>
>> freeBSD-->localhost 500 secs
>> freeBSD-->192.168.0.3 50 secs (a local win32 machine).
>>
>>
>> I guess I'm just not doing what I think I'm doing.
>
>I really can't explain these numbers either? Localhost is 10 times slower
>than over the network???
IIRC, there was some discussion some time ago about this being a particular
FreeBSD pessimality problem that wasn't true of Linux, but someone was fixing it?
But FreeBSD has good networking reputation I thought, so this needs double check,
for sure. Maybe it was for an old version?
>
>Unless your client & server both struggle for CPU cycles on the same machine
>at the same time...
Or the struggle involves unnecessary scheduling waits of some kind?
Regards,
Bengt Richter
Given the above perhaps it's not dns at all, but some other things that
get in the way. Do OSes impose limits on number of ports or
connections/second etc?
>Regards,
>Bengt Richter
--
Robin Becker
Seems that the easiest way to get insight into the network traffic you're
generating is to install Ethereal or some similar software and simply look
to see what's going over the wire.
regards
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/pwp/
Did you miss PyCon DC 2003? Would you come to PyCOn DC 2004?
>sigh :(
>for real distributed computing I wouldn't want to hold the sockets open.
Why ever not!? Most protocols people use for distributed computing do
hold sockets open. As long as connections are seen as a cache, and can
be closed if necessary, it's a good thing. TCP start-up overhead is
huge, especially if the network has high latency.
> Well after thinking about it I guess you're right. I guess I'm harking
> back to times when only small numbers of file handles could be held and
> thinking that sockets might be subject to the same kind of limits.
You are right that file descriptors are a limited resource, and any
carefully designed distributed-computing library needs to take this
into account. *However*, if taken into account, you can make good use
of this limited resource by nearly exhausting it:
Allow your library to consume a certain number of file descriptors,
just about as much so that enough are left for local file IO. Then the
library should implement some collection mechanism for unused socket
connections, and shut down anything that has not been used for a
while. The protocol needs to allow shut-downs from either side, as
both partners may experience file descriptor exhaustion.
In CORBA, you need to keep open all connections from which you expect
a response. So if your allocated socket descriptor pool is exhausted,
you first close those connections that have no outstanding requests.
If that is still insufficient, you also close the socket on which you
have been waiting for a response longest, and tell the application
that this connection has timed out.
Regards,
Martin
> so how do master slave implementations handle thousands of slaves? If
> the master is really a 'master' and not a SETI type server it would seem
> that these resource limits might play a part. How do GRID systems work
> this?
In a compute-intensive application, it is sensible to close the TCP
connection after you have communicated the job. Keeping the connection
alive is important to avoid the TCP connection setup, however, that is
neglible if you spend several minutes or more in computation.
I don't actually know how grid computing protocols work, but I believe
they are not concerned about networking performance, as they use HTTP
and SOAP to communicate jobs.
Regards,
Martin