Distributed Maglev

51 views
Skip to first unread message

ecin

unread,
Nov 3, 2010, 4:52:20 AM11/3/10
to MagLev Discussion
I've been trying to follow the directions per the Gemstone System
Administration Guide (http://seaside.gemstone.com/docs/GS64-
SysAdminGuide-2.3.pdf) to get a distributed Maglev system up and
running, i.e. connect to a Stone from a remote computer.

server☁:> cd $MAGLEV_HOME
server☁:> rake maglev:start
server☁:> rake netldi:start

At this point, I expect to be able to connect to the default Maglev
stone using a Network Resource String (NRS):

remote☁:> rake netldi:start
remote☁:> maglev-ruby --stone \!@<server☁_ip>\!maglev -e "puts 'yay'"

... but this errors out with:

-------
Error connecting to stone:
ERROR 4136, The connection to the Stone Repository monitor was
refused:
,
RUN can't be used prior to logging in.
... after error, skipping input source down to %
SKIPPED above method/doit due to error preceeding it
-------

Similarly, I can't seem to login from within a topaz prompt:

-------
server☁:> export GEMSTONE=$MAGLEV_HOME/gemstone
server☁:> cd $GEMSTONE
server☁:> ./bin/topaz
topaz > set username 'DataCurator'
topaz > set password swordfish
topaz > set gemstone !@<server☁_ip>!maglev
topaz > login
ERROR 4035, The GemStone session has lost its connection to the Stone
Repository monitor., recv(3,0x1023673e8,128,0) failed with
errno=54,ECONNRESET, Connection reset by peer, stncall SocketRecv
failed
-------

Am I missing something?

Monty Williams

unread,
Nov 3, 2010, 8:41:35 PM11/3/10
to MagLev Discussion
Hi Ecin,

This might be simple to solve, but it could also be difficult to
divine which logfile has the real error message. It may be one in your
home directory.

It'll make life easier to start both netldis in debug mode. Then the
netldi logs may give you an idea of where things went wrong. Try this
on both machines.

export GEMSTONE=$MAGLEV_HOME/gemstone
$GEMSTONE/bin/startnetldi -d -g -a $USER

You should get a message showing the location of the netldi log.
startnetldi[Info]: GemStone version '3.0.0'
startnetldi[Info]: Starting GemStone network server "gs64ldi".
startnetldi[Info]: GEMSTONE is: "/congo1/users/monty/MagLev/
MagLev-24591.Linux-x86_64/gemstone".
startnetldi[Info]: Log file is '/congo1/users/monty/MagLev/
MagLev-24479.Linux-x86_64/log/gs64ldi.log'.
startnetldi[Info]: GemStone server 'gs64ldi' has been started.

There is an old techtip (GSS-0037) on this page:
http://support.gemstone.com/gemstone_s/learning_center/tips/
It doesn't cover GS/S -64 bit, and the page manager logic was
different when this tip was written. But it may still prove helpful in
explaining the contents/sequence of the netldi logs.

If it turns out to be non-obvious, we'll need to look at some of the
logfiles.

Note: our intent was not to support remote logins on this version, but
it looks like we forgot to disable it. We might as well understand
your problem in case others hit it, too. We do a lot of testing remote
logins here, so it's not a known problem. Probably we just should
write a script to make it simpler.

-- Monty

Monty Williams

unread,
Nov 3, 2010, 8:52:06 PM11/3/10
to MagLev Discussion
Just a thought... The default directory for netldi logs is /opt/
gemstone/log. When starting programs from $GEMSTONE/bin it could fail
in finding that directory because of other missing environment
variables.
You could always create a soft link from /opt/gemstone/log to
$MAGLEV_HOME/log to get past that problem.

-- Monty

ecin

unread,
Nov 3, 2010, 10:39:34 PM11/3/10
to MagLev Discussion
All right, logging done.

If I follow the tech tip you mentioned, Monty, it seems I can't get to
step 5.

gs64ldi.log snippet on server:

==========
Attempting accept...
...succeeded accepting client from Snow.local, connection = 2
0: --- 11/03/10 22:09:40.895 AST ---

Finished reading client request:
Client is a rpc gem or a linked application.
'!#encrypted:ecin#server!maglev'
0: --- 11/03/10 22:09:40.895 AST ---

Reply to client started:
'SUCCESS 59664'
0: --- 11/03/10 22:09:40.895 AST ---

Done writing reply to client.
0: --- 11/03/10 22:09:40.915 AST ---

Disposed. elapsed time = 0
==========

So far so good. On the client's netldi logs:

==========
Finished reading client request:
Client is a rpc application.
'!@encrypted:ecin!gemnetobject'
0: --- 11/03/10 19:25:24.988 PDT ---

Sucessful fork; Child's Pid: 1440 command is:
'/home/ecin/Gemstone-24566.Linux-x86_64/sys/gemnetobject TCP
47519 30'
0 : --- 11/03/10 19:25:25.016 PDT ---

Now reading reply from child
0: --- 11/03/10 19:25:25.016 PDT ---

Reply to client started:
'SUCCESS 36479'
0: --- 11/03/10 19:25:25.016 PDT ---

Done writing reply to client.
0: --- 11/03/10 19:25:25.016 PDT ---

Disposed. elapsed time = 1
==========

However, the logs for the child process just forked shows the error
(timestamps are different, yes, but it's the same error every time):

==========
--- 11/03/10 22:19:23.645 AST ---
recv(6,0x1000c16bf,1,0) failed with errno=54,ECONNRESET, Connection
reset by peer
-----------------------------------------------------
GemStone: Error Fatal
The GemStone session has lost its connection to the Stone Repository
monitor., recv(3,0x1023673e8,128,0) failed with errno=54,ECONNRESET,
Connection reset by peer, stncall SocketRecv failed
Error Category: 231169 [GemStone] Number: 4035 Arg Count: 1 Context :
20
Arg 1: 20

[Info]: Logging out at 11/03/10 22:19:23 AST


*****************************************************
****** GemStone Abnormal Shutdown at 11/03/10 22:19:23 AST
*****************************************************
-----------------------------------------------------
GemStone: Error Fatal
The GemStone session has lost its connection to the Stone Repository
monitor., recv(3,0x1023673e8,128,0) failed with errno=54,ECONNRESET,
Connection reset by peer, stncall SocketRecv failed
Error Category: 231169 [GemStone] Number: 4035 Arg Count: 1 Context :
20
Arg 1: 20
==========

Only things of note I found: if I try to connect using the NRS from a
topaz instance running on the server, I get:

==========
stone oobSocket read failed 11
==========

In the child process log, just before it complains about the Gemstone
session being lost.

Interestingly, the above error only occurs when trying to test with
topaz; it seems Maglev can't figure out how to use the network. From
maglev.log:

==========
--- 11/03/10 22:19:35.099 AST ---
Could not get the network address of the Gem, session 5.
Reason: No network error. ,
Net
==========

Hope we can figure this out. :)

Monty Williams

unread,
Nov 4, 2010, 1:15:56 AM11/4/10
to MagLev Discussion
Ecin,

The problem is here. The stone process is unable to get the network
info for the gem process. This probably means a call to
gethostbyaddr_r() or getpeername() by the stone is failing. Can you
ping the stone host from the gem host by name and IP? And vice-versa?

I'd try it using IP addresses. Or try using fully qualified domain
names.

-- Monty

ecin

unread,
Nov 4, 2010, 12:02:55 PM11/4/10
to MagLev Discussion
Hey Monty,

Don't think that's the problem. Running maglev-ruby on the same
machine as the Stone still fails. BTW, I'm using RVM (not sure if that
would change anything vs. installing from Github repo):

server☁:> rvm maglev
server☁:> maglev start
server☁:> ruby --stone \!@192.168.1.120\!maglev -e "puts 'yay'"
Error connecting to stone:
ERROR 4136, The connection to the Stone Repository monitor was
refused:
,
RUN can't be used prior to logging in.
... after error, skipping input source down to %
server☁:> ruby --stone \!@Snow.local\!maglev -e "puts 'yay'"
Error connecting to stone:
ERROR 4136, The connection to the Stone Repository monitor was
refused:
,
RUN can't be used prior to logging in.
... after error, skipping input source down to %

Obviously, I can ping the same machine I'm on. Using 127.0.0.1 or
localhost works fine though (perhaps because they're defined in /etc/
host?):

server☁:> ruby --stone \!@127.0.0.1\!maglev -e "puts 'yay'"
yay
server☁:> ruby --stone \!@localhost\!maglev -e "puts 'yay'"
yay

Topaz repeats itself as well:

topaz> set gemstone !@Snow.local!maglev
topaz> login
--- 11/04/10 11:59:12.947 AST ---
stone oobSocket read failed 11
recv(5,0x1000a36bf,1,0) failed with errno=54,ECONNRESET, Connection
reset by peer
ERROR 4035, The GemStone session has lost its connection to the Stone
Repository monitor., recv(3,0x1620c13e8,128,0) failed with
errno=54,ECONNRESET, Connection reset by peer, stncall SocketRecv
failed

So, the problem affecting maglev-ruby doesn't seem to be the same as
the one affecting topaz. *sigh* I'll keep investigating.

Peter McLain

unread,
Nov 4, 2010, 12:32:00 PM11/4/10
to maglev-d...@googlegroups.com

On Nov 4, 2010, at 9:02 AM, ecin wrote:

> server☁:> rvm maglev
> server☁:> maglev start
> server☁:> ruby --stone \!@192.168.1.120\!maglev -e "puts 'yay'"

The --stone option to maglev-ruby does not understand the network syntax. It is used for finding the configuration file to pass to the stone (among other things). So, in your case, the maglev-ruby script is looking for a file named $MAGLEV_HOME/etc/conf.d/!@192.168.1.120!maglev.conf, but doesn't find one...

We'll need to update the options to maglev-ruby, and perhaps tweak a few other things to make it work out of the box. Until then, you'll need to manually setup the netldi processes, and fire up the remote VM by hand.

I've opened a trac ticket for this, including the need to document how to get things running: https://magtrac.gemstone.com/ticket/816

--
Peter McLain
pmc...@vmware.com




ecin

unread,
Nov 4, 2010, 2:17:02 PM11/4/10
to MagLev Discussion


On Nov 4, 12:32 pm, Peter McLain <pmcl...@vmware.com> wrote:
> On Nov 4, 2010, at 9:02 AM, ecin wrote:
>
> > server☁:> rvm maglev
> > server☁:> maglev start
> > server☁:> ruby --stone \!...@192.168.1.120\!maglev -e "puts 'yay'"
>
>   The --stone option to maglev-ruby does not understand the network syntax.  It is used for finding the configuration file to pass to the stone (among other things).  So, in your case, the maglev-ruby script is looking for a file named $MAGLEV_HOME/etc/conf....@192.168.1.120!maglev.conf, but doesn't find one...

server☁:> maglev-ruby --stone \!@127.0.0.1\!maglev -e "puts 'yay'"

The above does work though. Are you sure it doesn't understand the
network syntax?

>   We'll need to update the options to maglev-ruby, and perhaps tweak a few other things to make it work out of the box.  Until then, you'll need to manually setup the netldi processes, and fire up the remote VM by hand.
>
> I've opened a trac ticket for this, including the need to document how to get things running:https://magtrac.gemstone.com/ticket/816

Awesome. :)

> --  
> Peter McLain
> pmcl...@vmware.com

ecin

unread,
Nov 10, 2010, 3:30:41 PM11/10/10
to MagLev Discussion
Hey guys,

Here's a set of instructions to reproduce my setup. This is what I
would expect to work, and may point out any bad assumptions I've made.

# System
# Snow Leopard 10.6.4
# rvm 1.0.21
# maglev 24566

# Docs and instructions at
# http://seaside.gemstone.com/docs/GS64-SysAdminGuide-2.3.pdf

# Steps
# Install latest maglev on rvm, 24566 as of Nov. 10, 2010
rvm install maglev
rvm maglev
export GEMSTONE=$MAGLEV_HOME/gemstone

# Make sure the stone is running
# Should exit with status 0
$GEMSTONE/bin/waitstone \!@127.0.0.1\!maglev -1

# At this point, we should be able to connect with topaz
bin/topaz
topaz> set gemstone maglev
topaz> set username DataCurator
topaz> set password swordfish
topaz> login
successful login
topaz 1> exit
Logging out session 1.

# Now let's try to connect topaz to the Maglev stone (VM?)
# using a Network Resource String, i.e. !@<ip>!<stone_name>

# Start netldi in guest mode
$GEMSTONE/bin/startnetldi -g -a `whoami`

# Let's try with topaz again, using 127.0.0.1 as our server
bin/topaz
topaz> set gemstone !@127.0.0.1!maglev
topaz> set username DataCurator
topaz> set password swordfish
topaz> login
successful login
topaz 1> exit
Logging out session 1.

# Let's substitute 127.0.0.1 with our hostname
bin/topaz
topaz> set gemstone !@Snow.local!maglev
topaz> set username DataCurator
topaz> set password swordfish
topaz> login
ERROR 4035, The GemStone session has lost its connection to the
Stone Repository monitor., recv(3,0x1023673f0,128,0) failed with
errno=54,ECONNRESET, Connection reset by peer, stncall SocketRecv
failed
topaz> exit

# D'oh. Finally, let's try with our actual IP
bin/topaz
topaz> set gemstone !@192.168.1.120!maglev
topaz> set username DataCurator
topaz> set password swordfish
topaz> login
ERROR 4035, The GemStone session has lost its connection to the
Stone Repository monitor., recv(3,0x1023673f0,128,0) failed with
errno=54,ECONNRESET, Connection reset by peer, stncall SocketRecv
failed
topaz> exit

Monty Williams

unread,
Nov 10, 2010, 4:30:31 PM11/10/10
to maglev-d...@googlegroups.com
Hi Ecin,

Thanks for the details. Basically it looks like 127.0.0.1 works but using your hostname (Snow.local) or actual IP address (192.168.1.120) fails in the local case. I would have expected at least one of the latter to work. It looks broken to me. It could be we missed something on the Mac, but I can't rule out Linux either.

We're on our way to RubyConf so it may take a few days to get back to you on this.

After we figure this out, you may be able to make use of the ENV variable GEMSTONE_NRS_ALL to set a default.

-- Monty

> --
> You received this message because you are subscribed to the Google Groups "MagLev Discussion" group.
> To post to this group, send email to maglev-d...@googlegroups.com.
> To unsubscribe from this group, send email to maglev-discuss...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/maglev-discussion?hl=en.
>

ecin

unread,
Nov 10, 2010, 5:34:05 PM11/10/10
to MagLev Discussion
A bit preliminary here, but I think I found the problem: Gemstone
doesn't like dealing with IPs.

In /etc/hosts:
# 192.168.1.120 is the server's IP
192.168.1.120 Snow

Now topaz works with Snow as the hostname:
topaz> set gemstone !@Snow!maglev
topaz> set username DataCurator
topaz> set password swordfish
successful login

Found the tip at http://forum.world.st/Migrating-from-2-3-1-to-2-4-4-td2401101.html#a2401101

Tested running the topaz prompt on the same machine as the gemstone.
Gonna try separate machines next!

ecin

unread,
Nov 10, 2010, 7:26:34 PM11/10/10
to MagLev Discussion
All right, so it works... sort of.

I can connect a topaz instance with the RPC library:

remote:> echo "192.168.1.120 Snow" >> /etc/hosts

remote:> $GEMSTONE/bin/topaz
topaz> set gemstone maglev
topaz> set username DataCurator
topaz> set password swordfish
topaz> set gemnetid !@Snow!gemnetobject
topaz> login
successful login

Using topaz with the linked library fails:

remote:> $GEMSTONE/bin/topaz -l
topaz> set gemstone !@Snow!maglev
topaz> set username DataCurator
topaz> set password swordfish
ERROR 4141, Stone couldn't start its PageServer and/or SharedPageCache
on this machine.
See Page Manager's log for more information.
, Unable to establish connection to cache page server on remote host:
Snow Remote cache creation failed.,
topaz> exit

And from the Page Manager's log:

PageServer net connection error:
Nonblocking connect(192.168.1.120,port=53712) failed to complete.
--- 11/10/10 20:19:03.431 AST ---

pageMgrCreateRemoteSharedCache: RDbfFinishCreateCacheServer failure,

--- 11/10/10 20:19:03.431 AST ---

The connection to the cache pgsvr on host Snow was lost.
All gems on this remote cache will now be stopped.
Warning: PageSetLostOt didn't find session 5
Warning: PageSetLostOt didn't find session 5

So, definitely progress. There seem to be a whole set of reasons as to
why the PageServer could have trouble launching. I'll go into those
next.
Reply all
Reply to author
Forward
0 new messages