Multiple users on ExoGENI nodes

21 views
Skip to first unread message

Josh Smift

unread,
Jan 8, 2014, 3:22:55 PM1/8/14
to GENI Users
We have four slices that we share among four GPO infra users, and when we
create slivers in those slices, all four of us get accounts there, except
sometimes we don't on ExoGENI racks. For example, the VM at 129.7.98.9 on
Houston EG, the asydne01 account didn't get created for some reason. Out
of the twenty slivers, one account (and only ever one) was missing on the
resulting VM on eleven of them. (asydne01 on ten of them, chaos on one)
Any way to tell why?

-Josh (j...@bbn.com)

Paul Ruth

unread,
Jan 8, 2014, 10:03:12 PM1/8/14
to geni-...@googlegroups.com
I have a couple of thoughts about possible causes.  Is this at all repeatable? I ask because I will be at BBN tomorrow through Saturday for the Winter Camp.   Maybe I will have time Friday to sit down with you and debug this.

Paul



--
GENI Users is a community supported mailing list, so please help by responding to questions you know the answer to.

If this is your first time posting a question to this list, please review http://groups.geni.net/geni/wiki/GENIExperimenter/CommunityMailingList
---
You received this message because you are subscribed to the Google Groups "GENI Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geni-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Josh Smift

unread,
Jan 9, 2014, 11:42:32 AM1/9/14
to geni-...@googlegroups.com
PR> I have a couple of thoughts about possible causes. Is this at all
PR> repeatable? I ask because I will be at BBN tomorrow through Saturday
PR> for the Winter Camp. Maybe I will have time Friday to sit down with
PR> you and debug this.

Well, it happened several times when we created those slices, so it's
probably pretty repeatable, but I haven't tried since then. :^) Lemme see
if I can reproduce it at least sometimes, and I'll letcha know.

-Josh (j...@bbn.com)

Josh Smift

unread,
Jan 10, 2014, 11:44:37 AM1/10/14
to geni-...@googlegroups.com
PR> I have a couple of thoughts about possible causes. Is this at all
PR> repeatable? I ask because I will be at BBN tomorrow through Saturday
PR> for the Winter Camp. Maybe I will have time Friday to sit down with
PR> you and debug this.

JBS> Well, it happened several times when we created those slices, so it's
JBS> probably pretty repeatable, but I haven't tried since then. :^) Lemme
JBS> see if I can reproduce it at least sometimes, and I'll letcha know.

I tried just now with our four slivers on the UFL EG rack, and one of us
didn't get an account on any of the resulting four VMs (the same one on
all four). Dunno if I just got lucky, or if it'll reproduce every time,
but still, that's something.

Let me know if you need any other info to look around on your own, or if
you want to meet up some time today to look at it together, or what. Thanks!

-Josh (j...@bbn.com)

Paul Ruth

unread,
Jan 10, 2014, 1:06:13 PM1/10/14
to geni-...@googlegroups.com
Is the user who didn't get the account named "tupty"?

Paul



                                      -Josh (j...@bbn.com)

Josh Smift

unread,
Jan 10, 2014, 1:16:45 PM1/10/14
to geni-...@googlegroups.com
PR> Is the user who didn't get the account named "tupty"?

Nope; he is one of the four of us (jbs (me) and chaos are the other two),
but he *did* get an account, on all of the VMs (as did chaos).

-Josh (j...@bbn.com)

Paul Ruth

unread,
Jan 13, 2014, 12:10:30 PM1/13/14
to geni-...@googlegroups.com
Josh,

I think I found a race condition that could be the cause of your missing user accounts in the VM.  I pushed a fix to the ufl rack.   Please try your slice several times at ufl and let me know if they problem still exists.   

Let me know if you see missing accounts again on ufl.   Also let me know If/when you think that the fix might have worked so that I can push the fix to the other racks.

thanks,
Paul



                                      -Josh (j...@bbn.com)

Josh Smift

unread,
Jan 14, 2014, 1:19:14 AM1/14/14
to geni-...@googlegroups.com
PR> I think I found a race condition that could be the cause of your
PR> missing user accounts in the VM. I pushed a fix to the ufl rack.
PR> Please try your slice several times at ufl and let me know if they
PR> problem still exists.

I tried it just now, and all of our accounts got created properly the
first time. The second time, one of the VMs never came up -- sliverstatus
had it as "configuring" even after ten or fifteen minutes. The third and
fourth time, everything worked fine, and all our accounts got created
properly. The fifth time, Chaos's account (username 'chaos') didn't get
created. The sixth time, everything worked.

So, just that once; but there was that once. Around 01:00 if you want to
check some logs. I can also try some more during normal working hours
tomorrow if you'd like.

-Josh (j...@bbn.com)

Paul Ruth

unread,
Jan 14, 2014, 9:26:04 AM1/14/14
to geni-...@googlegroups.com
Hmmmm...  That must not have been the problem.   

I can see in the exogeni logs that the exogeni handler that creates and configures the VM fails to ssh to the VM to create the one user (but succeeds for the others).   This happens after the handler tests to see if it can ping and ssh to the VM.    Can you think of any reason this particular image or boot script would allow ssh connections then, temporarily, reject ssh connections?   

The solution might be for use to add retries to more places in the exogeni handler so that it will wait until the VM is ready to accept connections again.

Paul

 



                                      -Josh (j...@bbn.com)

Josh Smift

unread,
Jan 14, 2014, 10:12:29 AM1/14/14
to geni-...@googlegroups.com
PR> I can see in the exogeni logs that the exogeni handler that creates
PR> and configures the VM fails to ssh to the VM to create the one user
PR> (but succeeds for the others). This happens after the handler tests to
PR> see if it can ping and ssh to the VM. Can you think of any reason this
PR> particular image or boot script would allow ssh connections then,
PR> temporarily, reject ssh connections?

Not really. :^( One question: Is it clear whether it's rejecting the
connection, or timing out?

-Josh (j...@bbn.com)

Paul Ruth

unread,
Jan 16, 2014, 9:11:46 PM1/16/14
to geni-...@googlegroups.com
Which image are you using for the slices with this problem?   I just ran into a similar issue with Niky and Divya using the gimi ubuntu image.   I am wondering if it is related.   

Paul 



                                      -Josh (j...@bbn.com)

Josh Smift

unread,
Jan 17, 2014, 10:14:32 AM1/17/14
to geni-...@googlegroups.com
PR> Which image are you using for the slices with this problem? I just ran
PR> into a similar issue with Niky and Divya using the gimi ubuntu image.
PR> I am wondering if it is related.

I'm using

<disk_image name="http://emmy9.casa.umass.edu/Disk_Images/ExoGENI/Ubuntu1204/ubuntu1204.xml" version="0beb69c8fbe65af0134ac8fe38b22e8f9b3c254c" />

which is Ubuntu, but I think not GIMI. But there could be something about
the Ubuntu startup process that's not working well with the ExoGENI VM handler.

-Josh (j...@bbn.com)

Paul Ruth

unread,
Jan 17, 2014, 10:19:29 AM1/17/14
to geni-...@googlegroups.com
Yeah, that is what I am thinking.  I'm fairly certain that both of these VMs are based off of the same original ubuntu image.  

Likely yours is the original and the gimi image is based off of yours.  Divya and/or Niky, can you confirm this?   

Paul





                                      -Josh (j...@bbn.com)

Josh Smift

unread,
Jan 17, 2014, 10:20:54 AM1/17/14
to geni-...@googlegroups.com
PR> Yeah, that is what I am thinking. I'm fairly certain that both of these
PR> VMs are based off of the same original ubuntu image.
PR>
PR> Likely yours is the original and the gimi image is based off of yours.
PR> Divya and/or Niky, can you confirm this?

They might also have a common ancestor -- I think Jeanne Ohren created the
one I'm using, and I wouldn't be at all surprised if she created the GIMI
one too.

In any case, I liked your earlier suggestion about making the ExoGENI
handler more resilient to transient failures. Is that more like super
simple and just needs an outage to deploy, or more like complicated and a
pain? :^)

-Josh (j...@bbn.com)

Niky Riga

unread,
Jan 17, 2014, 10:23:10 AM1/17/14
to geni-...@googlegroups.com
I believe that the image Josh is using is the original Ubuntu that
Jeanne created a while back and probably the GIMI image was based on
that, and my image was based on GIMI.

Divya should probably know more.

--niky

Divyashri Bhat

unread,
Jan 17, 2014, 11:40:47 AM1/17/14
to geni-...@googlegroups.com

Yes, I have been using the Ubuntu image that was listed by Jeanne in the image registry.


For more options, visit https://groups.google.com/groups/opt_out.

--
GENI Users is a community supported mailing list, so please help by responding to questions you know the answer to.

If this is your first time posting a question to this list, please review http://groups.geni.net/geni/wiki/GENIExperimenter/CommunityMailingList
--- You received this message because you are subscribed to the Google Groups "GENI Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geni-users+unsubscribe@googlegroups.com.

Divyashri Bhat

unread,
Feb 4, 2014, 10:31:35 AM2/4/14
to geni-...@googlegroups.com
Hi Paul,

I am trying to build an ExoGENI Ubuntu image with OVS and have been having issues with the image not booting. I am using the same base Ubuntu image as above. Does the issue here still persist?
--
Regards,
Divyashri Bhat
Graduate Student
University of Massachusetts, Amherst
Reply all
Reply to author
Forward
0 new messages