Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
SSH authentication down... again
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Pedro Morais  
View profile  
 More options Sep 23 2010, 6:31 am
From: Pedro Morais <morais.pe...@gmail.com>
Date: Thu, 23 Sep 2010 03:31:35 -0700 (PDT)
Local: Thurs, Sep 23 2010 6:31 am
Subject: SSH authentication down... again
Hi,

Once again we are unable to push our repos using SSH (we're getting
permission denied; user morais).

I'm tired of sending emails to support, so this time my rant will be
public.
A code hosting service needs to just work. bitbucket doesn't. It seems
every week there's a new, minor-but-blocking-getting-work-done,
problem.

Jesper, what is being done to improve the level of service?

Thanks,
Pedro


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Doug Hellmann  
View profile  
 More options Sep 23 2010, 7:21 am
From: Doug Hellmann <doug.hellm...@gmail.com>
Date: Thu, 23 Sep 2010 07:21:15 -0400
Local: Thurs, Sep 23 2010 7:21 am
Subject: Re: [Bitbucket] SSH authentication down... again

On Sep 23, 2010, at 6:31 AM, Pedro Morais wrote:

> Hi,

> Once again we are unable to push our repos using SSH (we're getting
> permission denied; user morais).

I'm having the same issue (user dhellmann).

> I'm tired of sending emails to support, so this time my rant will be
> public.
> A code hosting service needs to just work. bitbucket doesn't. It seems
> every week there's a new, minor-but-blocking-getting-work-done,
> problem.

Indeed, this is getting old.

Doug


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jespern  
View profile  
 More options Sep 23 2010, 7:39 am
From: jespern <jno...@gmail.com>
Date: Thu, 23 Sep 2010 04:39:11 -0700 (PDT)
Local: Thurs, Sep 23 2010 7:39 am
Subject: Re: SSH authentication down... again
On Sep 23, 9:21 pm, Doug Hellmann <doug.hellm...@gmail.com> wrote:

> On Sep 23, 2010, at 6:31 AM, Pedro Morais wrote:

> > Hi,

> > Once again we are unable to push our repos using SSH (we're getting
> > permission denied; user morais).

> I'm having the same issue (user dhellmann).

It should be OK again now.

> > I'm tired of sending emails to support, so this time my rant will be
> > public.
> > A code hosting service needs to just work. bitbucket doesn't. It seems
> > every week there's a new, minor-but-blocking-getting-work-done,
> > problem.

> Indeed, this is getting old.

Yes it is. For us as well.

Long story:

When we migrated the new hardware, some things changed on the backend.
On EC2, we had to scale out the repositories across several instances,
as the I/O throughput on a single instance wasn't enough to serve
everyone at decent speeds. For this reason, every load balancer needed
to have a copy of authorized_keys on it, and we did this by using a
"fanout" exchange in our queue (RabbitMQ.) Every time a key was added
(or removed), every load balancer would be told to update the file
accordingly.

On the new hardware, we have the luxury of actual physical hardware,
and thus, we decided on getting a monster fileserver (with redundancy
of course), and serve up everything to the frontends via NFS. We used
the old code from the load balancers to handle the key files just the
same.

What first happened was that every frontend (3) would simultaneously
receive a job going "hey, update the key file", and by doing so,
something would get jumbled every 25 keys or so, and somehow the file
would get truncated completely and we'd end up with a file on the NFS
mount with 1 or 2 keys in it, and everyone would be in trouble.

We handled this in the simplest way: Disable the job on every box, but
one. This worked fine for a while. Then, however, one of the frontends
died, and had to be rebooted. When it came back up, it faithfully
started the key-job again, and now we had 2 boxes manipulating the
keyfile, and the problem from the first time around came back, and we
had the keyfile wiped again.

After the migration to the new hardware, we focused mainly on
improving service in general, taking advantage of some of the luxuries
we couldn't before on EC2. One of them was to improve the way the
keyfile was handled. We got rid of the "fanout" exchange and switched
it over to "round robin" class job, so one server would pick up the
key job, manipulate the file, and every frontend would see the change.
This is what's supposed to be the final solution to this.

Right now, I'm not sure what wiped the file, but that is indeed what
happened again. I'm tired of having to deal with repercussions of the
migration--after all, we did this to improve general service, not
degrade it.

So what are we doing to improve the level of service? Well, for
starters, we've started to thoroughly document every disruption of
service we experience, and write a PIR (Post Incidence Response.)
These are visible to our new (managed) data center operators, where we
have a respectable SLA. Over the last incident, for example, they were
quick to restore the key file while we were asleep (not having 24h
coverage sucks, but at least they call us.)

As when we were on EC2, a lot of man hours are being spent on keeping
the service running, and while it hasn't been *very* evident yet, at
least to the outside, we *are* spending less time pulling out hair,
and we generally get better sleep. The plan is that this will lead to
more quality time to develop much-needed and much-wanted features for
everyone to peruse, and a lot less time having to keep the boat
afloat, so to speak.

We're very aware of how we look to the outside, and I personally more
than anything, want to make Bitbucket a kick ass product. I can't
emphasize enough how much the community support has helped us since
day 1, and I urge everyone to not despair. Changes are coming, a team
totaling 8 people has been hired (they're all starting mid-October),
and things will get better.

Hang in there with us. You'll see.

Jesper


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeff Squyres  
View profile  
 More options Sep 23 2010, 7:23 am
From: Jeff Squyres <jsquy...@gmail.com>
Date: Thu, 23 Sep 2010 04:23:08 -0700 (PDT)
Local: Thurs, Sep 23 2010 7:23 am
Subject: Re: SSH authentication down... again
+1 -- same issue here (again).  ssh authentication seems to have been
broken often lately.  :-(

This issue has also been reported here:

    http://bitbucket.org/jespern/bitbucket/issue/2194/cant-update-and-pus...

On Sep 23, 7:21 am, Doug Hellmann <doug.hellm...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Doug Hellmann  
View profile  
 More options Sep 23 2010, 8:14 am
From: Doug Hellmann <doug.hellm...@gmail.com>
Date: Thu, 23 Sep 2010 08:14:52 -0400
Local: Thurs, Sep 23 2010 8:14 am
Subject: Re: [Bitbucket] Re: SSH authentication down... again

On Sep 23, 2010, at 7:39 AM, jespern wrote:

> On Sep 23, 9:21 pm, Doug Hellmann <doug.hellm...@gmail.com> wrote:
>> On Sep 23, 2010, at 6:31 AM, Pedro Morais wrote:

>>> Hi,

>>> Once again we are unable to push our repos using SSH (we're getting
>>> permission denied; user morais).

>> I'm having the same issue (user dhellmann).

> It should be OK again now.

It is working for me now.  Thanks!

File-locking on NFS is notoriously unreliable.

> We handled this in the simplest way: Disable the job on every box, but
> one. This worked fine for a while. Then, however, one of the frontends
> died, and had to be rebooted. When it came back up, it faithfully
> started the key-job again, and now we had 2 boxes manipulating the
> keyfile, and the problem from the first time around came back, and we
> had the keyfile wiped again.

> After the migration to the new hardware, we focused mainly on
> improving service in general, taking advantage of some of the luxuries
> we couldn't before on EC2. One of them was to improve the way the
> keyfile was handled. We got rid of the "fanout" exchange and switched
> it over to "round robin" class job, so one server would pick up the
> key job, manipulate the file, and every frontend would see the change.
> This is what's supposed to be the final solution to this.

That does seem like it would be more reliable.

> Right now, I'm not sure what wiped the file, but that is indeed what
> happened again. I'm tired of having to deal with repercussions of the
> migration--after all, we did this to improve general service, not
> degrade it.

Perhaps 2 key updates came at the same time, and were distributed to 2 different servers to process?  It seems like a fail-over system, instead of round-robin, would be a more appropriate architecture here.  Pass all requests to the same server, unless it stops responding, then pick another one and give all of the requests to it, etc.  

I appreciate your openness about the behind-the-scenes processes and issues, Jesper.  It's the primary reason I haven't walked away already.  Service *has* become much more stable on your new hosting service.  It sounds like this ssh key issue is a little tricky, and I hope you can get it worked out soon.

Doug


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Azrul Rahim  
View profile  
 More options Sep 23 2010, 8:21 am
From: Azrul Rahim <write...@azrul.com>
Date: Thu, 23 Sep 2010 20:21:51 +0800
Local: Thurs, Sep 23 2010 8:21 am
Subject: Re: [Bitbucket] Re: SSH authentication down... again
I sign up for bitbucket specifically so that I don't have to k ow
about any of this.

On Sep 23, 2010, at 7:39 PM, jespern <jno...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »