Rename of symlink (as opposed to rm && ln)

531 views
Skip to first unread message

Dominic Scheirlinck

unread,
Mar 5, 2012, 10:52:41 PM3/5/12
to capis...@googlegroups.com
Hi,

I was wondering if anyone knows why Capistrano doesn't use an atomic rename(2) operation (say, a mv) to replace the current symlink. As it is, it seems to remove the symlink and, if successful doing so, create the new one in the same place. I'm worried that this could leave a gap of a fair few cycles where the symlink doesn't exist at all, and requests fail. In other words, it seems to use the shell for two consecutive operations that should really be a single as-quick-as-possible operation; preferably one where the current symlink is always readable and exists.

Regards,
Dominic

Lee Hambley

unread,
Mar 6, 2012, 2:37:25 AM3/6/12
to capis...@googlegroups.com
> I was wondering if anyone knows why Capistrano doesn't use an atomic rename(2) operation (say, a mv) to replace the current symlink.

I believe it has to do with the possibility that on a newly deployed server the symlink might not exist in the first place. So a `mv` would fail.

If this is really a reproducible problem for you (it's never been a problem anyone has reported in 5 years of Capistrano being mainstream) you can easily override `deploy:create_symlink` to meet your requirements, the original implementation, as I'm sure you saw is a handful of lines of very easy to read code.

- Lee

-- 
Lee Hambley
Sent with Sparrow

--
* You received this message because you are subscribed to the Google Groups "Capistrano" group.
* To post to this group, send email to capis...@googlegroups.com
* To unsubscribe from this group, send email to capistrano+...@googlegroups.com For more options, visit this group at http://groups.google.com/group/capistrano?hl=en

Dominic Scheirlinck

unread,
Mar 6, 2012, 5:17:52 AM3/6/12
to capis...@googlegroups.com
Hi Lee,

On Tue, Mar 6, 2012 at 8:37 PM, Lee Hambley <lee.h...@gmail.com> wrote:
> I believe it has to do with the possibility that on a newly deployed server
> the symlink might not exist in the first place. So a `mv` would fail.

The destination symlink location not existing? That wouldn't cause a
mv to fail, would it? Or the one which is being created by
deploy:create_symlink before it's moved into place? That possibility
doesn't make sense to me either.

To be clear, I don't mean mv the old symlink out of the way. I mean mv
the new one over the top - reads to the location should fall either
side of the rename, so none of them will fail. (Well, actually it
varies by *nix flavour - OS X, for example, broke this specification
right up until Lion: http://www.weirdnet.nl/apple/rename.html)

> If this is really a reproducible problem for you (it's never been a problem
> anyone has reported in 5 years of Capistrano being mainstream)

It's a race condition, so it's not easily reproducible under
production conditions. This one is pretty small too: you're unlikely
to see it "in the wild", unless your request rate is phenomenal, or
you're not deploying a website. Then again, in testing, it's extremely
reproducible, so I can't help but think it must be causing the odd
404. Especially consider that this bug is likely to be under-reported
by its nature: your next deploy would probably go perfectly, and you'd
have to be watching the logs very closely (even after several hundred
successful deploys) to be able to correlate one or two failed requests
back to a deployment problem.

So, to test: first, I set up a simple loop performing the symlink swap
(using the shell, just like Capistrano does) in a loop.

mkdir a b; echo -n a > a/foo; echo -n b > b/foo; ln -s a current
while true; do
rm current && ln -s a current
rm current && ln -s b current
done

Then I tried to repeatedly read the value in current/foo in another
thread (https://gist.github.com/1985290). At full bore reading, this
fails to get through even a single deployment: the python loop is
always managing to open the file faster than the rm && ln can
complete; it throws an IO error on the first attempt.

The symlink is only unreadable for a few thousandths of a second; my
read loop was doing about 11KHz. But if you're aiming for something
like C10K (or not serving web at all, but using Capistrano as a
generic deployment tool), then it begins to be more likely that you'll
see problems. And this isn't even considering some of the weirder
stuff that could happen: an interrupt causing execution jitter between
the rm and ln, or the filesystem being marked as read-only between
them, etc.

> you can easily override `deploy:create_symlink` to meet your requirements

Thanks, yeah, I will be. I was only asking to see if there was a
definite purpose or rationale behind it, or whether it had just always
been like that.


Dom

Reply all
Reply to author
Forward
0 new messages