I understand that there are many who are not really interested in binary
packages and are much more happy compiling everything from source but
for me
this is not a viable option, especially with large ports such as kde,
gnome, openoffice etc...
For the last couple of days I have been following the pointyhat build
statistics provided at
http://pointyhat.freebsd.org/errorlogs/packagestats.html
I was trying to understand how often should I expect updated packages
for 386-8-stable to hit the ftp sites. As seen on that page, the building
process started on Dec 3rd but had not been completed yet. I've been
keeping up a rough completion status by following the "not yet built" number
however, that number is standing at 889 for almost 24 hours now while
the status of the building process is still reported as running (as I am
writing
this, I just noticed that it is now marked as Not Running but the same
applies to the i386-7 build which has been stuck at 867 Not Yet Built for
roughly the same amount of time).
As I was watching the building stats for i386-8-stable, a build started
for amd64-8 that was completed in about 24 hours (I am monitoring this page
manually so times are not exact but are a fair estimate). Why is there
such a large difference between the build times on amd64 and i386? Are the
i386 machines really that underpowered?
Next I found this page which keeps track on the upload status of
packages to the various ftp sites
http://portsmon.freebsd.org/portsuploadstatus.py
If the statistics on that page are correct then it seems to me that
there is a lot of inefficiency in the build and upload process. Some
sites are rarely updated
and some poinyhat build runs are never uploaded. Take a look for example
at the status of amd64-7 and amd64-8 packages: updated packages
were compiled in the last few days (7) and a couple of hours ago (8) but
it does not reflect at the ftp sites.
Again, I realize that for the majority binary packages are not a high
priority and that the main purpose of pointyhat is to check for errors
in the ports
tree. I am also certainly no expert in this and I am learning as I go.
It's possible that I am misinterpreting the data I am seeing. I simply
want to make sense of all this and understand how it all fits together (and
save some compiling time...).
Oren
Brave man :-) That's one I set up.
> As seen on that page, the building process started on Dec 3rd but had
> not been completed yet.
Apparently the build for www/p5-Gtk2-WebKit is now hanging on all buildenvs.
Pav already marked it so on amd64.
It will continue to run until a reaper process kills it off (or one of
us portmgrs does it manually). I'd like to see the error log so I'm
going to let it run for now. The reaper process is IIRC 24 hours.
> Why is there such a large difference between the build times on amd64
> and i386? Are the i386 machines really that underpowered?
Two data points: one, it looks like Pav having marked www/p5-Gtk2-WebKit
as broken had already been taken into account for the amd64 build, so it
didn't have that problem. And two, some of our i386 machines are indeed
underpowered. We've added several new, more modern, ones this year that
were donated to us: these are dual 2.4 or 2.8GHz machines, mostly with
2G of RAM. (One of my background tasks is to try to characterize
performance on the nodes with various setups; my intuition is that 4G
would allow us to raise throughput, but I need to make a 'use case' for
that before I go ask for funding.)
fwiw, I continually look for new ways to scrounge more package building
nodes (I seem to have inherited the task of looking after them).
> Next I found this page which keeps track on the upload status of
> packages to the various ftp sites
> http://portsmon.freebsd.org/portsuploadstatus.py
That's mine too :-)
> If the statistics on that page are correct then it seems to me that
> there is a lot of inefficiency in the build and upload process.
With 11 active buildenvs, we have saturated the amount of data that the
sites can upload. We've discussed the matter before but no one has
come up with a solution. We try not to upload different package sets
at the same time, as a workaround.
> Some sites are rarely updated
Not all of the sites carry all of the buildenvs, and some of those that
do can run days behind.
Also, I don't have up-to-date contact information for the various sites.
If anyone has that, please let me know.
> and some poinyhat build runs are never uploaded.
Hmm, they should be. I'll forward this on to pav. (The way we have the
work divided up is that pav does amd64; erwin does i386; I do sparc64 and
the nascent ia64; and various portmgrs, including miwi, do the *-exp runs
which are intentionally not uploaded, but do constitute a load on both
pointyhat and the nodes.)
> I simply want to make sense of all this and understand how it all fits
> together
I've been trying to understand it for several years, so don't worry :-)
And I'm one of the people "in charge".
Longer explanation:
pointyhat throughput depends on a lot of factors, some of which I am in
the early stages of understanding.
- if a node hangs (but only in certain ways), the dispatch scheduler
can get into a state where it still tries to schedule builds on that
node, over and over. This causes an overall slowdown in build
dispatch. I'm not exactly sure of the root cause of the hangs, but
one of them is likely to be swap exhaustion which leads to sshd being
killed. (The most recent -current fixes this). Since the failures
are statistical, they are hard to catch. I have added some error
logging code to try to figure this out. As for the scheduler, there
is some missing functionality there. The code is complex so it's not
trivial to fix.
- pointyhat itself is a very heavily loaded machine. The most recent
problems we have been chasing are a) disk space exhaustion, and b)
disk controller saturation. For the former, we keep finding things
to evict. OTOH, with 16 buildenvs (counting the *-exp ones) there is
only so much we can do. When space is low, the rate of builds slows
down significantly, for reasons I do not understand yet. For the latter,
there are two processes that busy the controller: 1) compression of
saved logfiles, and 2) the ZFS backup process. I think I may have an
idea of how to fix 1); I will have to learn more about the way ZFS is
set up on pointyhat to fix that.
- pointyhat can get into situations where nfs timeouts from nfs mounted
filesystems (such as /home) crash the system. I don't know much about
this. Once that happens, we have to restart all the builds. Sometimes
it can take a little while for one of us to notice the crash.
- the scheduler has a bug where it occasionally crashes. I am actively
investigating this and have added a bunch of debug code to catch it
in the act. Again, when this happens, all the builds have to be
restarted. This was happening a lot in the first few days of December,
but seems to have settled down now.
The code that runs pointyhat is hundreds of lines of sh, awk, perl, and
python, and quite complex. Although these days I understand most of it
from a static sense, I'm still learning about its dynamic characteristics.
But now you know the contents of (part of) my todo list.
mcl
Good luck and thank you very much for a thorough and informative reply.
How much do you need? RAM isn't that expensive I think direct funding can
be organized without all that overhead.
--
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
> > and some poinyhat build runs are never uploaded.
>
> Hmm, they should be. I'll forward this on to pav. (The way we have the
> work divided up is that pav does amd64; erwin does i386; I do sparc64 and
Yes, I have decided not to upload some of the recent amd64 builds.
The reasoning is that it's better to have a complete packageset on the
mirrors with older software, than a new incomplete set that lacks
popular software like GNOME or KDE.
Ie. people can still add GNOME alas it's still previous GNOME release,
instead of being unable to add GNOME at all.
Hope this makes sense.
--
Pav Lucistnik <p...@oook.cz>
<p...@FreeBSD.org>
MIPS: Meaningless Information Provided by Salesmen
[...]
> Two data points: one, it looks like Pav having marked www/p5-Gtk2-WebKit
> as broken had already been taken into account for the amd64 build, so it
> didn't have that problem. And two, some of our i386 machines are indeed
> underpowered. We've added several new, more modern, ones this year that
> were donated to us: these are dual 2.4 or 2.8GHz machines, mostly with
> 2G of RAM. (One of my background tasks is to try to characterize
> performance on the nodes with various setups; my intuition is that 4G
> would allow us to raise throughput, but I need to make a 'use case' for
> that before I go ask for funding.)
>
> fwiw, I continually look for new ways to scrounge more package building
> nodes (I seem to have inherited the task of looking after them).
What is the policy for package building nodes? I mean, is it possible to
use some machines not owned directy by FreeBSD.org?
For example, I have spare machine in our rack which I can lend for some
period (until some production machine goes down and needs to be replaced
by this spare machine) or maybe I can set up some older unused machine
(IBM x336).
Is deploying of new node easy task or is it something special that is
not useful to do for relatively short period of time?
Miroslav Lachman
I was wondering about this myself. I have a dual quad system that is
almost completely idle most of the time and would love to see it used
for something helpful to the project. I can guarantee access to it for
at least a year if that helps.
Jonathan
That would be a "remote node". It's possible, but it's not for
everybody.
First, it will generate an obscene amount of network traffic, both ways.
Second, you need to fully surrender it and give us root on it.
Also remote console access or at least power toggle would be good.
Then Mark can borg it. :)
--
Pav Lucistnik <p...@oook.cz>
<p...@FreeBSD.org>
Eat when you are hungry, sleep when you are tired. Chase butterflies
when you want some fun.
Can't you just take a jail? Just wondering ...
> Can't you just take a jail? Just wondering ...
No, unfortunately, jail won't do. The box must be fully dedicated.
--
Pav Lucistnik <p...@oook.cz>
<p...@FreeBSD.org>
See file. Click file. Get file.