Google 网上论坛不再支持新的 Usenet 帖子或订阅项。历史内容仍可供查看。

nextboot loader diff

已查看 6 次
跳至第一个未读帖子

msm...@mass.dis.org

未读,
2002年5月9日 15:00:302002/5/9
收件人
> I've finally learned enough forth to put together a diff to implement some
> nextboot functionality in the loader.
>
> Basically, the loader peeks into the first line of /boot/nextboot.conf to
> see if nextboot_enable="YES" is there. If it is, it reads the entire
> config, then rewrites the first line to nextboot_enable="NO"

Don't do this. Put the variables directly into loader.conf. There's no
need for another file.

--
To announce that there must be no criticism of the president,
or that we are to stand by the president, right or wrong, is not
only unpatriotic and servile, but is morally treasonable to
the American public. - Theodore Roosevelt

To Unsubscribe: send mail to majo...@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

gor...@gnf.org

未读,
2002年5月9日 15:03:442002/5/9
收件人
On Thu, 9 May 2002, Michael Smith wrote:

> > I've finally learned enough forth to put together a diff to implement some
> > nextboot functionality in the loader.
> >
> > Basically, the loader peeks into the first line of /boot/nextboot.conf to
> > see if nextboot_enable="YES" is there. If it is, it reads the entire
> > config, then rewrites the first line to nextboot_enable="NO"
>
> Don't do this. Put the variables directly into loader.conf. There's no
> need for another file.

I think it's a bad idea to try to rewrite /boot/loader.conf. I purposely
went for /boot/nextboot.conf so that if something went wrong with the
rewrite, it (hopefully) wouldn't hose the user's settings.

-gordon

msm...@mass.dis.org

未读,
2002年5月9日 15:13:412002/5/9
收件人
> > > I've finally learned enough forth to put together a diff to implement som
> e
> > > nextboot functionality in the loader.
> > >
> > > Basically, the loader peeks into the first line of /boot/nextboot.conf to
>
> > > see if nextboot_enable="YES" is there. If it is, it reads the entire
> > > config, then rewrites the first line to nextboot_enable="NO"
> >
> > Don't do this. Put the variables directly into loader.conf. There's no
> > need for another file.
>
> I think it's a bad idea to try to rewrite /boot/loader.conf. I purposely
> went for /boot/nextboot.conf so that if something went wrong with the
> rewrite, it (hopefully) wouldn't hose the user's settings.

You're fooling yourself if you think that just because you're rewriting a
different file, "something going wrong" isn't going to hose the user
anyway.

You probably want to overwrite with "TRY" rather than "NO", too, since
userland needs something to key off to know that this is a 'next' boot.

Obviously, "TRY" then gets overwritten with "NO" on the next boot, but
the new kernel is not booted (this is the 'recovery') boot.

I still think you're not thinking the processes associated with this
feature through carefully enough.

= Mike

--
To announce that there must be no criticism of the president,
or that we are to stand by the president, right or wrong, is not
only unpatriotic and servile, but is morally treasonable to
the American public. - Theodore Roosevelt

To Unsubscribe: send mail to majo...@FreeBSD.org

gor...@gnf.org

未读,
2002年5月9日 15:47:202002/5/9
收件人
On Thu, 9 May 2002, Michael Smith wrote:

> > I've finally learned enough forth to put together a diff to implement some
> > nextboot functionality in the loader.
> >
> > Basically, the loader peeks into the first line of /boot/nextboot.conf to
> > see if nextboot_enable="YES" is there. If it is, it reads the entire
> > config, then rewrites the first line to nextboot_enable="NO"
>
> Don't do this. Put the variables directly into loader.conf. There's no
> need for another file.

There needs to be another file regardless. How else am I going to know
which are supposed to be used on the next boot and which are normal values
that the user has put in /boot/loader.conf?

-gordon

gor...@gnf.org

未读,
2002年5月9日 16:10:462002/5/9
收件人
On Thu, 9 May 2002, Michael Smith wrote:

> You're fooling yourself if you think that just because you're rewriting a
> different file, "something going wrong" isn't going to hose the user
> anyway.

True, but if I only hose /boot/nextboot.conf (which is going to be delete
when the machine enters multi-user anyway), I can contain any damage done.

> You probably want to overwrite with "TRY" rather than "NO", too, since
> userland needs something to key off to know that this is a 'next' boot.
>
> Obviously, "TRY" then gets overwritten with "NO" on the next boot, but
> the new kernel is not booted (this is the 'recovery') boot.

This doesn't really have any hooks into userland. It's for loader options.
I may not be understanding what you are trying to illustrate here.

> I still think you're not thinking the processes associated with this
> feature through carefully enough.

Very possible. This was just a first cut of the feature and I'll be the
first to admit that it's not pretty. I don't know forth so I was happy
to get as far as I did.

There isn't a notion of a recovery boot in this implementation. It either
tries the new options (specified in /boot/nextboot.conf) or it doesn't and
sticks to the defaults.

-gordon

tlam...@mindspring.com

未读,
2002年5月9日 18:02:502002/5/9
收件人
Michael Smith wrote:
> I still think you're not thinking the processes associated with this
> feature through carefully enough.

Liten to Mike; he is the loader guru.

I don't know how the file I/O is done for the "YES/NO" change,
since I have to have a couple of browsers open to read FORTH
code. 8-(.

However, the prototype I worked on at ClickArray with James
(both guys who eventually donated code in that area were
ClickArray folks) assumed that it would rewrite the file with
equal length contents, as being the only safe way to do a
write from the FORTH code.

The way it was planned to work out was to have a file that had
a line listing root devices, and then rotor through the list,
rewriting it, as part of the boot process, e.g.:

"device1 device1 device1 device2 device2 device2"
->
"device1 device1 device2 device2 device2 device1"
...

The existance or non-existance of the file was the "yes" or
"no".

This is slightly different than the code that Archie, Julian,
and Doug worked out before I started at Whistle (the original
"nextboot" code), but it has the same properties.

Unfortunately, it wasn't really possible (no room, no write
code in the boot2, no room for write code in the boot2!) to
keep the list in the boot sector.

Rewriting a file in any case, almost makes it so that the
boot code, containing the file, is on its own partition.
Otherwise, you get screwed when one partition fails and
the other does not (it's a cris-cross) and you go to update.
8-(.

So I definitely agree with Mike here.

Maybe you could ask Archie or Ambrisko to clarify the feature
you're trying to replace, and then ask Mike about the code
needed to do that?

-- Terry

tlam...@mindspring.com

未读,
2002年5月9日 18:10:462002/5/9
收件人
Gordon Tetlow wrote:
> > I still think you're not thinking the processes associated with this
> > feature through carefully enough.
>
> Very possible. This was just a first cut of the feature and I'll be the
> first to admit that it's not pretty. I don't know forth so I was happy
> to get as far as I did.
>
> There isn't a notion of a recovery boot in this implementation. It either
> tries the new options (specified in /boot/nextboot.conf) or it doesn't and
> sticks to the defaults.

You need to listen to Mike.

The primary reason this code was originally written was to permit
field upgrades by having a "ping-pong" boot.

If the boot of the newly installed system failed a number of times
in a row, then it fell back to using the old "root" device, which
contained the previous revision.

The point of the exercise is to not turn a working machine into
a doorstop with a field upgrade that fails, or happens to be a
bad and/or hacked load.

What problem are you trying to solve? If it isn't the one that
the code was originally intended to solve, then it must be some
other problem that we just aren't seeing?


Thanks,
-- Terry

mi...@freebsd.org

未读,
2002年5月9日 18:15:092002/5/9
收件人
Terry Lambert [tlam...@mindspring.com] wrote :

> Michael Smith wrote:
> > I still think you're not thinking the processes associated with this
> > feature through carefully enough.
>
> Liten to Mike; he is the loader guru.
>
> I don't know how the file I/O is done for the "YES/NO" change,
> since I have to have a couple of browsers open to read FORTH
> code. 8-(.
>
> However, the prototype I worked on at ClickArray with James
> (both guys who eventually donated code in that area were
> ClickArray folks) assumed that it would rewrite the file with
> equal length contents, as being the only safe way to do a
> write from the FORTH code.

This is the same code.

[ ... failover code deisgn ... ]

> Maybe you could ask Archie or Ambrisko to clarify the feature
> you're trying to replace, and then ask Mike about the code
> needed to do that?

Gordon is working on rewriting nextboot(8) so that it works again. I
shouldn't have to tell you to RTFM, Terry. ;P

--
Jonathan Mini <mi...@freebsd.org>
http://www.haikugeek.com

"He who is not aware of his ignorance will be only misled by his knowledge."
-- Richard Whatley

jul...@elischer.org

未读,
2002年5月10日 12:19:262002/5/10
收件人
Gordon Tetlow wrote:
>
> On Thu, 9 May 2002, Michael Smith wrote:
>
> > > I've finally learned enough forth to put together a diff to implement some
> > > nextboot functionality in the loader.
> > >
> > > Basically, the loader peeks into the first line of /boot/nextboot.conf to
> > > see if nextboot_enable="YES" is there. If it is, it reads the entire
> > > config, then rewrites the first line to nextboot_enable="NO"
> >
> > Don't do this. Put the variables directly into loader.conf. There's no
> > need for another file.
>
> There needs to be another file regardless. How else am I going to know
> which are supposed to be used on the next boot and which are normal values
> that the user has put in /boot/loader.conf?

I wrote the original 'nextboot to use block 1 (ususally unused)
to avoid under all circumstances writing into the filesystem.

Also, part of the weakness of the current system is that it presumes you know
which IS the root filesystem. The original nextboot took as part of the
information it loaded from block 1 (assumuing it checked out as a boot-spec
block) the partition to use as the root. If the root partition is totally hosed
you may not be able to READ /boot/{anything}. The original nextboot
was really a local hack to fix a local problem, but I was thinking of making it
more
'acceptable' to the world as a whole by making it look for a DOS partition of
some type, and {length 1,location 1} before loading information.

I deliberatly kept this information outside the filesystem.

>
> -gordon
>
> To Unsubscribe: send mail to majo...@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message

--
+------------------------------------+ ______ _ __
| __--_|\ Julian Elischer | \ U \/ / hard at work in
| / \ jul...@elischer.org +------>x USA \ a very strange
| ( OZ ) \___ ___ | country !
+- X_.---._/ presently in San Francisco \_/ \\
v

tlam...@mindspring.com

未读,
2002年5月10日 13:03:432002/5/10
收件人
Julian Elischer wrote:
> I wrote the original 'nextboot to use block 1 (ususally unused)
> to avoid under all circumstances writing into the filesystem.
>
> Also, part of the weakness of the current system is that it presumes you know
> which IS the root filesystem. The original nextboot took as part of the
> information it loaded from block 1 (assumuing it checked out as a boot-spec
> block) the partition to use as the root. If the root partition is totally hosed
> you may not be able to READ /boot/{anything}. The original nextboot
> was really a local hack to fix a local problem, but I was thinking of making it
> more
> 'acceptable' to the world as a whole by making it look for a DOS partition of
> some type, and {length 1,location 1} before loading information.
>
> I deliberatly kept this information outside the filesystem.

If you always write the blocks back to where you got them,
there's no real difference for a small file. Adjacent disk
blocks are adjacent disk blocks, and it doesn't matter how
they are located (hard coded constant value, or math on
hard coded constant values).

In general, what you want is a read-only boot FS that has
only the boot code and the loader stuff in it. The FORTH
fwrite code (if Jon Mini and James really did use my code
for part of it) ignores readability/writeability not enforced
by hardware.

The *only* think you want to be writing is those adjacent
blocks, which are associated with a file more as a convenience
than anything else.

The "nextboot" command itself needs to blow the file contents
directly; unfortunately, now that we do not have two devices
that reference the disk (one for mounting, one for blowing file
contents directly, as in this case), this means mounting the
boot FS read/write, blowing the file contents, and then mounting
it read-only again.

The (length l, location l) idea is the same thing LILO does; it
is much better to simply compute the location off constant FS
information, which is going to be there (and constant) anyway.

-- Terry

j...@freebsd.org

未读,
2002年5月10日 14:15:432002/5/10
收件人

On 10-May-2002 Julian Elischer wrote:
> You also had to have:
> 1/ a way of setting the boot specification list from the running system.
> 2/ a simple and unlikely-to-break method of ensuring that if the boot did NOT
> succeed, it did something DIFFERENT next time.
> 3/ the ability to read the specification information regardless of the state
> of the first filesystem (e.g. completely trashed).

If / is trashed, you can't load a kernel or loader from it anyways.

> 4/ The ability to specify a filesystem on another planet^H^H^H^H^H^Hdisk.

Something you've missed in this version of nextboot is:

5/ work on more than just i386

> My decisions were:
> A) make boot0 do the actual load of the spec from block1 immediatly after it
> had read block0.. All teh registers were set up correctly to read the next
> block.
> block1 is almost alway unused, and if it was used it wouldn't have the
> correct
> magic numbers and would be ignored.

This only works on i386.

--

John Baldwin <j...@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!" - http://www.FreeBSD.org/

tlam...@mindspring.com

未读,
2002年5月10日 16:43:372002/5/10
收件人
John Baldwin wrote:
> On 10-May-2002 Julian Elischer wrote:
> > You also had to have:

[ ... ]

> > 4/ The ability to specify a filesystem on another planet^H^H^H^H^H^Hdisk.
>
> Something you've missed in this version of nextboot is:
>
> 5/ work on more than just i386

[ ... ]

> This only works on i386.

Now is when I point out that the original nextboot predates the ELF
format conversion, as well as the new FORTH based loader code...
which predates running on anything other than i386 anyway (unless
you count my Motorolla Powerstack port, or Vogel's SPARC port,
back before the 4.4-Lite integration).

8-) 8-) 8-).

-- Terry

gor...@gnf.org

未读,
2002年5月10日 17:53:172002/5/10
收件人
Is there anything that is wrong with the conceptual implementation of the
nextboot loader code that I've submitted? It definitely needs a code
cleanup on the forth side (which I'm not qualified to do), but if there
are no other objections, I'd really like to see this code committed.

-gordon

tlam...@mindspring.com

未读,
2002年5月10日 18:10:032002/5/10
收件人
Gordon Tetlow wrote:
> Is there anything that is wrong with the conceptual implementation of the
> nextboot loader code that I've submitted? It definitely needs a code
> cleanup on the forth side (which I'm not qualified to do), but if there
> are no other objections, I'd really like to see this code committed.

There should be a list, so that in a brown-out or whatever, you
don't end up toggling back to the previous version accidently.

You should only ever rewrite the contents of a single file, and
it shouldn't be an important file.

The existance/non-existance of the single file should be enough
to trigger/suppress the nextboot behaviour.

Don't assume that the nextboot file will be on the same disk and/or
partition as the boot and other config file code.

--

Together, these things will allow the new code to solve the same
problem that the old code solved on the InterJet.

-- Terry

gor...@gnf.org

未读,
2002年5月10日 21:53:432002/5/10
收件人
On Fri, 10 May 2002, Terry Lambert wrote:

> Gordon Tetlow wrote:
> > Is there anything that is wrong with the conceptual implementation of the
> > nextboot loader code that I've submitted? It definitely needs a code
> > cleanup on the forth side (which I'm not qualified to do), but if there
> > are no other objections, I'd really like to see this code committed.
>
> There should be a list, so that in a brown-out or whatever, you
> don't end up toggling back to the previous version accidently.

This is not something that is meant for you to massage which root
partition you are going to boot up off of.

> You should only ever rewrite the contents of a single file, and
> it shouldn't be an important file.

Yes, that's exactly what my patch does.

> The existance/non-existance of the single file should be enough
> to trigger/suppress the nextboot behaviour.

I can't unlink files in the loader, so the presence of such a file
wouldn't help.

> Don't assume that the nextboot file will be on the same disk and/or
> partition as the boot and other config file code.

Well, I'm assuming it's on the root partition. It would be kinda silly for
it to anywhere else.

> Together, these things will allow the new code to solve the same
> problem that the old code solved on the InterJet.

I've never heard nor seen the old code. I don't know what it did, and I
don't particularly care. I did this because I thought the way Wes Peters
did his implementation was rather hackish (not saying mine is any better
=) and suboptimal if the machine doesn't make it to multi-user. Please
refer to the commit logs from earlier this month if you don't know of the
commit I'm referring to.

-gordon

tlam...@mindspring.com

未读,
2002年5月11日 03:58:032002/5/11
收件人
Gordon Tetlow wrote:

[ ... ]

You *did* ask for comments...


> > There should be a list, so that in a brown-out or whatever, you
> > don't end up toggling back to the previous version accidently.
>
> This is not something that is meant for you to massage which root
> partition you are going to boot up off of.

I don't understand what it does, then. The original Whistle code
was intended to attempt to boot 3 times from one partition, and
then 3 times from another.

If a boot was successful, then in the last rc file before the getty's
were started, it reset the list to 3 times the current root and 3
times the alternate root.

That way, on each success, the counter was reset, so in general, a
given root was sticky.

When the failure occurred, then the alternate root was the one whose
rc files ran, and it became the sticky one.

Worst case, you could power cycle a box three times quickly to force
a switch back to an older version.

The general failure case is not an indefinite hang, but a reset before
the rc file runs. This is particularly true when you have a hardware
watchdog, where the first thing that happens is the watchdog is set.

Note that images are tested before they are shipped, so the worst
case failure is "out of memory" or some other installation failure
related problem, and not a kernel problem, anyway.

I've personally had to solve this same problem several times now.


> > You should only ever rewrite the contents of a single file, and
> > it shouldn't be an important file.
>
> Yes, that's exactly what my patch does.

I don't understand the "YES"/"NO" thing, then. There is one byte
difference in the file length, which I don't think can be properly
accounted, if you do the "YES"/"NO" thing.


> > The existance/non-existance of the single file should be enough
> > to trigger/suppress the nextboot behaviour.
>
> I can't unlink files in the loader, so the presence of such a file
> wouldn't help.

The file is the nextboot.conf file. And unlinking it is not something
which you want to do, actually. I think we are misunderstanding each
other's intent here.


> > Don't assume that the nextboot file will be on the same disk and/or
> > partition as the boot and other config file code.
>
> Well, I'm assuming it's on the root partition. It would be kinda silly for
> it to anywhere else.

Not really. Consider that if I switch root partitions, then, by
definition, I switch nextboot files.

Basically, the InterJet was laid out:

boot code (including nextboot list)
/ #1 <- version X of the system (read only)
/ #2 <- version Y of the system (read only)
swap
/var <- log files and /tmp
/data <- user data (config, user files, etc.)

The fstab's on #1 and #2 were opposite, so that you could mount and
overwrite the contents with a new release of the software.

An upgrade was:

mount opposite "root"
unpack new system image onto opposite root
set up opposite root fstab
sync
unmount
nextboot "opposite opposite opposite this this this"
reboot

Each revision had data management upgrade/downgrade scripts; these
were written to /data, so that opposite versions could downgrade.


> > Together, these things will allow the new code to solve the same
> > problem that the old code solved on the InterJet.
>
> I've never heard nor seen the old code. I don't know what it did, and I
> don't particularly care. I did this because I thought the way Wes Peters
> did his implementation was rather hackish (not saying mine is any better
> =) and suboptimal if the machine doesn't make it to multi-user. Please
> refer to the commit logs from earlier this month if you don't know of the
> commit I'm referring to.

I do. He committed some, but not all, of the code that Jon Mini
and James wrote (Jon says some of it was based on code I wrote).
The design I did at ClickArray was based on the Whistle design
from when I worked at Whistle with Julian and Archie.

The ClickArray code, if it was intended to solve the problem that
the code it was supposedly derived from was intended to solve is
for solving the remote upgrade problem, with no local removable
media that can be used to recover from a catastrophic failure
(the only recovery from such a failure is a fallback to a working
previous revision, per the InterJet).

The code you are talking about seems limited to replacing only the
kernel. Frankly, that's recoverable via the serial console, if
you put the "-p" in the right file in /.

This isn't really sufficient for any embedded system that needs to
get at netstat, ps, or other data which involves examination of
kernel structures, which may change between kernel versions. You
pretty much have to have two system images to solve that problem,
or you'll find youself incredibly screwed, when the web UI, the
CLI, SNMP, and the front panel LCD all start reporting random bogus
data. 8-(.

I'm not trying to dump on your code; I'm just saying that it's
not solving the problem that the original code was added to be
able to solve, and that the original nextboot itself was intended
to resolve.

You asked for comments ...those are mine.

-- Terry

j...@freebsd.org

未读,
2002年5月11日 11:23:382002/5/11
收件人

On 11-May-2002 Terry Lambert wrote:
> I don't understand the "YES"/"NO" thing, then. There is one byte
> difference in the file length, which I don't think can be properly
> accounted, if you do the "YES"/"NO" thing.

He could make it NES for all it matters.

Terry, please see that Gordon isn't trying to reimplement Whistle's nextboot.

Personally, I'm tired of missing the window in the loader to boot a test
kernel, so what I want is to do 'nextboot foo -s' to boot /boot/foo/kernel
into single user mode on the next boot and fall back to /boot/kernel/kernel
on the next boot after that.

>> > Don't assume that the nextboot file will be on the same disk and/or
>> > partition as the boot and other config file code.
>>
>> Well, I'm assuming it's on the root partition. It would be kinda silly for
>> it to anywhere else.
>
> Not really. Consider that if I switch root partitions, then, by
> definition, I switch nextboot files.
>
> Basically, the InterJet was laid out:

Repeat after me: This nextboot != Interjet nextboot.

--

John Baldwin <j...@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!" - http://www.FreeBSD.org/

To Unsubscribe: send mail to majo...@FreeBSD.org

tlam...@mindspring.com

未读,
2002年5月11日 12:35:322002/5/11
收件人
John Baldwin wrote:
> Terry, please see that Gordon isn't trying to reimplement Whistle's nextboot.
>
> Personally, I'm tired of missing the window in the loader to boot a test
> kernel, so what I want is to do 'nextboot foo -s' to boot /boot/foo/kernel
> into single user mode on the next boot and fall back to /boot/kernel/kernel
> on the next boot after that.

[ ... ]

> Repeat after me: This nextboot != Interjet nextboot.

Repeat after Gordon:

Gordon Tetlow writes:
] Is there anything that is wrong with the conceptual implementation of the


] nextboot loader code that I've submitted? It definitely needs a code
] cleanup on the forth side (which I'm not qualified to do), but if there
] are no other objections, I'd really like to see this code committed.

He solicited comments/objections. I've commented/objected.

-- Terry

gor...@gnf.org

未读,
2002年5月11日 23:21:372002/5/11
收件人
On Sat, 11 May 2002, Terry Lambert wrote:

> > This is not something that is meant for you to massage which root
> > partition you are going to boot up off of.
>
> I don't understand what it does, then. The original Whistle code
> was intended to attempt to boot 3 times from one partition, and
> then 3 times from another.

I was thinking different kernel/kernel flags not different root
partitions. You could probably work something up to make it do different
root partitions, but this was sufficient for my needs.

[snip]

> I don't understand the "YES"/"NO" thing, then. There is one byte
> difference in the file length, which I don't think can be properly
> accounted, if you do the "YES"/"NO" thing.

Well, it's actually "YES"/"NO"<space> but the loader is smart enough to
ignore spaces.

[snip]

> The code you are talking about seems limited to replacing only the
> kernel. Frankly, that's recoverable via the serial console, if
> you put the "-p" in the right file in /.

Exactly. That was all this was meant to do. Look at it as a first
implementation. If you would like to take the patch I submitted and do
some more work to have the same functionality as InterJet's code, be my
guest. There would be alot of work, the first piece being an unbufferred
string searching function in Forth.

> This isn't really sufficient for any embedded system that needs to

This wasn't for embedded systems, this was for developer convenience.

> I'm not trying to dump on your code; I'm just saying that it's
> not solving the problem that the original code was added to be
> able to solve, and that the original nextboot itself was intended
> to resolve.

Yup, I know that.

Let's lay this discussion to rest and see what we need to do to get the
code committed and used.

-gordon

0 个新帖子