Defrag in linux? - Newbie question

mac

unread,

Sep 27, 2001, 9:30:07 PM9/27/01

to

Do I need to defrag the HD in Linux? If yes, how?

Mac

john

unread,

Sep 27, 2001, 9:28:20 PM9/27/01

to

mac wrote:

> Do I need to defrag the HD in Linux? If yes, how?
>
> Mac
>

No need to in linux....that is a relic of the fat hack called a file system

Yvan Loranger

unread,

Sep 27, 2001, 6:21:25 AM9/27/01

to

mac wrote:
> Do I need to defrag the HD in Linux? If yes, how?

No! ext2 filesystem is resilient to fragmentation.

--
Merci...........Yvan Why don´t people understand when
I say my automobile has 100 Megametres on it?

Soenke Petersen

unread,

Sep 28, 2001, 2:18:13 AM9/28/01

to

mac wrote:
>
> Do I need to defrag the HD in Linux? If yes, how?
>
> Mac

The command to check and repair a linux filesystem is

fsck

Are you looking for this?

Adrian 'Dagurashibanipal' von Bidder

unread,

Sep 28, 2001, 2:29:02 AM9/28/01

to

Behold! For mac declaimed:

> Do I need to defrag the HD in Linux? If yes, how?

There is an ext2defrag (or e2defrag?) tool. But basically, you don't need
to defrag since the file system code is much more clever than the FAT one
and tries to discourage fragmentation. You'll basically get a low
fragmentation (2-5% of the files or so), but this will remain constant
except when the partition gets very full and you do log of file
creations/deletions with extremely funny file sizes. I mean: you could
probably force fragmentation up if you tried, but it'll never happen in
everyday use.

-- vbi

philo

unread,

Sep 28, 2001, 3:11:14 AM9/28/01

to

actually...
upon bootup
your system will fsck if needed...
you may see this every so often

Skylar Thompson

unread,

Sep 27, 2001, 9:13:58 PM9/27/01

to

On Thu, 27 Sep 2001 21:30:07 -0400, mac <a...@aaa.com> wrote:
>Do I need to defrag the HD in Linux? If yes, how?

Nope. Ext2 is designed not to be affected by fragmentation until it is almost full. Fragmentation only
happens on filesystems designed for floppy disks that are implemented on billions of times larger than
its maximum expected size, like Microsloth's beloved FAT32.

--
-- Skylar Thompson (sky...@attglobal.net)

P(4.2.2) + "Skylar DXLIX" DMPo L:36 DL:2500' A++ R+++ Sp w:Stormbringer
A(JLE)*/P*/Z/J64/Ad L/O H+ D+ c f-/f PV+ s TT- d++/d+ P++ M/M+
C- S++ I+/I++ So B+ ac GHB++ SQ++ RQ+ V+ F:JLE F: Possessors strong again

cbbr...@acm.org

unread,

Sep 28, 2001, 7:20:37 AM9/28/01

to

Soenke Petersen <soenke....@bln.sbs.de> writes:
> mac wrote:
> >
> > Do I need to defrag the HD in Linux? If yes, how?
>

> The command to check and repair a linux filesystem is
> fsck
> Are you looking for this?

That doesn't _reorganize_ data on the filesystem, so it wouldn't make
much sense for that to be the case.
--
(reverse (concatenate 'string "moc.enworbbc@" "enworbbc"))
http://www.cbbrowne.com/info/defrag.html
Minds, like parachutes, only function when they are open.

cbbr...@acm.org

unread,

Sep 28, 2001, 7:29:24 AM9/28/01

to

sky...@utumno.attglobal.net (Skylar Thompson) writes:
> On Thu, 27 Sep 2001 21:30:07 -0400, mac <a...@aaa.com> wrote:
> >Do I need to defrag the HD in Linux? If yes, how?

> Nope. Ext2 is designed not to be affected by fragmentation until it
> is almost full. Fragmentation only happens on filesystems designed
> for floppy disks that are implemented on billions of times larger
> than its maximum expected size, like Microsloth's beloved FAT32.

Definitely not quite true:

- DBAs have to run defragmenters ("reorganizations") on Oracle
databases;

- VMS was noted for requiring defragging on its filesystems.

It's fair to say that ext2 isn't dramatically affected by
fragmentation until it is very full, and _any_ filesystem will have
some pathological conditions when it gets real full.

There is a "defrag" tool for ext2, but it is seldom used.
--
(concatenate 'string "cbbrowne" "@cbbrowne.com")
http://www.cbbrowne.com/info/defrag.html
Rules of the Evil Overlord #212. "I will not send out battalions
composed wholly of robots or skeletons against heroes who have qualms
about killing living beings. <http://www.eviloverlord.com/>

Tony Lawrence

unread,

Sep 28, 2001, 7:53:35 AM9/28/01

to

I just started reading Moshe Bar's "Linux File Systems". On page 68, he says
(about ext2 fs):

"However, contiguous file system block allocation will eventually end up with
file blocks fragmented across the file system, and file system access will
eventualy revert back to a random nature".

--
Tony Lawrence (to...@aplawrence.com)
SCO/Linux articles, help, book reviews, tests,
job listings and more : http://www.pcunix.com

Adrian 'Dagurashibanipal' von Bidder

unread,

Sep 28, 2001, 9:05:30 AM9/28/01

to

Behold! For Tony Lawrence declaimed:

> Adrian 'Dagurashibanipal' von Bidder wrote:
>>
>> Behold! For mac declaimed:
>>
>> > Do I need to defrag the HD in Linux? If yes, how?
>>
>> There is an ext2defrag (or e2defrag?) tool. But basically, you don't
>> need to defrag since the file system code is much more clever than the
>> FAT one and tries to discourage fragmentation. You'll basically get a
>> low fragmentation (2-5% of the files or so), but this will remain
>> constant except when the partition gets very full and you do log of
>> file creations/deletions with extremely funny file sizes. I mean: you
>> could probably force fragmentation up if you tried, but it'll never
>> happen in everyday use.
>
>
> I just started reading Moshe Bar's "Linux File Systems". On page 68, he
> says (about ext2 fs):
>
> "However, contiguous file system block allocation will eventually end up
> with file blocks fragmented across the file system, and file system
> access will eventualy revert back to a random nature".

Agreed. The question is, when is 'evenutally'. If the fs is, at one time,
quite full and you delete random files (randomly paced, that is), you
will of course get high fragmentation. Every filesystem is affected by
this fragmentation (ok, there's bound to be exceptions, but anyway), the
question is: how bad, and how soon.

-- vbi

Robert Heller

unread,

Sep 28, 2001, 6:44:13 AM9/28/01

to

mac <a...@aaa.com>,
In a message on Thu, 27 Sep 2001 21:30:07 -0400, wrote :

m> Do I need to defrag the HD in Linux? If yes, how?

No. Fragmentation is only a problem with the (bogus) FAT file system
(used by MS-DOS/MS-Windows).

m>
m> Mac
m>

--
\/
Robert Heller ||InterNet: hel...@cs.umass.edu
http://vis-www.cs.umass.edu/~heller || hel...@deepsoft.com
http://www.deepsoft.com /\FidoNet: 1:321/153

Otto Wyss

unread,

Sep 28, 2001, 4:40:58 PM9/28/01

to

> Do I need to defrag the HD in Linux? If yes, how?
>

Well first run the fsck program ("fsch -fnv /dev/...") and see how
fragmented your partitions really are. I just did it on mine and
discovered

/dev/hda3 2.4%
/dev/hda7 18.8%
/dev/hda8 24.5%

This shows clearly that depending on the kind of usage a significant
deframentation is possible. On hda7 and hda8 there are always lots of
file created and deleted, so it's no wounder they are rather heavy
defragmented.

I'm just using defrag on hda8 and see what's the outcome. So far I just
can say defragmenting needs an awfull lot of time. I'd say don't use
defrag below 10%.

O. Wyss

Lew Pitcher

unread,

Sep 28, 2001, 7:17:25 PM9/28/01

to

mac wrote:
>
> Do I need to defrag the HD in Linux? If yes, how?
>

Short answer: No.

Long answer: see below

In a single-user, single-tasking OS, it's best to keep all blocks for a
file together,
because _most_ of the disk accesses over a given period of time will be
against a
single file. In this scenario, the read-write heads of your HD advance
sequentially
through the hard disk. In the same sort of system, if your file is
fragmented, the
read-write heads jump all over the place, adding seek time to the hard
disk access time.

In a multi-user, multi-tasking, multi-threaded OS, many files are being
accessed at any
time, and, if left unregulated, the disk read-write heads would jump all
over the place
all the time. Even with 'defragmented' files, there would be as much
seek-time delay as
there would be with a single-user single-tasking OS and fragmented
files.

Fortunately, multi-user, multi-tasking, multi-threaded OSs are usually
built smarter than
that. Since file access is multiplexed from the point of view of the
device (multiple file
accesses from multiple, unrelated processes, with no order imposed on
the sequence of blocks
requested), the device driver reorders the requests into something
sensible for the device
(i.e elevator algorithm).

In other words, fragmentation is a concern when one (and only one)
process access data from
one (and only one) file. When more than one file is involved, the disk
addresses being
requested are 'fragmented' wrt the sequence that the driver has to
service them, and thus it
doesn't matter to the device driver whether or not a file was
fragmented.

To illustrate:

I have two programs executing simultaneously, each reading two different
files.

The files are organized sequentially (unfragmented) on disk...
[1.1][1.2][1.3][2.1][2.2][2.3][3.1][3.2][3.3][4.1][4.2][4.3][4.4]

Program 1 reads file 1, block 1
file 1, block 2
file 2, block 1
file 2, block 2
file 2, block 3
file 1, block 3

Program 2 reads file 3, block 1
file 4, block 1
file 3, block 2
file 4, block 2
file 3, block 3
file 4, block 4

The OS scheduler causes the programs to be scheduled and executed such
that the device
driver receives requests
file 3, block 1
file 1, block 1
file 4, block 1
file 1, block 2
file 3, block 2
file 2, block 1
file 4, block 2
file 2, block 2
file 3, block 3
file 2, block 3
file 4, block 4
file 1, block 3

Graphically, this looks like...

[1.1][1.2][1.3][2.1][2.2][2.3][3.1][3.2][3.3][4.1][4.2][4.3][4.4]
}------------------------------>[3.1]
[1.1]<-----------------------'
`-------------------------------------->[4.1]
[1.2]<---------------------------------'
`----------------------->[3.2]
[2.1]<-------------'
`---------------------------->[4.2]
[2.2]<-----------------------'
`------------->[3.3]
[2.3]<-------'
`---------------------------->[4.4]
[1.3]<------------------------------------------'

As you can see, the accesses are already 'fragmented' and we haven't
even reached the disk
yet. I have to stress this, the above situation is _no different_ from
an MSDOS single file
access against a fragmented file.

So, how do we minimize the effect seen above? If you are MSDOS, you
reorder the blocks on
disk to match the (presumed) order in which they will be requested.
OTOH, if you are Linux,
you reorder the _requests_ into a regular sequence that minimizes disk
access. You also
buffer most of the data in memory, and you only write dirty blocks. In
other words, you
minimize the effect of 'disk file fragmentation' as part of the other
optimizations you
perform on the _access requests_ before you execute them.

Now, this is not to say that 'disk file fragmentation' is a good thing.
It's just that
'disk file fragmentation' doesn't have the *impact* here that it would
have in MSDOS-based
systems. The performance difference between a 'disk file fragmented'
Linux file system and
a 'disk file unfragmented' Linux file system is minimal to none, where
the same performance
difference under MSDOS would be huge.

Under the right circumstances, fragmentation is a neutral thing, neither
bad nor good. As
to defraging a Linux filesystem (ext2fs), there are tools available, but
(because of the
design of the system) these tools are rarely (if ever) needed or used.
That's the impact
of designing up front the multi-processing/multi-tasking multi-user
capacity of the OS into
it's facilities, rather than tacking multi-processing/multi-tasking
multi-user support on
to an inherently single-processing/single-tasking single-user system.

--
Lew Pitcher

Master Codewright and JOAT-in-training
Registered Linux User #112576

Juha Siltala

unread,

Sep 29, 2001, 2:27:31 AM9/29/01

to

This is great Lew! I've been faced with this question many times but have
only been able to bumble something like 'it's no problem with linux, don't
worry about it.'

You should submit this to Linux Gazette, where this kind of information in
such easily accessable language is always greatly valued!
--

\Juha

john

unread,

Sep 29, 2001, 2:36:51 AM9/29/01

to

Juha Siltala wrote:

>> design of the system) these tools are rarely (if ever) needed or used.
>> That's the impact
>> of designing up front the multi-processing/multi-tasking multi-user
>> capacity of the OS into
>> it's facilities, rather than tacking multi-processing/multi-tasking
>> multi-user support on
>> to an inherently single-processing/single-tasking single-user system.
>>
>> --
>> Lew Pitcher
>>
>> Master Codewright and JOAT-in-training
>> Registered Linux User #112576
>

Yes lew but you cant agrue the fact that the fat file system was a hack
from the start and poorly designed

Paul Colquhoun

unread,

Sep 29, 2001, 3:40:01 AM9/29/01

to

There are a few other factors as well.

Linux/Unix also does "readahead". WHen you open a file and start reading
it, the OS will generally read *more* of the file than you ask for, so
the next reads will come from the file buffers.

Also file systems like UFS & EXT2 have some anti-fragmentation features.

Each filesystem (disk partition) is divided internally into sections
called "cylinder groups". The name comes from the layout of old disk
drives, but it doesn't correspond to modern drives, especially RAID
units, etc.

When a new directory is created, it is placed in the least-full cylinder
group. This spreads the usage out over the full partition.

New files are allocated to the same group as their directory (unless it
is full, etc).

Spreading the files out over the partitions does slow down access
slightly (the heads have to cross more tracks, on average), but
individual blocks of a file are likely to be close together. Combined
with readahead (and the disk access re-ordering Lew described above) this
more than makes up for the increase in seek times.

Even a badly fragmented file is normally still located in one cylinder
group, so defragmenting it would not be a large impovement. Only large
files are generally split between two (or more) groups, but (assuming
sequential writes) each group will have a large, sequential, piece of
the file and only jumping between the pieces will be affected.

Large filesystems can have quite a few cylinder groups. I have a 9.8 Gb
filesystem with 76 groups, each group is 128 Mb.

--
Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/~paulcol
Asking for technical help in newsgroups? Read this first:
http://www.tuxedo.org/~esr/faqs/smart-questions.html

Adrian 'Dagurashibanipal' von Bidder

unread,

Sep 29, 2001, 3:43:10 AM9/29/01

to

Behold! For Juha Siltala declaimed:
[...]
>> Long answer: see below
[long answer snipped]

> This is great Lew! I've been faced with this question many times but
> have only been able to bumble something like 'it's no problem with
> linux, don't worry about it.'
>
> You should submit this to Linux Gazette, where this kind of information
> in such easily accessable language is always greatly valued!

Seconded. This should definitely be Officially Archived.

-- vbi

Otto Wyss

unread,

Sep 29, 2001, 5:05:50 AM9/29/01

to

> /dev/hda7 18.8%
> /dev/hda8 24.5%
>
Applying defrag without any problems gives

/dev/hda7 1.3%
/dev/hda8 3.3%

Since I have no benchmark figures I can only say it feels a little more
responsive but this might as well just be imagination.

O. Wyss

Jean-David Beyer

unread,

Sep 29, 2001, 9:41:24 AM9/29/01

to

Another thing to consider is the re-ordering the disk driver does. I do
not know about Linux, but the last version of UNIX I looked at used
something like an elevator algorithm to reschedule the reads (and
writes). So if the driver got a batch of requests, apparantly random,
from the applications, it would re-order them to reduce seek times.
(assume the whole disk is blocks, numbered 0 to Max -1, in order from
the edge to the center, for example)

Say the following blocks were requested:

1, 101, 233, 234, 997, 51, 235, 102, ...

Then, if logically possible, it would actually read (assuming nothing in
the buffers and no predictions yet to warrant read-ahead)
1, 51, 101, 102, 233, 234, 235, 997.

Furthermore, once it read 101 and 102, it would infer that 103 might be
needed next and probably read the whole track where 102 was located at
the same time. Similarly when it read 234.

This could be quite interesting to watch in the old days when you could
see the position of the heads on the disk drives. On a lightly loaded
machine, the heads seemed to move around at random, but if 20 people
were compiling large suites of programs at the same time, the heads
moved smoothly from the outside edge to the axis and back most of the
time. Recall that moving the heads is the slowest actions a machine can
do, taking over a millisecond, sometimes MUCH MORE than a millisecond.

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ Registered Machine 73926.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 9:30am up 9 days, 16:27, 3 users, load average: 3.05, 3.05, 3.04

John Thompson

unread,

Sep 29, 2001, 9:06:45 AM9/29/01

to

john wrote:

> Yes lew but you cant agrue the fact that the fat file system was a hack
> from the start and poorly designed

Consider what FAT was designed for: 360k floppy disks. For that
it was fine, but over the years it's been cobbled up to work with
HD devices many orders of magnitude larger than a 360k floppy
disk. It shouldn't be any surprise that it has some problems in
that environment. In fact, one might argue that it is impressive
that it works at all on modern HD devices...

--

-John (John.T...@attglobal.net)

cbbr...@acm.org

unread,

Sep 29, 2001, 11:12:33 AM9/29/01

to

Lew Pitcher <lpit...@sympatico.ca> writes:
> Long answer: see below

_Excellent_ answer.

I would be glad to add that (suitably attributed, naturally) to the
URL below...

Note that Art Kagel made similar comments some time a couple of years
ago, albeit not with the outline of how the system might reorder block
requests to improve the situation...
--
(concatenate 'string "cbbrowne" "@acm.org")
http://www.cbbrowne.com/info/defrag.html
"We come to bury DOS, not to praise it."
-- Paul Vojta <vo...@math.berkeley.edu>, paraphrasing a quote of
Shakespeare

cbbr...@acm.org

unread,

Sep 29, 2001, 11:12:34 AM9/29/01

to

john <fa...@phoney.com> writes:
> Juha Siltala wrote:
> >> design of the system) these tools are rarely (if ever) needed or
> >> used. That's the impact of designing up front the
> >> multi-processing/multi-tasking multi-user capacity of the OS into
> >> it's facilities, rather than tacking
> >> multi-processing/multi-tasking multi-user support on to an
> >> inherently single-processing/single-tasking single-user system.

> Yes lew but you cant agrue the fact that the fat file system was a

> hack from the start and poorly designed

It was largely a replica of the CP/M file system that was largely
replicating some DEC file system (that I can't think of the name of
offhand), all of which were designed to cope with small floppy disks.

When CP/M was "king," there was no such thing as a 1GB hard drive;
that was the sort of quantity of storage that the biggest "big iron"
mainframe shops probably didn't have because it would be vastly too
expensive.

FAT is fairly acceptable as a storage scheme for small filesystems
fundamentally involving small numbers of files; the problem is that it
suffers when the number of files grows to stupendous numbers as
happens with a Windows install.

I have no objection to using FAT for stuff like floppies, ZIP disks,
and stuff like "SmartMedia" memory cards for digital cameras and MP3
players. It's simple to implement, and its behaviour is pretty well
understood, which is good for these sorts of applications.

--
(concatenate 'string "cbbrowne" "@acm.org")

http://www.ntlug.org/~cbbrowne/rdbms.html

ANT...@zimage.com

unread,

Sep 29, 2001, 5:36:07 PM9/29/01

to

Good stuff! Someone should put this on /. ;).

> Short answer: No.

> Long answer: see below

> To illustrate:

> Graphically, this looks like...

--
"Individually, ants are stupid. Together, they're brilliant."
--unknown
--
If you are replying to Ant's news post by e-mail, then please kindly
remove ANT in the e-mail addresses listed below. Note the CaSe!
----------------------------------------------------------------------
/\___/\ E-mail: phi...@earthlink.netANT or phi...@apu.eduANT
/ /\ /\ \
| |. .| | The Ant Farm: http://antfarm.home.dhs.org
\ _ /
( ) ICQ UIN: 2223658. Resume: http://ptp-resume.home.dhs.org

Robert Heller

unread,

Sep 29, 2001, 8:24:43 PM9/29/01

to

john <fa...@phoney.com>,
In a message on Sat, 29 Sep 2001 02:36:51 -0400, wrote :

j> Juha Siltala wrote:
j>
j> >> design of the system) these tools are rarely (if ever) needed or used.
j> >> That's the impact
j> >> of designing up front the multi-processing/multi-tasking multi-user
j> >> capacity of the OS into
j> >> it's facilities, rather than tacking multi-processing/multi-tasking
j> >> multi-user support on
j> >> to an inherently single-processing/single-tasking single-user system.
j> >>
j> >> --
j> >> Lew Pitcher
j> >>
j> >> Master Codewright and JOAT-in-training
j> >> Registered Linux User #112576
j> >
j>
j>
j> Yes lew but you cant agrue the fact that the fat file system was a hack
j> from the start and poorly designed
j>

The FAT file system was designed for *floppy* disks. It is a simple,
*low-overhead* file system. Fragmentation is never a serious issue
with a floppy disk -- floppies don't have enough capacity for
fragmentation to matter. And defraging a floppy is trivial: 'format
B:', 'copy a:*.* b:*.*' is quite effective -- having two floppy drives
of the (or close to the) same capacity is more then cost effective (and
was common in the 'early days' before hard disks where the rule).
Microsoft's 'Blunder #1': not designing a proper file system for the
hard drive and instead just 'extending' the floppy disk file system for
use with hard drives. Yes the *early* hard drives were small (5meg),
but Microsoft seems to have no clue as to 'future' technology. Which
is strange, since Mr. Bill is a Trekie -- he even showed up at a
stockholder meeting in a Star Fleet uniform once.

cbbr...@acm.org

unread,

Sep 29, 2001, 9:05:16 PM9/29/01

to

Robert Heller <hel...@deepsoft.com> writes:
> Yes the *early* hard drives were small (5meg), but Microsoft seems
> to have no clue as to 'future' technology. Which is strange, since
> Mr. Bill is a Trekie -- he even showed up at a stockholder meeting
> in a Star Fleet uniform once.

Actually, if Mr Bill bases his notions of 'future technology' on Star
Trek, he gets the Hollywood-ized guesses of what might happen someday
based on what someone Hollywood guy saw in a PC store 3 years ago.

A wiser place to go would be to visit MIT, CMU, Stanford, ETH,
UC-Berkeley, and such, actually talking with professorial types that
might know a bit about what's "bleeding edge" in CS right now that
might conceivably be able to be commercialized ten years from now.

If the "forward thinking" comes from Star Trek, then it would make
_perfect sense_ that they have no clue as to 'future technology.'

--
(concatenate 'string "cbbrowne" "@acm.org")

http://www.cbbrowne.com/info/advocacy.html
"Heavy music didn't start in Seattle. It started in Katy, Texas with
King's X" -- Jeff Ament/Pearl Jam

D. Stussy

unread,

Sep 29, 2001, 9:21:26 PM9/29/01

to

However, he can't really take credit for this. The concept was published in
several books regarding operating systems in the 1980's (if not earlier, such
as in the 1960's by IBM in their hard disk manuals). The topic of disk access
is usually covered in college/university operating systems classes, so most
professionals (at least those with a college degree) already know this stuff.

cbbr...@acm.org

unread,

Sep 29, 2001, 9:34:15 PM9/29/01

to

He may not be able to take credit for inventing the ideas, but he can
_certainly_ take credit for having written a clear news posting on the
subject.
--
(concatenate 'string "cbbrowne" "@ntlug.org")
http://www.ntlug.org/~cbbrowne/sgml.html
"How can you dream the impossible dream when you can't get any sleep?"
-- Sam Robb

John Thompson

unread,

Sep 29, 2001, 10:22:46 PM9/29/01

to

Robert Heller wrote:

> The FAT file system was designed for *floppy* disks. It is a simple,
> *low-overhead* file system. Fragmentation is never a serious issue
> with a floppy disk -- floppies don't have enough capacity for
> fragmentation to matter. And defraging a floppy is trivial: 'format
> B:', 'copy a:*.* b:*.*' is quite effective -- having two floppy drives
> of the (or close to the) same capacity is more then cost effective (and
> was common in the 'early days' before hard disks where the rule).
> Microsoft's 'Blunder #1': not designing a proper file system for the
> hard drive and instead just 'extending' the floppy disk file system for
> use with hard drives. Yes the *early* hard drives were small (5meg),
> but Microsoft seems to have no clue as to 'future' technology. Which
> is strange, since Mr. Bill is a Trekie -- he even showed up at a
> stockholder meeting in a Star Fleet uniform once.

It should be noted that Microsoft did design a decent,
fragmentation-resistant filesystem for the joint IBM/Microsoft
OS/2 project: HPFS (High Performance File System). HPFS has held
up quite well over the years (even though OS/2 itself hasn't, for
reasons largely unrelated to its technical merits). Why
Microsoft didn't use HPFS with Windows 9x and up is beyond me.

--

-John (John.T...@attglobal.net)

Juha Siltala

unread,

Sep 30, 2001, 4:07:53 AM9/30/01

to

True. However, a novel, easily accessible everyman's introduction to the
subject is needed. If I write a nice article about Plato's thoughts, I will
sign it as usual and no one will say I'm stealing Plato's work.
--

\Juha

Robert Heller

unread,

Sep 30, 2001, 10:24:50 AM9/30/01

to

"D. Stussy" <kd6...@bde-arc.ampr.org>,
In a message on Sun, 30 Sep 2001 01:21:26 GMT, wrote :

"S> On Sat, 29 Sep 2001, Adrian 'Dagurashibanipal' von Bidder wrote:
"S> >Behold! For Juha Siltala declaimed:
"S> >[...]
"S> >>> Long answer: see below
"S> >[long answer snipped]
"S> >
"S> >> This is great Lew! I've been faced with this question many times but
"S> >> have only been able to bumble something like 'it's no problem with
"S> >> linux, don't worry about it.'
"S> >>
"S> >> You should submit this to Linux Gazette, where this kind of information
"S> >> in such easily accessable language is always greatly valued!
"S> >
"S> >Seconded. This should definitely be Officially Archived.
"S>
"S> However, he can't really take credit for this. The concept was published in
"S> several books regarding operating systems in the 1980's (if not earlier, such
"S> as in the 1960's by IBM in their hard disk manuals). The topic of disk access
"S> is usually covered in college/university operating systems classes, so most
"S> professionals (at least those with a college degree) already know this stuff.
"S>
"S>

Except the people at Microsoft???

Robert Heller

unread,

Sep 30, 2001, 10:24:48 AM9/30/01

to

cbbr...@acm.org,
In a message on Sun, 30 Sep 2001 01:05:16 GMT, wrote :

c> Robert Heller <hel...@deepsoft.com> writes:
c> > Yes the *early* hard drives were small (5meg), but Microsoft seems
c> > to have no clue as to 'future' technology. Which is strange, since
c> > Mr. Bill is a Trekie -- he even showed up at a stockholder meeting
c> > in a Star Fleet uniform once.
c>
c> Actually, if Mr Bill bases his notions of 'future technology' on Star
c> Trek, he gets the Hollywood-ized guesses of what might happen someday
c> based on what someone Hollywood guy saw in a PC store 3 years ago.

Whatever one can say about the 'fantasy world' of Hollywood, at least
the idea of a disk drive larger than 30meg (and/or more than 1024 cylinders)
or at least something happening after Dec. 31, 1999 should at least be
considered as a remote possiblity...

c>
c> A wiser place to go would be to visit MIT, CMU, Stanford, ETH,
c> UC-Berkeley, and such, actually talking with professorial types that
c> might know a bit about what's "bleeding edge" in CS right now that
c> might conceivably be able to be commercialized ten years from now.
c>
c> If the "forward thinking" comes from Star Trek, then it would make
c> _perfect sense_ that they have no clue as to 'future technology.'
c> --
c> (concatenate 'string "cbbrowne" "@acm.org")
c> http://www.cbbrowne.com/info/advocacy.html
c> "Heavy music didn't start in Seattle. It started in Katy, Texas with
c> King's X" -- Jeff Ament/Pearl Jam
c>

Charles Sullivan

unread,

Sep 30, 2001, 11:24:30 AM9/30/01

to

In article <3BB50505...@sympatico.ca>, "Lew Pitcher"
<lpit...@sympatico.ca> wrote:

> mac wrote:
>>
>> Do I need to defrag the HD in Linux? If yes, how?
>>
>>
> Short answer: No.
>
> Long answer: see below

Great answer Lew! However it's been expounded (without explanation)
numerous times in the various Linux newsgroups that the ext2 file system
itself is designed to resist fragmentation, and in fact the percent
fragmentation displayed when fsck is run at boot time does almost always
appear small.

Can you say something about this? For example is the displayed percent
fragmentation real or somehow a "virtual" fragmentation.

cbbr...@acm.org

unread,

Sep 30, 2001, 2:57:33 PM9/30/01

to

"Charles Sullivan" <cwsu...@triad.rr.com> writes:
> Great answer Lew! However it's been expounded (without explanation)
> numerous times in the various Linux newsgroups that the ext2 file system
> itself is designed to resist fragmentation, and in fact the percent
> fragmentation displayed when fsck is run at boot time does almost always
> appear small.

> Can you say something about this? For example is the displayed
> percent fragmentation real or somehow a "virtual" fragmentation.

Personally, I don't entirely believe the "designed to resist
fragmentation" line.

It is mentioned in tremendously general terms in
<http://e2fsprogs.sourceforge.net/ext2intro.html>
Design and Implementation of the Second Extended Filesystem

In particular, the reference indicates that EXT (the predecessor of
EXT2) had the problem that "as the filesystem was used, the lists
became unsorted and the filesystem became fragmented."

As a result Xia and ext2 were produced; Xia basically extending Minix
FS functionality, but ext2 taking a more ambitious tack.

The one place that suggests "resistance to fragmentation" is thus:

"When writing data to a file, Ext2fs preallocates up to 8 adjacent
blocks when allocating a new block. Preallocation hit rates around
75% even on very full filesystems. This preallocation achieves good
write performances under heavy load. It also allows contiguous blocks
to be allocated to files, thus it speeds up the future sequential
reads."

That addresses the notion that ext2 does some "pre-planning" that
allows it to be _resistant_ to fragmentation. It _doesn't_ indicate
anything that would imply that fragmentation would be reduced on the
fly in any other manner.

The above means that until the FS gets "full," it has some ability to
resist fragmentation. But once it _gets full_, it's not clear that
there's any further resistance to be made...
--
(concatenate 'string "aa454" "@freenet.carleton.ca")
http://www.ntlug.org/~cbbrowne/lsf.html
"If we believe in data structures, we must believe in independent
(hence simultaneous) processing. For why else would we collect items
within a structure? Why do we tolerate languages that give us the one
without the other?" -- Alan Perlis

Charles Sullivan

unread,

Sep 30, 2001, 11:30:22 PM9/30/01

to

In article <xWJt7.400$4l5.2...@news20.bellglobal.com>, "cbbrowne"
<cbbr...@acm.org> wrote:

At some point in the development of MS-DOS, the attempt was made to at
least defer the onset of VFAT fragmentation by writing files into the
largest stretch of unused clusters, which until the top of the disk
memory was reached, meant only starting files in the space immediately
above the highest used cluster. I haven't looked at this in years and
don't know what Windows is doing.

From your description it sounds like ext2 is not in itself much better
in this regard, so that improvement in Linux comes about through the
methods described by Lew. But in that case it would seem like a VFAT
file system would be no worse than an ext2 file system.

Lew Pitcher

unread,

Sep 30, 2001, 3:40:31 AM9/30/01

to

Guilty as charged. I've simply condensed a couple of books (Andy
Tanenbaum's "Operating Systems: Design and Implementation" and Don
Knuth's "The Art of Computer Programming") into something that will fit
on one page.

OTOH, the question keeps coming up, and although my little document
doesn't go into the details of how Linux specifically handles disk
management (i.e. source program names and statement numbers), the
one-pager certainly makes the issues clearer.

Lew Pitcher

unread,

Sep 30, 2001, 6:19:22 PM9/30/01

to

cbbr...@acm.org wrote:
>
> Lew Pitcher <lpit...@sympatico.ca> writes:
> > Long answer: see below
>
> _Excellent_ answer.
>
> I would be glad to add that (suitably attributed, naturally) to the
> URL below...

Please, be my guest. I would be most honoured to have my humble efforts
added to global knowledgebase <vbg>.

> Note that Art Kagel made similar comments some time a couple of years
> ago, albeit not with the outline of how the system might reorder block
> requests to improve the situation...
> --
> (concatenate 'string "cbbrowne" "@acm.org")
> http://www.cbbrowne.com/info/defrag.html
> "We come to bury DOS, not to praise it."
> -- Paul Vojta <vo...@math.berkeley.edu>, paraphrasing a quote of
> Shakespeare

--

Yvan Loranger

unread,

Oct 1, 2001, 6:16:31 PM10/1/01

to

Charles Sullivan wrote:
> In article <xWJt7.400$4l5.2...@news20.bellglobal.com>, "cbbrowne"
> <cbbr...@acm.org> wrote:
>>Personally, I don't entirely believe the "designed to resist
>>fragmentation" line.
>>
>>It is mentioned in tremendously general terms in
>><http://e2fsprogs.sourceforge.net/ext2intro.html> Design and
>>Implementation of the Second Extended Filesystem
>>
>>In particular, the reference indicates that EXT (the predecessor of
>>EXT2) had the problem that "as the filesystem was used, the lists became
>>unsorted and the filesystem became fragmented."
>>
>>As a result Xia and ext2 were produced; Xia basically extending Minix FS
>>functionality, but ext2 taking a more ambitious tack.
>>
>>The one place that suggests "resistance to fragmentation" is thus:
>>
>>"When writing data to a file, Ext2fs preallocates up to 8 adjacent
>>blocks when allocating a new block. Preallocation hit rates around 75%
>>even on very full filesystems. This preallocation achieves good write
>>performances under heavy load. It also allows contiguous blocks to be
>>allocated to files, thus it speeds up the future sequential reads."

The conclusion to this last sentence is that file fragmentation *does*
matter!

>>That addresses the notion that ext2 does some "pre-planning" that allows
>>it to be _resistant_ to fragmentation. It _doesn't_ indicate anything
>>that would imply that fragmentation would be reduced on the fly in any
>>other manner.
>>
>>The above means that until the FS gets "full," it has some ability to
>>resist fragmentation. But once it _gets full_, it's not clear that
>>there's any further resistance to be made...
>
> At some point in the development of MS-DOS, the attempt was made to at
> least defer the onset of VFAT fragmentation by writing files into the
> largest stretch of unused clusters, which until the top of the disk
> memory was reached, meant only starting files in the space immediately
> above the highest used cluster. I haven't looked at this in years and
> don't know what Windows is doing.

That scheme would unfortunately tend to fragment free space. I remember
the somewhat similar concept of adding/extending a file not at first
avail. cluster [picture ongoing deletions too] but at first avail
cluster beyond the last file added/extended, in the name of increased
contiguity.

> From your description it sounds like ext2 is not in itself much better
> in this regard, so that improvement in Linux comes about through the
> methods described by Lew. But in that case it would seem like a VFAT
> file system would be no worse than an ext2 file system.

--
Merci...........Yvan Why don´t people understand when
I say my automobile has 100 Megametres on it?

cbbr...@acm.org

unread,

Oct 1, 2001, 7:40:54 PM10/1/01

to

bq...@freenet.carleton.ca (Yvan Loranger) writes:
> Charles Sullivan wrote:
> > In article <xWJt7.400$4l5.2...@news20.bellglobal.com>, "cbbrowne"
> > <cbbr...@acm.org> wrote:
> >>Personally, I don't entirely believe the "designed to resist
> >>fragmentation" line.
> >>
> >>It is mentioned in tremendously general terms in
> >><http://e2fsprogs.sourceforge.net/ext2intro.html> Design and
> >>Implementation of the Second Extended Filesystem
> >>
> >>In particular, the reference indicates that EXT (the predecessor of
> >>EXT2) had the problem that "as the filesystem was used, the lists became
> >>unsorted and the filesystem became fragmented."
> >>
> >>As a result Xia and ext2 were produced; Xia basically extending Minix FS
> >>functionality, but ext2 taking a more ambitious tack.
> >>
> >>The one place that suggests "resistance to fragmentation" is thus:
> >>
> >>"When writing data to a file, Ext2fs preallocates up to 8 adjacent
> >>blocks when allocating a new block. Preallocation hit rates around 75%
> >>even on very full filesystems. This preallocation achieves good write
> >>performances under heavy load. It also allows contiguous blocks to be
> >>allocated to files, thus it speeds up the future sequential reads."

> The conclusion to this last sentence is that file fragmentation
> *does* matter!

Yes, fragmentation _does_ matter. And there seems to be some
"resistance" falling out of this.

What's not clear is what degree of "resistance" to fragmentation this
strategy provides.

--
(concatenate 'string "cbbrowne" "@acm.org")

http://www.cbbrowne.com/info/wp.html
Rules of the Evil Overlord #172. "I will allow guards to operate under
a flexible work schedule. That way if one is feeling sleepy, he can
call for a replacement, punch out, take a nap, and come back refreshed
and alert to finish out his shift. <http://www.eviloverlord.com/>

Charles Sullivan

unread,

Oct 1, 2001, 9:46:29 PM10/1/01

to

In article <aa7u7.4967$Uf2.8...@news20.bellglobal.com>, "cbbrowne"
<cbbr...@acm.org> wrote:

From the percent fragmentation displayed when fsck is run, it seems to
be pretty good, at least for large partitions on my system. But maybe
that's because the total used space is not that high. I remember seeing
the percent fragmentation climb somewhat for my 15 Meg /boot partition
in a previous installation of Linux when I was continually trying out
newer kernels.

cbbr...@acm.org

unread,

Oct 1, 2001, 10:19:55 PM10/1/01

to

Well, the case where fragmentation is likely to proliferate as well as
cause problems is when the amount of used space _does_ get way high.

The thing I'd be curious to hear is what happens if you take a
filesystem, fill it pretty much chock full, thus ending the ability to
ensure that blocks can cluster near one another, and _then_ chop out
some files.

The interesting result would be if it tended to get a less fragmented
over time; that would be a good sign of "resistance to fragmentation."

It seems somewhat more likely that once fragmented, it would have a
hard time improving on this.

--
(concatenate 'string "cbbrowne" "@ntlug.org")

http://www.cbbrowne.com/info/nonrdbms.html Every program is a part of
some other program and rarely fits. -- Alan Perlis

Floyd Davidson

unread,

Oct 2, 2001, 12:54:10 AM10/2/01

to

cbbr...@acm.org wrote:
>"Charles Sullivan" <cwsu...@triad.rr.com> writes:
>> In article <aa7u7.4967$Uf2.8...@news20.bellglobal.com>, "cbbrowne"
>> <cbbr...@acm.org> wrote:
>> > Yes, fragmentation _does_ matter. And there seems to be some
>> > "resistance" falling out of this.
>> >
>> > What's not clear is what degree of "resistance" to fragmentation this
>> > strategy provides.
>
>> From the percent fragmentation displayed when fsck is run, it seems
>> to be pretty good, at least for large partitions on my system. But
>> maybe that's because the total used space is not that high. I
>> remember seeing the percent fragmentation climb somewhat for my 15
>> Meg /boot partition in a previous installation of Linux when I was
>> continually trying out newer kernels.
>
>Well, the case where fragmentation is likely to proliferate as well as
>cause problems is when the amount of used space _does_ get way high.
>
>The thing I'd be curious to hear is what happens if you take a
>filesystem, fill it pretty much chock full, thus ending the ability to
>ensure that blocks can cluster near one another, and _then_ chop out
>some files.

Then (assuming significant space was free'd up) you are once
again using a filesystem that is not close to being full, and
new date being written will not be fragmented. But existing
files that were fragmented when the filesystem was full will
remain fragmented until they are re-written.

However, it is highly debatable that fragmented filesystems are
detrimental to system performance in any significant way.

1) It won't be detrimental if it is read only one time.

2) It won't be detrimental if it is read from disk cache or
buffer rather than from the disk itself.

>The interesting result would be if it tended to get a less fragmented
>over time; that would be a good sign of "resistance to fragmentation."

That is not resistance to fragmentation though. The ext2
filesystem resists fragmentation by selecting adjacent blocks in
groups to write to. But it does not dynamically re-write
fragmented files to "correct" fragmentation.

>It seems somewhat more likely that once fragmented, it would have a
>hard time improving on this.

I suspect that it is hard enough to do, that it is not done
because the benefit is so small. The effects of fragmentation
on a multiuser/multitasking system that orders reads and writes
and has automatic disk read buffering and sector caching just is
not significant enough to make cpu cycles worth wasting on
"correcting" a fragmented filesystem. It would be a net loss.

--
Floyd L. Davidson <http://www.ptialaska.net/~floyd>
Ukpeagvik (Barrow, Alaska) fl...@barrow.com

Yvan Loranger

unread,

Oct 2, 2001, 11:33:11 AM10/2/01

to

Philosophically speaking, degree of fragmentation is rather meaningless,
as a filesystem might achieve good performance on a highly-fragmented
disk or poor performance on a lightly-fragmented disk [esp. if what
little frag exists is really pathological; like the most-used files
being spread all over, etc.]. What we need is specific benchmarking.

Mats Wichmann

unread,

Oct 2, 2001, 1:28:51 PM10/2/01

to

On Sun, 30 Sep 2001 18:57:33 GMT, cbbr...@acm.org wrote:

:The one place that suggests "resistance to fragmentation" is thus:

:
:"When writing data to a file, Ext2fs preallocates up to 8 adjacent
:blocks when allocating a new block. Preallocation hit rates around
:75% even on very full filesystems. This preallocation achieves good
:write performances under heavy load. It also allows contiguous blocks
:to be allocated to files, thus it speeds up the future sequential
:reads."
:
:That addresses the notion that ext2 does some "pre-planning" that
:allows it to be _resistant_ to fragmentation. It _doesn't_ indicate
:anything that would imply that fragmentation would be reduced on the
:fly in any other manner.
:
:The above means that until the FS gets "full," it has some ability to
:resist fragmentation. But once it _gets full_, it's not clear that
:there's any further resistance to be made...

That's why ext2, as well as ufs (to use one of its' many names)
reserves some space. It's often billed as being "reserved for the
super-user" but really the main purpose is to keep enough free space
to keep the allocator working sensibly.

Other extent-based filesystems have had a "defragger" process that
coalesces files which were not originally able to be allocated
sequentially. SGI's efs (and I believe xfs) come to mind... I'm sure
there are others.

Traditional UNIX defragmentation scheme: back up to tape, make a new
empty filesystem, restore. It's a bit of a pain but will certainly
solve those rare bad fragmentation problems.

Mats Wichmann

Yvan Loranger

unread,

Oct 2, 2001, 2:41:35 PM10/2/01

to

Lew Pitcher wrote:
> mac wrote:
>
>>Do I need to defrag the HD in Linux? If yes, how?
>
> Short answer: No.
>
> Long answer: see below
>

> In a single-user, single-tasking OS, it's best to keep all blocks for a

> file together,

> design of the system) these tools are rarely (if ever) needed or used.

> That's the impact

> of designing up front the multi-processing/multi-tasking multi-user

> capacity of the OS into

> it's facilities, rather than tacking multi-processing/multi-tasking

> multi-user support on

> to an inherently single-processing/single-tasking single-user system.

Penalties due to file fragmentation *have* been overstated, but they
still do exist. You cannot deny that it pays to store files on as few
cylinders as possible. And let's not underestimate the effect of
read-ahead caching - rather ineffective if the next record is 20 tracks
further. You cannot "buffer most of the data in the memory", 10% [1] is
already difficult to achieve; whatever you can buffer will greatly help
but not on the first read of a file.

Plus, performance is not the only criterion, wear & tear on the head
assembly is also important; the fewer cyls visited the better. I'd
rather replace disks every 7 years instead of every 5, for example.

I leave on a question Why is a swap partition more efficient than a
swapfile? Partition size [of the swap partition & of the partition
containing the swapfile], block size are surely determinant but what
about usage pattern and internal organization?

[1] I'm basing this on a half-filled 1 GB partition (rather generous,
seems many people install a lot more), 128 MB ram (half-used for
diskcache) and counting programs as well as data (they have to be read too).

Robert Heller

unread,

Oct 2, 2001, 3:32:23 PM10/2/01

to

bq...@freenet.carleton.ca (Yvan Loranger),
In a message on Tue, 02 Oct 2001 14:41:35 -0400, wrote :

YL> I leave on a question Why is a swap partition more efficient than a
YL> swapfile? Partition size [of the swap partition & of the partition
YL> containing the swapfile], block size are surely determinant but what
YL> about usage pattern and internal organization?

A swap partition allows the swapper *direct* access to a *contiguous*
chunk of the disk for swapping. A swap file entails the added overhead
of the file system, plus there is no certainty that the swap file is
contiguous.

YL>
YL> --
YL> Merci...........Yvan Why don´t people understand when
YL> I say my automobile has 100 Megametres on it?
YL>
YL>

Steve Jorgensen

unread,

Oct 2, 2001, 4:13:20 PM10/2/01

to

On Sun, 30 Sep 2001 03:40:31 -0400, Lew Pitcher
<lpit...@sympatico.ca> wrote:

>"D. Stussy" wrote:
>>
>> On Sat, 29 Sep 2001, Adrian 'Dagurashibanipal' von Bidder wrote:
>> >Behold! For Juha Siltala declaimed:
>> >[...]
>> >>> Long answer: see below
>> >[long answer snipped]
>> >
>> >> This is great Lew! I've been faced with this question many times but
>> >> have only been able to bumble something like 'it's no problem with
>> >> linux, don't worry about it.'
>> >>
>> >> You should submit this to Linux Gazette, where this kind of information
>> >> in such easily accessable language is always greatly valued!
>> >
>> >Seconded. This should definitely be Officially Archived.
>>
>> However, he can't really take credit for this. The concept was published in
>> several books regarding operating systems in the 1980's (if not earlier, such
>> as in the 1960's by IBM in their hard disk manuals). The topic of disk access
>> is usually covered in college/university operating systems classes, so most
>> professionals (at least those with a college degree) already know this stuff.
>
>Guilty as charged. I've simply condensed a couple of books (Andy
>Tanenbaum's "Operating Systems: Design and Implementation" and Don
>Knuth's "The Art of Computer Programming") into something that will fit
>on one page.

I believe that's called research writing, and it certainly has value.
Restating known information for a new audience is precisely what most
magazine articles do, and I don't see anyone apologizing for not being
the original source of their information.

>OTOH, the question keeps coming up, and although my little document
>doesn't go into the details of how Linux specifically handles disk
>management (i.e. source program names and statement numbers), the
>one-pager certainly makes the issues clearer.
>

Yup.

Floyd Davidson

unread,

Oct 2, 2001, 4:41:15 PM10/2/01

to

bq...@freenet.carleton.ca (Yvan Loranger) wrote:
>further. You cannot "buffer most of the data in the memory",
>10% [1]

Isn't it more realistic to expect a much higher percentage,
given that system performance is going to be degraded most by
repeatedly accessing a slow to read disk file, and with Linux
that of course will be read from memory every time except the
first unless the entire file is larger than the RAM available
for disk caching (unlikely). Hence the most common effect of
disk caching (even at 10%) is to eliminate the biggest
performance hits of all. (For example, repeatedly reading
the data segment from /bin/bash.)

But of course that does leave those first time reads:

>is already difficult to achieve; whatever you can buffer will
>greatly help but not on the first read of a file.

The read ahead buffering of blocks will help this situation
greatly. The first block read called for will fetch 16 blocks,
and most likely at least the next 6 (since ext2 writes are
grouped 7 blocks at a time) will be from the disk buffer rather
than directly from the disk itself. If the file is not
fragmented at all, the entire 16 blocks fetched on the first
read will result in 1 read from disk and 15 from the buffer.

>Plus, performance is not the only criterion, wear & tear on the
>head assembly is also important; the fewer cyls visited the
>better. I'd rather replace disks every 7 years instead of every
>5, for example.

Interesting point!

>I leave on a question Why is a swap partition more efficient
>than a swapfile? Partition size [of the swap partition & of the
>partition containing the swapfile], block size are surely
>determinant but what about usage pattern and internal
>organization?

Does it make any real difference??? If the system starts
swapping it is a performance disaster. Swap is there to prevent
a system crash and allow recovery from that disaster. The fact
that recovery takes a little longer is insignificant, because in
either case it is a red flag being waved which says "Don't Do
That!" on one side and "Or else buy more RAM!" on the other.

Faster up swapping is not productive!

>[1] I'm basing this on a half-filled 1 GB partition (rather
>generous, seems many people install a lot more), 128 MB ram
>(half-used for diskcache) and counting programs as well as data
>(they have to be read too).

--

Jean-David Beyer

unread,

Oct 2, 2001, 5:31:47 PM10/2/01

to

Floyd Davidson wrote (in part):

> >I leave on a question Why is a swap partition more efficient
> >than a swapfile? Partition size [of the swap partition & of the
> >partition containing the swapfile], block size are surely
> >determinant but what about usage pattern and internal
> >organization?
>
> Does it make any real difference??? If the system starts
> swapping it is a performance disaster. Swap is there to prevent
> a system crash and allow recovery from that disaster. The fact
> that recovery takes a little longer is insignificant, because in
> either case it is a red flag being waved which says "Don't Do
> That!" on one side and "Or else buy more RAM!" on the other.
>

I am not sure this is correct.

I notice Linux likes to use as much space as possible for buffers and
cache. I seldom use the virtual terminals, so after a while, it will
swap out the 6 /sbin/mingetty processes. Likewise, I seldom use nfs from
my other machine, so it swaps out those 4 (I diddled the startup in
/etc/rc.d/init.d) so it starts only 4 instead of 8 of these) nfsd
processes as well. It does this long before any serious paging takes
place. Of course, with 512 Megabytes of RAM, my machine seldom engages
in paging. In fact, there is nothing recorded in my swap space at the
moment at all, since I booted this machine.

Occasional paging is not a big deal (you may be thinking of thrashing).
I prefer to use swap partitions so it can take place bypassing part of
the Linux file system for slightly more efficiency. If I was really out
of hard drive space, I guess I would just add another hard drive or
clean out some unnecessary files.

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ Registered Machine 73926.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 5:25pm up 13 days, 22 min, 3 users, load average: 3.14, 3.14, 3.07

Floyd Davidson

unread,

Oct 2, 2001, 7:51:00 PM10/2/01

to

Jean-David Beyer <jdb...@exit109.com> wrote:
>Floyd Davidson wrote (in part):
>
>> >I leave on a question Why is a swap partition more efficient
>> >than a swapfile? Partition size [of the swap partition & of the
>> >partition containing the swapfile], block size are surely
>> >determinant but what about usage pattern and internal
>> >organization?
>>
>> Does it make any real difference??? If the system starts
>> swapping it is a performance disaster. Swap is there to prevent
>> a system crash and allow recovery from that disaster. The fact
>> that recovery takes a little longer is insignificant, because in
>> either case it is a red flag being waved which says "Don't Do
>> That!" on one side and "Or else buy more RAM!" on the other.
>>
>I am not sure this is correct.

Only as far as it goes... and I didn't go as far as you did below! :-)

>I notice Linux likes to use as much space as possible for
>buffers and cache. I seldom use the virtual terminals, so after
>a while, it will swap out the 6 /sbin/mingetty
>processes. Likewise, I seldom use nfs from my other machine, so
>it swaps out those 4 (I diddled the startup in
>/etc/rc.d/init.d) so it starts only 4 instead of 8 of these)
>nfsd processes as well. It does this long before any serious
>paging takes place. Of course, with 512 Megabytes of RAM, my
>machine seldom engages in paging. In fact, there is nothing
>recorded in my swap space at the moment at all, since I booted
>this machine.

Still, your point that swap does get used to page out inactive
processes is correct. That does free up RAM for use as cache and
buffer.

All I question is whether the speed that such actions happen at
is signiicant relative to the difference between using a
partition vs. using a file for swap. I don't think the
difference in speed is going to be noticable. It might take
dozens of milliseconds for a program to swap back in, and 2 or 3
milliseconds might be saved by using a partition rather than a
file??

>Occasional paging is not a big deal (you may be thinking of thrashing).

I was citing the case where programs are using more memory than
is available in RAM, and virtual memory comes into use. An
example would be the other day I was traveling, and with a
laptop that has 64Mb of RAM I sucked up 330 some email messages
at one time into GNUS. A job that my desktop, with enough RAM,
would have done in less that 3 minutes, took more than half an
hour! The useful information that I got out if it was that
about 40Mb of swap was being used by XEmacs, putting the total
swap used at 64Mb. There was about 8Mb of RAM being used for
cache and buffering. It appears to me that purchasing another
64Mb of RAM for my laptop would be worth the price! I don't see
that doing something to make swap work faster would be worth
much though. I suppose a newer, faster disk might allow that
half an hour to be cut down to only 20 minutes? Big deal, eh?

>I prefer to use swap partitions so it can take place bypassing
>part of the Linux file system for slightly more efficiency. If
>I was really out of hard drive space, I guess I would just add
>another hard drive or clean out some unnecessary files.

I prefer to allocate 2 or 3 partitions for swap, and I don't skimp
at all (for example, that laptop has something like 368Mb of swap
available). On a laptop the multiple partitions are probably not
of much use, but on a desktop it does allow a number of odd things
to be done if ever there is a need for a blank 100Mb partition.
Just run swapoff on it, format it as needed, and there it is.

Basically, disk space has been exceedingly cheap for some time
now, so I treat it as such.

Lee Sau Dan

unread,

Sep 30, 2001, 12:31:42 PM9/30/01

to

>>>>> "Otto" == Otto Wyss <otto...@bluewin.ch> writes:

>> /dev/hda7 18.8% /dev/hda8 24.5%
>>
Otto> Applying defrag without any problems gives

Otto> /dev/hda7 1.3% /dev/hda8 3.3%

Otto> Since I have no benchmark figures I can only say it feels a
Otto> little more responsive but this might as well just be
Otto> imagination.

*If* defragmenting results in no notable difference, why care about
it? If you really care about it, then you ought to do some controlled
experiments to convince yourself that things are improved.

That's why Linux users seldom have to care about it. May new converts
who used to be "advanced" Windows users would be tempted to do
"defrag" from time to time, not knowing that fragmentation is simply
NOT a problem for ext2. (Much like the disk cache doesn't need to be
deliberately set up like one did with "smartdrv" back in the Win3.1
days. Moreover, Linux's buffer cache grows and shrinks according to
the amount of free/idle RAM. That's much better than a fixed size
cache offered by "smartdrv".)

BTW, your fragmentation figures were abnormally high. I seldom see
figures higher than 10%, much more seldom higher than 15%. Perhaps,
your disks are really too full. Frequent file creation/deletion
shouldn't be a major cause. That often happens to /var, and my /var
volumes seldom get such a high fragmentation figure. Perhaps my /var
volumes are not very full.

--
Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ)
.----------------------------------------------------------------------------.
| e-mail: sd...@eti.hku.hk http://www.csis.hku.hk/~sdlee |
`----------------------------------------------------------------------------'

cbbr...@acm.org

unread,

Oct 2, 2001, 11:56:03 PM10/2/01

to

bq...@freenet.carleton.ca (Yvan Loranger) writes:
> I leave on a question Why is a swap partition more efficient than a
> swapfile? Partition size [of the swap partition & of the partition
> containing the swapfile], block size are surely determinant but what
> about usage pattern and internal organization?

The thing is, the usage pattern for swap is _totally_ dependent on
what applications are doing.

It may be predictable, but only if you know what program is doing
what work.

Contrast that with being able to assume _some_ structure to FS-based
accesses:

- If a program has read a portion of a file, it is reasonable to
expect it may read the sequence that follows. The FS code in ext2
makes this assumption.

Not true for swap space.

- If a program pulls some data from the metadata (e.g. - directory
structures), it is similarly reasonable to consider pulling the rest
of the metadata into memory.

Not true for swap space.

None of the readahead that is appropriate for a FS is of any use with
swap, because the patterns _aren't_ fixed.
--
(reverse (concatenate 'string "moc.enworbbc@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/nonrdbms.html
Time is an illusion,
Lunchtime doubly so.

John Thompson

unread,

Oct 2, 2001, 6:18:52 PM10/2/01

to

Mats Wichmann wrote:

> Traditional UNIX defragmentation scheme: back up to tape, make a new
> empty filesystem, restore. It's a bit of a pain but will certainly
> solve those rare bad fragmentation problems.

Or, if you have the HD space to spare, create an empty filesystem
sufficient to hold the old filesystem, mount it and from the root
of the old filesystem "tar cf -.|(cd /newfilesystem; tar xfv -)"
Faster than a restore from tape, but it is still prudent to
maintain current backups.

--

-John (John.T...@attglobal.net)

Michael Lee Yohe

unread,

Oct 3, 2001, 6:52:41 AM10/3/01

to

>>Do I need to defrag the HD in Linux? If yes, how?
>

> No need to in linux....that is a relic of the fat hack called a file system

Wouldn't we all hope so?

No matter what filesystem you use, as it fills up - it's ability to
properly lay out files becomes deterred and fragmentation becomes a problem.

/usr/local: 147/304608 files (36.7% non-contiguous), 224931/608454 blocks

36.7% of the files are fragmented - which is quite reasonable
considering 147 files are consuming 860M (pretty large files).
Nonetheless, fragmentation can be a problem even with ext2.

--

Michael Lee Yohe (myohe+...@redhat.com)
Software Developer, Engineering Services
Red Hat, Inc.

QUIPd 0.12:
-> I shall return.
-> - Douglas MacArthur, General (1880-1964)

Michael Lee Yohe

unread,

Oct 3, 2001, 6:59:10 AM10/3/01

to

> *If* defragmenting results in no notable difference, why care about
> it? If you really care about it, then you ought to do some controlled
> experiments to convince yourself that things are improved.

Because in the grand scheme of performance, many clock cycles can be
wasted simply attempting to read data and write data to disk. There are
two forms of fragmentation - data fragmentation (what the thread is
currently talking about) and free space fragmentation (which is just as
important).

When your system is requesting information about a file scattered
throughout the platters of the physical media, clock cycles are wasted
gathering that information. When your system is requesting space to
store a large file, clock cycles are wasted attempting to lay down the
bits on the physical media.

For people who are constantly working with the i/o subsystem this is
critically important for performance.

> That's why Linux users seldom have to care about it. May new converts

Your statement sums it up right here. There are people that "don't
care". And there are people, like me, who check tire pressure regularly
to ensure that I get the maximum bang for the buck in terms of gas.
Tweaking should never be criticized - it's the pasttime for those who
want to sleep better knowing they're not letting something go to waste.
--

Michael Lee Yohe (myohe+...@redhat.com)
Software Developer, Engineering Services
Red Hat, Inc.

QUIPd 0.12:
-> Whenever I hear anyone arguing for slavery, I feel a strong
-> impulse to see it tried on him personally.
-> - Abraham Lincoln (1809-1865)

Jean-David Beyer

unread,

Oct 3, 2001, 8:22:05 AM10/3/01

to

Michael Lee Yohe wrote:
>
> >>Do I need to defrag the HD in Linux? If yes, how?
> >
> > No need to in linux....that is a relic of the fat hack called a file system
>
> Wouldn't we all hope so?
>
> No matter what filesystem you use, as it fills up - it's ability to
> properly lay out files becomes deterred and fragmentation becomes a problem.
>
> /usr/local: 147/304608 files (36.7% non-contiguous), 224931/608454 blocks
>
> 36.7% of the files are fragmented - which is quite reasonable
> considering 147 files are consuming 860M (pretty large files).
> Nonetheless, fragmentation can be a problem even with ext2.

How much of a problem? It is very tricky to measure. If you are running
a desktop, with all the processes in the process table idle except one,
and that one is a test program to see about reading two identical files,
one on a heavily fragmented file system and one on a completely
defragmented file system, you could surely tell the difference,
especially if the test program was reading a very large sequential file.

But if you were running a multi-user system where there were many users
doing all sorts of things, but especially lots of large compilations,
loading statically linked programs, a dbms doing lots of random access
to data files, but using many different indices, etc., I doubt if you
would see the difference.

Has anyone done measurements of heavily loaded ext2 systems to see how
much fragmentation really matters in heavily loaded systems?

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ Registered Machine 73926.
/( )\ Shrewsbury, New Jersey http://counter.li.org

^^-^^ 8:15am up 13 days, 15:12, 3 users, load average: 3.00, 3.01, 3.00

Yvan Loranger

unread,

Oct 2, 2001, 6:16:15 PM10/2/01

to

Robert Heller wrote:
> bq...@freenet.carleton.ca (Yvan Loranger),
> In a message on Tue, 02 Oct 2001 14:41:35 -0400, wrote :
>
> YL> I leave on a question Why is a swap partition more efficient than a
> YL> swapfile? Partition size [of the swap partition & of the partition
> YL> containing the swapfile], block size are surely determinant but what
> YL> about usage pattern and internal organization?
>
> A swap partition allows the swapper *direct* access to a *contiguous*
> chunk of the disk for swapping. A swap file entails the added overhead
> of the file system, plus there is no certainty that the swap file is
> contiguous.

This entails that file fragmentation has some importance. [see original
post]

I love UseNet. Actually meant that as a rhetorical question; but I'm
happily surprised to receive input. I'll be clearer next time :)

A real question this time: are the pages of a swapped-out file
contiguous in the swap partition? (Remembering that a program may be
paged out in batches.) Does it matter? I believe it would.

--

Merci...........Yvan Why don´t people understand when

Yvan Loranger

unread,

Oct 2, 2001, 6:16:23 PM10/2/01

to

cbbr...@acm.org wrote:
> bq...@freenet.carleton.ca (Yvan Loranger) writes:
>
>>I leave on a question Why is a swap partition more efficient than a
>>swapfile? Partition size [of the swap partition & of the partition
>>containing the swapfile], block size are surely determinant but what
>>about usage pattern and internal organization?
>>
>
> The thing is, the usage pattern for swap is _totally_ dependent on
> what applications are doing.
>
> It may be predictable, but only if you know what program is doing
> what work.
>
> Contrast that with being able to assume _some_ structure to FS-based
> accesses:
>
> - If a program has read a portion of a file, it is reasonable to
> expect it may read the sequence that follows. The FS code in ext2
> makes this assumption.
>
> Not true for swap space.
>
> - If a program pulls some data from the metadata (e.g. - directory
> structures), it is similarly reasonable to consider pulling the rest
> of the metadata into memory.
>
> Not true for swap space.
>
> None of the readahead that is appropriate for a FS is of any use with
> swap, because the patterns _aren't_ fixed.
>

I don't entirely agree; if some of a program's pages get paged back in,
isn't it likely that, as the user progresses in her work, some more
pages will soon be pulled into the working set?

Yvan Loranger

unread,

Oct 2, 2001, 6:16:28 PM10/2/01

to

Floyd Davidson wrote:
> bq...@freenet.carleton.ca (Yvan Loranger) wrote:
>
>>further. You cannot "buffer most of the data in the memory",
>>10% [1]
>
> Isn't it more realistic to expect a much higher percentage,
> given that system performance is going to be degraded most by
> repeatedly accessing a slow to read disk file, and with Linux
> that of course will be read from memory every time except the
> first unless the entire file is larger than the RAM available
> for disk caching (unlikely). Hence the most common effect of
> disk caching (even at 10%) is to eliminate the biggest
> performance hits of all. (For example, repeatedly reading
> the data segment from /bin/bash.)

You're right, it's more realistic to count files used that day rather
than all files on disk [I won't use a spreadsheet today, for example].
It's still difficult to predict what will be used over the next 24 hrs
but yes, performance does hinge a lot on a smaller portion of one's
disk's bytes. Guilty as charged.

> But of course that does leave those first time reads:
>
>>is already difficult to achieve; whatever you can buffer will
>>greatly help but not on the first read of a file.
>
> The read ahead buffering of blocks will help this situation
> greatly. The first block read called for will fetch 16 blocks,
> and most likely at least the next 6 (since ext2 writes are
> grouped 7 blocks at a time) will be from the disk buffer rather
> than directly from the disk itself. If the file is not
> fragmented at all, the entire 16 blocks fetched on the first
> read will result in 1 read from disk and 15 from the buffer.
>

>>I leave on a question Why is a swap partition more efficient
>>than a swapfile? Partition size [of the swap partition & of the
>>partition containing the swapfile], block size are surely
>>determinant but what about usage pattern and internal
>>organization?
>
> Does it make any real difference??? If the system starts
> swapping it is a performance disaster. Swap is there to prevent
> a system crash and allow recovery from that disaster. The fact
> that recovery takes a little longer is insignificant, because in
> either case it is a red flag being waved which says "Don't Do
> That!" on one side and "Or else buy more RAM!" on the other.

Agreed. However there will always be the editor who wants to work that
huge bitmapped image or video clip or.

> Faster up swapping is not productive!

--

Tony Lawrence

unread,

Oct 3, 2001, 8:50:53 AM10/3/01

to

Jean-David Beyer wrote:

> How much of a problem? It is very tricky to measure. If you are running
> a desktop, with all the processes in the process table idle except one,
> and that one is a test program to see about reading two identical files,
> one on a heavily fragmented file system and one on a completely
> defragmented file system, you could surely tell the difference,
> especially if the test program was reading a very large sequential file.
>
> But if you were running a multi-user system where there were many users
> doing all sorts of things, but especially lots of large compilations,
> loading statically linked programs, a dbms doing lots of random access
> to data files, but using many different indices, etc., I doubt if you
> would see the difference.
>

Yes. Measuring performance on a multi-user system is very complicated,
because different users access different disk blocks, effectively
producing fragmentation from the viewpoint of the fs driver asked to go
get those blocks.

So if (simplified, multiply by 50 users or whatever)

Bob wants disk block 37550
Mary wants 37654
and Gene needs 28756,

all at the same (relative time, that's the same as Paul, sitting alone
at the machine, reading a fragmented file.

In both cases, good drivers and good hardware (scsi) are going to
rearrange the requests to be as efficient as possible,

OTOH, if this machine is a database server, where client requests go
through a central program, fragmentation *might* matter more, depending
on the db server software- all else being equally confused as above.

As to defraggers, I just don't trust 'em. Easier to just wipe out the
fs and restore from backups.

--
Tony Lawrence
SCO/Linux Support Tips, How-To's, Tests and more: http://pcunix.com

Michael Lee Yohe

unread,

Oct 3, 2001, 8:51:27 AM10/3/01

to

> YL> I leave on a question Why is a swap partition more efficient than a
> YL> swapfile? Partition size [of the swap partition & of the partition
> YL> containing the swapfile], block size are surely determinant but what
> YL> about usage pattern and internal organization?
>
> A swap partition allows the swapper *direct* access to a *contiguous*
> chunk of the disk for swapping. A swap file entails the added overhead
> of the file system, plus there is no certainty that the swap file is
> contiguous.

A swap partition is more efficient than a swap file simply because a
kernel layer has been removed. As such, all the problems of a
filesystem are eliminated because there is no looking up, tracking,
seeking, locking, etc. while attempting to page data to disk or retrieve
data from disk.

Instead of kernel -> swap -> ext2 driver -> swapfile -
kernel -> swap -> swap partition

--

Michael Lee Yohe (myohe+...@redhat.com)
Software Developer, Engineering Services
Red Hat, Inc.

QUIPd 0.12:
-> The O-O languages give you more of course - prettier syntax,
-> derived types and so on - but conceptually they provide little
-> extra.
-> - Rob Pike

cbbr...@acm.org

unread,

Oct 3, 2001, 9:20:35 AM10/3/01

to

bq...@freenet.carleton.ca (Yvan Loranger) writes:

> cbbr...@acm.org wrote:
> > None of the readahead that is appropriate for a FS is of any use with
> > swap, because the patterns _aren't_ fixed.

> I don't entirely agree; if some of a program's pages get paged back
> in, isn't it likely that, as the user progresses in her work, some
> more pages will soon be pulled into the working set?

Certainly it's possible that some data will be pulled back in, even
probable.

But if the kernel is ignorant of which data that is likely to be, this
doesn't provide us any policy to give to the kernel to "pre-fetch"
data.

With a filesystem, it may be reasonable to pull in the whole directory
structure once you start reading it; it may be reasonable to pull in
subsequent parts of a file once you start reading it.

Both of those things involve a reasonably clear sequence. The kernel
knows what a directory structure is, and that it represents a logical
grouping of data, and the same is true for a file.

The kernel _doesn't_ know anything about the structure of the data
being swapped in and out.

If this were Multics, where virtual memory was being visibly managed
in the form of segments, there _would_ be some structure to it, and
some readahead might become meaningful. (I'm speculating a bit here;
I don't know Multics segments well enough to be certain.) But this is
Linux; virtual memory is just a big unstructured bag of bytes.
--
(reverse (concatenate 'string "gro.mca@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/sap.html
"Windows: The ``Big O'' of operating systems."

aflinsch

unread,

Oct 3, 2001, 10:44:11 AM10/3/01

to

My system is entirely Reiserfs, with the exception of /boot (ext2),
and a vfat partition for Windows.
I recently defragged /boot (went from 65% non-contig to 3% -- I was
doing a lot of kernel recompiles...), there seemed to be a slight
improvement in the boot process -- not much, just seemed to be a bit
quicker.

Anyway -- the real question is --
Does anyone know of a way to defrag a vfat partition from linux?

The reason for asking is that the system tends to stay under linux,
and is only used in Windows rarely -- mostly for a work related app
which creates and deletes lots of files, and the infrequent tarball
backup of a linux directory, both of which tend to frag up the disk
quite a bit. What I would like to do is to be able to sched a cron
script to kick off a defrag of the vfat partition, which would be a
whole lot more convient than waiting for the defrag to run under
windows.

John W Gintell

unread,

Oct 3, 2001, 11:11:08 AM10/3/01

to

cbbr...@acm.org wrote:
>
> bq...@freenet.carleton.ca (Yvan Loranger) writes:
> > cbbr...@acm.org wrote:
> > > None of the readahead that is appropriate for a FS is of any use with
> > > swap, because the patterns _aren't_ fixed.
>
> > I don't entirely agree; if some of a program's pages get paged back
> > in, isn't it likely that, as the user progresses in her work, some
> > more pages will soon be pulled into the working set?
>
> Certainly it's possible that some data will be pulled back in, even
> probable.
>
> But if the kernel is ignorant of which data that is likely to be, this
> doesn't provide us any policy to give to the kernel to "pre-fetch"
> data.
>
> With a filesystem, it may be reasonable to pull in the whole directory
> structure once you start reading it; it may be reasonable to pull in
> subsequent parts of a file once you start reading it.
>
> Both of those things involve a reasonably clear sequence. The kernel
> knows what a directory structure is, and that it represents a logical
> grouping of data, and the same is true for a file.
>
> The kernel _doesn't_ know anything about the structure of the data
> being swapped in and out.
>
> If this were Multics, where virtual memory was being visibly managed
> in the form of segments, there _would_ be some structure to it, and

> some read ahead might become meaningful. (I'm speculating a bit here;

> I don't know Multics segments well enough to be certain.) But this is
> Linux; virtual memory is just a big unstructured bag of bytes.
> --

If true sequential access can be detected, prefetching can sometimes pay
off well.

Early Multics had a concept (not ever implemented to my recollection) of
page batching where forward adjacent pages could be read into memory;
there was a page_batching_size attribute for each segment to control how
much if any to read.

Some modern disk caches detect if there is sequential access to a file
and start pre-fetching the next record(s) where the number of records
increases in subsequent reads.

Charles Sullivan

unread,

Oct 3, 2001, 11:57:30 AM10/3/01

to

In article <3BBB243B...@att.net>, "aflinsch" <avfl...@att.net>
wrote:

I don't know of a program to do this from Linux other than a
copy/delete-all/copy-back (which might screw up Windows if there are any
fixed-position files).

But if you are tempted to do this, there is another potential problem:
Linux to date has a bug wherein using 'cp -p' to copy a file from Linux to
a vfat partition doesn't accurately reproduce the date/time when Linux is
configured for a dual timezone, i.e., Standard/Daylight times. The time
looks correct when viewed from Linux but Daylight times will be off by an
hour when checked under DOS/Windows. It's a minor bug, but one which
might cause problems with file currency under some circumstances.

(I reported this bug via Red Hat's bugzilla as long ago as kernel v 2.0.36
and more recently for kernel v 2.2.x to the Linux maintainer, but it's
apparently not considered an important enough problem to fix - at least it
hasn't been fixed in kernel 2.4.3-12.)

Regards,
Charles Sullivan

Michael Lee Yohe

unread,

Oct 3, 2001, 12:15:58 PM10/3/01

to

> As to defraggers, I just don't trust 'em. Easier to just wipe out the
> fs and restore from backups.

It goes to convenience.. time.. reliability of tapes.. :)

I think the basic agreement reached throughout this thread is that we
should all install 80GB drives in our machines with large partitions so
that ext2 (or the Linux filesystem of choice, for that matter) has
plenty of space to arrange the files in the fashion it so desires.

--

Michael Lee Yohe (myohe+...@redhat.com)
Software Developer, Engineering Services
Red Hat, Inc.

QUIPd 0.12:
-> I don't trust anything I can't grep.
-> - Simon

Alan T. Bowler

unread,

Oct 3, 2001, 12:54:57 PM10/3/01

to

Even on Multics this was a problem. I talked to someone
administering a Multics system, who was complaining about
poor performance on the systems because so many of the
programs were just doing sequential processing of their
data files (using the usual language specific read/write
calls). Since these just translated into increasing
memory address, and page faults system would keep paging
in new pages. However since it did NOT know that this was
sequential access, and the just processed data pages were the
least recently used, the system was biased to keeping those
pages in memory. I.e. it kept trying to use memory for pages
that were never going to be used again. The effect was to
transform a smooth series of program requests into heavy
bursts of paging activity. The IO systems would be relatively
idle for a period and then be swamped. (Which is saying
something because the one of the strengths of the Gcos/Multics
hardware model is high capacity/low-overhead I/O).

The guy was a Multics fan, and did not see this as
a problem with Multics itself, but rather with the
failure of the program designers. He was leaning on
them to reorganize and not do sequential processing.
The conversation did not go into what the programs were
actually doing, so I don't know if this was the right or
wrong approach. Sometimes sequential processing is the
"best" approach, sometimes it is not, and certainly there
are large number of application programs that use either
direct access methods or sequential processing when the
other would be better for the application (but not neccesarily
for the hosting environment).

cbbr...@acm.org

unread,

Oct 3, 2001, 1:17:58 PM10/3/01

to

The most relevant thing in recent development "memory" would be the
RVM (Reliable Virtual Memory) subsystem that's part of the Coda
distributed filesystem.

I'd be curious to know if they've done any strategizing to deal with
this sort of thing. Coda's goals are pretty different, so I doubt
it...
--
(concatenate 'string "cbbrowne" "@acm.org")
http://www.cbbrowne.com/info/internet.html
Roses are red,
Violets are blue,
I'm schizophrenic...
And I am too.

Tony Lawrence

unread,

Oct 3, 2001, 2:33:30 PM10/3/01

to

Michael Lee Yohe wrote:
>
> > As to defraggers, I just don't trust 'em. Easier to just wipe out the
> > fs and restore from backups.
>
> It goes to convenience.. time.. reliability of tapes.. :)

Tapes? Only if I have to. Up to 9 gig or so, I use dvd-ram:
http://pcunix.com/Reviews/dvdram.html

>
> I think the basic agreement reached throughout this thread is that we
> should all install 80GB drives in our machines with large partitions so
> that ext2 (or the Linux filesystem of choice, for that matter) has
> plenty of space to arrange the files in the fashion it so desires.

Which reminds me.. I have to add another drive to this box :-)

Yvan Loranger

unread,

Oct 3, 2001, 3:10:36 PM10/3/01

to

cbbr...@acm.org wrote:
> bq...@freenet.carleton.ca (Yvan Loranger) writes:
>
>>cbbr...@acm.org wrote:
>>
>>>None of the readahead that is appropriate for a FS is of any use with
>>>swap, because the patterns _aren't_ fixed.
>>>
>
>>I don't entirely agree; if some of a program's pages get paged back
>>in, isn't it likely that, as the user progresses in her work, some
>>more pages will soon be pulled into the working set?
>>
>
> Certainly it's possible that some data will be pulled back in, even
> probable.

You contradict this below, 2 times. A program's pages *can* offer
structure, especially if they are about to be paged back in. So
readahead could be beneficial, (offhand I'd say especially at track
boundaries).

> But if the kernel is ignorant of which data that is likely to be, this
> doesn't provide us any policy to give to the kernel to "pre-fetch"
> data.
>
> With a filesystem, it may be reasonable to pull in the whole directory
> structure once you start reading it; it may be reasonable to pull in
> subsequent parts of a file once you start reading it.
>
> Both of those things involve a reasonably clear sequence. The kernel
> knows what a directory structure is, and that it represents a logical
> grouping of data, and the same is true for a file.
>
> The kernel _doesn't_ know anything about the structure of the data
> being swapped in and out.
>
> If this were Multics, where virtual memory was being visibly managed
> in the form of segments, there _would_ be some structure to it, and
> some readahead might become meaningful. (I'm speculating a bit here;
> I don't know Multics segments well enough to be certain.) But this is
> Linux; virtual memory is just a big unstructured bag of bytes.

How unfortunate that it's just a bag of bytes, if it really is that.
Otherwise, again, readahead could be beneficial.

[I've got to quit using rhetorical questions!]

Yvan Loranger

unread,

Oct 3, 2001, 3:10:42 PM10/3/01

to

John W Gintell wrote:

> cbbr...@acm.org wrote:
>>Certainly it's possible that some data will be pulled back in, even
>>probable.
>>
>>But if the kernel is ignorant of which data that is likely to be, this
>>doesn't provide us any policy to give to the kernel to "pre-fetch"
>>data.
>>
>>With a filesystem, it may be reasonable to pull in the whole directory
>>structure once you start reading it; it may be reasonable to pull in
>>subsequent parts of a file once you start reading it.
>>
>>Both of those things involve a reasonably clear sequence. The kernel
>>knows what a directory structure is, and that it represents a logical
>>grouping of data, and the same is true for a file.
>>
>>The kernel _doesn't_ know anything about the structure of the data
>>being swapped in and out.
>>
>>If this were Multics, where virtual memory was being visibly managed
>>in the form of segments, there _would_ be some structure to it, and
>>some read ahead might become meaningful. (I'm speculating a bit here;
>>I don't know Multics segments well enough to be certain.) But this is
>>Linux; virtual memory is just a big unstructured bag of bytes.
>

> If true sequential access can be detected, prefetching can sometimes pay
> off well.
>
> Early Multics had a concept (not ever implemented to my recollection) of
> page batching where forward adjacent pages could be read into memory;
> there was a page_batching_size attribute for each segment to control how
> much if any to read.
>
> Some modern disk caches detect if there is sequential access to a file
> and start pre-fetching the next record(s) where the number of records
> increases in subsequent reads.

A good idea but AFAIK disk caches don't intervene between memory &
virtual mem/swap.

Anne & Lynn Wheeler

unread,

Oct 3, 2001, 9:25:15 PM10/3/01

to

bq...@freenet.carleton.ca (Yvan Loranger) writes:
>
> You contradict this below, 2 times. A program's pages *can* offer
> structure, especially if they are about to be paged back in. So
> readahead could be beneficial, (offhand I'd say especially at track
> boundaries).

MVS & VM introduced the concept of "big pages" around 1980
... initially targeted for 3380s. When time came for certain page
replace operations ... a collection of a task's pages were selected,
equivalent to a track's worth and written out as a "big page". On a
page fault, if a page was a member of a "big page", the "big page" was
brought in as a unit. There were various rules about what was
considered candidates for membership in big page ... and various kinds
of trimming and segregation went on to try and cluster meaningful
members of big pages. At that time, a 3380 track held 10 4k-byte pages
... and so the implementation adopted from doing nominal 4k-byte page
I/O operations to 40k-byte page I/O operations.

... aka virtual page access patterns & virtual page reference bits
prior to page-out ... provided "structure" for how "big pages" were
formed and therefor provided the "structure" for how individual
virtual pages were brought back in.

part of the motivation was the significant increase in 3380 data
transfer (compared to prior disks) w/o any compareable increase in arm
access and rotational delay. in effect, doing "big transfers, while it
may increase real storage requirements ... was otherwise utilizing a
resource (disk transfer) that was significantly underutilized.

some past refs on this subject:
http://www.garlic.com/~lynn/93.html#31 Big I/O or Kicking the Mainframe out the Door
http://www.garlic.com/~lynn/95.html#8 3330 Disk Drives
http://www.garlic.com/~lynn/95.html#10 Virtual Memory (A return to the past?)
http://www.garlic.com/~lynn/98.html#46 The god old days(???)
http://www.garlic.com/~lynn/99.html#4 IBM S/360
http://www.garlic.com/~lynn/99.html#6 3330 Disk Drives

--
Anne & Lynn Wheeler | ly...@garlic.com - http://www.garlic.com/~lynn/

cbbr...@acm.org

unread,

Oct 3, 2001, 11:05:25 PM10/3/01

to

bq...@freenet.carleton.ca (Yvan Loranger) writes:
> cbbr...@acm.org wrote:
> > bq...@freenet.carleton.ca (Yvan Loranger) writes:
> >>cbbr...@acm.org wrote:

> >>>None of the readahead that is appropriate for a FS is of any use
> >>>with swap, because the patterns _aren't_ fixed.

> >>I don't entirely agree; if some of a program's pages get paged
> >>back in, isn't it likely that, as the user progresses in her work,
> >>some more pages will soon be pulled into the working set?

> > Certainly it's possible that some data will be pulled back in,
> > even probable.

> You contradict this below, 2 times. A program's pages *can* offer
> structure, especially if they are about to be paged back in. So
> readahead could be beneficial, (offhand I'd say especially at track
> boundaries).

It would be surprising if there were not some structure to swap pages;
the problem is that it's not structure that the kernel can expect to
recognize the way it is aware of filesystem structure.

I do indeed give two counterexamples, but they're not related to
swapping, but rather to the structure that sits in filesystem code.

> > But if the kernel is ignorant of which data that is likely to be, this
> > doesn't provide us any policy to give to the kernel to "pre-fetch"
> > data.
> > With a filesystem, it may be reasonable to pull in the whole
> > directory
>
> > structure once you start reading it; it may be reasonable to pull in
> > subsequent parts of a file once you start reading it. Both of those
> > things involve a reasonably clear sequence. The kernel
>
> > knows what a directory structure is, and that it represents a logical
> > grouping of data, and the same is true for a file.
> > The kernel _doesn't_ know anything about the structure of the data
>
> > being swapped in and out.
> > If this were Multics, where virtual memory was being visibly managed
>
> > in the form of segments, there _would_ be some structure to it, and
> > some readahead might become meaningful. (I'm speculating a bit here;
> > I don't know Multics segments well enough to be certain.) But this is
> > Linux; virtual memory is just a big unstructured bag of bytes.

> How unfortunate that it's just a bag of bytes, if it really is
> that. Otherwise, again, readahead could be beneficial.

C'est la vie.
--
(concatenate 'string "aa454" "@freenet.carleton.ca")
http://www.cbbrowne.com/info/nonrdbms.html
What's another word for synonym?

Mats Wichmann

unread,

Oct 4, 2001, 12:30:16 PM10/4/01

to

On Tue, 02 Oct 2001 18:16:15 -0400, bq...@freenet.carleton.ca (Yvan
Loranger) wrote:

Think of the virtual memory system as being composed of two things:

1. RAM
2. Backing store for stuff that isn't in RAM now, but can be brought
(back) in if needed

The backing store actually consists of two pieces itself:
1. The "swap partition" (hard to stamp out old names!) for data
2. The disk file for code

There's no need to page out the code pages, you just toss 'em, since
they're still available in the disk file.

Mats Wichmann

Michael Lee Yohe

unread,

Oct 4, 2001, 5:10:19 PM10/4/01

to

> Tapes? Only if I have to. Up to 9 gig or so, I use dvd-ram:
> http://pcunix.com/Reviews/dvdram.html

DVD-RAM is up to 9 gigs? That's pretty sweet. Last time I read about
the "main-stream" DVD-RAM drives I think they were sitting steady at
4.7GB. Are you using UDF in the kernel?

Nonetheless, I'm still waiting to see who wins the DVD[+-]RW battle. I sincerely hope it's the one where I burn and then can playback on regular DVD players. :) The Apple commercial was an instant win with me!

--

Michael Lee Yohe (myohe+...@redhat.com)
Software Developer, Engineering Services
Red Hat, Inc.

QUIPd 0.12:
-> I don't know exactly what democracy is. But we need more of it.
-> - Chinese Student, protests in Tianamen Square, Beijing, 1989

Tony Lawrence

unread,

Oct 4, 2001, 8:19:39 PM10/4/01

to

Michael Lee Yohe wrote:
>
> > Tapes? Only if I have to. Up to 9 gig or so, I use dvd-ram:
> > http://pcunix.com/Reviews/dvdram.html
>
> DVD-RAM is up to 9 gigs?

4.7 GB. With typical compression, 9 g\GB of backup.

>That's pretty sweet. Last time I read about
> the "main-stream" DVD-RAM drives I think they were sitting steady at
> 4.7GB. Are you using UDF in the kernel?

No, I'm just using it to backup- not as a filesystem. I use the
Microlite product: http://pcunix.com/Reviews/backupedgess.html

>
> Nonetheless, I'm still waiting to see who wins the DVD[+-]RW battle. I sincerely hope it's the one where I burn and then can playback on regular DVD players. :) The Apple commercial was an instant win with me!

DVD is a mess, no doubt. But as my only interest is backup, I don't
really care about the other stuff.

Christopher Stacy

unread,

Oct 5, 2001, 2:36:19 AM10/5/01

to

ITS had a feature where programs could dynamically give
advice to the system about sequential "page ahead/behind".

John W Gintell

unread,

Oct 5, 2001, 10:19:55 AM10/5/01

to

Christopher Stacy wrote:
>
> ITS had a feature where programs could dynamically give
> advice to the system about sequential "page ahead/behind".

One of the problems with systems where programs give such advice is that
the advice is often wrong - or was right once but because of changes in
conditions about the data or usage, the advice is no longer valid. In
the early days of Fortran (late 50's (yes, there were computers back
them)), there was a FREQUENCY statement that was used for the programmer
to tell the compiler approximately how many times a DO loop was to
executed or what the predicted likelihood of branching on an IF
statement. IBM had an optimizing compiler that used this information for
register optimization and other things. Some experiments were performed
where the compiler was changed to ignore the FREQUENCY statement and
then some performance tests were run on large collections of programs.
As I recall the over results were the same - some programs rann slower
and some ran faster.

Yvan Loranger

unread,

Oct 5, 2001, 2:42:49 PM10/5/01

to

Afraid not, easiest counter-example is program's variables, another is
relocation of program addresses, another is self-modifying programs [if
anybody is still doing that].

Christopher Stacy

unread,

Oct 6, 2001, 4:11:26 AM10/6/01

to

>>>>> On Fri, 05 Oct 2001 14:19:55 GMT, John W Gintell ("John") writes:

John> Christopher Stacy wrote:
>>
>> ITS had a feature where programs could dynamically give
>> advice to the system about sequential "page ahead/behind".

John> One of the problems with systems where programs give such advice is that
John> the advice is often wrong - or was right once but because of changes in
John> conditions about the data or usage, the advice is no longer valid.

That's interesting, but the FORTRAN FREQUENCY problem that you described
is about a compile-time decision in the face of unknown data.

The ITS feature that I was describing allows the running application
to dynamically provide the operating system with better hints about
the locality of reference.

The program already has to know about the data usage and is explicitly
allocating and mapping the data pages. It certainly knows what the
data structure is, how it's accessing it and how much it is willing
to bring into core at once. I don't see how it can change unless
the application is changed.

The access pattern depends on the application, the operating system
is unable to predict it (Multics always makes the wrong assumption
for this kind of application, and thrashes to death), so where would
you propose putting the control?

JD

unread,

Oct 6, 2001, 11:20:27 AM10/6/01

to

"John W Gintell" <gin...@shore.net> wrote in message news:3BBDC191...@shore.net...

> Christopher Stacy wrote:
> >
> > ITS had a feature where programs could dynamically give
> > advice to the system about sequential "page ahead/behind".
>
> One of the problems with systems where programs give such advice is that
> the advice is often wrong - or was right once but because of changes in
> conditions about the data or usage, the advice is no longer valid.
>

I put allowances in FreeBSD for paging ahead (and possibly behind.) It
at least used to have extension of sync reads, and through filesystems
could use the same adaptive code for async read-aheads. Some of the
FreeBSD behavior was dynamic, and didn't require an explict hint. Some
of the behavior was controlled by user enabled hints.

John

Mats Wichmann

unread,

Oct 6, 2001, 2:16:59 PM10/6/01

to

On Fri, 05 Oct 2001 14:42:49 -0400, bq...@freenet.carleton.ca (Yvan
Loranger) wrote:

: > There's no need to page out the code pages, you just toss 'em, since

: > they're still available in the disk file.
:
:Afraid not, easiest counter-example is program's variables, another is
:relocation of program addresses, another is self-modifying programs [if
:anybody is still doing that].

Variables aren't in text, they're in data. Data pages /do/ need to be
paged out to someplace they can be restored from. The reclocation
issue is resolved (for shared libs, mainly) by using
position-independent code, where jumps and branches are relative, not
absolute. VM handles the rest, the relocations are resolved by the
static linker (ld) and the VM system loads the code at the right
virtual address to make it work. And self-modifying code...well, some
folks were still doing it recently, but as code ("text") segements
have become read-only, it's done by building code on the /stack/.
That's been the source of some interesting exploints, too, so most
systems now either do by default, or have an option, to turn off the
ability to execute code from the stack segment. Two large-scale
commercial apps, SAS and Oracle, used to both build code on the fly on
the stack, I suspect they no longer do so for this reason.

Mats Wichmann

David E. Fox

unread,

Nov 5, 2001, 9:03:20 PM11/5/01

to

On Wed, 03 Oct 2001 05:52:41 -0500, Michael Lee Yohe <myohe+...@redhat.com>
wrote:

>/usr/local: 147/304608 files (36.7% non-contiguous), 224931/608454 blocks
>
>36.7% of the files are fragmented - which is quite reasonable
>considering 147 files are consuming 860M (pretty large files).

Since you mention large files, isn't the fact that ext2 uses block
pointers if the file is over 14 blocks long factored into the
fragmentation estimate? If the file is over 14 blocks long, then you
have another block located elsewhere on the disk that contains block
entries for the rest of the file; and if the file is really big, you
have a block that contains block entries that point to blocks (double
indirection) and maybe triple indirection if the files are *really*
large.

I'm not clear on where these 'pointer blocks' are stored - are they
really part of the filesystem (one would think) or are enough of them
allocated in each block group? Either way, one would think that as soon
as your file grows beyond 14 blocks, you have a possibility for fragmentation,
right?

>Michael Lee Yohe (myohe+...@redhat.com)
>Software Developer, Engineering Services
>Red Hat, Inc.

--
------------------------------------------------------------------------
David E. Fox Thanks for letting me
df...@tsoft.com change magnetic patterns
df...@m206-157.dsl.tsoft.com on your hard disk.
-----------------------------------------------------------------------