[PATCH][2/2] SquashFS

15 views
Skip to first unread message

Phillip Lougher

unread,
Mar 14, 2005, 12:53:58 PM3/14/05
to Andrew Morton, Greg KH, linux-...@vger.kernel.org
patch2

Andrew Morton

unread,
Mar 14, 2005, 8:10:33 PM3/14/05
to Phillip Lougher, gr...@kroah.com, linux-...@vger.kernel.org
Phillip Lougher <phi...@lougher.demon.co.uk> wrote:
>
>

Please don't send multiple patches with the same Subject:. Choose nice,
meaningful Subject:s for each patch. And include the relevant changelog
details within the email for each patch rather than in patch 1/N. See
http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt and
http://linux.yyz.us/patch-format.html.


> @@ -0,0 +1,439 @@

[lots of comments from patch 1/2 are applicable here]

> +#define SQUASHFS_MAX_FILE_SIZE ((long long) 1 << \
> + (SQUASHFS_MAX_FILE_SIZE_LOG - 1))

1LL would suit here. Of a cast to loff_t.

> +typedef unsigned int squashfs_block;
> +typedef long long squashfs_inode;

squashfs_block_t and squashfs_inode_t, please. If one must use typedefs...

> +typedef struct squashfs_super_block {
> + unsigned int s_magic;
> + unsigned int inodes;
> + unsigned int bytes_used;
> + unsigned int uid_start;
> + unsigned int guid_start;
> + unsigned int inode_table_start;
> + unsigned int directory_table_start;
> + unsigned int s_major:16;
> + unsigned int s_minor:16;
> + unsigned int block_size_1:16;
> + unsigned int block_log:16;
> + unsigned int flags:8;
> + unsigned int no_uids:8;
> + unsigned int no_guids:8;
> + unsigned int mkfs_time /* time of filesystem creation */;
> + squashfs_inode root_inode;
> + unsigned int block_size;
> + unsigned int fragments;
> + unsigned int fragment_table_start;
> +} __attribute__ ((packed)) squashfs_super_block;

Whoa. Tons of bitfields in this file. Are these on-disk data structures?
If so, that's a problem for portability between architectures and possibly
compiler versions. It also introduces locking complexity.

if they're in-core data structures then the bitfields are probably slower than using `int', as well.

> +typedef struct {
> + unsigned int inode_type:4;
> + unsigned int mode:12; /* protection */
> + unsigned int uid:8; /* index into uid table */
> + unsigned int guid:8; /* index into guid table */
> +} __attribute__ ((packed)) squashfs_base_inode_header;

See, if one CUP is modifying `inode_type' while another CPU is modifying
`mode', this struct can get trashed.

> +/*
> + * macros to convert each packed bitfield structure from little endian to big
> + * endian and vice versa. These are needed when creating or using a filesystem
> + * on a machine with different byte ordering to the target architecture.
> + *
> + */

hmm, OK.. Tell us more?

> + * bitfields and different bitfield placing conventions on differing
> + * architectures
> + */
> +
> +#include <asm/byteorder.h>
> +
> +#ifdef __BIG_ENDIAN
> + /* convert from little endian to big endian */
> +#define SQUASHFS_SWAP(value, p, pos, tbits) _SQUASHFS_SWAP(value, p, pos, \
> + tbits, b_pos)
> +#else
> + /* convert from big endian to little endian */
> +#define SQUASHFS_SWAP(value, p, pos, tbits) _SQUASHFS_SWAP(value, p, pos, \
> + tbits, 64 - tbits - b_pos)
> +#endif
> +
> +#define _SQUASHFS_SWAP(value, p, pos, tbits, SHIFT) {\
> + int bits;\
> + int b_pos = pos % 8;\
> + unsigned long long val = 0;\
> + unsigned char *s = (unsigned char *)p + (pos / 8);\
> + unsigned char *d = ((unsigned char *) &val) + 7;\
> + for(bits = 0; bits < (tbits + b_pos); bits += 8) \
> + *d-- = *s++;\
> + value = (val >> (SHIFT))/* & ((1 << tbits) - 1)*/;\
> +}

Can the standard leXX_to_cpu() helpers not be used here?

> +#include <linux/squashfs_fs.h>
> +
> +typedef struct {
> + unsigned int block;
> + int length;
> + unsigned int next_index;
> + char *data;
> + } squashfs_cache;

Whitespace inconsistency (column 1 for the closing brace is standard)

--- linux-2.6.11.3/init/do_mounts_rd.c 2005-03-13 06:44:30.000000000 +0000
+++ linux-2.6.11.3-squashfs/init/do_mounts_rd.c 2005-03-14 00:53:28.092559728 +0000

Your changelog didn't mention that squashfs interacts with the boot
process. That's the sort of thing which is nice to tell people about.

> +SQUASHFS FILESYSTEM
> +P: Phillip Lougher
> +M: phi...@lougher.demon.co.uk
> +W: http://squashfs.sourceforge.net
> +L: squashf...@lists.sourceforge.net
> +S: Maintained
> +

Lots of little comments, but I have no fundamental problems with the
patches as long as the bitfield issue is shown to be a non-issue.

Please respin the patches and unless someone else sees a showstopper I'll
merge them into -mm for further testing and review, thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Phillip Lougher

unread,
Mar 14, 2005, 9:32:12 PM3/14/05
to Andrew Morton, gr...@kroah.com, linux-...@vger.kernel.org

On Tuesday, March 15, 2005, at 01:06 am, Andrew Morton wrote:

> Phillip Lougher <phi...@lougher.demon.co.uk> wrote:
>
>> @@ -0,0 +1,439 @@
>
> [lots of comments from patch 1/2 are applicable here]
>

OK. Noted :-)

>> +#define SQUASHFS_MAX_FILE_SIZE ((long long) 1 << \
>> + (SQUASHFS_MAX_FILE_SIZE_LOG - 1))
>
> 1LL would suit here. Of a cast to loff_t.
>

OK

>> +typedef unsigned int squashfs_block;
>> +typedef long long squashfs_inode;
>
> squashfs_block_t and squashfs_inode_t, please. If one must use
> typedefs...
>

OK

They look pretty nasty, but are quite harmless really...

The structures represent on-disk structures. Squashfs tries to cram as
much information into
as small an area as possible on disk, which is why they're using
bitfields.

The structures are read into memory from disk into the bit field
structure, and the information
is immediately transferred to more sane 'int' structures inside the
inode or into private
Squashfs data, and all reads/writes take place from there. No writes
are made into the
bit fields, they're only used to temporarily 'parse' the packed data on
disk.

I've done a lot of checking to ensure portability across architectures
and against different
compiler versions. Gcc uniformly uses two representations for 'packed
structures', one for
little endian architectures and one for big endian architectures.
Little endian bitfield
structures are packed low-byte high byte order, allocating bitfields
from low bit to high bit in ints.
Big endian structures are packed high-byte low-byte order, allocating
bitfields from
high bit to low bit in ints (this incidently generates structures in
the bit/byte order
specified in the C source). The filling is done this way on different
endian architectures
as it allows the most efficient bit-field access code to be generated
for each endian
architecture.

I've checked compatibilty against Intel 32 and 64 bit architectures,
PPC 32/64 bit, ARM, MIPS
and SPARC. I've used compilers from 2.91.x upto 3.4...

>> +typedef struct {
>> + unsigned int inode_type:4;
>> + unsigned int mode:12; /* protection */
>> + unsigned int uid:8; /* index into uid table */
>> + unsigned int guid:8; /* index into guid table */
>> +} __attribute__ ((packed)) squashfs_base_inode_header;
>
> See, if one CUP is modifying `inode_type' while another CPU is
> modifying
> `mode', this struct can get trashed.

I agree. This is why the structures are never written to. Bit fields
are slow, I move
the data out as soon as possible.

>
>> +/*
>> + * macros to convert each packed bitfield structure from little
>> endian to big
>> + * endian and vice versa. These are needed when creating or using a
>> filesystem
>> + * on a machine with different byte ordering to the target
>> architecture.
>> + *
>> + */
>
> hmm, OK.. Tell us more?
>

As mentioned previously, there are two packed bit-field
representations, one
for big endian machines, and one for little endian machines. Squashfs
for
efficiency in embedded systems writes little endian filesystems (with
little
endian bit field structures) for little endian targets, and big endian
filesystems
for big endian targets. However, to allow non-native endian filesystems
(i.e. where the host is little endian but the target is big endian), to
be mounted,
Squashfs will swap the filesystem on a different endian machine.

Squashfs at filesystem mount time determines if the filesystem is
swapped with
respect to the host architecture. If it is then the packed bit-field
structures
read off disk are in the wrong endianness. Immediately after reading
off disk,
the structures are converted to the correct endianness for the
architecture, and
are then processed as normal.

Due to the different bit-field filling rules between big endian and
little
endian machines, bit fields are in different places within the structure
for each architecture, this means when coverting the endianness of
a structure the structure has to be converted as a whole. For each
bit field the macros are given the 'logical' position of the bit field
and
use that to find the bit-field in the non-native structure using the non
native structure filling rules.

No unfortunately not. The above hopefully describes why.

The swap macro is IMHO quite concise and efficient, the same macro
is used to swap from little-endian to big endian, and from big-endian to
little endian. The only difference is the _SQUASHFS_SWAP value which
either counts down from 64 bits to 0 (for high-bit low-bit filling
order on
big endian machines), or counts up from 0 to 64 (for low-bit high-bit
filling
order on little endian machines). For efficiency this value is
determined
at compile time.

I believe doing the work another way would make the code more difficult
to
understand and less efficient?


>> +#include <linux/squashfs_fs.h>
>> +
>> +typedef struct {
>> + unsigned int block;
>> + int length;
>> + unsigned int next_index;
>> + char *data;
>> + } squashfs_cache;
>
> Whitespace inconsistency (column 1 for the closing brace is standard)
>
> --- linux-2.6.11.3/init/do_mounts_rd.c 2005-03-13 06:44:30.000000000
> +0000
> +++ linux-2.6.11.3-squashfs/init/do_mounts_rd.c 2005-03-14
> 00:53:28.092559728 +0000
>
> Your changelog didn't mention that squashfs interacts with the boot
> process. That's the sort of thing which is nice to tell people about.
>

Ok.

>> +SQUASHFS FILESYSTEM
>> +P: Phillip Lougher
>> +M: phi...@lougher.demon.co.uk
>> +W: http://squashfs.sourceforge.net
>> +L: squashf...@lists.sourceforge.net
>> +S: Maintained
>> +
>
> Lots of little comments, but I have no fundamental problems with the
> patches as long as the bitfield issue is shown to be a non-issue.
>
> Please respin the patches and unless someone else sees a showstopper
> I'll
> merge them into -mm for further testing and review, thanks.
>
>

Thanks

Phillip

Andrew Morton

unread,
Mar 14, 2005, 10:03:46 PM3/14/05
to Phillip Lougher, gr...@kroah.com, linux-...@vger.kernel.org
Phillip Lougher <phi...@lougher.demon.co.uk> wrote:
>
> [ on-disk bitfields ]

>
> I've checked compatibilty against Intel 32 and 64 bit architectures,
> PPC 32/64 bit, ARM, MIPS
> and SPARC. I've used compilers from 2.91.x upto 3.4...

hm, OK. I remain a bit skeptical but it sounds like you're the expert. I
guess if things later explode it will be pretty obvious, and the filesystem
will need rework.

One thing which I assume we don't know at this stage is whether all 27
architectures work as expected - you can bet ia64 does it differently ;)

How does one test that? Create a filesystem-in-a-file via mksquashfs, then
transfer that to a different box, then try and mount and use it, I assume?

When you upissue these patches, please include in the changelog pointers to
the relevant userspace support tools - mksquashfs, fsck.squashfs, etc. I
guess http://squashfs.sourceforge.net/ will suit.

Also, this filesystem seems to do the same thing as cramfs. We'd need to
understand in some detail what advantages squashfs has over cramfs to
justify merging it. Again, that is something which is appropriate to the
changelog for patch 1/1.

Matt Mackall

unread,
Mar 14, 2005, 10:17:11 PM3/14/05
to Phillip Lougher, Andrew Morton, Greg KH, linux-...@vger.kernel.org
On Mon, Mar 14, 2005 at 04:30:33PM +0000, Phillip Lougher wrote:

> +config SQUASHFS_1_0_COMPATIBILITY
> + bool "Include support for mounting SquashFS 1.x filesystems"

How common are these? It would be nice not to bring in legacy code.

> +#define SERROR(s, args...) do { \
> + if (!silent) \
> + printk(KERN_ERR "SQUASHFS error: "s, ## args);\
> + } while(0)

Why would we ever want to be silent about something of KERN_ERR
severity? Isn't that a better job for klogd?

> +#define SQUASHFS_MAGIC 0x73717368
> +#define SQUASHFS_MAGIC_SWAP 0x68737173

Again, what's the story here? Is this purely endian conversion or do
filesystems of both endian persuasions exist? If the latter, let's not
keep that legacy. Pick an order, and use endian conversion functions
unconditionally everywhere.

> +#define SQUASHFS_COMPRESSED_SIZE_BLOCK(B) (((B) & \
> + ~SQUASHFS_COMPRESSED_BIT_BLOCK) ? (B) & \
> + ~SQUASHFS_COMPRESSED_BIT_BLOCK : SQUASHFS_COMPRESSED_BIT_BLOCK)

Shortening all these macro names would be nice..

> +typedef unsigned int squashfs_block;
> +typedef long long squashfs_inode;

Eh? Seems we can have many more inodes than blocks? What sorts of
volume limits do we have here?

> + unsigned int s_major:16;
> + unsigned int s_minor:16;

What's going on here? s_minor's not big enough for modern minor
numbers.

> +typedef struct {
> + unsigned int index:27;
> + unsigned int start_block:29;
> + unsigned char size;

Eep. Not sure how bit-fields handle crossing word boundaries, would be
surprised if this were very portable.

> + * macros to convert each packed bitfield structure from little endian to big
> + * endian and vice versa. These are needed when creating or using a filesystem
> + * on a machine with different byte ordering to the target architecture.
> + *
> + */

> +
> +#define SQUASHFS_SWAP_SUPER_BLOCK(s, d) {\
> + SQUASHFS_MEMSET(s, d, sizeof(squashfs_super_block));\
> + SQUASHFS_SWAP((s)->s_magic, d, 0, 32);\
> + SQUASHFS_SWAP((s)->inodes, d, 32, 32);\
> + SQUASHFS_SWAP((s)->bytes_used, d, 64, 32);\
> + SQUASHFS_SWAP((s)->uid_start, d, 96, 32);\
> + SQUASHFS_SWAP((s)->guid_start, d, 128, 32);\
> + SQUASHFS_SWAP((s)->inode_table_start, d, 160, 32);\
> + SQUASHFS_SWAP((s)->directory_table_start, d, 192, 32);\
> + SQUASHFS_SWAP((s)->s_major, d, 224, 16);\
> + SQUASHFS_SWAP((s)->s_minor, d, 240, 16);\
> + SQUASHFS_SWAP((s)->block_size_1, d, 256, 16);\
> + SQUASHFS_SWAP((s)->block_log, d, 272, 16);\
> + SQUASHFS_SWAP((s)->flags, d, 288, 8);\
> + SQUASHFS_SWAP((s)->no_uids, d, 296, 8);\
> + SQUASHFS_SWAP((s)->no_guids, d, 304, 8);\
> + SQUASHFS_SWAP((s)->mkfs_time, d, 312, 32);\
> + SQUASHFS_SWAP((s)->root_inode, d, 344, 64);\
> + SQUASHFS_SWAP((s)->block_size, d, 408, 32);\
> + SQUASHFS_SWAP((s)->fragments, d, 440, 32);\
> + SQUASHFS_SWAP((s)->fragment_table_start, d, 472, 32);\
> +}

Are those positions in bits? If you're going to go to the trouble of
swapping the whole thing, I think it'd be easier to just unpack the
and endian-convert the thing so that we didn't have the overhead of
bitfields and unpacking except at read/write time. Something like:

void pack(void *src, void *dest, pack_table_t *e);
void unpack(void *src, void *dest, pack_table_t *e);
size_t pack_size(pack_table_t);

where e is an array containing basically the info you have in the
above macros for each element: offset into unpacked structure,
starting bit in packed structure, and packed bits.

--
Mathematics is the supreme nostalgia of our time.

Greg KH

unread,
Mar 15, 2005, 12:47:51 AM3/15/05
to Phillip Lougher, Andrew Morton, linux-...@vger.kernel.org
On Mon, Mar 14, 2005 at 04:30:33PM +0000, Phillip Lougher wrote:
> +typedef unsigned int squashfs_block;
> +typedef long long squashfs_inode;

Try using u32 and u64 instead.

> +typedef unsigned int squashfs_uid;

Why is this a typedef?

> +
> +typedef struct squashfs_super_block {

Don't typedef structures, it's not the kernel way.

thanks,

greg k-h

Paulo Marques

unread,
Mar 15, 2005, 1:30:53 PM3/15/05
to Andrew Morton, Phillip Lougher, gr...@kroah.com, linux-...@vger.kernel.org
Andrew Morton wrote:
> [...]

> Also, this filesystem seems to do the same thing as cramfs. We'd need to
> understand in some detail what advantages squashfs has over cramfs to
> justify merging it. Again, that is something which is appropriate to the
> changelog for patch 1/1.

Well, probably Phillip can answer this better than me, but the main
differences that affect end users (and that is why we are using SquashFS
right now) are:
CRAMFS SquashFS

Max File Size 16Mb 4Gb
Max Filesystem Size 256Mb 4Gb?
UID/GID 8 bits 32 bits
Block Size 4K default 64k

Probably the block size is the most responsible for this, but the
compression ratio achieved by SquashFS is much higher than that achieved
with cramfs.

I just wanted to say one thing on behalf of SquashFS. We've been using
SquashFS in production on a POS system we sell, and we have currently
more than 1200 of these in use. There was never a problem reported that
involved SquashFS.

Although the workload patterns of these systems are probably very
similar (so the quantity doesn't really matter much), it is a real world
test of the filesystem, nevertheless.

--
Paulo Marques - www.grupopie.com

All that is necessary for the triumph of evil is that good men do nothing.
Edmund Burke (1729 - 1797)

Phillip Lougher

unread,
Mar 15, 2005, 1:44:15 PM3/15/05
to Andrew Morton, gr...@kroah.com, linux-...@vger.kernel.org
Andrew Morton wrote:
> Phillip Lougher <phi...@lougher.demon.co.uk> wrote:
>
>>[ on-disk bitfields ]
>>
>>I've checked compatibilty against Intel 32 and 64 bit architectures,
>> PPC 32/64 bit, ARM, MIPS
>> and SPARC. I've used compilers from 2.91.x upto 3.4...
>
>
> hm, OK. I remain a bit skeptical but it sounds like you're the expert. I
> guess if things later explode it will be pretty obvious, and the filesystem
> will need rework.
>
> One thing which I assume we don't know at this stage is whether all 27
> architectures work as expected - you can bet ia64 does it differently ;)
>
> How does one test that? Create a filesystem-in-a-file via mksquashfs, then
> transfer that to a different box, then try and mount and use it, I assume?
>

Yes, slow and laborious, but it works...

> When you upissue these patches, please include in the changelog pointers to
> the relevant userspace support tools - mksquashfs, fsck.squashfs, etc. I
> guess http://squashfs.sourceforge.net/ will suit.
>

OK.

> Also, this filesystem seems to do the same thing as cramfs. We'd need to
> understand in some detail what advantages squashfs has over cramfs to
> justify merging it. Again, that is something which is appropriate to the
> changelog for patch 1/1.
>

OK. Squashfs has much better compression and is much faster than
cramfs, which is why many embedded systems that used cramfs have moved
over to squashfs. Additionally squashfs is used in liveCDs (where
cramfs can't be used because of its max 256MB size limit), where it is
slowly taking over from cloop, again because it compresses better and is
faster.

Both these two groups have been asking for squashfs to be in the
mainline kernel.

I can put the above rationale and a pointer to some performance
statistics in the changelog, will that be sufficient?

Phillip

Phillip Lougher

unread,
Mar 15, 2005, 7:42:26 PM3/15/05
to Matt Mackall, Andrew Morton, Greg KH, linux-...@vger.kernel.org
Matt Mackall wrote:
> On Mon, Mar 14, 2005 at 04:30:33PM +0000, Phillip Lougher wrote:
>
>
>>+config SQUASHFS_1_0_COMPATIBILITY
>>+ bool "Include support for mounting SquashFS 1.x filesystems"
>
>
> How common are these? It would be nice not to bring in legacy code.
>

Squashfs 1.x filesystems were the previous file format. Embedded
systems tend to be conservative, and so there are quite a few systems
out there using 1.x filesystems. I've also heard of quite a few cases
where Squashfs is used as an archival filesystem, and so there's
probably quite a few 1.x fileystems around for this reason.

One issue which I'm aware of here is deciding what getting squashfs
support into the kernel is meant to answer. I'm asking for it to be put
into the kernel because developers out there are asking me to put it in
the kernel - because they don't want to continually (re)patch their kernels.

If I drop too much support from the kernel patch, then the kernel
squashfs support will not be adequate, and the developers will still
have to patch their kernels with my third-party patches.

Before I submitted this patch I factored out the Squashfs 1.x code into
a separate file only built if this option is selected. Obviously this
reduces the built kernel size (by 6K - 8K depending on architecture),
but doesn't address the issue of "legacy" code in the kernel.

If people don't want support for 1.x filesystems in the patch, then I
will drop it... Opinions?

>>+#define SERROR(s, args...) do { \
>>+ if (!silent) \
>>+ printk(KERN_ERR "SQUASHFS error: "s, ## args);\
>>+ } while(0)
>
>
> Why would we ever want to be silent about something of KERN_ERR
> severity? Isn't that a better job for klogd?
>

Silent is a parameter passed into the superblock read routine at mount
time. It appears to be intended to ensure the filesystem is silent
about failed mounts, which is what I use it for.

The macros is only used by the superbock read routine and so I'll
replace it with direct printks.

>
>>+#define SQUASHFS_MAGIC 0x73717368
>>+#define SQUASHFS_MAGIC_SWAP 0x68737173
>
>
> Again, what's the story here? Is this purely endian conversion or do
> filesystems of both endian persuasions exist? If the latter, let's not
> keep that legacy. Pick an order, and use endian conversion functions
> unconditionally everywhere.

This is _certainly_ not legacy code. Squashfs deliberately supports
filesystems of both endian persuasions for efficiency in embedded
systems. Swapping data structures is an unnecessary overhead which can
be avoided if the filesystem is in the native byte order - embedded
systems often need all the performance optimisations possible,
especially in the filesystem to reduce initial 'turn-on' start up delay.

Picking an order will impose unnecessary overhead on the losing
architecture. When Linux was almost exclusively running on little
endian machines, having little endian only filesystems probably didn't
matter (but still not nice in my view), however, Linux now runs on lots
of different architectures. In the embedded market the PowerPC (big
endian) makes up a large percentage of the machines running Linux.

In short SquashFS will always be a dual endian filesystem.

Incidently cramfs is also a dual endian filesystem (not by design, but
by virtue of the fact it writes filesystems in the host byte order).
No-one seems to be complaining there.

>
>
>>+#define SQUASHFS_COMPRESSED_SIZE_BLOCK(B) (((B) & \
>>+ ~SQUASHFS_COMPRESSED_BIT_BLOCK) ? (B) & \
>>+ ~SQUASHFS_COMPRESSED_BIT_BLOCK : SQUASHFS_COMPRESSED_BIT_BLOCK)
>
>
> Shortening all these macro names would be nice..
>
>
>>+typedef unsigned int squashfs_block;
>>+typedef long long squashfs_inode;
>
>
> Eh? Seems we can have many more inodes than blocks? What sorts of
> volume limits do we have here?

For efficiency Squashfs encodes the location of inode data on disk
within the inode number, this means the inode can be directly read
without an intermediate inode to disk block lookup. Because SquashFS
compresses metadata the inode data location consists of a tuple: the
location of the compressed block the inode is within, and the offset
within the uncompressed block of the inode data itself.

The filesystem can be 4GB in size which requires 32 bits for the block
location. An uncompressed metadata block is 8KB, which requires 13 bits
for the block offset. A Squashfs inode is consequently 45 bits in size.


>
>
>>+ unsigned int s_major:16;
>>+ unsigned int s_minor:16;
>
>
> What's going on here? s_minor's not big enough for modern minor
> numbers.
>

What is the modern size then?

>
>>+typedef struct {
>>+ unsigned int index:27;
>>+ unsigned int start_block:29;
>>+ unsigned char size;
>
>
> Eep. Not sure how bit-fields handle crossing word boundaries, would be
> surprised if this were very portable.

It is. Please see earlier reply on the same subject to Andrew Morton.

As mentioned in the previous reply to Andrew Morton, the macros _are_
simply endian converting and unpacking the data at disk read-off time,
once this is performed there is no further bit field overhead.

Andrew Morton

unread,
Mar 15, 2005, 8:01:01 PM3/15/05
to Phillip Lougher, m...@selenic.com, gr...@kroah.com, linux-...@vger.kernel.org
Phillip Lougher <phi...@lougher.demon.co.uk> wrote:
>
> >>+ unsigned int s_major:16;
> >>+ unsigned int s_minor:16;
> >
> >
> > What's going on here? s_minor's not big enough for modern minor
> > numbers.
> >
>
> What is the modern size then?

10 bits of major, 20 bits of minor.

As this is an on-disk thing, you're kinda stuck. A number of filesystems
have this problem. We used tricks in the inode to support it in ext2 and
ext3.

Matt Mackall

unread,
Mar 15, 2005, 8:09:14 PM3/15/05
to Phillip Lougher, Andrew Morton, Greg KH, linux-...@vger.kernel.org
On Tue, Mar 15, 2005 at 11:25:07PM +0000, Phillip Lougher wrote:
> Matt Mackall wrote:
> >
> >>+config SQUASHFS_1_0_COMPATIBILITY
> >>+ bool "Include support for mounting SquashFS 1.x filesystems"
> >
> >How common are these? It would be nice not to bring in legacy code.
>
> Squashfs 1.x filesystems were the previous file format. Embedded
> systems tend to be conservative, and so there are quite a few systems
> out there using 1.x filesystems. I've also heard of quite a few cases
> where Squashfs is used as an archival filesystem, and so there's
> probably quite a few 1.x fileystems around for this reason.
>
> One issue which I'm aware of here is deciding what getting squashfs
> support into the kernel is meant to answer. I'm asking for it to be put
> into the kernel because developers out there are asking me to put it in
> the kernel - because they don't want to continually (re)patch their kernels.

My suggestion would be to break out the 1.x code into a separate patch
and encourage everyone to convert to 2.x. No one has ever created a
1.x fs with the expectation it'll work on an unpatched kernel, so they
don't lose anything. And no one should be creating such any more, right?

> >>+ unsigned int s_major:16;
> >>+ unsigned int s_minor:16;
> >
> >What's going on here? s_minor's not big enough for modern minor
> >numbers.
>
> What is the modern size then?

Minors are 22 bits, majors are 10. May grow to 32 each at some point.

--
Mathematics is the supreme nostalgia of our time.

Matt Mackall

unread,
Mar 15, 2005, 11:21:00 PM3/15/05
to Phillip Lougher, Andrew Morton, Greg KH, linux-...@vger.kernel.org
On Tue, Mar 15, 2005 at 05:04:32PM -0800, Matt Mackall wrote:
> On Tue, Mar 15, 2005 at 11:25:07PM +0000, Phillip Lougher wrote:
> > >>+ unsigned int s_major:16;
> > >>+ unsigned int s_minor:16;
> > >
> > >What's going on here? s_minor's not big enough for modern minor
> > >numbers.
> >
> > What is the modern size then?
>
> Minors are 22 bits, majors are 10. May grow to 32 each at some point.

Both akpm and I remembered wrong, fyi. It's 12 major bits, 20 minor.

Pavel Machek

unread,
Mar 21, 2005, 5:20:47 AM3/21/05
to Paulo Marques, Andrew Morton, Phillip Lougher, gr...@kroah.com, linux-...@vger.kernel.org
Hi!

> >Also, this filesystem seems to do the same thing as cramfs. We'd need to
> >understand in some detail what advantages squashfs has over cramfs to
> >justify merging it. Again, that is something which is appropriate to the
> >changelog for patch 1/1.
>
> Well, probably Phillip can answer this better than me, but the main
> differences that affect end users (and that is why we are using SquashFS
> right now) are:
> CRAMFS SquashFS
>
> Max File Size 16Mb 4Gb
> Max Filesystem Size 256Mb 4Gb?

So we are replacing severely-limited cramfs with also-limited
squashfs... For live DVDs etc 4Gb filesystem size limit will hurt for
sure, and 4Gb file size limit will hurt, too. Can those be fixed?

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

Phillip Lougher

unread,
Mar 21, 2005, 12:37:18 PM3/21/05
to Pavel Machek, Paulo Marques, Andrew Morton, gr...@kroah.com, linux-...@vger.kernel.org
Pavel Machek wrote:
> Hi!
>
>
>>>Also, this filesystem seems to do the same thing as cramfs. We'd need to
>>>understand in some detail what advantages squashfs has over cramfs to
>>>justify merging it. Again, that is something which is appropriate to the
>>>changelog for patch 1/1.
>>
>>Well, probably Phillip can answer this better than me, but the main
>>differences that affect end users (and that is why we are using SquashFS
>>right now) are:
>> CRAMFS SquashFS
>>
>>Max File Size 16Mb 4Gb
>>Max Filesystem Size 256Mb 4Gb?
>
>
> So we are replacing severely-limited cramfs with also-limited
> squashfs...

I think that's rather unfair, Squashfs is significantly better than
cramfs. The main aim of Squashfs has been to achieve the best
compression (using zlib of course) of any filesystem under Linux - which
it does, while also being the fastest. Moving beyond the 4Gb limit has
been a goal, but it has been a secondary goal. For most applications
4Gb compressed (this equates to 8Gb or more of uncompressed data in most
usual cases) is ok.

> For live DVDs etc 4Gb filesystem size limit will hurt for
> sure, and 4Gb file size limit will hurt, too. Can those be fixed?

Almost everything can be fixed given enough time and money.
Unfortunately for Squashfs, I don't have much of either. I'm not paid
to work on Squashfs and so it has to be done in my free time. I'm hoping
to get greater than 4Gb support this year, it all depends on how much
free time I get.

Phillip

> Pavel

Mws

unread,
Mar 21, 2005, 1:10:21 PM3/21/05
to Pavel Machek, linux-...@vger.kernel.org, Andrew Morton
hi everybody, hi pavel

>On Monday 21 March 2005 11:14, you wrote:
> Hi!
>
> > >Also, this filesystem seems to do the same thing as cramfs. We'd need to
> > >understand in some detail what advantages squashfs has over cramfs to
> > >justify merging it. Again, that is something which is appropriate to the
> > >changelog for patch 1/1.
> >
> > Well, probably Phillip can answer this better than me, but the main
> > differences that affect end users (and that is why we are using SquashFS
> > right now) are:
> > CRAMFS SquashFS
> >
> > Max File Size 16Mb 4Gb
> > Max Filesystem Size 256Mb 4Gb?
>
> So we are replacing severely-limited cramfs with also-limited
> squashfs... For live DVDs etc 4Gb filesystem size limit will hurt for
> sure, and 4Gb file size limit will hurt, too. Can those be fixed?
>
> Pavel

no - squashfs _is_ indeed an advantage for embedded systems in
comparison to cramfs. why does everybody think about huge systems
with tons of ram, cpu power whatever - there are also small embedded systems
which have real small resources.

some notes maybe parts are OT - but imho it must be said someday

- reviewing the code is absolutely ok.
- adding comments helps the coder and also the users to understand
_how_ kernel coding is to be meant

- but why can't people just stop to blame every really good thing?

in this case it means:
of course cramfs and squashfs are to different solutions for saving data
in embedded environments like set-top-boxes, pda, ect., but there is
a need for having inventions as higher compression rates or more data security.

in other cases that means:
of course there are finished network drivers from Syskonnect/Marvel/Yukon
for the GBit network interfaces.
Also they were send to the ml. but nearly the same thing happened to them
reviewing the code, critics, and rejection of their code.

this all ends up in not supported hardware - or - someday supported hardware cause
somebody is in reel need of those features and just publishes the patches online like a
DIY-Patchset for different kernel versions.

Hasn't it been the aim of linux to run on different architectures, support lots of filesystems,
partition types, network adapters, bus-systems whatever?

but if there is a contribution from the outside - it is not taken "as is" and maybe fixed up, which
should be nearly possible in the same time like analysing and commenting the code - it ends up
in having less supported hardware.

imho if a hardware company does indeed provide us with opensource drivers, we should take these
things as a gift, not as a "not coding guide a'like" intrusion which has to be defeated.

ready to take your comments :)

regards
marcel


Willy Tarreau

unread,
Mar 22, 2005, 12:54:02 AM3/22/05
to Pavel Machek, Phillip Lougher, Paulo Marques, Andrew Morton, gr...@kroah.com, linux-...@vger.kernel.org

Hi Pavel,

On Mon, Mar 21, 2005 at 08:00:44PM +0100, Pavel Machek wrote:

> Perhaps squashfs is good enough improvement over cramfs... But I'd
> like those 4Gb limits to go away.

Well, squashfs is an *excellent* filesystem with very high compression ratios
and high speed on slow I/O devices such as CDs. I now use it to store my root
FS in initrd, and frankly, having a fully functionnal OS in an image as small
as 7 MB is "a good enough improvement over cramfs".

If the 4 GB limit goes away one day, I hope it will not increase overall
image size significantly, because *this* would then become a regression.
Perhaps it would simply need to be a different version and different format
(eg: squashfs v3) just as we had ext, then ext2, or jffs then jffs2, etc...

Cheers,
Willy

Stefan Smietanowski

unread,
Mar 22, 2005, 12:49:43 AM3/22/05
to Phillip Lougher, Pavel Machek, Paulo Marques, Andrew Morton, gr...@kroah.com, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi.

> I have agreed to drop V1.0 support, and yes (as explained in another
> emauil), breaking the 4GB limit does involve on-disk format change.

I've only also been reading this thread with half an eye but :

Would it be possible (in some logical timeframe) to change the
filesystem's on-disk format to support larger sizes without
actually changing the rest of the code?

I don't know where the 4GB limit comes from in this case but if you
would change the on-disk format, the format itself, then I would
think it would make it easier to swallow the filesystem and then
when it's in the kernel you can actually make it support more
than 4GB.

Then there at least wouldn't need to be a switch in the format
when it's in the kernel.

Just my thought - just feels like it might make it included faster.

And hell, if it's not possible, just ignore what I wrote.

// Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (MingW32)

iD8DBQFCP68hBrn2kJu9P78RAoGVAJ9a2cjFAv6NW8qyd336wEK5VcJf7gCfV5Oc
gswa6cSH7o3ND+lse64LLxI=
=D8rp
-----END PGP SIGNATURE-----

Phillip Lougher

unread,
Mar 22, 2005, 12:24:41 AM3/22/05
to Pavel Machek, Paulo Marques, Andrew Morton, gr...@kroah.com, linux-...@vger.kernel.org
Pavel Machek wrote:
> Hi!

>
>
>>>Perhaps squashfs is good enough improvement over cramfs... But I'd
>>>like those 4Gb limits to go away.
>>
>>So would I. But it is a totally groundless reason to refuse kernel
>>submission because of that, Squashfs users are quite happily using it
>>with such a "terrible" limitation. I'm asking for Squashfs to be put in
>>the kernel _now_ because users are asking me to do it _now_. If it
>
>
> Putting it into kernel because users want it is... not a good
> reason. You should put it there if it is right thing to do. I believe
> you should address those endianness issues and drop V1 support. If
> breaking 4GB limit does not involve on-disk format change, it may be
> okay to merge. After code is merged, doing format changes will be
> hard...
>
> Pavel

So users don't matter anymore, now that's a terrible admission to make.
Linux wouldn't be where it is today without all those "mere" users.

I obviously think putting Squashfs into the kernel is the right thing to do.

The filesystem is endian safe and has been since the first release - it
works on big endian and little endian, and every architecure I've tried
it on it works (Intel 32/64, PowerPC 32/64. MIPS, ARM, Sparx). The
endian code which everyone seems to have got so worked up about is there
to _make_ it endian safe. I've already explained why making Squashfs
natively support both little endian and big endian is important for
embedded systems.

I have agreed to drop V1.0 support, and yes (as explained in another
emauil), breaking the 4GB limit does involve on-disk format change.

Paul Jackson

unread,
Mar 22, 2005, 12:54:02 AM3/22/05
to Phillip Lougher, ak...@osdl.org, jd...@us.ibm.com, pa...@suse.cz, pmar...@grupopie.com, gr...@kroah.com, linux-...@vger.kernel.org
It is not so much selling, in my view, as putting in context.

If one can simply explain to others what is before them, so
that they can quickly understand its purposes, scope, architecture,
limitations, alternatives, and such, then others can quickly
evaluate what it is, and whether such seems like a good idea.

It doesn't necessarily mean they buy it any quicker. Sometimes it
just means it gets shot down quicker ;). That's ok.

There's a _lot_ of stuff that flows by here ... be gentle and
helpfully informative to the poor reader ... as best you can.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <p...@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

Stefan Smietanowski

unread,
Mar 22, 2005, 2:30:46 AM3/22/05
to Mws, Pavel Machek, Phillip Lougher, Paulo Marques, Andrew Morton, gr...@kroah.com, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> what do you need e.g. reiserfs 4 for? or jfs? or xfs? does not ext2/3
> the journalling job also?

Ext2 does not do journaling. Ext3 does.

>> Perhaps squashfs is good enough improvement over cramfs... But I'd
>> like those 4Gb limits to go away.
>>

> we all do - but who does really care about stupid 4Gb limits on embedded
> systems with e.g.
> 8 or 32 Mb maybe more of Flash Ram? really noboby

Then if this filesystem is specifically targeted ONLY on embedded
then that's reason for keeping it out-of-tree.

> if you want to have a squashfs for DVD images e.g. not 4.7Gb but
> DualLayer ect., why do you complain?
> you are maybe not even - nor you will be - a user of squashfs. but there

But if a filesystem COULD be made to work for MORE users - why not?

I'm sure that more than a few might use it in some form if such a limit
is removed - why lock us into a corner that when we do get around
to fixing it we need a new on-disk format and then we might have a new
filesystem, squashfs2 or whatever.

> are many people outside that use
> squashfs on different platforms and want to have it integrated to
> mainline kernel. so why are you blocking?

I think that's because people see a potential in it that has a flaw
that should be taken care of so that MORE people can use it, and
not ONLY "embedded people with 8 or 32 MB".

Seriously, noone's flaming here - I think what people want is
for a limit to be removed, and that is not in my eyes a bad thing.

// Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (MingW32)

iD8DBQFCP8cGBrn2kJu9P78RAsTnAKCfslYF0ez4Wkt5xgKs7AXXp1KlUgCgt0y/
pX+t5HtVhQ+EvIo667XaDBA=
=Q6RX
-----END PGP SIGNATURE-----

Phillip Lougher

unread,
Mar 21, 2005, 11:54:09 PM3/21/05
to Andrew Morton, Josh Boyer, pa...@suse.cz, pmar...@grupopie.com, gr...@kroah.com, linux-...@vger.kernel.org
Andrew Morton wrote:
> Josh Boyer <jd...@us.ibm.com> wrote:
>
>>This is a useful, stable, and _maintained_ filesystem and I'm a bit
>> surprised that there is this much resistance to it's inclusion.
>
>
> Although I've only been following things with half an eye, I don't think
> there's a lot of resistance. It's just that squashfs's proponents are
> being asked to explain the reasons why the kernel needs this filesystem.
> That's something into which no effort was made in the initial patch release
> (there's a lesson there).

That is my fault. When I did the patch I was concentrating on providing
code not on "selling" the filesystem. This is probably a cultural
thing, coming from Britain, I actually thought such "strong arm" sales
tatics would be tasteless and inappropropriate.

Phillip Lougher

unread,
Mar 22, 2005, 12:32:41 AM3/22/05
to Pavel Machek, Mws, linux-...@vger.kernel.org
Pavel Machek wrote:

>
> And people merging xfs/reiserfs4/etc did address problems pointed out
> in their code.
>

Where did I say I wasn't addressing the problems pointed out in the
code. All the issues I can fix I am addressing.

Pavel

Mws

unread,
Mar 22, 2005, 2:08:59 AM3/22/05
to Pavel Machek, Phillip Lougher, Paulo Marques, Andrew Morton, gr...@kroah.com, linux-...@vger.kernel.org
Pavel Machek wrote:
-snip-

>>>So we are replacing severely-limited cramfs with also-limited
>>>squashfs...
>>>
>>>

>>I think that's rather unfair, Squashfs is significantly better than
>>cramfs. The main aim of Squashfs has been to achieve the best
>>
>>
>

>Yes, it *is* rather unfair. Sorry about that. But having 2 different
>limited compressed filesystems in kernel does not seem good to me.


>
>
>
what do you need e.g. reiserfs 4 for? or jfs? or xfs? does not ext2/3
the journalling job also?

is there really a need for cifs and samba and ncpfs and nfs v3 and nfs
v4? why?

-snip-

>Well, out-of-tree maintainenance takes lot of time, too, so by keeping
>limited code out-of-kernel we provide quite good incentive to make
>those limits go away.


>
>Perhaps squashfs is good enough improvement over cramfs... But I'd
>like those 4Gb limits to go away.

> Pavel


>
>
we all do - but who does really care about stupid 4Gb limits on embedded
systems with e.g.
8 or 32 Mb maybe more of Flash Ram? really noboby

if you want to have a squashfs for DVD images e.g. not 4.7Gb but

DualLayer ect., why do you complain?
you are maybe not even - nor you will be - a user of squashfs. but there

are many people outside that use
squashfs on different platforms and want to have it integrated to
mainline kernel. so why are you blocking?

did you have a look at the code? did you find a "trojan horse"?
no and no? so why are you blocking? if the coding style is not that what
nowadays kernel coder have as
coding style? if you care - fix it - otherwise give hints and other
people will do.

regards
marcel

Pavel Machek

unread,
Mar 22, 2005, 1:50:26 AM3/22/05
to Mws, Phillip Lougher, linux-...@vger.kernel.org
Hi!

[I'm not sure if I should further feed the trolls.]

> >Yes, it *is* rather unfair. Sorry about that. But having 2 different
> >limited compressed filesystems in kernel does not seem good to me.

> what do you need e.g. reiserfs 4 for? or jfs? or xfs? does not ext2/3
> the journalling job also?
> is there really a need for cifs and samba and ncpfs and nfs v3 and nfs
> v4? why?

Take a look at debate that preceded xfs merge. And btw reiserfs4 is
*not* merged.

And people merging xfs/reiserfs4/etc did address problems pointed out
in their code.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

Pavel Machek

unread,
Mar 22, 2005, 2:06:58 AM3/22/05
to Mws, kernel list
Hi!

> >>>>Well, probably Phillip can answer this better than me, but the main
> >>>>differences that affect end users (and that is why we are using
> >>>>SquashFS right now) are:
> >>>> CRAMFS SquashFS
> >>>>
> >>>>Max File Size 16Mb 4Gb
> >>>>Max Filesystem Size 256Mb 4Gb?
> >>>>
> >>>>

> >>>So we are replacing severely-limited cramfs with also-limited

> >>>squashfs... For live DVDs etc 4Gb filesystem size limit will hurt for
> >>>sure, and 4Gb file size limit will hurt, too. Can those be fixed?
> >>>
> >>>
> >

> >...


> >
> >
> >>but if there is a contribution from the outside - it is not taken "as is"
> >>and maybe fixed up, which
> >>should be nearly possible in the same time like analysing and commenting
> >>the code - it ends up
> >>in having less supported hardware.
> >>
> >>imho if a hardware company does indeed provide us with opensource
> >>drivers, we should take these
> >>things as a gift, not as a "not coding guide a'like" intrusion which
> >>has to be defeated.
> >

> >Remember that horse in Troja? It was a gift, too.

> of course there had been a horse in troja., but thinking like that
> nowadays is a bit incorrect - don't you agree?
>
> code is reviewed normally - thats what i told before and i stated as
> good feature - but there is no serious reason
> to blame every code to have potential "trojan horses" inside and to
> reject it.

I should have added a smiley.

I'm not seriously suggesting that it contains deliberate problem. But
codestyle uglyness and arbitrary limits may come back and haunt us in
future. Once code is in kernel, it is very hard to change on-disk
format, for example.

Reply all
Reply to author
Forward
0 new messages