Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

find problem finding big files

63 views
Skip to first unread message

jammer

unread,
Oct 1, 2012, 2:51:43 PM10/1/12
to
I thought these would be the same but the first one finds files but the second doesn't.

$ find . -name \* -size 1M
$ find . -name \* -size 1024k

Lem Novantotto

unread,
Oct 1, 2012, 4:38:36 PM10/1/12
to
jammer ha scritto:
1M means 1 *single* unit (of 1048576 bytes).
A 3 bytes file does indeed use (part of) 1 single unit of 1048576 bytes.
A 1048577 bytes file uses 2 single unit of 1048576 bytes (the first unit,
and one byte of the second unit).
So: with "-size 1M" you'll catch al files whose size S is: 0 < S <=
1048576 bytes

1024k means 1024 units (of 1024 bytes each).
A 3 bytes file does not use 1024 of these units: it uses just (part of)
the first unit.
A 1048500 bytes file does use 1024 of these units (the first 1023, and
part of the 1024th).
So: with "-size 1024k" you'll catch al files whose size S is:
1024*1023 < S <= 1024*1024 bytes

1048576B means exactly 1048576 bytes.
So: with 1048576B you'll catch al files whose size S is exactly 1048576
bytes, and nothing else.
--
Bye, Lem
Ceterum censeo ISLAM esse delendum
_________________________________________________________________
Non sprecare i cicli idle della tua CPU, né quelli della tua GPU.
http://www.worldcommunitygrid.org/index.jsp
http://www.rnaworld.de/rnaworld/ http://home.edges-grid.eu/home/
http://www.gpugrid.net/

Lem Novantotto

unread,
Oct 1, 2012, 4:43:59 PM10/1/12
to
Lem Novantotto ha scritto:

> 1048576B means exactly 1048576 bytes.
> So: with 1048576B you'll catch al files whose size S is exactly 1048576
> bytes, and nothing else.

Errata corrige:

1048576c means exactly 1048576 bytes.
So: with "-size 1048576c" you'll catch al files whose size S is exactly

Kaz Kylheku

unread,
Oct 1, 2012, 4:59:56 PM10/1/12
to
On 2012-10-01, Lem Novantotto <Le...@Hotmail.com> wrote:
> jammer ha scritto:
>
>> I thought these would be the same but the first one finds files but the
> second doesn't.
>>
>> $ find . -name \* -size 1M
>> $ find . -name \* -size 1024k
>
> 1M means 1 *single* unit (of 1048576 bytes).

Another thing: maybe the OP thinks that "-name \*" matches all names?

This predicate actually means "match all directory entries whose name does not
begin with a dot".

To find files regardless of name, simply do not include a -name predicate.

Wayne

unread,
Oct 1, 2012, 11:48:05 PM10/1/12
to
On 10/1/2012 4:59 PM, Kaz Kylheku wrote:
> On 2012-10-01, Lem Novantotto <Le...@Hotmail.com> wrote:
>> jammer ha scritto:
>>
>>> I thought these would be the same but the first one finds files but the
>> second doesn't.
>>>
>>> $ find . -name \* -size 1M
>>> $ find . -name \* -size 1024k

Neither "k", "K", "m", or "M" are standard, so you need to read your
find command's man page to see how it handles that.

"-size n" is true if the file size in bytes, divided by
512 and rounded up to the next integer, is n. To use bytes,
use "-size 1048576c". To find files bigger than that, use
"-size +1048576c". To find files smaller than that, use
"-size -1048576c".
>>
>> 1M means 1 *single* unit (of 1048576 bytes).
>
> Another thing: maybe the OP thinks that "-name \*" matches all names?
>
> This predicate actually means "match all directory entries whose name does not
> begin with a dot".

No, it does match all names, including dot files. But you're right that it
should be excluded, since it does nothing.

>
> To find files regardless of name, simply do not include a -name predicate.
>

If the intent was to ignore any dot files, and to not look in hidden directories
either, then something like this will work:

find \( -type d -name '.[!.]*' -prune \) \
-o \( ! -type d -name '[!.]*' \) \
-size whatever -print

--
Wayne

Janis Papanagnou

unread,
Oct 2, 2012, 3:19:12 AM10/2/12
to
On 02.10.2012 05:48, Wayne wrote:
> On 10/1/2012 4:59 PM, Kaz Kylheku wrote:
>> On 2012-10-01, Lem Novantotto <Le...@Hotmail.com> wrote:
>>> jammer ha scritto:
>>>
>>>> I thought these would be the same but the first one finds files but the
>>> second doesn't.
>>>>
>>>> $ find . -name \* -size 1M
>>>> $ find . -name \* -size 1024k
>
> Neither "k", "K", "m", or "M" are standard, so you need to read your
> find command's man page to see how it handles that.
>
> "-size n" is true if the file size in bytes, divided by
> 512 and rounded up to the next integer, is n. To use bytes,
> use "-size 1048576c". To find files bigger than that, use
> "-size +1048576c". To find files smaller than that, use
> "-size -1048576c".
>>>
>>> 1M means 1 *single* unit (of 1048576 bytes).
>>
>> Another thing: maybe the OP thinks that "-name \*" matches all names?
>>
>> This predicate actually means "match all directory entries whose name does not
>> begin with a dot".
>
> No, it does match all names, including dot files.

I find it astonishing that find's -name arguments are expanded by shell
syntax and find's directory arguments are not. (In shell a '*' does not
expand dot-files.)

Janis

Kaz Kylheku

unread,
Oct 2, 2012, 1:31:49 PM10/2/12
to
On 2012-10-02, Wayne <nos...@all.invalid> wrote:
> On 10/1/2012 4:59 PM, Kaz Kylheku wrote:
>> Another thing: maybe the OP thinks that "-name \*" matches all names?
>>
>> This predicate actually means "match all directory entries whose name does not
>> begin with a dot".
>
> No, it does match all names, including dot files. But you're right that it
> should be excluded, since it does nothing.

Funny; you know, I had experimentally verified that claim before posting.
Or thought I had. I will have to go back to that particular terminal session
on that particular virtual machine to see what happened.

Lem Novantotto

unread,
Oct 2, 2012, 1:57:14 PM10/2/12
to
Wayne ha scritto:

> No, it does match all names, including dot files.

It depends.

In *bash*, it depends:
When a pattern is used for pathname expansion, the character ``.'' at the
start of a name or immediately following a slash must be matched
explicitly, unless the shell option dotglob is set...

In *sh*, never:
...a pattern cannot match a string starting with a period unless the
first character of the pattern is a period...

In *ksh* (at least ksh93):
If FIGNORE is set, then each file name component that matches the pat‐
tern defined by the value of FIGNORE is ignored when generating the
matching filenames.
The names . and .. are also ignored. If FIGNORE is not set, the
character . at the start of each file name component will be
ignored unless the first character of the pattern corresponding to this
component is the character . itself. Note, that for other
uses of pattern matching the / and . are not treated specially.
∗ Matches any string, including the null string. When used
for filename expansion, if the globstar option is on, two adjacent ∗'s
by itself will match all files and zero or more directories and
subdirectories. If followed by a / then only directories and
subdirectories will match.

Thomas 'PointedEars' Lahn

unread,
Oct 2, 2012, 7:14:04 PM10/2/12
to
That is all very interesting regarding shells, but with regard to the
`-name' option of find(1) that is being discussed here, most certainly it is
a difference between *GNU* find(1) – which is common, but not necessary,
where *GNU* bash is primarily used – and non-GNU find(1)s (which are usually
found on "real" Unices) instead.

--
PointedEars

Twitter: @PointedEars2
Please do not Cc: me. / Bitte keine Kopien per E-Mail.

Lem Novantotto

unread,
Oct 3, 2012, 3:03:27 AM10/3/12
to
Thomas 'PointedEars' Lahn ha scritto:

> That is all very interesting regarding shells, but with regard to the
> `-name' option of find

Oh, very true. Thanks for pointing it out.

I see an escaped asterisk... and I write of shell pathname expansion and
files generation and so.

LOL, one of my "who's stolen my brain?" moments.
What... What did you say? That I don't have "who's stolen my brain?"
moments, I just happen to have a few "my brain's back" moments? ;-p
Well, I guess you're probably right. :(

Wayne

unread,
Oct 3, 2012, 3:24:27 AM10/3/12
to
Actually, GNU find does obey the POSIX rules on "-name \*" too, in that it
will match dot files. Here's a demo from a Linux system:
$ find --version | head -n 1
find (GNU findutils) 4.5.10
$ find -maxdepth 1 -name \* | grep '^\./\.bash'
./.bash_logout
./.bash_history
./.bash_profile
./.bashrc

As you can see, the "*" does match leading dots. I don't know of any
version of find where it doesn't.

--
Wayne

Geoff Clare

unread,
Oct 3, 2012, 8:51:47 AM10/3/12
to
Wayne wrote:

>> That is all very interesting regarding shells, but with regard to the
>> `-name' option of find(1) that is being discussed here, most certainly it is
>> a difference between *GNU* find(1) ? which is common, but not necessary,
>> where *GNU* bash is primarily used ? and non-GNU find(1)s (which are usually
>> found on "real" Unices) instead.
>>
>
> Actually, GNU find does obey the POSIX rules on "-name \*" too, in that it
> will match dot files.

Only since Oct 2004. Here's the ChangeLog entry (sanitised):

2004-10-22 James Youngman <jay@xxxxxxxx>

[...]

* find/testsuite/find.gnu/name-period.xo,
find/testsuite/find.gnu/name-period.exp,
find/find.1, doc/find.texi:
The -name predicate must allow '*' to match '.foo' as demanded by
IEEE Std 1003.2-1992 Interpretation #126.

* find/pred.c:
Remove use of FNM_PERIOD for -name as demanded by IEEE Std
1003.2-1992 Interpretation #126

Historically, some of the commercial UNIX systems also got it wrong in
the mid 1990s because they misinterpreted POSIX. (This being why
Interpretation #126 was raised.) Correctly interpreting POSIX
relies on the subtle distinction between expansion operations
and matching operations.

--
Geoff Clare <net...@gclare.org.uk>


Geoff Clare

unread,
Oct 3, 2012, 8:42:57 AM10/3/12
to
Janis Papanagnou wrote:

> I find it astonishing that find's -name arguments are expanded [...]

They are not "expanded". The -name primary performs a matching
operation. Think of it as working like a case statement in the shell.

$ case .foo in
> *) echo yes;;
> esac
yes

$ find .foo -name \*
.foo

--
Geoff Clare <net...@gclare.org.uk>

Sven Mascheck

unread,
Oct 7, 2012, 9:31:54 PM10/7/12
to
Geoff Clare wrote:

> Historically, some of the commercial UNIX systems also got it wrong in
> the mid 1990s because they misinterpreted POSIX. (This being why
> Interpretation #126 was raised.) Correctly interpreting POSIX
> relies on the subtle distinction between expansion operations
> and matching operations.

Before POSIX it must have been historical origin: 7th edition find
called glob(3) and thus handled the dot special like the shell did.

I once had a look at some implementations
(http://www.in-ulm.de/~mascheck/various/find/#match_expand)
and found the older behaviour f.i. on

7th edition
BSD until 4.3-Tahoe (4.3-Reno switched to fnmatch(3))
OpenServer 5.x (SVR3-like) and 6.0 /bin and /bin/posix/ (but not anymore 6.0 /u95/bin/)
some more SVR4-variants

all SunOS 5.x /bin/find (but not /usr/xpg4/bin/)
busybox-1.01 (don't know when the GNU fix found its way to busybox)

Geoff Clare

unread,
Oct 8, 2012, 8:35:21 AM10/8/12
to
Interesting. My experience at the time was that all the versions of
find I had used, before POSIX standardised it, matched dot files.
I distinctly remember being annoyed when the company I worked for
updated one of its commercial UNIX systems to a new release and the
behaviour of find changed to not match dot files. I initially
blamed POSIX for this change. But it turned out it was not POSIX's
fault; the vendor had misinterpreted POSIX.

--
Geoff Clare <net...@gclare.org.uk>

0 new messages