'file' doesn't recognize PNG or report depths/sizes on image formats; should it?

29 views
Skip to first unread message

Anthony Sorace

unread,
Jul 2, 2025, 1:07:37 AMJul 2
to plan9port-dev
I modified my Plan 9 file.c to report depths and image sizes for PNG files:

:; file $home/lib/web/image/memo.png
/usr/a/lib/web/image/memo.png: PNG image, depth 8, size 329x329

I went to do the same for plan9port's and found it doesn't recognize PNG at all. I assume this is a historical accident because Plan 9's didn't either when p9p got started and nobody's bothered yet. Any objection to adding it? I'm happy to convert the patch.

More specifically, are we okay with reporting the size? The existing 'file' reports depth for Plan 9 bitmaps, but I think that's it today.

I did PNG first because that's what I needed at the moment, but I'd like to add depth and size reporting to other image formats. But things like jpeg are already recognized, and reported on without the additional info. Does adding that, in the format above adding ", depth N, size NxN", seem okay?

Ethan Azariah

unread,
Jul 2, 2025, 5:33:41 PMJul 2
to plan9p...@googlegroups.com
On Wed, Jul 2, 2025, at 6:07 AM, Anthony Sorace wrote:
>
> I did PNG first because that's what I needed at the moment, but I'd
> like to add depth and size reporting to other image formats. But things
> like jpeg are already recognized, and reported on without the
> additional info. Does adding that, in the format above adding ", depth
> N, size NxN", seem okay?

I'm not a regular 9 user any more, but every time I use the version of file common to many POSIX systems, I struggle with the deluge of information it provides. Reporting depth and size of an image isn't a whole lot of information, but it is the start of a slippery slope. If depth and size are added, they will be used as precedent to add more information. For example, why not report the number of channels in a tiff or tga?

Moving to other format types, why not list partitions when querying a drive or drive image? This demonstrates the problem I most often have with file on POSIX systems. It results in several lines of visual chaos per drive when all I ever want to know is whether it's partitioned and/or bootable. It gets worse when the bootsector also contains a FAT parameter block because all the unnecessary details from that will be spewed out too.

Image size is relatively easy to obtain in the Plan 9 ecosystem. run png [jpg, gif, etc.] with -9 to output Plan 9 image data, and pipe the output into a small script to extract the width height and depth. This is the image info script I used to use:

$ cat ii
#!/bin/awk -f

/^compressed$/ {
next
}
{
print $1, "(" $4-$2, $5-$3 ") (" $2, $3, $4, $5 ")"
nextfile
}

(It should probably print FILENAME too. I was so adept at writing `for` loops that I never got around to adding it.)

Anthony Sorace

unread,
Jul 2, 2025, 6:44:06 PMJul 2
to plan9port-dev
I get the argument (it's why I asked that part). But while I agree unix's 'file' is off the deep end (Exif data can really blow up reporting on jpegs, for example), "slippery slope" arguments are always a little suspect.

(It's a useful reminder anyway. I'm mostly done adding depth and resolution to my 'file' for jpeg, and along the way you have to examine the bit that says whether it's a JFIF, EXIF, IPP Color Profile, or whatever. "Well I might as well print it" definitely crossed my mind, despite the fact that I've never once cared.)

And I've done things similar to your ii script before. But it's a lot more work for the CPU. My immediate use case driving this is figuring out what sorts of scaled images I want to generate for a directory of images on the web. For one directory with only 10 PNGs in it:

:; time file $home/lib/web/image/*.png > /dev/null
0.00u 0.01s 0.02r file /usr/a/lib/web/image/jocelyn.x2y2o8t1.png ...
:; time rc -c 'for (i in $home/lib/web/image/*.png) {echo -n $i'' ''; png -9 $i >[2]/dev/null | /tmp/ii}' > /dev/null
1.62u 0.12s 1.76r rc -c for (i in $home/lib/web/image/*.png) {echo -n $i' '; png -9 $i >[2]/dev/null | /tmp/ii}

That's quite a multiplier.

I think the alternative isn't passing through the Plan 9 image format, but a dedicated "imgtype" (like doctype) command that prints out more info about the target file, with less concern for conciseness... but a lot of that command would be duplicating what's already in 'file', which doesn't feel great, either.

Depth and resolution feel like the correct upper bound of info for image types to me, but I don't feel strongly about it. I'd appreciate other opinions.
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "plan9port-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to plan9port-de...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/plan9port-dev/2d16af40-51be-442c-b78e-d52b7cb31a71%40app.fastmail.com.

Ethan Azariah

unread,
Jul 13, 2025, 6:11:14 AMJul 13
to plan9p...@googlegroups.com
On Wed, Jul 2, 2025, at 11:43 PM, Anthony Sorace wrote:
> I get the argument (it's why I asked that part). But while I agree
> unix's 'file' is off the deep end (Exif data can really blow up
> reporting on jpegs, for example), "slippery slope" arguments are always
> a little suspect.
>
> (It's a useful reminder anyway. I'm mostly done adding depth and
> resolution to my 'file' for jpeg, and along the way you have to examine
> the bit that says whether it's a JFIF, EXIF, IPP Color Profile, or
> whatever. "Well I might as well print it" definitely crossed my mind,
> despite the fact that I've never once cared.)

Yeah, lots of little "might as well"s add up to a lot. :)

> And I've done things similar to your ii script before. But it's a lot
> more work for the CPU. My immediate use case driving this is figuring
> out what sorts of scaled images I want to generate for a directory of
> images on the web. For one directory with only 10 PNGs in it:
>
> :; time file $home/lib/web/image/*.png > /dev/null
> 0.00u 0.01s 0.02r file /usr/a/lib/web/image/jocelyn.x2y2o8t1.png ...
> :; time rc -c 'for (i in $home/lib/web/image/*.png) {echo -n $i'' '';
> png -9 $i >[2]/dev/null | /tmp/ii}' > /dev/null
> 1.62u 0.12s 1.76r rc -c for (i in $home/lib/web/image/*.png) {echo -n
> $i' '; png -9 $i >[2]/dev/null | /tmp/ii}
>
> That's quite a multiplier.

It is indeed, but...

> I think the alternative isn't passing through the Plan 9 image format,
> but a dedicated "imgtype" (like doctype) command that prints out more
> info about the target file, with less concern for conciseness... but a
> lot of that command would be duplicating what's already in 'file',
> which doesn't feel great, either.

Sometimes, a little duplication can save a lot of time. I've even heard of code duplication in Plan 9, though I forget where. (I remembered that there was duplication because at the time, I was kind-of obsessed with finding the causes of bloat.) A major cause of bloat is writing general-purpose code. I'd write a C program to do just what you want with the images. It'll be quicker than deciding what should go into imgtype. ;) I recently saw this *massive* discussion email about acme over what would be a 5-minute coding job in a simple editor without a file interface. Write your own code! :D

It doesn't take a lot of code to distinguish PNG and GIF, they both have very sensible magic numbers at the start of the file. They also have relatively consistent headers if I remember right, easier than JPEG. BMP has an overly short magic number, but you don't need BMP for the Web. I have no idea about webm and webp.

I actually gave up on my image code because I was trying to add image display to a program which already had a little file manager, text editing, a document creator, and a compiler! lol It was somebody else's old codebase which seemed simple until you saw it had got a bit messy over the years, and with all those different features, there were a lot of differnt little messes. I'm starting from scratch with smaller programs now. My big difference from Plan 9 is that I'm not going to make the API the UI. My equivalent to file will be able to supply a lot of data to other programs without overwhelming the user. It and many other programs will output structured data with named fields. The terminal-equivalent will default to only showing a few fields but will give the user full control.

Devon H. O'Dell

unread,
Jul 13, 2025, 7:45:42 AMJul 13
to a...@9srv.net, plan9port-dev
Oops, I forgot to finish my draft.

On Wed, Jul 2, 2025 at 6:44 PM Anthony Sorace <a...@9srv.net> wrote:
I get the argument (it's why I asked that part). But while I agree unix's 'file' is off the deep end (Exif data can really blow up reporting on jpegs, for example), "slippery slope" arguments are always a little suspect.

(It's a useful reminder anyway. I'm mostly done adding depth and resolution to my 'file' for jpeg, and along the way you have to examine the bit that says whether it's a JFIF, EXIF, IPP Color Profile, or whatever. "Well I might as well print it" definitely crossed my mind, despite the fact that I've never once cared.)

And I've done things similar to your ii script before. But it's a lot more work for the CPU. My immediate use case driving this is figuring out what sorts of scaled images I want to generate for a directory of images on the web. For one directory with only 10 PNGs in it:

        :; time file $home/lib/web/image/*.png > /dev/null
        0.00u 0.01s 0.02r  file /usr/a/lib/web/image/jocelyn.x2y2o8t1.png ...
        :; time rc -c 'for (i in $home/lib/web/image/*.png) {echo -n $i'' ''; png -9 $i >[2]/dev/null | /tmp/ii}' > /dev/null
        1.62u 0.12s 1.76r  rc -c for (i in $home/lib/web/image/*.png) {echo -n $i' '; png -9 $i >[2]/dev/null | /tmp/ii}

That's quite a multiplier.

I think the alternative isn't passing through the Plan 9 image format, but a dedicated "imgtype" (like doctype) command that prints out more info about the target file, with less concern for conciseness... but a lot of that command would be duplicating what's already in 'file', which doesn't feel great, either.

Depth and resolution feel like the correct upper bound of info for image types to me, but I don't feel strongly about it. I'd appreciate other opinions.

I recently had a use-case for info about a large-ish sample of images (~hundreds) while developing a simple image recognition program that's worked pretty well over tens of thousands of images over the last few months.

I needed a few pieces of information: the actual image format (the images were all sent with the "image/jpeg" MIME type, but some were PNG or webp images), along with the dimensions (I needed to scale a template based on the size of a region of the screen) and color profile (an important green color had significant variance in RGB value between sRGB and Apple's Display P3 color profiles). What's reported by "file" for each image format varies wildly, and I was bothered by how non-standard this output was. For example:

a.png: JPEG image data, Exif standard: [TIFF image data, big-endian, direntries=1, orientation=upper-left], baseline, precision 8, 1179x2556, components 3
b.png: PNG image data, 1080 x 2400, 8-bit/color RGBA, non-interlaced

I ended up using the "identify" command from ImageMagick, which is wonderfully standard in how it reports things, and minimal by default in what it reports. For the same two files:

a.png JPEG 1290x2796 1290x2796+0+0 8-bit sRGB 1.16424MiB 0.000u 0:00.000
b.png PNG 1080x2400 1080x2400+0+0 8-bit sRGB 1.01811MiB 0.000u 0:00.000

Anyway, if you were only going to add depth and resolution, I think there's not much need to annotate the dimensionality as "size" because, sans further context, any MxN output for an image is immediately obvious as that. Similarly "depth N" seems like it would be obvious as "Nbpp" or "N-bit". (To me, "precision 8" seems incredibly obtuse.) If you decided to also add anything about colorspace, I think that also doesn't particularly need annotation: if someone is unclear what "sRGB" or "Display P3" means, it's simple enough to search for. I think all three of these things make sense to print because they’re relevant to what the rendered image ought to look like, unlike EXIF data. 

I've been very happy with the output of "identify" for the last 25 years and it feels like the right level of detail to target for baseline queries about an image.

Kind regards,

--dho

Reply all
Reply to author
Forward
0 new messages