#lang magic is my implementation of the mini language used by the Unix file command. I'm aiming for compatibility with
Ian Darwin's version, found in most Linux and BSD distributions. #lang magic is a work in progress. It is missing a lot of functionality but still has enough to be useful.
For the curious, 'man magic' describes the magic language in considerable, but not exhaustive, detail. A code sample to check for Microsoft executables provides the flavor of the language:
# MS Windows executables are also valid MS-DOS executables
0 string MZ
>0x18 leshort <0x40
>>(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
>>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
# skip the whole block below if it is not an extended executable
>0x18 leshort >0x3f
>>(0x3c.l) string PE\0\0 PE executable (MS-Windows)
>>>&0 leshort 0x14c for Intel 80386
>>>&0 leshort 0x184 for DEC Alpha
>>>&0 leshort 0x8664 for AMD64
>>(0x3c.l) string LX\0\0 LX executable (OS/2)The code sample above compiles to one magic query. A #lang magic Racket module consists of 1 or more such queries. New queries start on a line without a preceding '>'. Every #lang magic module provides two functions: magic-query and magic-query-run-all. These functions are thunks that can be passed to with-input-from-file to test the file against the queries in the module. magic-query replicates the default behavior of the file command. It stops and returns true after the first query match or returns false if no queries pass. A query is matched if the test on the first line of the query succeeds. A matched query will run until completion, but even if later tests in the query fail, the query is still considered a match if the first test passes. magic-query-run-all, on the other hand, will always test the file against every query in the module. Both functions print the messages for each successful test to the current output port. This is a brief summary, so consult 'man magic' for a complete explanation.
I wrote #lang magic to use in my gopher server. Gopher is a simple TCP protocol for exchanging documents. Gopher directories have simple one character flags to indicate file type. I wanted something that would be more robust than simply relying on file extensions. Here's an example of how I call into #lang magic to do that:
(require (only-in "magic/image.rkt" (magic-query image-query)))
(require (only-in "magic/gif.rkt" (magic-query gif-query)))
(require (only-in "magic/html.rkt" (magic-query html-query)))
(define (filetype path)
(define extension (filename-extension path))
(cond [(directory-exists? path) "1"]
[(with-input-from-file path image-query) "I"]
[(with-input-from-file path gif-query) "g"]
[(with-input-from-file path html-query) "h"]
[(is-utf8-text? path) "0"]
[extension
(cond [(or (bytes=? extension #"txt")
(bytes=? extension #"conf")
(bytes=? extension #"cfg")
(bytes=? extension #"sh")
(bytes=? extension #"bat")
(bytes=? extension #"ini"))"0"]
[(or (bytes=? extension #"wav")
(bytes=? extension #"ogg")
(bytes=? extension #"mp3")) "s"]
[else "9"])]
[else "9"]))
The require'd .rkt files are written in #lang magic. I've kept the extension check as a fallback for now. Eventually I will add additional magic to detect audio files and otherfile types supported by gopher.
I still have a lot of work to do. This is my first project using scheme macros, much less Racket's language building facilities, so I'm sure my code is far from optimal. Most of the macros in my current code need revising and I'd like to rewrite most of them with syntax/parse. I know my lexer could be improved as well. One reason I'm making the code public now is to gather feedback and advice for improvement.
I'm currently running Racket 6.11 on Linux, so that is the only platform I've tested it on. I plan to test on Windows and a current version of Racket in the near future. Please let me know if you have problems using this on another platform. If I know it doesn't work for someone, it will push up the priority of testing on other platforms.
I couldn't have gotten as far as I have without the generous help of members of this group. Many thanks to everyone who has helped me directly or contributed in any way to official or unofficial Racket documentation. I even owe the genesis of this project to this group,
this thread in particular.
I'd love to know if anyone else has a use for this. So please post here if you do! I will happily accept any suggestions, ideas, or feedback.
-- Jonathan