[ANN] Racket implementation of magic language

瀏覽次數:67 次
跳到第一則未讀訊息

Jonathan Simpson

未讀,
2019年7月31日 晚上9:54:342019/7/31
收件者:Racket Users
#lang magic is my implementation of the mini language used by the Unix file command. I'm aiming for compatibility with Ian Darwin's version, found in most Linux and BSD distributions. #lang magic is a work in progress. It is missing a lot of functionality but still has enough to be useful.

For the curious, 'man magic' describes the magic language in considerable, but not exhaustive, detail. A code sample to check for Microsoft executables provides the flavor of the language:

# MS Windows executables are also valid MS-DOS executables
0           string  MZ
>0x18       leshort <0x40
>>(4.s*512) leshort 0x014c  COFF executable (MS-DOS, DJGPP)
>>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
# skip the whole block below if it is not an extended executable
>0x18       leshort >0x3f
>>(0x3c.l)  string  PE\0\0  PE executable (MS-Windows)
>>>&0       leshort 0x14c   for Intel 80386
>>>&0       leshort 0x184   for DEC Alpha
>>>&0       leshort 0x8664  for AMD64
>>(0x3c.l)  string  LX\0\0  LX executable (OS/2)


The code sample above compiles to one magic query. A #lang magic Racket module consists of 1 or more such queries. New queries start on a line without a preceding '>'. Every #lang magic module provides two functions: magic-query and magic-query-run-all. These functions are thunks that can be passed to with-input-from-file to test the file against the queries in the module. magic-query replicates the default behavior of the file command. It stops and returns true after the first query match or returns false if no queries pass. A query is matched if the test on the first line of the query succeeds. A matched query will run until completion, but even if later tests in the query fail, the query is still considered a match if the first test passes. magic-query-run-all, on the other hand, will always test the file against every query in the module. Both functions print the messages for each successful test to the current output port. This is a brief summary, so consult 'man magic' for a complete explanation.

I wrote #lang magic to use in my gopher server. Gopher is a simple TCP protocol for exchanging documents. Gopher directories have simple one character flags to indicate file type. I wanted something that would be more robust than simply relying on file extensions. Here's an example of how I call into #lang magic to do that:

(require (only-in "magic/image.rkt" (magic-query image-query)))
(require (only-in "magic/gif.rkt" (magic-query gif-query)))
(require (only-in "magic/html.rkt" (magic-query html-query)))

(define (filetype path)
 
(define extension (filename-extension path))

 
(cond [(directory-exists? path) "1"]
       
[(with-input-from-file path image-query) "I"]
       
[(with-input-from-file path gif-query) "g"]
       
[(with-input-from-file path html-query) "h"]
       
[(is-utf8-text? path) "0"]
       
[extension
         
(cond [(or (bytes=? extension #"txt")
                   
(bytes=? extension #"conf")
                   
(bytes=? extension #"cfg")
                   
(bytes=? extension #"sh")
                   
(bytes=? extension #"bat")
                   
(bytes=? extension #"ini"))"0"]
               
[(or (bytes=? extension #"wav")
                   
(bytes=? extension #"ogg")
                   
(bytes=? extension #"mp3")) "s"]
               
[else "9"])]
       
[else "9"]))

The require'd .rkt files are written in #lang magic. I've kept the extension check as a fallback for now. Eventually I will add additional magic to detect audio files and otherfile types supported by gopher.

I still have a lot of work to do. This is my first project using scheme macros, much less Racket's language building facilities, so I'm sure my code is far from optimal. Most of the macros in my current code need revising and I'd like to rewrite most of them with syntax/parse. I know my lexer could be improved as well. One reason I'm making the code public now is to gather feedback and advice for improvement.

I'm currently running Racket 6.11 on Linux, so that is the only platform I've tested it on. I plan to test on Windows and a current version of Racket in the near future. Please let me know if you have problems using this on another platform. If I know it doesn't work for someone, it will push up the priority of testing on other platforms.

I couldn't have gotten as far as I have without the generous help of members of this group. Many thanks to everyone who has helped me directly or contributed in any way to official or unofficial Racket documentation. I even owe the genesis of this project to this group, this thread in particular.

I'd love to know if anyone else has a use for this. So please post here if you do! I will happily accept any suggestions, ideas, or feedback.

-- Jonathan

Jonathan Simpson

未讀,
2019年7月31日 晚上9:55:272019/7/31
收件者:Racket Users
And most importantly, here is the github :)


Neil Van Dyke

未讀,
2019年8月1日 凌晨1:07:362019/8/1
收件者:Jonathan Simpson、Racket Users
Jonathan Simpson wrote on 7/31/19 9:54 PM:
> #lang magic is my implementation of the mini language used by the Unix
> file command.

Nice.  In addition to the practical merits, and the craft, it's also an
example of a useful legacy DSL we can point to (like lex, yacc, make),
and which Racket now implements.

You might want to make API documentation for it in Scribble.  (Your
racket-users post contains helpful information that's not currently with
the code in GitHub.  Also, we're currently having trouble with
racket-users Google Groups often inexplicably not showing up in Google
searches, which is yet another reason to try to consolidate
documentation with the code in a simple way.)

Besides API documentation helping others to use some code, as well as
establishing an API of *what* as distinct from *how*... the
documentation is also a well-deserved chance to show off your work in
different way than the code itself, and with Scribble slickness. Or
Markdown.

Jonathan Simpson

未讀,
2019年8月1日 上午9:42:512019/8/1
收件者:Racket Users
Thanks for the feedback. I haven't used scribble before, but I'll look into it. Do you feel that it would be helpful to add scribble documentation for the magic language itself, or are you just referring to the API provided by #lang magic modules?

For the language itself, I can see documentation being helpful if I can keep it up to date with the functionality currently supported by #lang magic. I'll also need to check on how the magic man pages are licensed. It would save a lot of time if I could copy from there as needed. Otherwise it will be a lot of work. The long term goal is to be able to slap a #lang magic on top of any magic file taken from the file command's repository and it will just work. At that point #lang magic will effectively be able to share the excellent documentation provided by the file command. Of course, another goal is to add enhancements that don't break compatibility.

I'll also do some more work to improve the github README as you suggested.

Thanks for your input!

-- Jonathan
回覆所有人
回覆作者
轉寄
0 則新訊息