unicode (or just plain byte) regular expression positions?

22 views
Skip to first unread message

Tim Meehan

unread,
Jan 18, 2021, 10:53:48 PM1/18/21
to Racket Users
Say that I have a strange character group that I want to find in a binary file.
I wanted to use something like this:

(define needle (list->string (map integer->char (list #xab #xcd #xef))))
(define needle-offset
  (call-with-input-file "big_binary_blob.bin"
    #:mode 'binary
    (λ (p)
      (regexp-match-positions (regexp needle) p))))

The "regexp-match-positions" returns #f (even though I know that needle is in there, I put it there). Is there a better way to go about this? The binary blob is about 100 MiB or so, if that helps.

Jon Zeppieri

unread,
Jan 18, 2021, 11:09:18 PM1/18/21
to Tim Meehan, Racket Users
You're searching for a certain unicode codepoint sequence (U+00AB,
U+00CD, U+00EF) in a string, but I think you're trying to search for a
byte sequence in a byte string. You can read in the file as bytes and
use a byte regexp. So:

(define needle (list->bytes (list #xab #xcd #xef)))
(define needle-offset
(call-with-input-file "big_binary_blob.bin"
#:mode 'binary
(λ (p)
(regexp-match-positions (byte-regexp needle) p))))
Reply all
Reply to author
Forward
0 new messages