custom input ports and specials

25 views
Skip to first unread message

je...@lisp.sh

unread,
Apr 13, 2021, 1:28:16 AM4/13/21
to Racket Users
I'm implementing a tokenizer as a custom input port using `make-input-port`. My thinking is that `peek-char-or-special` and `read-char-or-special` will be the primary interface to the tokenizer; port locations will also be used. In this case, the input port should emitting characters as well as special values. It's not just bytes/characters.

Here's a simplified form of what I'm trying to accomplish. Imagine a language consisting of sequences (possibly empty) of ASCII letters (lowercase and uppercase). If you see a non-X (capital X), just emit that. (When I say "emit", what I mean is "value that should be returned by `peek-char-or-special` and `read-char-or-special`.) If you see an X and it's the last character of the input, emit #\X. If you see a capital X followed by any character, emit that character as a symbol. That's the special value. Thus:

  a b c ==> a b c

  X ==> X

  a X b ==> a 'b

I realize that this example could easily be done with regular expressions or just straightforward processing of byte strings using `port->bytes`. But I'd like to attack this problem using custom input ports. It feels like the right thing to do. With a custom input port, I can even do validation by logging errors. For example, if I'm given a byte that represents a non-ASCII letter, I can log an error and advance the port by one byte and try again.

I find the documentation for `make-input-port` rather heavy going. There are some examples there, which are a good start, but I'm still a bit lost. In the discussion of the peek and read procedures that are supplied as arguments to `make-input-port` (see `peek!` and `read!` below), I don't understand the byte strings that are being passed. It seems that this procedures are always given a mutable byte string, and the examples in the docs suggest that the byte string could/should indeed be modified. But in the input I have in mind, where peek and read might emit specials, it's unclear to me what I should stuff into the byte string. For example, if I'm peeking `Xm` (capital X and lowercase m), that should eventually get turned into `'m` (symbol whose name is "m"), so in my thinking, I'm looking at two bytes, not 1. But it seems that the peek and read procedures are always (?) given a byte string of length 1.

Anyway, this is perhaps all a long way of saying that I'm rather lost with my custom input port approach to the issue. Any advice would be appreciated. Maybe custom input ports are not the way to go about what I'm doing, but I'm not ready to abandon them just yet. Below you can read (ha!) the current status of where I am with this project.

Jesse

````
#lang racket/base

(require racket/match
         racket/format
         racket/port)

; Input strings are intended to be sequences of ASCII letters,
; uppercase and lowercase. Capital X followed by another letter
; should get turned into a symbol whose name is the one-character
; string consisting of that letter.
(define (make-cool-port in)
  (define (peek! bstr skip event)
    (sleep 1)
    (define bs (peek-bytes 2 skip in))
    (cond [(eof-object? bs)
           eof]
          [else
           (match (bytes->list bs)
             [(list 88 a) ; 88 = X
              (lambda args 2)]
             [_ 1])]))
  (define (read! bstr)
    (define peeked (peek! bstr 0 #f))
    (cond [(eof-object? peeked)
           eof]
          [(procedure? peeked)
           (define bs (bytes->list (read-bytes (peeked) in)))
           (define a (cadr bs))
           (lambda args (string->symbol (~a (integer->char a))))]
          [else
           (read-byte in)
           peeked]))
  (make-input-port
   'xs
   read!
   peek!
   (lambda () (close-input-port))))

(define (read-it-all)
  (define t (peek-char-or-special))
  (log-error "peeked ~a" t)
  (unless (eof-object? t)
    (read-char-or-special)
    (read-it-all)))

(module+ main
  (call-with-input-string
   "Xaw"
   (lambda (in)
     (define p (make-cool-port in))
     (parameterize ([current-input-port p])
       (read-it-all)))))
````

Reply all
Reply to author
Forward
0 new messages