telling apart files ending with a newline

24 views
Skip to first unread message

Shriram Krishnamurthi

unread,
Jul 29, 2020, 11:11:41 PM7/29/20
to Racket Users
Suppose I have two files that are identical, except one ends in a newline and the other does not. If I use `read-line` to read the successive lines of this file, because it swallows the line separators, there is no way to tell them apart. E.g., these two strings

"a
b"

and

"a
b
"

read using `read-line` and `open-input-string` produce the same result.

This is unfortunate for a program SPDEGabrielle has induced me to write (-:.

Any reasonable ways to work around this that rely, as much as possible, on the OS-specific handling `read-line` already provides?

Shriram

Ryan Culpepper

unread,
Jul 30, 2020, 7:25:38 AM7/30/20
to Shriram Krishnamurthi, Racket Users
If I understand the docs correctly, the OS-specific handling is in open-input-file, but it is not the default.

Here is an alternative to read-line that preserves line endings:

  #lang racket/base

  ;; my-read-line : InputPort -> String
  ;; Like read-line, but preserves line ending.
  ;; Fails if port contains specials.
  (define (my-read-line in [mode 'any])
    (define rx
      (case mode
        [(any) #rx"^[^\n\r]*(?:\r\n|\n|\r|$)"]
        [(linefeed) #rx"^[^\n]*(?:\n|$)"]
        ;; ...
        [else (error 'my-read-line "unsupported mode: ~e" mode)]))
    (define m (car (regexp-match rx in))) ;; rx accepts "", can't fail
    (if (equal? m #"") eof (bytes->string/utf-8 m)))

  (require rackunit racket/port)

  (check-equal?
   (port->list my-read-line (open-input-string "abc\ndef\r\ngh\n"))
   '("abc\n" "def\r\n" "gh\n"))

  (check-equal?
   (port->list my-read-line (open-input-string "abc\ndef\r\ngh"))
   '("abc\n" "def\r\n" "gh"))

  (check-equal?
   (port->list my-read-line (open-input-string "\n\r\n\n\r\r\n"))
   '("\n" "\r\n" "\n" "\r" "\r\n"))

  (check-equal?
   (port->list (lambda (in) (my-read-line in 'linefeed))
               (open-input-string "\n\r\n\n\r\r\n"))
   '("\n" "\r\n" "\n" "\r\r\n"))

Ryan


--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/250c95c9-24b6-467a-ad08-0cd81abded66n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages