How to recover whitespace from syntax->string

67 views
Skip to first unread message

Jeff Henrikson

unread,
Apr 8, 2021, 9:10:55 PM4/8/21
to Racket Users

Racket users,

I’m trying to read a scheme file, decorate a bit of window dressing around the edges, and write the expressions to a new file.  I’ve got the basic mechanism working with (read . . .) and (pretty-write . . .), but of course that doesn’t preserve linebreaks.  So now I’m trying to improve it to preserve linebreaks.  It would be nice to preserve all whitespace, but I'll settle for linebreaks.  The racket docs seem to suggest it is possible:

2.6 Rendering Syntax Objects with Formatting
 (require syntax/to-string)     package: base
        procedure
        (syntax->string stx-list) → string?
      stx-list : (and/c syntax? stx-list?)
Builds a string with newlines and indenting according to the source locations in stx-list; the outer pair of parens are not rendered from stx-list.

However, when I evaluate:

(syntax->string (read-syntax "mystring" (open-input-string "(comment\n  \"hello world\"\n  line)")))

I get:

"comment\"hello world\"line"

which has no whitespace at all, not even the whitespace that is necessary to separate the original tokens.

I get a similar behavior if I read-syntax from a file and apply syntax->string to those values.

Does anyone know how to get syntax->string to recover the original whitespace?

I'm using Racket 8.0 cs on Ubuntu 20.


Thanks in advance,


Jeff Henrikson


Laurent

unread,
Apr 9, 2021, 7:30:15 AM4/9/21
to Jeff Henrikson, Racket Users
You need to enable line/character counting with `port-count-lines!`:

#lang racket
(require syntax/to-string)

(define in (open-input-string "(comment\n  \"hello world\"\n  line)"))
(port-count-lines! in)
(syntax->string (read-syntax "mystring" in))

; -> "comment\n \"hello world\"\n line"

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/95dd99f5-2d04-a607-ed11-bc235095eeb7%40gmail.com.

Jeff Henrikson

unread,
Apr 9, 2021, 12:57:33 PM4/9/21
to Laurent, Racket Users

Laurent,

Thank you very much.  It probably would have taken me a long time on my own to think of the possibility that the port was at fault.


Jeff

Jeff Henrikson

unread,
Apr 9, 2021, 7:22:31 PM4/9/21
to Laurent, Racket Users
It turns out that I have more trouble with printing whitespace from syntax.  Consider this:

(require racket)
(require syntax/to-string)

(define (syntax-on-lines-v1 xs)
  (define (iter fin ys)
    (let ((y (read-syntax "<string>" fin)))
      (if (eof-object? y)
          (reverse ys)
          (iter fin (cons y ys)))))
  (if (null? xs)
      '()
      (let* ((fout (open-output-string))
             (_ (for* ((x xs))
                  (writeln x fout)))
             (_ (close-output-port fout))
             (fin (open-input-string (get-output-string fout))))
        (port-count-lines! fin)
        (iter fin '()))))

The basic case works:

;; ex1
(let* ((xs '("collected" "from" "separate" "lines"))
       (ys (syntax-on-lines xs)))
  (syntax->string
   #`(#,@ys)))
;; "\"collected\"\n\"from\"\n\"separate\"\n\"lines\""

But I need to decorate a number of constant items, such as:

;; ex2
(let* ((xs '("collected" "from" "separate" "lines"))
       (ys (syntax-on-lines xs)))
  (syntax->string
   #`(comment #,@ys)))
;; "comment\"collected\"\"from\"\"separate\"\"lines\""

In ex2, the loss of line breaks seems to stem from lack of srcloc info on the first token.  However, I don't understand just what's necessary to keep things moving along.  If I populate the first token only with quasisyntax/loc as follows, I get my EOLs:

;; ex3
(let* ((xs '("collected" "from" "separate" "lines"))
       (ys (syntax-on-lines xs))
       (zs (cons (quasisyntax/loc (car ys) "comments") ys)))
  (syntax->string
   #`(#,@zs)))
;; "\"comments\"\"collected\"\n\"from\"\n\"separate\"\n\"lines\""

But that's quite inconvenient if I have a bunch of stuff to decorate.  If I try to do the more convenient outside position for quasisyntax/loc, my EOL data seems to get overwritten:

;; ex4
(let* ((xs '("collected" "from" "separate" "lines"))
       (ys (syntax-on-lines xs)))
  (syntax->string (quasisyntax/loc (car ys) ("comments" #,@ys))))
;; "\"comments\"\"collected\"\"from\"\"separate\"\"lines\""

Is there a way to do something like ex4, where I can add a number of new constant tokens, but without overwriting the EOL data?

Thanks in advance,


Jeff Henrikson


jackh...@gmail.com

unread,
Apr 10, 2021, 6:48:15 AM4/10/21
to Racket Users
I had to build something for this in my Resyntax project. My takeaways were:
  • There's no substitute for just reading the file. If you have a `syntax-original?` subform, you can use the srcloc information to read the exact original text that was in the source file. This not only preserves whitespace more accurately than `syntax->string`, it also preserves comments, which is something that `syntax->string` fundamentally cannot do.
  • For non-original syntax, such as that synthetic "comments" bit you're inserting programmatically, just ignore the source locations completely. They refer to the source location of where it occurs in your program-manipulating-program, which is completely unrelated to the positions of the original syntax. The `syntax->string` form doesn't handle it well when some subforms are original and some aren't, or when subforms are from different sources.
  • It's handy to have a way to communicate explicit formatting. I did this by having a special NEWLINE form that my code could shove into the syntax object before rendering it to a string, to tell the formatter where to insert line breaks.
  • If you just rely on `syntax-original?` to preserve whitespace/comments, you'll miss the whitespace and comments between original syntax objects. I dealt with this by having an `(ORIGINAL-SPLICE original-syntax ...)` form that I used to tell the formatter that not only are all of these syntax objects original, but the entire sequence is unchanged from the input program. That allows the formatter to copy the source file text from the start of the first syntax object all the way to the end of the last one, preserving any whitespace and comments in between them.
Reply all
Reply to author
Forward
0 new messages