file->bytes with large files

47 views
Skip to first unread message

Greg Rosenblatt

unread,
Jul 15, 2020, 6:05:23 PM7/15/20
to Racket Users
Hi, I'm getting an error while using file->bytes to load a moderately large file:

> (time (void (file->bytes "my-7.6GB-file")))
; error reading from stream port
;   port:
#<path:/Users/greg/my-7.6GB-file>
;   system error: Invalid argument; errno=22
;   context...:
;    /Applications/Racket v7.7/collects/racket/file.rkt:768:6: temp218
;    /Applications/Racket v7.7/collects/racket/private/more-scheme.rkt:336:52
;    eval-one-top
;    /Applications/Racket v7.7/share/pkgs/xrepl-lib/xrepl/xrepl.rkt:1478:0
;    /Applications/Racket v7.7/collects/racket/repl.rkt:11:26

Is there a limit to the size of files that can be used with file->bytes?

I was preferring file->bytes because it seems much faster than manually reading from a port.  If file->bytes is not appropriate here, can somebody recommend another fast approach?

Matthew Flatt

unread,
Jul 15, 2020, 6:32:36 PM7/15/20
to Greg Rosenblatt, Racket Users
The `file->bytes` function uses the file size with `read-bytes`, and it
appears that the Mac OS `read` system call errors on requests of 2GB or
more. The right fix is for the `read` call within Racket (at the rktio
layer) to limit the size that it passes, and I'll make that change.

Meanwhile, you could work around the problem by limiting the size of an
individual request: Allocate a byte string and then use a sequence of
`read-bytes!` calls to read the file in increments. Each time, use the
number of read bytes to increment a starting position into the
destination byte string (which is the third argument to `read-bytes!`).

Matthew
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/racket-users/e99edda0-06ed-4164-b7bd-46f8a458
> c6c8o%40googlegroups.com.

Greg Rosenblatt

unread,
Jul 17, 2020, 6:45:46 AM7/17/20
to Racket Users
Thanks.

Depending on how the increment compares to the file size, file->bytes might be up to 1.5x faster (on my machine at least) than the workaround.  But the workaround is still fast enough.

(define (file->bytes2 file-name)
  (define size (file-size file-name))
  (define bs (make-bytes size))
  (call-with-input-file
    file-name
    (lambda (in)
      (let loop ((i 0))
        (cond ((= i size) bs)
              (else (define end (+ i (min 1073741824 (- size i))))  ;; 1GB increment
                    (read-bytes! bs in i end)
                    (loop end)))))))
Reply all
Reply to author
Forward
0 new messages