Memory usage of (port->lines ...) is extreamly high

72 views
Skip to first unread message

Hong Yang

unread,
Sep 24, 2020, 6:14:55 AM9/24/20
to Racket Users
Hi Racketer

I'm trying to load a log file which size is 600MB, then I found the program exhausted 3 GB resident memory just for load all the content of it via (port->lines...). I do have enough RAM but it looks like some thing went wrong here, and I do need to load them all into RAM for my use case so (read-line ...) doesn't help me.

Any comment would be preciated.

Here is may programe:

#!/usr/bin/racket
#lang racket

; Load input as list of lines
(define (input-load-lines file-name)
  (if (file-exists? file-name)
      (let* ([input (open-input-file file-name)]
             [lines (port->lines input)])
        (close-input-port input)
        lines)
      empty))

; Racket 7.8, compile from source code, (none cs mode)
; 100M of log requires 0.5G runtime memory
; 300M of log requires 1.5G runtime memory
; 600M of log requires 3.0G runtime memory
;
; Reference
;   racket/collects/racket/port.rkt :106
;   racket/collects/racket/private/portlines.rkt :11

(define input (input-load-lines "main.log"))

; 214M(VIRT)/101M(RSS) without open any file
(let loop()
  (sleep 5) ; Waiting here so that I can check it via top/ps
  (set! input empty) ; event not help with this line
  (loop))

Thanks
Hong

Hong Yang

unread,
Sep 24, 2020, 6:46:10 AM9/24/20
to Racket Users
Update with memory dump log attached.

1. With out (set! input empty), call (collect-garbage) doesn't help
2. Call (set! input empty) and (collect-garbage), memory reduce dramaticly.

; 214M(VIRT)/101M(RSS) without open any file
(let loop()
  (sleep 5)          ; Waiting here so that I can check it via top/ps
  (dump-memory-stats)
  (collect-garbage)
  (set! input empty)  ; Even not help with this line, it works after called (collect-garbage) explicity.
  (loop))
memory-dump-300Maa.txt

Laurent

unread,
Sep 24, 2020, 6:55:10 AM9/24/20
to Hong Yang, Racket Users
Quick comment: of you don't need to load the whole file but want to parse it line by line, use `in-lines` which is memory-efficient.

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/07c41b96-87ba-473a-ad0e-0cec71dc4024n%40googlegroups.com.

Hong Yang

unread,
Sep 24, 2020, 8:53:42 AM9/24/20
to Racket Users
Thanks Laurent, I tried (in-lines...), and yes, it's memory-efficient, but I still curious and concern why (port->lines ...) takes so many memory.

Best regards
Hong

Sam Tobin-Hochstadt

unread,
Sep 24, 2020, 8:55:48 AM9/24/20
to Hong Yang, Racket Users
port->lines produces a list with all the lines in it. That list is what uses all the memory. Using in-lines avoids producing the whole list at once. 

Sam

jackh...@gmail.com

unread,
Sep 30, 2020, 1:16:34 AM9/30/20
to Racket Users
I'm also guessing the jump from 600MB to 3GB is related to encodings. The file is probably UTF8/ASCII, and racket strings are a different encoding. I think they're one of the 32-bit encodings? So for ASCII text that alone would be a factor of four increase in memory usage.
Reply all
Reply to author
Forward
0 new messages