slowness of code run in places

69 views
Skip to first unread message

Sergey Pinaev

unread,
Nov 5, 2015, 4:04:35 AM11/5/15
to Racket Developers
hi.
i wrote a function, that does some long computation and counts time spent on it.
then i call this function N times and sum times spent on computations.
then i call this functionin N times in X separate places in parallel and again sum times spent on computations.
results are surprising. the more places we use the more summary time is.
point out: NOT time of execution, but summary time of N computations.
and before starting computations i waits till all worker places are started (starting place takes long time and i know it).

on 16 core system, running 1 place:
time: 1164
running 2 places:
time: 1739
running 10 places:
time: 8825

code:
#lang racket
(require racket/place)
(provide main
         child
)
(define (child ch)
  (let ((n (place-channel-get ch)))
    (let ((start (current-milliseconds)))
      (cond
       ((number? n)
        (place-channel-put
         ch
         (and (let loop ((res '()) (n n))
                (if (= n 0)
                    (apply + res)
                    (loop (cons n res) (sub1 n))))
              (- (current-milliseconds) start)))
        (child ch))
       ((equal? n "ping")
        (place-channel-put ch "pong")
        (child ch))))))
(define (get-avail p res)
  (cons
   (let loop ((avail '()))
     (if (null? avail)
         (loop (filter-map
                (lambda (pl)
                  (let ((r (sync/timeout 0 pl)))
                    (and r
                         (set! res (cons r res))
                         pl)))
                p))
         avail))
    res))
(define (main . argv)
  (let ((p (let loop ((n (string->number (car argv))) (r '()))
             (if (= n 0)
                 r
                 (loop (sub1 n)
                       (cons (dynamic-place
                              (string->path "placestest.scm")
                              'child)
                             r)))))
        (numbers '(325220 295205 285260
                   146030 58810 231409
                   260650 58890 299280
                   168250 57320 120210
                   226100 325260 320180
                   251680 37310 275680
                   111010 288300 183890
                   325220 295205 285260
                   146030 58810 231409
                   260650 58890 299280
                   168250 57320 120210
                   226100 325260 320180
                   251680 37310 275680
                   111010 288300 183890)))
    (for-each (lambda (pl)
                (place-channel-put pl "ping")
                (place-channel-get pl))
              p)
    (printf "all places started, run~n")
    (let ((results (let loop ((numbers numbers) (a (cons p '())))
                     (if (null? numbers)
                         (cdr a)
                         (loop (cdr numbers)
                               (let ((a (if (not (null? (car a)))
                                            a
                                            (get-avail p (cdr a)))))
                                 (and (place-channel-put (car (car a)) (car numbers))
                                      (cons (cdr (car a))
                                            (cdr a)))))))))
      (let loop ((results results))
        (if (= (length results) (length numbers))
            (printf "time: ~a~n" (apply + results))
            (loop (cdr (get-avail p results)))))
      (for-each (lambda (pl) (place-channel-put pl #f)) p))
    (for-each place-wait p)))

p.s. sorry for my bad english

Sergey Pinaev

unread,
Nov 5, 2015, 8:57:01 AM11/5/15
to Racket Developers
btw, if function does not allocate memory and just compute something - things getting better:
1 place:
time: 12763
2 places:
time: 12798
10 places:
time: 13778

modified "child":

(define (child ch)
  (let ((n (place-channel-get ch)))
    (let ((start (current-milliseconds)))
      (cond
       ((number? n)
        (place-channel-put
         ch
         (and (let loop ((res 0) (n (* n 300)))
                (if (= n 0)
                    res
                    (loop (add1 res) (sub1 n))))

WarGrey Gyoudmon Ju

unread,
Nov 5, 2015, 9:10:12 AM11/5/15
to Sergey Pinaev, Racket Developers
Hello, Sergey.
Have you read this before?

--
You received this message because you are subscribed to the Google Groups "Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+...@googlegroups.com.
To post to this group, send email to racke...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/be718313-52f7-4025-8bb4-beb1af7fb08b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Sergey Pinaev

unread,
Nov 5, 2015, 9:20:24 AM11/5/15
to Racket Developers, drf....@gmail.com
четверг, 5 ноября 2015 г., 17:10:12 UTC+3 пользователь WarGrey Gyoudmon Ju написал:
Hello, Sergey.
Have you read this before?

this post is not about "performance of my code". this test is sintetic just to prove that there is something wrong with "places".

Matthew Flatt

unread,
Nov 6, 2015, 3:22:52 PM11/6/15
to Sergey Pinaev, Racket Developers
I'm able to replicate the effects you describe on my 4-core machine,
but I'm not convinced that it's an issue with places. The effect seems
to be a limitation of running memory-intensive processes on a single
machine.

In particular, I see the same effects if I run multiple instances of
the program at the same time. To make that comparison easier, I used a
variant of `get-avail` that doesn't busy-wait:

(define (get-avail p res)
(cons
(let loop ((avail '()))
(if (null? avail)
(loop (apply
sync
(map (lambda (pl)
(wrap-evt
pl
(lambda (v)
(set! res (cons v res))
(list pl))))
p)))
avail))
res))

That way, running the program for one place keeps just one core busy.
As long as I don't run out of cores, though, the busy-waiting loops
don't matter relative to the allocating loops, as you have observed.

In general, I would expect allocation and garbage collection to
increase contention at the OS and virtual-memory layers. To
double-check whether something in the Racket memory manager makes that
worse than it should be, I tried running a variant of your program in
both Racket and in Gambit. The program is below, and here are the
results when I run between 1 and 4 instances of the program
concurrently via a shell script:

procs Racket Gambit
(v6.3.0.3) (v4.6.6)
1 1152 1558 \ real time in msec
2 1346 1839 | as reported by one instance
3 1620 2077 |
4 2034 2493 /

That's a rough experiment, and it might be interesting to try more
runtime systems, but the results don't suggest that Racket is doing
anything especially out of line.

I also tried inspecting Racket's execution with various performance
tools, and I didn't see anything suspicious, such as excessive system
calls.

Here's the program for the above table:

; Run with `-f` for Racket

; Uncomment this for compiling with Gambit, but it doesn't seem to
; matter much:
; (declare (standard-bindings) (extended-bindings) (block))

(define numbers '(325220 295205 285260
146030 58810 231409
260650 58890 299280
168250 57320 120210
226100 325260 320180
251680 37310 275680
111010 288300 183890
325220 295205 285260
146030 58810 231409
260650 58890 299280
168250 57320 120210
226100 325260 320180
251680 37310 275680
111010 288300 183890))

(define (len v)
(if (pair? v)
1
0))

(define (add-up n)
(let loop ([repeats 10])
(if (zero? repeats)
'done
(begin
(let loop ((res '()) (n n))
(if (= n 0)
(len res)
(loop (cons n res) (- n 1))))
(loop (- repeats 1))))))

(time (for-each add-up numbers))

Sergey Pinaev

unread,
Nov 9, 2015, 9:48:16 AM11/9/15
to Racket Developers, drf....@gmail.com
I'm able to replicate the effects you describe on my 4-core machine,
but I'm not convinced that it's an issue with places. The effect seems

very strange. i rewriten test to use (process ...) and communication over stdin/stdout.
with racket 6.x - 10 processes have much slower summary time.
time racket -t processtest.scm -m -- 1
time: 37948
real    0m40.397s
time racket -t processtest.scm -m -- 10
time: 87168
real    0m12.416s

but with racket 5.1.1:
time racket -t processtest.scm -m -- 1
time: 24084
real    0m25.075s
ime racket -t processtest.scm -m -- 10
time: 24413
real    0m4.189s

really weird and i dont know what to do =(

processtest.scm:
#lang racket
(provide main)
(define (child)
  (let* ((l (read-line))
         (n (string->number l)))
    (let ((start (current-milliseconds)))
      (cond
       ((equal? l "ping")
        (printf "pong\n")
        (flush-output (current-output-port))
        (child))
       ((number? n)
        (printf
         "~s~n"
         (and (let loop ((res '()) (n (* n 10)))
                (if (= n 0)
                    (apply + res)
                    (loop (cons n res) (sub1 n))))
              (- (current-milliseconds) start)))
        (flush-output (current-output-port))
        (child))))))

(define (get-avail p res)
  (cons
   (let loop ((avail '()))
     (if (null? avail)
         (loop (apply
                sync
                (map (lambda (pl)
                       (wrap-evt
                        (first pl)
                        (lambda (v)
                          (let ((R (read-line v)))
                            (set! res (cons R res)))

                          (list pl))))
                     p)))
         avail))
   res))
(define (main-loop n)
  (let ((p (let loop ((n n) (r '()))

             (if (= n 0)
                 r
                 (loop (sub1 n)
                       (cons (process "racket -t processtest.scm -m")
                             r)))))
        (numbers '(325220 295205 285260

                   146030 58810 231409
                   260650 58890 299280
                   168250 57320 120210
                   226100 325260 320180
                   251680 37310 275680
                   111010 288300 183890
                   325220 295205 285260
                   146030 58810 231409
                   260650 58890 299280
                   168250 57320 120210
                   226100 325260 320180
                   251680 37310 275680
                   111010 288300 183890)))
    (for-each (lambda (pl)
                (fprintf (second pl) "ping~n")
                (flush-output (second pl))
                (read-line (first pl)))
              p)
    (printf "all childs started, run~n")

    (let ((results (let loop ((numbers numbers) (a (cons p '())))
                     (if (null? numbers)
                         (cdr a)
                         (loop (cdr numbers)
                               (let ((a (if (not (null? (car a)))
                                            a
                                            (get-avail p (cdr a)))))
                                 (and (fprintf (second (car (car a))) "~a~n" (car numbers))
                                      (flush-output (second (car (car a))))

                                      (cons (cdr (car a))
                                            (cdr a)))))))))
      (let loop ((results results))
        (if (= (length results) (length numbers))
            (printf "time: ~a~n" (apply + (map string->number results)))
            (loop (cdr (get-avail p results))))))
    (for-each
     (lambda (pl)
       (fprintf (second pl) "quit~n")
       (flush-output (second pl)))
     p)))
(define (main . argv)
  (if (null? argv)
      (child)
      (main-loop (string->number (car argv)))))

Reply all
Reply to author
Forward
0 new messages