Help with vector-sort

31 views
Skip to first unread message

greadey

unread,
Feb 15, 2020, 7:44:49 AM2/15/20
to Racket Users
Hi there,

I have written a programme to compute a bootstrapped mean etc.  I successfully wrote it using lists both untyped and in typed/racket.
I am interested in optimising the code and having seen that typed racket performs faster (for lists) I am interested in seeing if I get a performance increase using vectors in both typed and untyped code.
In order to compute the median value as well as certain percentiles it is necessary to sort the samples.  My specific problem is using vector-sort.  I have got round it by using vector->list and list->vector functions and using the list sort function.
If I use vector-sort typed racket keeps telling me to use "require/typed".  I got a function definition to compile  (:sort-myvector : (Vectorof Flonum) (Flonum Flonum -> Boolean) -> (Vectorof Flonum))
I'm not sure if that is exactly right but it was something similar, however it did not work.  I did add a "require/typed racket/vector" to the expression.
So, how can I use vector-sort in typed racket code, and can anyone tell me if translating vectors to lists and back again actually introduces a performance hit? 
Here is my code for reference
/
#lang typed/racket

(require math/flonum)
(require racket/vector)
(require math/statistics)
;(require "net/basic-defs.rkt")


;;This struct to hold the means and SDs for each bootstrap list
;;We don't need these structs for this
;;(struct sig-mean (mean-val sd-val) #:transparent)
;;(struct sig-median (median-val sd-val) #:transparent)
;;What we need is this!
(struct metrics ([mean-val : Real] [median-val : Real]) #:transparent)


(: orig (Vectorof Real))
(define orig (vector  8.9518e-02
                      9.9626e-02
                      5.1029e-01
                      2.2797e-01
                      7.1273e-02
                      7.4641e-01
                      6.4869e-01
                      8.1512e-01
                      6.4230e-01
                      4.2182e-03
                      5.1573e-01))

;;This function picks a random item from a list of data
(: idx : (Vectorof Real) -> Integer)
(define (idx v)
  (random (vector-length v)))

;;This is because random yields a non-negative fixnum but we need to pass
;;an Integer to list-ref.
(: pick-item : (Vectorof Real) -> Real)
(define (pick-item v)
  (vector-ref v (idx v)))

;;This function takes a bootstrap size and a list of data and produces a
;;bootstrap data set of length size.
(: booted : Integer (Vectorof Real) -> (Vectorof (Vectorof Real)))
(define (booted size v)
  (for/vector : (Vectorof (Vectorof Real))
      ([i (in-range size)])
    (for/vector : (Vectorof Real)
        ([j (in-range (vector-length v))])
      (pick-item v))))



;;This function was nicked from Rosetta Code.

(: my-median : (Vectorof Real) -> Real)
;(: my-median : (Vectorof Real) -> Real)
(define (my-median numbers)
  (define sorted-list (list->vector (sort (vector->list numbers) <)))
  (define count (vector-length numbers))
  (if (zero? count)
      0.0
      (/ (+ (vector-ref sorted-list (sub1 (floor (/ (add1 count) 2))))
            (vector-ref sorted-list (sub1 (ceiling (/ (add1 count) 2)))))
         2)))


;;This function extracts the means and the stddevs for each sample.
;;lol is a list of list(s) of samples, returns a list of structs

(: get-metrics : (Vectorof Real) -> metrics)
(define (get-metrics v)
  (metrics (mean v) (my-median v)))

(: get-all-metrics : (Vectorof (Vectorof Real)) -> (Vectorof metrics))
(define (get-all-metrics vov)
  (vector-map get-metrics vov))

;;Takes a list of structs and a field and returns a list of the values
;;of the struct's field.  So if we want the means,
;;we call (split-metrics list-of-metrics (metrics-mean-val))
#;(: split-metrics : (Vectorof metrics) (metrics -> Real) -> (Vectorof Real))
#;(define (split-metrics lofStruct field)
  (vector-map field lofStruct))


;;So now we need to get the 2.5 and the 97.5 percentiles of the means and the medians.
;;To get the error range we need to get the 2.5th and 97.5th percentiles
;;of the means and the medians.
(: percentile : (Vectorof Real) Real -> Integer)
(define (percentile lofMet percen)
  ;(exact-round (+ 1 (* (/ percen 100) (sub1 (vector-length lofMet))))))
  (exact-round (* (/ percen 100) (sub1 (vector-length lofMet)))))

;;This function computes the 95 percent confidence limits
(: 95-limits : (Vectorof Real) -> (Listof Real))
(define (95-limits vofMeans)
  (let
      ([lower (vector-ref vofMeans (percentile vofMeans 2.5))]
       [upper (vector-ref vofMeans (percentile vofMeans 97.5))])
    (list lower upper)))


(time
;;This function yields a vector of metrics structures
(define mets (get-all-metrics (booted 100000 orig)))
;We get all the means with this one
(define my-means (vector-map metrics-mean-val mets))
;Samr for medians
(define my-medians (vector-map metrics-median-val mets))
;Next two get the SDs for standard error
(define std-err-mean (stddev my-means))
(define std-err-median (stddev my-medians))
;The next two compute the percentiles - 2.5 and 97.5
(define 95-means (95-limits my-means))
(define 95-medians (95-limits my-medians))
;Need to get the mean and the median of the original data
(define the-mean (mean orig))
(define the-median (my-median orig))
;Output the results
(list "Mean is: " the-mean "std error is: " std-err-mean "limits: " "97.5: "
      (last 95-means) "2.5 " (first 95-means)
      "The median is: "
      the-median
      " std error: "
      std-err-median
      "Limits: "
      "97.5 "
      (last 95-medians)
      "2.5 "
      (first 95-medians)
))

I'd be grateful to receive constructive criticism about my code :-)

Regards
greadey

Jens Axel Søgaard

unread,
Feb 15, 2020, 8:03:44 AM2/15/20
to greadey, Racket Users
Den lør. 15. feb. 2020 kl. 13.44 skrev greadey <gre...@gmail.com>:
I have written a programme to compute a bootstrapped mean etc.  I successfully wrote it using lists both untyped and in typed/racket.
I am interested in optimising the code and having seen that typed racket performs faster (for lists) I am interested in seeing if I get a performance increase using vectors in both typed and untyped code.
In order to compute the median value as well as certain percentiles it is necessary to sort the samples.  My specific problem is using vector-sort.  I have got round it by using vector->list and list->vector functions and using the list sort function.
If I use vector-sort typed racket keeps telling me to use "require/typed".  I got a function definition to compile  (:sort-myvector : (Vectorof Flonum) (Flonum Flonum -> Boolean) -> (Vectorof Flonum))
I'm not sure if that is exactly right but it was something similar, however it did not work.  I did add a "require/typed racket/vector" to the expression.
So, how can I use vector-sort in typed racket code, and can anyone tell me if translating vectors to lists and back again actually introduces a performance hit? 

Here is an example of how to use `vector-sort` from a typed/racket program:

#lang typed/racket
(require/typed racket/vector
  [vector-sort (-> (Vectorof Any) (-> Any Any Boolean) (Vectorof Any))])

(define (compare x y)
  (if (and (real? x) (real? y))
      (< x y)
      (error 'compare "expected two real numbers, got: ~a ~a" x y)))

(vector-sort (vector 3 1 5 2) compare)
 
/Jens Axel


greadey

unread,
Feb 16, 2020, 5:10:31 AM2/16/20
to Racket Users
Thanks Jens - Nice one!  I had to define vector-sort to accept and yield Real values in order to get it to work!  My understanding of typed programming is that if you (as you did) define a function to be polymorphic, it should be polymorphic.  However, copying your code exactly and running vector sort  on a vector of reals gives  the error;
Type Checker: type mismatch
  expected: (Vectorof Any)
  given: (Vectorof Real) in: orig

Maybe one day I'll understand this one :-)
Reply all
Reply to author
Forward
0 new messages