zero cost subarray?

538 views
Skip to first unread message

Peter Brady

unread,
Apr 17, 2015, 3:48:56 PM4/17/15
to julia...@googlegroups.com
Inorder to write some differencing algorithms in a semi-dimensional agnostic manner the code I've written makes heavy use of subarrays which turn out to be rather costly. I've noticed some posts on the cost of subarrays here and that things will be better in 0.4.  Can someone comment on how much better?  Would subarray (or anything like it) be on par with simply passing an offset and stride (constant) and computing the index myself? I'm currently using the 0.3 release branch.

René Donner

unread,
Apr 17, 2015, 4:04:39 PM4/17/15
to julia...@googlegroups.com
As far as I have measured it sub in 0.4 is still not cheap, as it provides the flexibility to deal with all kinds of strides and offsets, and the view object itself thus has a certain size. See https://github.com/rened/FunctionalData.jl#efficiency for a simple analysis, where the speed is mostly dominated by the speed of the "sub-view" mechanism.

To get faster views which require strides etc you can try https://github.com/JuliaLang/ArrayViews.jl

What do you mean by semi-dim agnostic? In case you only need indexing along the last dimension (like a[:,:,i] and a[:,:,:,i]) you can use
https://github.com/rened/FunctionalData.jl#efficient-views-details
which uses normal DenseArrays and simple pointer updates internally. It can also update a view in-place, by just incrementing the pointer.

Peter Brady

unread,
Apr 17, 2015, 4:24:14 PM4/17/15
to julia...@googlegroups.com
Thanks for the links.  I'll check out ArrayViews as it looks like what I was going to do manually without wrapping it in a type.

By semi-dim agnostic I meant that the differencing algorithm itself only cares about one dimension but that dimension is different for different directions. Only a few toplevel routines actually need to know about the dimensionality of the problem. 

Sebastian Good

unread,
Apr 17, 2015, 11:30:27 PM4/17/15
to julia...@googlegroups.com
This was discussed a few weeks ago


I think the bottom line is that the current implementation *should* be 'zero-cost' once a set of planned improvements and optimizations take place. One of the key ones is a tuple overhaul.

Fair to say it can never be 'zero' cost since there is different inherent overhead depending on the type of subarray, e.g. offset, slice, re-dimension, etc. however the implementation is quite clever about allowing specialization of those.

In a common case (e.g. a constant offset or simple stride) my understanding is that the structure will be type-specialized and likely stack allocated in many cases, reducing to what you'd write by hand. At least this is what they're after.

Peter Brady

unread,
Apr 18, 2015, 5:52:35 PM4/18/15
to julia...@googlegroups.com
So I've done some experimenting I think my use case easily lends itself to a custom datatype (basically a Ptr with an offset and stride) named 'UnSafeSlice' below. The timings are pretty close to simply passing the array and offset/stride but the memory allocation is way off.

I've posted my test.jl script and I'm not sure where all the allocations are coming from.  Any help would be greatly appreciated.

Here's the output of running the script on my machine
julia> include("test.jl")
test_all (generic function with 1 method)

julia> test_all(5)
test_stride
elapsed time: 1.596122539 seconds (0 bytes allocated)
test_view
elapsed time: 9.502840329 seconds (94208000 bytes allocated, 0.38% gc time)
test_unsafe
elapsed time: 1.683452985 seconds (16303000 bytes allocated)

Here's my test.jl script
using ArrayViews
import Base: size, getindex, setindex!, ndims
type UnsafeSlice{T,N, P<:Ptr} <: AbstractArray
    p::P
    start::Int
    stride::Int
    size::NTuple{N,Int}
end
size(s::UnsafeSlice) = s.size
size(s::UnsafeSlice, i::Int) = s.size[i]
ndims{T,N}(s::UnsafeSlice{T,N}) = N
getindex(s::UnsafeSlice, i::Int) = unsafe_load(s.p, s.start+(i-1)*s.stride)
setindex!(s::UnsafeSlice, x, i::Int) = unsafe_store!(s.p, x, s.start+(i-1)*s.stride)

function UnsafeSlice(a, slicedim::Int, start=1)
    p = pointer(a)
    str = stride(a, slicedim)
    UnsafeSlice{eltype(a), ndims(a), typeof(p)}(p, start, str, size(a))
end

function update(a::UnsafeSlice, idx, off)
    for i=1:size(a, idx)
        a[i] = -10*off+i
    end
    a
end
function update(a::AbstractArray, idx, off)
    for i=1:size(a, idx)
        a[i] = -10*off+i
    end
    a
end
function update(a::Array, idx, off, stride)
    for i=1:size(a, idx)
        a[off+(i-1)*stride] = -10*off+i
    end
    a
end
function setk_UnSafe{T}(a::Array{T,3})
    us = UnsafeSlice(a, 3)
    for j=1:size(a,2),i=1:size(a,1)
        us.start = (j-1)*size(a,1)+i
        #off = sub2ind(size(a), i, j, 1)
        update(us, 3, us.start)
    end
    a
end
function setk_View{T}(a::Array{T,3})
    for j=1:size(a,2),i=1:size(a,1)
        off = sub2ind(size(a), i, j, 1)
        update(view(a, i, j, :), 3, off)
    end
    a
end
function setk_Stride{T}(a::Array{T,3})
    str = stride(a, 3)
    for j=1:size(a,2),i=1:size(a,1)
        off = (j-1)*size(a,1)+i
        update(a, 3, off, str)
    end
    a
end

function test_stride(n)
    a = zeros(Int, (320, 320, 320))
    # warmup
    setk_Stride(a);
    @time (for i=1:n; setk_Stride(a); end)
end
function test_view(n)
    a = zeros(Int, (320, 320, 320))
    # warmup
    setk_View(a);
    @time (for i=1:n; setk_View(a); end)
end
function test_unsafe(n)
    a = zeros(Int, (320, 320, 320))
    # warmup
    setk_UnSafe(a);
    @time (for i=1:n; setk_UnSafe(a); end)
end

function test_all(n)
    println("test_stride")
    test_stride(n)
    println("test_view")
    test_view(n)
    println("test_unsafe")
    test_unsafe(n)
end

Dahua Lin

unread,
Apr 18, 2015, 11:29:17 PM4/18/15
to julia...@googlegroups.com
Constructing a view would cause memory allocation. The compiler has not been able to completely optimize this out.

I have been considering introducing something like a ``local_view`` that caches the base pointer.  The concern is that such views are dangerous is passed around -- it does not guarantee that the pointer remains valid when it is used (e.g. the underlying array might have already been garbage collected, or reallocated (due to resize, etc)).

Dahua

Tim Holy

unread,
Apr 19, 2015, 6:49:20 AM4/19/15
to julia...@googlegroups.com
Sorry to be slow to chime in here, but the tuple overhaul has landed and they
are still not zero-cost:

function sumcols(A)
s = 0.0
for j = 1:size(A,2)
Aj = slice(A, :, j)
for i = 1:length(Aj)
s += Aj[i]
end
end
s
end

Even in the latest 0.4, this still allocates memory. On the other hand, while
SubArrays allocate nearly 2x more memory than ArrayViews, the speed of the two
(replacing `slice` with `view` above) is, for me, nearly identical.

--Tim

Dahua Lin

unread,
Apr 19, 2015, 7:38:35 AM4/19/15
to julia...@googlegroups.com
The latest version of ArrayViews (v0.6.0) now provides unsafe views (that maintain raw pointers instead of the parent array). See https://github.com/JuliaLang/ArrayViews.jl#view-types

You may see whether this makes your code more performant. Be careful, you should make sure that unsafe views are used only within a local scope and don't pass them around, otherwise you may possibly run into memory corruption or segfault.

Dahua

Sebastian Good

unread,
Apr 19, 2015, 12:33:55 PM4/19/15
to Tim Holy, julia...@googlegroups.com
Their size seems much decreased. I’d imagine to totally avoid allocation in this benchmark requires an optimization that really has nothing to do with subarrays per se. You’d have to do an escape analysis and see that Aj never left sumcols. Not easy in practice, since it’s passed to slice and length, and you’d have to make sure they didn’t squirrel it away or pass it on to someone else. Then you could stack allocate it, or even destructure it into a bunch of scalar mutations on the stack. After eliminating dead code, you’d end up with a no-allocation loop much like you’d write by hand. This sort of optimization seems to be quite tricky for compilers to pull off, but it’s a common pattern in numerical code. 

In Julia is such cleverness left entirely to LLVM, or are there optimization passes in Julia itself?

Tim Holy

unread,
Apr 19, 2015, 1:38:17 PM4/19/15
to Sebastian Good, julia...@googlegroups.com
It's not just escape analysis, as this (new) issue demonstrates:
https://github.com/JuliaLang/julia/issues/10899

--Tim

Peter Brady

unread,
Apr 19, 2015, 4:21:56 PM4/19/15
to julia...@googlegroups.com, seba...@palladiumconsulting.com
@Dahua, thanks for adding an unsafeview!  I appreciate how quickly this community responds.

I've added the following function to my test.jl script
function setk_unsafeview{T}(a::Array{T,3})
    for j=1:size(a,2),i=1:size(a,1)
        off = sub2ind(size(a), i, j, 1)
        update(unsafe_view(a, i, j, :), 3, off)
    end
    a
end
 But I'm not seeing the large increase in performance I was expecting.  My timings are now

julia> test_all(5);
test_stride
elapsed time: 2.156173128 seconds (0 bytes allocated)
test_view
elapsed time: 9.30964534 seconds (94208000 bytes allocated, 0.47% gc time)
test_unsafe
elapsed time: 2.169307471 seconds (16303000 bytes allocated)
test_unsafeview
elapsed time: 8.955876793 seconds (90112000 bytes allocated, 0.41% gc time)

To be fair, I am cheating a bit with my custom 'UnsafeSlice' since I make only one instance and simply update the offset on each iteration.  If I make it immutable and create a new instance on every iteration (as I do for the view and unsafeview), things slow down a little and the allocation goes south:

julia> test_all(5);
test_stride
elapsed time: 2.159909265 seconds (0 bytes allocated)
test_view
elapsed time: 9.029025282 seconds (94208000 bytes allocated, 0.43% gc time)
test_unsafe
elapsed time: 2.621667854 seconds (114606240 bytes allocated, 2.41% gc time)
test_unsafeview
elapsed time: 8.888434466 seconds (90112000 bytes allocated, 0.44% gc time)

These are all with 0.3.8-pre.  I'll try compiling master and see what happens.  I'm still confused about why allocating a single type with a pointer, 2 ints and a tuple costs so much memory though.

Peter Brady

unread,
Apr 19, 2015, 4:36:18 PM4/19/15
to julia...@googlegroups.com, seba...@palladiumconsulting.com
So I discovered the --track-allocation option and now I am really confused:

Here's my session:

$ julia --track-allocation=all
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.3.8-pre+13 (2015-04-17 18:08 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 0df962d* (2 days old release-0.3)
|__/                   |  x86_64-redhat-linux

julia> include("test.jl")
test_all (generic function with 1 method)

julia> test_unsafe(5)

And here's the relevant part of the resulting test.jl.mem file.  Note that I commented out some calls to 'size' and replaced with the appropriate hard-coded values but the resulting allocation is the same... Can anyone shed some light on this while I wait for 0.4 to compile?

        - function update(a::AbstractArray, idx, off)
  8151120     for i=1:320 #size(a, idx)
        0         a[i] = -10*off+i
        -     end
        0     a
        - end
        - 
       - function setk_UnSafe{T}(a::Array{T,3})
      760     us = UnsafeSlice(a, 3)
        0     for j=1:size(a,2),i=1:size(a,1)
  8151120         us.start = (j-1)*320+i #size(a,1)+i
        -         #off = sub2ind(size(a), i, j, 1)
        0         update(us, 3, us.start)
        -     end
        0     a
        - end
        - function test_unsafe(n)
        0     a = zeros(Int, (320, 320, 320))
        -     # warmup
        0     setk_UnSafe(a);
        0     clear_malloc_data()
        -     #@time (
        0     for i=1:n; setk_UnSafe(a); end
        - end

Sebastian Good

unread,
Apr 19, 2015, 6:39:35 PM4/19/15
to julia...@googlegroups.com, Peter Brady
—track-allocation still requires guesswork, as optimizations can move the allocation to a different place than you would expect.

Dahua Lin

unread,
Apr 19, 2015, 8:08:49 PM4/19/15
to julia...@googlegroups.com, peter...@gmail.com
My benchmark shows that element indexing has been as fast as it can be for array views (or subarrays in Julia 0.4). 

Now the problem is actually the construction of views/subarrays. To optimize the overhead of this part, the compiler may need to introduce additional optimization.

Dahua 

Sebastian Good

unread,
Apr 19, 2015, 9:15:15 PM4/19/15
to julia...@googlegroups.com, Dahua Lin, peter...@gmail.com
Optimizing the creation of many small structures during execution typically comes down to either cleverly eliminating the need to allocate them in the first place (via escape analysis, and the like) or making the first generation of the garbage collector wickedly fast. I understand both of these are being worked.

René Donner

unread,
Apr 20, 2015, 2:57:31 AM4/20/15
to julia...@googlegroups.com, peter...@gmail.com
What about something like unsafe_updateview!(view, indices...) ?

It could be used like this (pseudocode):

view = unsafe_view(data, 1, 1, :) # to construct / allocate
for i in ..., j in ...
unsafe_updateview!(view, i, j, :)
# use view
end

In the trivial case of unsafe_view(data, :, :, i) this would boil down to a single pointer update. Of course passing around these views outside of their scope is rather discouraged. I use this pattern a lot and it proved to be very handy / fast.

Peter Brady

unread,
Apr 20, 2015, 11:04:41 AM4/20/15
to René Donner, julia...@googlegroups.com
Accidentally hit reply instead of reply-all. Sorry for the double post.

Ran my script in 0.4 and got these results...

julia> test_all(5)
test_stride
elapsed time: 2.008043041 seconds (0 bytes allocated)
test_view
elapsed time: 8.871387399 seconds (42 MB allocated, 0.23% gc time in 2 pauses with 1 full sweep)
test_unsafe
elapsed time: 2.308598574 seconds (46 MB allocated, 0.68% gc time in 2 pauses with 1 full sweep)
test_unsafeview
elapsed time: 9.106651158 seconds (0 bytes allocated)

julia> test_all(10)
test_stride
elapsed time: 4.012240175 seconds (0 bytes allocated)
test_view
elapsed time: 18.085514211 seconds (85 MB allocated, 0.16% gc time in 4 pauses with 1 full sweep)
test_unsafe
elapsed time: 4.477773618 seconds (93 MB allocated, 1.12% gc time in 4 pauses with 1 full sweep)
test_unsafeview
elapsed time: 18.146163969 seconds (0 bytes allocated)

So the allocation for the new unsafeview has been reduced to zero but it has become slower than the regular view.

Perhaps the compiler optimizations that have been discussed here are occuring since the only occurence of 'unsafeview' is the argument to a function. 

Peter Brady

unread,
Apr 20, 2015, 12:35:34 PM4/20/15
to julia...@googlegroups.com
Here's the results of running --track-allocation=user --inline=no on 0.4.  Note that I also deleted all the macros which were affecting the reported line numbers.  

I have three questions base on the data below:
1.  Why is the call to size which indexes into a tuple so expensive
2.  Why is setindex! so expensive?
3.  Why is it so expensive to update the 'start' attribute of my unsafeslice?
Does anyone have any answers or any suggestions on what tools to use to find the answers?

Here's my session:


$ ~/gitrepos/julia0.4/julia --track-allocation=user --inline=no

               _
   _       _ _
(_)_     |  A fresh approach to technical computing
 
(_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _
| |_  __ _   |  Type "help()" for help.
 
| | | | | | |
/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+4385 (2015-04-20 14:52 UTC)
 _/
|\__'_|_|_|\__'_|  |  Commit 5499882 (0 days old master)
|__/                   |  x86_64-redhat-linux


julia
> include("test_alloc.jl")
test_unsafe
(generic function with 2 methods)


julia
> test_unsafe(1);


julia
>
[ptb@cyrus julia]$


And the output

 
       - using ArrayViews
       
- import Base: size, getindex, setindex!, ndims, start, stride, pointer
       
-
       
- type UnsafeSlice{T,N, P<:Ptr} <: AbstractArray
       
-     start::Int
       
-     stride::Int
       
-     size::NTuple{N,Int}
       
-     p::P
       
- end
       
-
       
- size(s::UnsafeSlice) = size(s.size)
       
-
 
7356448 size(s::UnsafeSlice, i::Int) = s.size[i]
       
-
       
- ndims{T,N}(s::UnsafeSlice{T,N}) = N
       
-
       
- getindex(s::UnsafeSlice, i::Int) = unsafe_load(s.p, s.start+(i-1)*s.stride)
       
-
1048559648 setindex!(s::UnsafeSlice, x, i::Int) = unsafe_store!(s.p, x, s.start+(i-1)*s.stride)
       
-
       
- function UnsafeSlice(a, slicedim::Int, start=1)
       
0     p = pointer(a)
       
-
       
0     str = stride(a, slicedim)
       
-
     
368     UnsafeSlice{eltype(a), ndims(a), typeof(p)}(start, str, size(a),p)
       
- end
       
-
       
- function update(a::UnsafeSlice, idx, off)
       
-
       
0     for i=1:size(a, idx)
       
-
       
0         a[i] = -10*off+i
       
-     end
       
-
       
0     a
       
- end

       
-
       
- function setk_UnSafe{T}(a::Array{T,3})

       
0     us = UnsafeSlice(a, 3)
       
-
       
0     for j=1:size(a,2),i=1:size(a,1)
       
-
 
14712896         us.start = sub2ind(size(a), i, j, 1)
       
-
       
0         update(us, 3, us.start)
       
-     end
       
-
       
0     a
       
- end
       
-
       
- function test_unsafe(n, time=true)

       
0     a = zeros(Int, (320, 320, 320))
       
-
       
-     # warmup
       
0     setk_UnSafe(a);

       
-
       
0     Profile.clear_malloc_data()
       
-
       
0     for i=1:n
       
-
       
0         setk_UnSafe(a)
       
-
       
-     end
       
-
       
0     a
       
- end
       
-

Tim Holy

unread,
Apr 20, 2015, 12:54:49 PM4/20/15
to julia...@googlegroups.com
First, you need to run it twice, see
http://docs.julialang.org/en/latest/manual/profile/#memory-allocation-analysis
and the part about clear_malloc_data.

Second, I think you have a bug:
size(s::UnsafeSlice) = size(s.size)
should presumably be
size(s::UnsafeSlice) = s.size

--Tim

Peter Brady

unread,
Apr 20, 2015, 1:05:01 PM4/20/15
to julia...@googlegroups.com
The body of the test_unsafe function is:

        - function test_unsafe(n, time=true)
       
0     a = zeros(Int, (320, 320, 320))
       
-
       
-     # warmup
       
0     setk_UnSafe(a);
       
-
       
0     Profile.clear_malloc_data()
       
-
       
0     for i=1:n
       
-
       
0         setk_UnSafe(a)
       
-
       
-     end
       
-
       
0     a
       
- end

So I clear the malloc data after running my expensive function once and then run it a second time.  There is no difference in the .mem file if I call Profile.clear_malloc_data and then call test_unsafe a second time.

You are right about the bug but that line of code is never called in these tests.  Correcting that line did not change the allocation patterns.  

Tim Holy

unread,
Apr 20, 2015, 1:14:32 PM4/20/15
to julia...@googlegroups.com
Sorry, I didn't notice you'd included the test function.

What happens if you make

UnsafeSlice{T,N, P<:Ptr} <: AbstractArray{T,N}

rather than

UnsafeSlice{T,N, P<:Ptr} <: AbstractArray

?

Also try @code_warntype on those functions.

If these don't work, can you paste in a version of your code that doesn't have
all the malloc markup?

--Tim
> > <javascript:>>:

Peter Brady

unread,
Apr 20, 2015, 1:47:04 PM4/20/15
to julia...@googlegroups.com
I tried making UnsafeSlice a subtype of Abstract{T,N} but that didn't have an impact.

The @code_warntype for update, size, and setindex! didn't raise any red flags

julia> @code_warntype setindex!(UnsafeSlice(zeros(Int, (10, 10, 10)), 3), -10, 1)
Variables:
  s::UnsafeSlice{Int64,3,Ptr{Int64}}
  x::Int64
  i::Int64

Body:
  begin  # /usr/local/runs/compact-fd/symbolic/julia/test_alloc.jl, line 19:
      GenSym(1) = (top(getfield))(s::UnsafeSlice{Int64,3,Ptr{Int64}},:p)::Ptr{Int64}
      GenSym(0) = (top(box))(Int64,(top(add_int))((top(getfield))(s::UnsafeSlice{Int64,3,Ptr{Int64}},:start)::Int64,(top(box))(Int64,(top(mul_int))((top(box))(Int64,(top(sub_int))(i::Int64,1)),(top(getfield))(s::UnsafeSlice{Int64,3,Ptr{Int64}},:stride)::Int64))))
      return (top(pointerset))(GenSym(1),x::Int64,GenSym(0))::Ptr{Int64}
  end::Ptr{Int64}

julia> @code_warntype size(UnsafeSlice(zeros(Int, (10, 10, 10)), 3), 3)
Variables:
  s::UnsafeSlice{Int64,3,Ptr{Int64}}
  i::Int64

Body:
  begin  # /usr/local/runs/compact-fd/symbolic/julia/test_alloc.jl, line 13:
      return getfield((top(getfield))(s::UnsafeSlice{Int64,3,Ptr{Int64}},:size)::Tuple{Int64,Int64,Int64},i::Int64)::Int64
  end::Int64

julia> @code_warntype update(UnsafeSlice(zeros(Int, (10, 10, 10)), 3), 3, 20)
Variables:
  a::UnsafeSlice{Int64,3,Ptr{Int64}}
  idx::Int64
  off::Int64
  #s3::Int64
  i::Int64

Body:
  begin  # /usr/local/runs/compact-fd/symbolic/julia/test_alloc.jl, line 31:
      GenSym(3) = getfield((top(getfield))(a::UnsafeSlice{Int64,3,Ptr{Int64}},:size)::Tuple{Int64,Int64,Int64},idx::Int64)::Int64
      GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1, :(((top(getfield))(Intrinsics,:select_value))((top(sle_int))(1,GenSym(3))::Bool,GenSym(3),(top(box))(Int64,(top(sub_int))(1,1)))::Int64)))
      #s3 = (top(getfield))(GenSym(0),:start)::Int64
      unless (top(box))(Bool,(top(not_int))(#s3::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool)) goto 1
      2: 
      GenSym(6) = #s3::Int64
      GenSym(7) = (top(box))(Int64,(top(add_int))(#s3::Int64,1))
      i = GenSym(6)
      #s3 = GenSym(7) # line 33:
      GenSym(2) = (top(box))(Int64,(top(add_int))((top(box))(Int64,(top(mul_int))(-10,off::Int64)),i::Int64))
      GenSym(5) = (top(getfield))(a::UnsafeSlice{Int64,3,Ptr{Int64}},:p)::Ptr{Int64}
      GenSym(4) = (top(box))(Int64,(top(add_int))((top(getfield))(a::UnsafeSlice{Int64,3,Ptr{Int64}},:start)::Int64,(top(box))(Int64,(top(mul_int))((top(box))(Int64,(top(sub_int))(i::Int64,1)),(top(getfield))(a::UnsafeSlice{Int64,3,Ptr{Int64}},:stride)::Int64))))
      (top(pointerset))(GenSym(5),GenSym(2),GenSym(4))::Ptr{Int64}
      3: 
      unless (top(box))(Bool,(top(not_int))((top(box))(Bool,(top(not_int))(#s3::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool)))) goto 2
      1: 
      0:  # line 36:
      return a::UnsafeSlice{Int64,3,Ptr{Int64}}
  end::UnsafeSlice{Int64,3,Ptr{Int64}}
.

However, running this on the setk_Unsafe function:

julia> @code_warntype setk_UnSafe(zeros(Int, (10, 10, 10)))
Variables:
  a
::Array{Int64,3}
  us
::UnsafeSlice{T,N,P<:Ptr{T}}
 
#s1::Int64
  j
::Int64
 
#s3::Int64
  i
::Int64
 
####p#3255#3279::Ptr{Int64}
 
####str#3256#3280::Int64


Body:
 
begin  # /usr/local/runs/compact-fd/symbolic/julia/test_alloc.jl, line 40:
     
####p#3255#3279 = (top(ccall))(:jl_array_ptr,$(Expr(:call1, :(top(apply_type)), :Ptr, Int64)),$(Expr(:call1, :(top(svec)), :Any)),a::Array{Int64,3},0)::Ptr{Int64}
     
####str#3256#3280 = stride(a::Array{Int64,3},3)::Int64
      us
= ((top(apply_type))(UnsafeSlice,Int64,3,typeof(####p#3255#3279::Ptr{Int64})::Type{Ptr{Int64}})::Type{_<:UnsafeSlice{Int64,N,Ptr{Int64}}})(1,####str#3256#3280::Int64,(top(tuple))((top(arraysize))(a::Array{Int64,3},1)::Int64,(top(arraysize))(a::Array{Int64,3},2)::Int64,(top(arraysize))(a::Array{Int64,3},3)::Int64)::Tuple{Int64,Int64,Int64},####p#3255#3279::Ptr{Int64})::UnsafeSlice{T,N,P<:Ptr{T}} # line 42:
     
GenSym(4) = (top(arraysize))(a::Array{Int64,3},2)::Int64
     
GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1, :(((top(getfield))(Intrinsics,:select_value))((top(sle_int))(1,GenSym(4))::Bool,GenSym(4),(top(box))(Int64,(top(sub_int))(1,1)))::Int64)))
     
#s1 = (top(getfield))(GenSym(0),:start)::Int64
     
unless (top(box))(Bool,(top(not_int))(#s1::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool)) goto 1
     
2:
     
GenSym(7) = #s1::Int64
     
GenSym(8) = (top(box))(Int64,(top(add_int))(#s1::Int64,1))
      j
= GenSym(7)
     
#s1 = GenSym(8)
     
GenSym(5) = (top(arraysize))(a::Array{Int64,3},1)::Int64
     
GenSym(2) = $(Expr(:new, UnitRange{Int64}, 1, :(((top(getfield))(Intrinsics,:select_value))((top(sle_int))(1,GenSym(5))::Bool,GenSym(5),(top(box))(Int64,(top(sub_int))(1,1)))::Int64)))
     
#s3 = (top(getfield))(GenSym(2),:start)::Int64
     
unless (top(box))(Bool,(top(not_int))(#s3::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(GenSym(2),:stop)::Int64,1))::Bool)) goto 4
     
5:
     
GenSym(9) = #s3::Int64
     
GenSym(10) = (top(box))(Int64,(top(add_int))(#s3::Int64,1))
      i
= GenSym(9)
     
#s3 = GenSym(10) # line 44:
     
GenSym(11) = (top(arraysize))(a::Array{Int64,3},1)::Int64
     
GenSym(12) = (top(arraysize))(a::Array{Int64,3},2)::Int64
     
GenSym(13) = (top(arraysize))(a::Array{Int64,3},3)::Int64
     
(top(setfield!))(us::UnsafeSlice{T,N,P<:Ptr{T}},:start,(top(convert))((top(fieldtype))((top(typeof))(us::UnsafeSlice{T,N,P<:Ptr{T}})::Type{_<:UnsafeSlice{T,N,P<:Ptr{T}}},:start)::Type{_},(top(box))(Int64,(top(add_int))(i::Int64,(top(box))(Int64,(top(mul_int))(GenSym(11),(top(box))(Int64,(top(add_int))((top(box))(Int64,(top(sub_int))(j::Int64,1)),(top(box))(Int64,(top(mul_int))(GenSym(12),(top(box))(Int64,(top(sub_int))(1,1)))))))))))::Any)::Any # line 46:
      update
(us::UnsafeSlice{T,N,P<:Ptr{T}},3,(top(getfield))(us::UnsafeSlice{T,N,P<:Ptr{T}},:start)::Int64)::UnsafeSlice{T,N,P<:Ptr{T}}
     
6:
     
unless (top(box))(Bool,(top(not_int))((top(box))(Bool,(top(not_int))(#s3::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(GenSym(2),:stop)::Int64,1))::Bool)))) goto 5
     
4:
     
3:
     
unless (top(box))(Bool,(top(not_int))((top(box))(Bool,(top(not_int))(#s1::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool)))) goto 2
     
1:
     
0:  # line 49:
     
return a::Array{Int64,3}
 
end::Array{Int64,3}


On my terminal the type ::UnsafeSlice{T,N<P<:Ptr{T}} is highlighted in red in this function as is the ::Any on what claims to be 'line 46'.  Is there something about my type that is tripping up the inferencer?

Here's the test_alloc.jl script ( I don't normally double space everything but it made the .mem file more readable):

using ArrayViews

import Base: size, getindex, setindex!, ndims, start, stride,
pointer


type
UnsafeSlice{T,N, P<:Ptr} <: AbstractArray{T,N}

    start
::Int
    stride
::Int
    size
::NTuple{N,Int}

    p
::P
end


size
(s::UnsafeSlice) = s.
size


size
(s::UnsafeSlice, i::Int) = s.size[i]


ndims
{T,N}(s::UnsafeSlice{T,N}) = N


getindex
(s::UnsafeSlice, i::Int) = unsafe_load(s.p, s.start+(i-1)*s.stride)


setindex
!(s::UnsafeSlice, x, i::Int) = unsafe_store!(s.p, x, s.start+(i-1)*s.stride)


function UnsafeSlice(a, slicedim::Int, start=1)

    p
= pointer(a)


    str
= stride(a, slicedim)



   
UnsafeSlice{eltype(a), ndims(a), typeof(p)}(start, str, size(a),p)
end



function update(a::UnsafeSlice, idx, off)



   
for i=1:size(a, idx)



        a
[i] = -10*off+i
   
end


    a
end


function setk_UnSafe{T}(a::Array{T,3})

    us
= UnsafeSlice(a, 3)


   
for j=1:size(a,2),i=1:size(a,1)



        us
.start = sub2ind(size(a), i, j, 1)


        update
(us, 3, us.start)
   
end


    a
end


function test_unsafe(n, time=true)

    a
= zeros(Int, (320, 320, 320))



   
# warmup
    setk_UnSafe
(a);


   
Profile.clear_malloc_data()



   
for i=1:n


        setk_UnSafe
(a)


   
end



    a
end


 Thanks for taking a look at this.

Tim Holy

unread,
Apr 20, 2015, 2:08:12 PM4/20/15
to julia...@googlegroups.com
I wonder if your color printing isn't working? Because for me @code_warntype
raised some big red flags about setk_Unsafe(a). This fixed the problems:

function UnsafeSlice{T,N}(a::Array{T,N}, slicedim::Int, start=1)
p = pointer(a)
str = stride(a, slicedim)
UnsafeSlice{T, N, Ptr{T}}(start, str, size(a),p)
end

--Tim
> :ool,GenSym(3),(top(box))(Int64,(top(sub_int))(1,1)))::Int64)))
> y )::Any # line 46:

Peter Brady

unread,
Apr 20, 2015, 2:23:48 PM4/20/15
to julia...@googlegroups.com
Thank you!

julia> include("test.jl")
test_all (generic function with 1 method)

julia> test_all(5);
test_stride
elapsed time: 1.835237919 seconds (0 bytes allocated)
test_view
elapsed time: 9.453440318 seconds (42 MB allocated, 0.10% gc time in 2 pauses with 1 full sweep)
test_unsafe
elapsed time: 1.821716706 seconds (320 bytes allocated)
test_unsafeview
elapsed time: 9.520118901 seconds (0 bytes allocated)

Can you explain why this fixes the issue?  My understanding (which is apparently wrong) is that this kind of parameterization wasn't necessary since the compiler determines the types when the function is called and compiles it for that specific combination of types.  I was trying to follow the suggestions here http://docs.julialang.org/en/release-0.3/manual/style-guide/#avoid-writing-overly-specific-types.  Are things different for constructors?

Tim Holy

unread,
Apr 21, 2015, 5:09:15 AM4/21/15
to julia...@googlegroups.com
On Monday, April 20, 2015 11:23:48 AM Peter Brady wrote:
> Can you explain why this fixes the issue? My understanding (which is
> apparently wrong) is that this kind of parameterization wasn't necessary
> since the compiler determines the types when the function is called and
> compiles it for that specific combination of types. I was trying to follow
> the suggestions
> here
> http://docs.julialang.org/en/release-0.3/manual/style-guide/#avoid-writing-> overly-specific-types. Are things different for constructors?

Not usually, but I suspect pointer may be a special case---it's not considered
very julian to use it.

I didn't really have time to figure out specifically what went wrong in your
version, but since you're interested, please do figure out the critical problem
and then file an issue.

Best,
> > :56

Peter Brady

unread,
Apr 21, 2015, 11:00:20 AM4/21/15
to julia...@googlegroups.com
That sounds like a good way to familiarize myself with the code base.  Thanks again for helping me with your debug-fu.  I have to admit that I was starting to lose faith in julia but I'm glad it turned out to be user error.

As a final note here, on 0.4, if I make my UnsafeSlice immutable the allocation drops to zero:

$ ~/gitrepos/julia0.4/julia --check-bounds=no
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+4385 (2015-04-20 14:52 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 5499882 (0 days old master)
|__/                   |  x86_64-redhat-linux

julia> workspace(); include("test.jl")
test_all (generic function with 1 method)

julia> test_all(5);
test_stride
elapsed time: 1.844340643 seconds (0 bytes allocated)
test_unsafe
elapsed time: 1.793284676 seconds (0 bytes allocated)

Note that test_stride is simply indexing into the array and test_unsafe is using the immutable UnsafeSlice.  So it turns out that for certain indexing patterns if one can cope with the use restrictions imposed by pointers it _is_ possible to have a zero cost simplified subarray!  Very nice.

Thanks again,
Peter.

Reply all
Reply to author
Forward
0 new messages