Array from a single field of a type Array

Gabor

unread,

Sep 7, 2012, 4:16:39 PM9/7/12

to juli...@googlegroups.com

I tried to create an Array from a single field of a type Array:

> type T1
a::Int
b::Int
end

> v=T1[T1(1,2),T1(10,20),T1(100,200)]
3-element T1 Array:
T1(1,2)
T1(10,20)
T1(100,200)

> v[:].a
type Array has no field a

Obviously, it does not work as I anticipated, but it could be a useful (Fortran-like) feature.
Any opinion?

Stefan Karpinski

unread,

Sep 7, 2012, 4:22:34 PM9/7/12

to juli...@googlegroups.com

Hmm. Seems iffy to me. I suspect at that point what you should be doing is keeping two arrays of Ints, instead of an array of Int pairs. Of course, with overloadable . notation, you could implement this easily.

--

Gabor

unread,

Sep 7, 2012, 4:45:27 PM9/7/12

to juli...@googlegroups.com

The above construct is useful when one has to work with several moderately complex Bragg reflection lists as:

type Refl
    h::Int
    k::Int
    l::Int
    A::Float64

    sigma::Float64
    phase::Float64
    observed::Bool

superreflection::Bool
...
end

... and the number of separate arrays would quickly blow up.

What is your present suggestion for the overloadable . notation ?

Stefan Karpinski

unread,

Sep 7, 2012, 4:54:20 PM9/7/12

to juli...@googlegroups.com

On Fri, Sep 7, 2012 at 4:45 PM, Gabor <g...@szfki.hu> wrote:

What is your present suggestion for the overloadable . notation ?

The idea is that foo.bar would be a method call, something like getfield(foo,:bar), thereby allowing one to define any desired behavior, including having arr.bar mean map(x->x.bar,arr) when arr is an array. The big issue, however, is that we can't currently dispatch on symbol values, so that would become a runtime operation, which is lousy for performance. That's fine for using that notation to do something more sophisticated and high-level, but for plain old field access, it needs to be very fast.

Gabor

unread,

Sep 7, 2012, 5:05:23 PM9/7/12

to juli...@googlegroups.com

Stefan, thank you for the explaining the present options.

I still think this Fortran-like feature is generally useful and should be fast,

at least many crystallographic codes are based on it.

Jeff Bezanson

unread,

Sep 7, 2012, 6:10:13 PM9/7/12

to juli...@googlegroups.com

Performance is maybe the key here. If you want v[:].a to perform like
an array of Ints and not require copying to access, then an array of
structs is not the right thing. Maybe DataFrame would help?

If copying is not a problem then maybe defining things like

a(x::Array) = [ x.a for i = 1:length(x) ]

would help. A family of these functions could be generated by a macro.

Or maybe we do need a StructArray{T} type that stores arrays of T
sliced into separate arrays for each field. Then, of course, accessing
a single struct element becomes the slower thing.

With symbols as type parameters (planned feature) one could define
FieldSlice{A,fieldname} that refers to one field over an array of
structs without copying.

> --
>
>
>

Gabor

unread,

Sep 8, 2012, 5:33:04 AM9/8/12

to juli...@googlegroups.com

Jeff, many thanks for the menu of present and future solutions !

I do not know DataFrames.
Some additional explanation would be welcome.

For now, I choose the most straightforward option of
function + array comprehension, even at the cost of copying.

The yet not-planned StructArrays would also be fine,
because in this case the speed of single element access
is less important than the speed of array access.

The planned FieldSlice option seems to be the ideal solution.
As I understand it, it is in the same league as SubArrays
which also avoid copies and are already very useful.

Before doing anything, it might be worth checking
how Fortran compilers handle this situation efficiently.

Stefan Karpinski

unread,

Sep 8, 2012, 11:08:50 AM9/8/12

to juli...@googlegroups.com

Fortran doesn't do anything dynamically, so it's not really relevant.

--

Jeff Bezanson

unread,

Sep 8, 2012, 2:19:25 PM9/8/12

to juli...@googlegroups.com

Can you point me to a reference on this fortran feature? The sources I
found only talked about A(i).x, accessing a field of an element of a
struct array.

On Sat, Sep 8, 2012 at 5:33 AM, Gabor <g...@szfki.hu> wrote:

> --
>
>
>

Gabor

unread,

Sep 8, 2012, 2:55:36 PM9/8/12

to juli...@googlegroups.com

The source I remember is the book by Metcalf-Cohen-Read titled Fortran 95/2003 explained.
I have not yet found the section but hope I will.

In the meantime you can test this code snippet using gfortran:

type :: TT
    integer :: h
    integer :: k
    integer :: l
end type

type(TT) :: v(3)

v(1) = TT(1,10,100)
v(2) = TT(2,20,200)
v(3) = TT(3,30,300)

print *, v(:)%k

It prints 10,20,30 as expected.

Gabor

unread,

Sep 8, 2012, 5:14:01 PM9/8/12

to juli...@googlegroups.com

I could not get closer than Section 6.13 of the Metcalf-Cohen-Read book,
but I also asked help on the Fortran mailing list.

The answers by Richard Maine (a well-known expert) are:
https://groups.google.com/forum/#!topic/comp.lang.fortran/yao-8SLny0s
https://groups.google.com/forum/#!topic/comp.lang.fortran/V55Hk_Xa1to

On Saturday, September 8, 2012 8:19:26 PM UTC+2, Jeff Bezanson wrote:

Tom Short

unread,

Sep 8, 2012, 9:06:11 PM9/8/12

to juli...@googlegroups.com

On Sat, Sep 8, 2012 at 5:33 AM, Gabor <g...@szfki.hu> wrote:

> Jeff, many thanks for the menu of present and future solutions !
>
> I do not know DataFrames.
> Some additional explanation would be welcome.

Gabor, DataFrames sound like a good match for your problem. DataFrames
are implemented as part of the JuliaData package:

https://github.com/HarlanH/JuliaData

They are much like R's data.frame object. DataFrames are a container
object for holding vectors of the same length in a two-dimensional
representation. DataFrames are commonly loaded from CSV files. In your
Refl example above, each "record" would correspond to a row of the
DataFrame. Columns hold vectors of each variable. Here's one way to
make a DataFrame in the format you gave above:

d = DataFrame(quote
h = 1:5
k = randi(5)
l = randi(5)
A = randn(5)
sigma = randn(5)
phase = randn(5)
observed = fill(true, 5)
superreflection = fill(false, 5)
end)

julia> d
DataFrame (5,8)
h k l A sigma phase observed superreflection
[1,] 1 2 5 0.0176655 0.0415967 -0.158557 true false
[2,] 2 2 5 -1.41504 -1.06199 -0.80658 true false
[3,] 3 2 5 -0.101363 -1.49359 -0.472749 true false
[4,] 4 2 5 -1.85589 0.453289 1.52718 true false
[5,] 5 2 5 1.24029 0.395592 -0.304406 true false

julia> dump(d)
DataFrame 5 observations of 8 variables
h: DataVec{Int64}(5) [1,2,3,4]
k: DataVec{Int64}(5) [2,2,2,2]
l: DataVec{Int64}(5) [5,5,5,5]
A: DataVec{Float64}(5)
[0.01766554024905492,-1.4150357684681656,-0.10136344773823956,-1.8558917282066767]
sigma: DataVec{Float64}(5)
[0.04159672227478477,-1.061994296276411,-1.4935914619211654,0.45328879270345956]
phase: DataVec{Float64}(5)
[-0.15855650538029256,-0.8065803944729208,-0.4727493159931296,1.5271844105399426]
observed: DataVec{Bool}(5) [true,true,true,true]
superreflection: DataVec{Bool}(5) [false,false,false,false]

julia> d["sigma"]
[0.04159672227478477,-1.061994296276411,-1.4935914619211654,0.45328879270345956,0.3955924466789631]

julia> d[1,:]
DataFrame (1,8)
h k l A sigma phase observed superreflection
[1,] 1 2 5 0.0176655 0.0415967 -0.158557 true false

julia> d[1,"sigma"]
0.04159672227478477

Stefan Karpinski

unread,

Sep 8, 2012, 10:52:20 PM9/8/12

to juli...@googlegroups.com

I have to admit that I hadn't realized quite how close we are to Fortran syntax. Using "type" to define composite types was pretty hotly debated about a year ago (and by hotly debated, I mean that Jeff, Viral and I couldn't agree), but it looks like we ended up picking the same keyword.

--

Gabor

unread,

Sep 9, 2012, 4:05:23 AM9/9/12

to juli...@googlegroups.com

Tom: Many thanks for the quick intro on DataFrames !
I will surely try to put it in my active vocabulary.

Stefan: Modern Fortran (90/95/2003/2008) is much better than its reputation
in the CS community. Some similarity to its syntax is not a bad thing either.

Stefan Karpinski

unread,

Sep 9, 2012, 8:28:31 AM9/9/12

to juli...@googlegroups.com

On Sun, Sep 9, 2012 at 4:05 AM, Gabor <g...@szfki.hu> wrote:

Stefan: Modern Fortran (90/95/2003/2008) is much better than its reputation in the CS community. Some similarity to its syntax is not a bad thing either.

Fortran has come to be one of my favorite languages that I've never programmed in ;-)

Reply all

Reply to author

Forward