Best data structure for a "dynamic array"

631 views
Skip to first unread message

David P. Sanders

unread,
Feb 21, 2014, 9:23:56 AM2/21/14
to julia...@googlegroups.com
Hi,

I am having trouble finding the best equivalent of a list in Python, and the Pythonic idiom of creating an empty list and then adding elements one by one with .append().
For example, how would I do this for a collection of ordered pairs (Int64, Int64)? I don't seem able even to create the correct type:

julia> v = Vector{(Int64, Int64)}()
ERROR: type cannot be constructed

Then I want to do something like

push!(v, (1, 2))

Thanks,
David.

David P. Sanders

unread,
Feb 21, 2014, 9:26:45 AM2/21/14
to julia...@googlegroups.com
I found a similar question which I think gives the (or at least an) answer:

julia> v = (Int64, Int64)[]
0-element Array{(Int64,Int64),1}

julia> push!(v, (1,2))
1-element Array{(Int64,Int64),1}:
 (1,2) 

Sorry for the noise.

Mauro

unread,
Feb 21, 2014, 9:26:52 AM2/21/14
to julia...@googlegroups.com
Yes, Julia vectors are basically like python lists.

julia> v=(Int,Int)[]
0-element Array{(Int64,Int64),1}

julia> push!(v,(1,2))
1-element Array{(Int64,Int64),1}:
 (1,2)

David P. Sanders

unread,
Feb 21, 2014, 10:14:42 AM2/21/14
to julia...@googlegroups.com


El viernes, 21 de febrero de 2014 08:26:52 UTC-6, Mauro escribió:
Yes, Julia vectors are basically like python lists.

OK, great, thanks!

David.

David P. Sanders

unread,
Feb 21, 2014, 10:34:24 AM2/21/14
to julia...@googlegroups.com
If I were to use instead a Set, instead of a Vector, how would this impact performance?

Kevin Squire

unread,
Feb 21, 2014, 10:57:10 AM2/21/14
to julia...@googlegroups.com
It's hard to answer without more information.  Is this related to your other, self-answered post, where you mention adding and deleting elements?

If so, a Set is the better choice for anything but small arrays, as deleting from the middle of a vector is inefficient for large vectors. 

If insertion order is important, an OrderedSet (in the DataStructures package) might be useful. It will be slightly slower than a Set, but more efficient than deleting from an array.

But without more information, these might all be wrong (or inefficient) for your problem. :-)

Cheers, Kevin

Ivar Nesje

unread,
Feb 21, 2014, 12:21:44 PM2/21/14
to julia...@googlegroups.com
If you want performance, you will always need to test your actual use case. Kevin's remarks are the same I learned in my algorithms course, but he does not specify the size where the difference shows. On large problems there might be other effects dominating the runtime than what you think. I think I remember someone doing performance testing on random inserts and deletes on C++ STL Vector and List. The result was that vector was faster for all the problem sizes they tried, because indexing on a the list was more costly than moving all the elements in the tail of the vector.

Often it is smart to first write a version that is correct, and then use profiling to find what makes it slow. When you know what is slow, and have a correct version to test against, you can try alternative implementations to see if they are faster.

Ivar

David P. Sanders

unread,
Feb 23, 2014, 12:48:40 AM2/23/14
to julia...@googlegroups.com


El viernes, 21 de febrero de 2014 09:57:10 UTC-6, Kevin Squire escribió:
It's hard to answer without more information.  Is this related to your other, self-answered post, where you mention adding and deleting elements?

Yes, it's a simulation I'm working on, which is a particular kind of dynamics in a random environment.
I have a version in C++ but am converting it to Julia since it is so much more quick and flexible to write.
   

If so, a Set is the better choice for anything but small arrays, as deleting from the middle of a vector is inefficient for large vectors. 

If insertion order is important, an OrderedSet (in the DataStructures package) might be useful. It will be slightly slower than a Set, but more efficient than deleting from an array.

Great, thanks -- didn't know about the DataStructures package.
It's often pretty hard to find my way around the julia ecosystem!

Thanks,
David

David P. Sanders

unread,
Feb 23, 2014, 12:49:57 AM2/23/14
to julia...@googlegroups.com


El viernes, 21 de febrero de 2014 11:21:44 UTC-6, Ivar Nesje escribió:
If you want performance, you will always need to test your actual use case. Kevin's remarks are the same I learned in my algorithms course, but he does not specify the size where the difference shows. On large problems there might be other effects dominating the runtime than what you think. I think I remember someone doing performance testing on random inserts and deletes on C++ STL Vector and List. The result was that vector was faster for all the problem sizes they tried, because indexing on a the list was more costly than moving all the elements in the tail of the vector.

Often it is smart to first write a version that is correct, and then use profiling to find what makes it slow. When you know what is slow, and have a correct version to test against, you can try alternative implementations to see if they are faster.

That's a very good suggestion, thanks.
I'll get a version working and then try the various options out.

Once it's somewhat done, I'm planning to make it available for critiques...!

Thanks,
David.
Reply all
Reply to author
Forward
0 new messages