Performance on joins/preload

969 views
Skip to first unread message

Dmitry Aleksandrov

unread,
Nov 16, 2014, 6:52:16 AM11/16/14
to elixi...@googlegroups.com
Hi!

I build small test application with all (cities and other administrative points) from geonames(using database from http://www.geonames.org/ ). I have tried to build it up in 2 tables (cities 3520143, countries 248 at all). Here is a gist with Model and SQL( https://gist.github.com/liveforeverx/649d019601a1bcb89197 ) I've modeled cities, with country_id, now I'll try to get all cities with countries as association. The question is, why is the speed difference is so big(2 -> 80 ms)?

Is it SQL problem, or do I use ecto associations not right?? Or is it ok, that the difference is so big?

Why preload have much more difference as join?

José Valim

unread,
Nov 16, 2014, 7:42:33 AM11/16/14
to Dmitry Aleksandrov, elixi...@googlegroups.com
preload is going to be slow in this case due to how it works: it traverses all cities, collecting their ids, issue a sql query, then traverse all cities associating them with the country. Since you are loading all 3 million cities, it is going to make a big difference. preload is faster or about the same speed for small data sets. Maybe we should improve the docs for preload to make it clearer.

It is also very likely we could make preload faster but that applies to the other queries too.



José Valim
Skype: jv.ptec
Founder and Lead Developer

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Eric Meadows-Jönsson

unread,
Nov 16, 2014, 7:58:18 AM11/16/14
to José Valim, Dmitry Aleksandrov, elixi...@googlegroups.com
Preload is likely slower than join on this kind of dataset where one city can only be associated with one country. But if you have dataset such as blog posts and their tags where one tag can belong too many different posts preload will likely be faster than join.

Because how joins work you will get a result set for each pair of post+tag. So if you have a post with 5 tags you will get 5 duplicated posts back from the database. But with a preload query you will only get unique records, i.e. a lot less data going over the wire.
--
Eric Meadows-Jönsson

Dmitry Aleksandrov

unread,
Nov 16, 2014, 8:08:52 AM11/16/14
to elixi...@googlegroups.com, elix...@gmail.com, jose....@plataformatec.com.br
In my case I try to load only 15 cities(which have name: Samara). But, understand now. Thanks for answer.
Reply all
Reply to author
Forward
0 new messages