Loading a huge graph

瀏覽次數:120 次
跳到第一則未讀訊息

László Török

未讀,
2012年4月12日 下午6:17:582012/4/12
收件者:clo...@googlegroups.com
Hi,

I'm trying figure out how to load a huge file that contains some 800k pair of integers (two integers per line) which represent edges of a directed graph.

So if the ith line has x and y, it means that there is an edge between x and y vertex in the graph.

The goal is to load it in an array of arrays representation, where the kth array contains all the nodes, where there is a directed edge from the kth node to those nodes.

I've attempted multiple variants of with-open reader and line-seq etc. but almost always ended up with OutMemoryException or sg VERY slow.

My latest attempt that also does not work on the large input:

(defn load-graph [input-f]
  (with-open [rdr (io/reader input-f)]
    (->> (line-seq rdr)
        (map (fn [row]
               (let [[v1str v2str] (str/split row #"\s")]
                   [ (Integer/parseInt v1str) (Integer/parseInt v2str) ]))   )
        (reduce (fn [G [v1 v2]]
                  (if-let [vs (get G v1)]
                    (update-in G [v1] #(conj % v2))
                    (assoc G v1 [v2])))  { }  ))))

I'm getting a bit frustrated as there are Python, Go implementations that load the graph in less the 5 seconds.

What am I doing wrong?

Thanks

--
László Török

David Nolen

未讀,
2012年4月12日 下午6:22:342012/4/12
收件者:clo...@googlegroups.com
How much memory do Python & Go consume when you do this? Are you giving the JVM enough memory?

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Alex Robbins

未讀,
2012年4月12日 晚上11:07:242012/4/12
收件者:clo...@googlegroups.com
Yeah, sounds like it could definitely be a memory issue. This is one
part where the JVM works a lot differently than I expected coming from
a python background.

Everybody may already know this, but the JVM only takes 64mb for the
heap by default. You'll get an out of memory error if your program
uses more than that. In contrast, python just takes all the memory it
needs. As your program gets closer to the JVM memory limit it'll spend
more and more time doing garbage collection, with less and less real
work getting done. You can pass an -Xmx flag to give java access to
more memory, which many (most?) programs do.

László Török

未讀,
2012年4月13日 凌晨4:32:242012/4/13
收件者:clo...@googlegroups.com
Thanks,

having a C++ background, there is definitely a lot to learn about the JVM.

I'm wrote a script in Python that ran in about 10-12s and used up to 320MB of memory.

I'm running a clojure repl from Emacs via jack-in.

What's the best way to adjust the heap size? swank-clojure and clojure-mode don't say.

If I spin up the repl via lein swank, I can use the LEIN_JVM_OPTS, however it affects only one of the JVMs started by lein swank, the one with jline.

Any advice? :)


2012/4/13 Alex Robbins <alexander...@gmail.com>



--
László Török

bOR_

未讀,
2012年4月13日 清晨5:44:452012/4/13
收件者:clo...@googlegroups.com
I have this option in my project.clj file, which does the trick if you are developing from emacs+swank+clojure-jack-in, and using large networks

  :jvm-opts ["-Xmx4000m"]

And yes, one of the things to do when working with the jvm is learning how to use jconsole or visualvm to see why your program is slow :-), and know how to adapt the jvm options. If I remember correctly, java 1.7 has an option for a more flexible heap size, which might be a nice default.


2012/4/13 Alex Robbins <alexander...@gmail.com>

>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to

> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en



--
László Török

László Török

未讀,
2012年4月13日 清晨6:56:322012/4/13
收件者:clo...@googlegroups.com
Excellent! Thank you, that did the trick!

2012/4/13 bOR_ <boris....@gmail.com>

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en



--
László Török

Robert Marianski

未讀,
2012年4月12日 晚上9:17:262012/4/12
收件者:clo...@googlegroups.com
If the jvm does have enough memory, you may want to try building up the
map using a transient.

And not sure if this is faster, (maybe it's slower), but you can spell the
function you pass to reduce more succinctly:

(fn [G [v1 v2]] (update-in G [v1] (fnil conj []) v2))

Robert

On Thu, Apr 12, 2012 at 06:22:34PM -0400, David Nolen wrote:
> How much memory do Python & Go consume when you do this? Are you giving the
> JVM enough memory?
>

> > L�szl� T�r�k

László Török

未讀,
2012年4月13日 下午2:06:412012/4/13
收件者:clo...@googlegroups.com

Thanks, I know about transients, but I'm already using mutable arrays for speed :-)


On Apr 13, 2012 8:01 PM, "Robert Marianski" <r...@marianski.com> wrote:
>
> If the jvm does have enough memory, you may want to try building up the
> map using a transient.
>
> And not sure if this is faster, (maybe it's slower), but you can spell the
> function you pass to reduce more succinctly:
>
> (fn [G [v1 v2]] (update-in G [v1] (fnil conj []) v2))
>
> Robert
>
> On Thu, Apr 12, 2012 at 06:22:34PM -0400, David Nolen wrote:
> > How much memory do Python & Go consume when you do this? Are you giving the
> > JVM enough memory?
> >

> > > László Török

回覆所有人
回覆作者
轉寄
0 則新訊息