GraphLab can not read hdfs file by specified file prefix?

kai

unread,

Jun 27, 2013, 2:25:09 AM6/27/13

to graph...@googlegroups.com

[titan@cucstorage sdb]$ time mpiexec -n 3 pagerank --graph hdfs://172.16.12.5:9288/user/titan/data-4000000/graphlab-adj --format="adj" --saveprefix hdfs://172.16.12.5:9288/user/titan/data-4000000/output/pagerank/pagerank.log --ncpus=8

GRAPHLAB_SUBNET_ID/GRAPHLAB_SUBNET_MASK environment variables not defined.

Using default values

Subnet ID: 0.0.0.0

Subnet Mask: 0.0.0.0

Will find first IPv4 non-loopback address matching the subnet

GRAPHLAB_SUBNET_ID/GRAPHLAB_SUBNET_MASK environment variables not defined.

Using default values

Subnet ID: 0.0.0.0

Subnet Mask: 0.0.0.0

Will find first IPv4 non-loopback address matching the subnet

TCP Communication layer constructed.

INFO: distributed_graph.hpp(set_ingress_method:2822): Use random ingress

Loading graph in format: adj

INFO: distributed_graph.hpp(set_ingress_method:2822): Use random ingress

GRAPHLAB_SUBNET_ID/GRAPHLAB_SUBNET_MASK environment variables not defined.

Using default values

Subnet ID: 0.0.0.0

Subnet Mask: 0.0.0.0

Will find first IPv4 non-loopback address matching the subnet

TCP Communication layer constructed.

INFO: distributed_graph.hpp(set_ingress_method:2822): Use random ingress

Loading graph in format: adj

#

# A fatal error has been detected by the Java Runtime Environment:

#

# SIGSEGV (0xb) at pc=0x00002b3a2e50fbc0, pid=16132, tid=47528894618784

#

# JRE version: 6.0_31-b04

# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.6-b01 mixed mode linux-amd64 compressed oops)

# Problematic frame:

# V [libjvm.so+0x50fbc0] unsigned+0xb0

#

# An error report file with more information is saved as:

# /sdb/hs_err_pid16132.log

#

# If you would like to submit a bug report, please visit:

# http://java.sun.com/webapps/bugreport/crash.jsp

#

# A fatal error has been detected by the Java Runtime Environment:

#

# SIGSEGV (0xb) at pc=0x00007f564f748bc0, pid=5420, tid=140008671612000

#

# JRE version: 6.0_31-b04

# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.6-b01 mixed mode linux-amd64 compressed oops)

# Problematic frame:

# V [libjvm.so+0x50fbc0] unsigned+0xb0

#

# An error report file with more information is saved as:

# /sdb/hs_err_pid5420.log

#

# If you would like to submit a bug report, please visit:

# http://java.sun.com/webapps/bugreport/crash.jsp

#

# A fatal error has been detected by the Java Runtime Environment:

#

# SIGSEGV (0xb) at pc=0x00007f706a772bc0, pid=18710, tid=140120791805024

#

# JRE version: 6.0_31-b04

# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.6-b01 mixed mode linux-amd64 compressed oops)

# Problematic frame:

# V [libjvm.so+0x50fbc0] unsigned+0xb0

#

# An error report file with more information is saved as:

# /sdb/hs_err_pid18710.log

#

# If you would like to submit a bug report, please visit:

# http://java.sun.com/webapps/bugreport/crash.jsp

#

hadoop dfs -ls hdfs://172.16.12.5:9288/user/titan/data-4000000/graphlab-adj*

-rw-r--r-- 3 titan supergroup 38712961 2013-06-24 15:48 /user/titan/data-4000000/graphlab-adj-1790000.adj

-rw-r--r-- 3 titan supergroup 38870630 2013-06-24 15:48 /user/titan/data-4000000/graphlab-adj-180000.adj

-rw-r--r-- 3 titan supergroup 38958915 2013-06-24 15:48 /user/titan/data-4000000/graphlab-adj-1800000.adj

-rw-r--r-- 3 titan supergroup 39038166 2013-06-24 15:48 /user/titan/data-4000000/graphlab-adj-1810000.adj

-rw-r--r-- 3 titan supergroup 39437524 2013-06-24 15:48 /user/titan/data-4000000/graphlab-adj-1820000.adj

-rw-r--r-- 3 titan supergroup 38646297 2013-06-24 15:48 /user/titan/data-4000000/graphlab-adj-1830000.adj

-rw-r--r-- 3 titan supergroup 38594307 2013-06-24 15:48 /user/titan/data-4000000/graphlab-adj-1840000.adj

...

Haijie Gu

unread,

Jun 27, 2013, 1:03:43 PM6/27/13

to graph...@googlegroups.com

Please add "env CLASSPATH=`hadoop classpath`" before pagerank. Because you are loading from HDFS, the program needs to know the hadoop class path.

Also there are still 3 independent instance of pagerank running. Try specify the hostile for mpiexec. For example:

mpiexec -n 3 -f hostfile env CLASSPATH=`hadoop classpath` ./pagerank …

-jay

--
You received this message because you are subscribed to the Google Groups "GraphLab API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to graphlabapi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

kai

unread,

Jul 4, 2013, 12:53:43 AM7/4/13

to graph...@googlegroups.com

hi,

pagerank execution error specified hadoop file prefix，but using the full file path to perform normal.

GraphLab Version is V2.2

[titan@smartstorage03 ~]$ time mpiexec --hostfile /home/titan/mpdmaster.hosts -n 6 env CLASSPATH=`hadoop classpath` /home/titan/graphlab/graphlab-v2.2/release/toolkits/graph_analytics/pagerank --graph hdfs://172.16.12.5:9288/user/titan/data-4000000/graphlab-adj --format="adj" --saveprefix hdfs://172.16.12.5:9288/user/titan/data-4000000/output/pagerank/pagerank.log --ncpus=8

GRAPHLAB_SUBNET_ID/GRAPHLAB_SUBNET_MASK environment variables not defined.

Using default values

Subnet ID: 0.0.0.0

Subnet Mask: 0.0.0.0

Will find first IPv4 non-loopback address matching the subnet

INFO: dc.cpp(init:554): Cluster of 6 instances created.

Loading graph in format: adj

INFO: distributed_graph.hpp(set_ingress_method:2902): Automatically determine ingress method: grid

#

# A fatal error has been detected by the Java Runtime Environment:

#

# SIGSEGV (0xb) at pc=0x00007f17815febc0, pid=16500, tid=139738926193568

#

# JRE version: 6.0_31-b04

# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.6-b01 mixed mode linux-amd64 compressed oops)

# Problematic frame:

# V [libjvm.so+0x50fbc0] unsigned+0xb0

#

# An error report file with more information is saved as:

# /home/titan/hs_err_pid16500.log

#

# If you would like to submit a bug report, please visit:

# http://java.sun.com/webapps/bugreport/crash.jsp

#

# A fatal error has been detected by the Java Runtime Environment:

#

# SIGSEGV (0xb) at pc=0x00007f95674dfbc0, pid=26213, tid=140279654689696

#

# JRE version: 6.0_31-b04

# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.6-b01 mixed mode linux-amd64 compressed oops)

# Problematic frame:

# V [libjvm.so+0x50fbc0] unsigned+0xb0

#

# An error report file with more information is saved as:

# /home/titan/hs_err_pid26213.log

#

# If you would like to submit a bug report, please visit:

# http://java.sun.com/webapps/bugreport/crash.jsp

#

===================================================================================

= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES

= EXIT CODE: 6

= CLEANING UP REMAINING PROCESSES

= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

===================================================================================

[proxy:0:5...@smartstorage03.yoyoyws.com] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed

[proxy:0:5...@smartstorage03.yoyoyws.com] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status

[proxy:0:5...@smartstorage03.yoyoyws.com] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event

[proxy:0:0...@cucstorage.wocloud.com.cn] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed

[proxy:0:0...@cucstorage.wocloud.com.cn] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status

[proxy:0:0...@cucstorage.wocloud.com.cn] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event

[mpi...@smartstorage03.yoyoyws.com] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting

[mpi...@smartstorage03.yoyoyws.com] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion

[mpi...@smartstorage03.yoyoyws.com] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion

[mpi...@smartstorage03.yoyoyws.com] main (./ui/mpich/mpiexec.c:325): process manager error waiting for completion

real 0m2.349s

user 0m0.029s

sys 0m0.024s

在 2013年6月28日星期五UTC+8上午1时03分43秒，Jay写道：

hs_err_pid16500.log

Yucheng Low

unread,

Jul 9, 2013, 12:49:46 AM7/9/13

to graph...@googlegroups.com

Hi,

Interesting. We are not emitting enough details in the HDFS error message.
Make sure that all machines can enumerate the path hdfs://172.16.12.5:9288/user/titan/data-4000000/

Also, what version of Hadoop are you using?

Yucheng

Yingxia Shao

unread,

Nov 26, 2013, 8:40:05 PM11/26/13

to graph...@googlegroups.com

Hi,

I encountered the same problem. I have 60 separated file on HDFS with same prefix.

When I run the undirected_triangle_count with more than 10 instance, it fails with the following "SIGSEGV" error.

# A fatal error has been detected by the Java Runtime Environment:

#

# SIGSEGV (0xb) at pc=0x0000000000541c45, pid=18741, tid=139898438657808

#

# JRE version: 7.0_05-b05

# Java VM: Java HotSpot(TM) 64-Bit Server VM (23.1-b03 mixed mode linux-amd64 compressed oops)

# Problematic frame:

# C [undirected_triangle_count+0x141c45] graphlab::cuckoo_map_pow2<unsigned long, graphlab::fixed_dense_bitset<128>, 3ul, unsigned int, boost::hash<unsigned long>, std::equal_to<unsigned long> >::do_insert(std::pair<unsigned long const, graphlab::fixed_dense_bitset<128> > const&)+0x345

#

# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

#

# An error report file with more information is saved as:

# /tmp/hs_err_pid18741.log

#

# If you would like to submit a bug report, please visit:

# http://bugreport.sun.com/bugreport/crash.jsp

# The crash happened outside the Java Virtual Machine in native code.

# See problematic frame for where to report the bug.

#

rank 7 in job 18 changping11_49111 caused collective abort of all ranks

exit status of rank 7: killed by signal 9

It looks like the do_insert method cause the error.

In summary,

the "mpiexec -n 9 undirected_triangle_count --graph xxx --format xxx" run successfully.

the "mpiexec -n 10 undirected_triangle_count --graph xxx --format xxx" failed with above error.

Thanks for help.

Following are the Environment Information:

Hadoop 0.20.2, no_secure

MPICH2 Info:

MPICH2 Version: 1.5

MPICH2 Release date: Mon Oct 8 14:00:48 CDT 2012

MPICH2 Device: ch3:nemesis

MPICH2 configure: --enable-shared --with-pm=mpd

MPICH2 CC: gcc -O2

MPICH2 CXX: c++ -O2

MPICH2 F77: gfortran -O2

MPICH2 FC: gfortran -O2

Java version:

java version "1.7.0_05"

Java(TM) SE Runtime Environment (build 1.7.0_05-b05)

Java HotSpot(TM) 64-Bit Server VM (build 23.1-b03, mixed mode)

Rong Chen

unread,

Dec 11, 2013, 10:02:05 PM12/11/13

to graph...@googlegroups.com

maybe you problem is from the graph ingress.

GraphLab uses "grid" ingress for 9 machines, uses "oblivious" ingress for 10 machines.
(you can get it from log "Automatically determine ingress method: grid")

To confirm it, you can try "mpiexec -n 9 undirected_triangle_count --graph xxx --format xxx --graph_opts ingress=oblivious"
If it failed, maybe above assumption is right.

The problem is from parallel loading multiple files (even using NFS).
There is a simple patch (https://github.com/graphlab-code/graphlab/pull/103)
It only provides correctness to "oblivious", but performance is poor.
It also speedup loading phase of "grid" (real parallel loading)

Thanks,
Rong
Institute of Parallel and Distributed Systems (IPADS)
Shanghai Jiao Tong University
http://ipads.se.sjtu.edu.cn/projects/powerlyra.html

[prox...@smartstorage03.yoyoyws.com] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed

[prox...@smartstorage03.yoyoyws.com] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status

[prox...@smartstorage03.yoyoyws.com] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event

[prox...@cucstorage.wocloud.com.cn] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed

[prox...@cucstorage.wocloud.com.cn] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status

[prox...@cucstorage.wocloud.com.cn] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event

Reply all

Reply to author

Forward