My error in graph format

44 views
Skip to first unread message

lyu...@sslab.cs.nthu.edu.tw

unread,
Apr 6, 2014, 12:15:28 PM4/6/14
to stanford...@googlegroups.com
Hello, when I load my own data into the GPS, there are some problems about data format. The problems are copied in the txt attached. Also my data format is in this. I will appreciate if someone tells me how to solve this question.
BTW, there is another question I haven't solved in the website: https://groups.google.com/d/msg/stanfordgpsusers/BB6AORKMTI8/cA_x_3Tc4oIJ       

Best regards,
Lyuwei
GPS_number.txt

Semih Salihoglu

unread,
Apr 6, 2014, 1:03:18 PM4/6/14
to lyuwei, stanford...@googlegroups.com
Hi Lyuwei,

The problem is that some IDs are very large: 2642503443. This ID for example is larger than 2^31 and you shouldn't use IDs larger than 2^31 because GPS stores IDs in 4 byte integers.

Best,

semih


--
You received this message because you are subscribed to the Google Groups "stanfordgpsusers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stanfordgpsuse...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lyu...@sslab.cs.nthu.edu.tw

unread,
Apr 7, 2014, 1:35:39 AM4/7/14
to stanford...@googlegroups.com, lyuwei, se...@stanford.edu
Thank you, Semih! Actually, My whole graph has 250million vertices, so the ids are big correspondingly. Therefore, are there some methods for GPS to process a graph that large?

Best Regards,
Lyuwei

在 2014年4月7日星期一UTC+8上午1时03分18秒,Semih Salihoglu写道:

Semih Salihoglu

unread,
Apr 7, 2014, 1:38:11 AM4/7/14
to lyuwei, stanford...@googlegroups.com
250M vertices is fine as long as the ids are less than 2^31. When the id is > 2^31, then Java's parseInt function throws an errors. Did you generate your graph? If so, generate using the same algorithm but pick the ids to be between 0 and 250M. In general also use consecutive ids. They'll decrease with GPS's memory usage extremely.

Best,

semih

lyu...@sslab.cs.nthu.edu.tw

unread,
Apr 15, 2014, 11:16:32 PM4/15/14
to stanford...@googlegroups.com, lyuwei, se...@stanford.edu
Thank you Semih. I have removed the illegal nodes and it can run very well. However, I found that when I run my data on a 3-VMs cluster which every VM has 2GB memory assigned to GPS, only 67MB data can be loaded. (Otherwise memory will heap size) According to my previous experiment, the test file of LiveJournal01.txt (200MB) can run on the cluster well. Thus I guess if some of my node is too dense and the value of my vertices are too long. Do you think so? Also I want to know some ways to optimize the memory.
Thank you very much for your warmhearted help.

Best Regards,
Lyuwei 

在 2014年4月7日星期一UTC+8下午1时38分11秒,Semih Salihoglu写道:

Semih Salihoglu

unread,
Apr 15, 2014, 11:44:23 PM4/15/14
to lyuwei, stanford...@googlegroups.com
No, I would still guess that if you're exhausting 2GB of ram with 67MB of data, there's still an ID issue. One of your IDs can still be large. This however also depends on whether you have edge values and how larger your vertex values are. If edge/vertex values are small then I would guess that there's an ID issue.

lyu...@sslab.cs.nthu.edu.tw

unread,
Apr 16, 2014, 12:23:57 AM4/16/14
to stanford...@googlegroups.com, lyuwei, se...@stanford.edu
Thank you Semih. Indeed my vertex values are very large, just as followings. So are there any ways to optimize the memory use? 

Best Regards,
Lyuwei

0
1 1916883570 1344360230 1642351362 1763582395 1813080181 1198920804 1212812142 1746274673 1816011541 1682352065 1658688240 1197161814 1749964961 1644492510 1793285524 1752467960 1182389073 1266286555 1087770692 1656809190 1192329374 1854283601 1241148864 1778742953 1249159055 1676082433 1191965271 1228486722 1249193625 1275017594 1704540003 1735209835 1750405355 1653957693 1684502353 1649189521
4 1649189521 1684502353
5 1649189521 1684502353
6 1649189521 1684502353
7 1649189521 1684502353

在 2014年4月16日星期三UTC+8上午11时44分23秒,Semih Salihoglu写道:

Semih Salihoglu

unread,
Apr 16, 2014, 12:29:50 AM4/16/14
to lyuwei, stanford...@googlegroups.com
Make the IDs contiguous from 0 to numVertices. So if you have 1000 vertices, they should have IDs between 0 to 9999 for the most optimal memory. Once you add a very large ID, the arrays in GPS get very large. That's an implicit assumption of GPS that the IDs are contiguous from 0 to numVertices.
Reply all
Reply to author
Forward
0 new messages