OutOfMemory when to build Graph.obj with multiple transit files

124 views
Skip to first unread message

Moohyoung Park

unread,
Aug 9, 2011, 4:59:23 AM8/9/11
to opentripp...@googlegroups.com
Hi All,

I've built Graph.obj with over 50 agencies. It means I have to handle over 50 google transit files for building it.
Before I've merged one files for each type, such as stops.txt, routes.txt and trips.txt and then zipped one transit file.
Recently as recommended, I've tried to build with multiple(?) gtfs bundles list. However, it has failed with following messages
though I've increased the memory into 32Gb. 
Do I need more configuration to handle multiple transit files ? I've attached the config.xml.
Thank you in advance.

Moohyoung

-------------------------------- Following --------------------------
2011-08-09 08:28:30,999 INFO  [XmlBeanDefinitionReader.java:315] : Loading XML bean definitions from class path resource [org/opentripplanner/graph_builder/application-context.xml]
2011-08-09 08:28:32,289 INFO  [XmlBeanDefinitionReader.java:315] : Loading XML bean definitions from file [/root/iHive/gb-full.config.xml]
2011-08-09 08:28:33,295 INFO  [DefaultListableBeanFactory.java:618] : Overriding bean definition for bean 'graphBuilderTask': replacing [Generic bean: class [org.opentripplanner.graph_builder.GraphBuilderTask]; scope=; abstract=false; lazyInit=false; autowireMode=0; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=null; factoryMethodName=null; initMethodName=null; destroyMethodName=null; defined in class path resource [org/opentripplanner/graph_builder/application-context.xml]] with [Generic bean: class [org.opentripplanner.graph_builder.GraphBuilderTask]; scope=; abstract=false; lazyInit=false; autowireMode=0; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=null; factoryMethodName=null; initMethodName=null; destroyMethodName=null; defined in file [/root/iHive/gb-full.config.xml]]
2011-08-09 08:28:33,300 INFO  [AbstractApplicationContext.java:456] : Refreshing org.springframework.context.support.GenericApplicationContext@68916a2: startup date [Tue Aug 09 08:28:33 UTC 2011]; root of context hierarchy
2011-08-09 08:28:33,378 INFO  [DefaultListableBeanFactory.java:555] : Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@7632efa7: defining beans [org.springframework.context.annotation.internalConfigurationAnnotationProcessor,org.springframework.context.annotation.internalAutowiredAnnotationProcessor,org.springframework.context.annotation.internalRequiredAnnotationProcessor,org.springframework.context.annotation.internalCommonAnnotationProcessor,graphService,graphBuilderTask,graphBundle,gtfsBuilder,osmBuilder,transitStreetLink,optimizeTransit]; root of factory hierarchy
2011-08-09 08:28:33,514 INFO  [GtfsGraphBuilderImpl.java:191] : reading entities: org.onebusaway.gtfs.model.Agency
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2760)
at java.util.Arrays.copyOf(Arrays.java:2734)
at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
at java.util.ArrayList.addAll(ArrayList.java:474)
at org.opentripplanner.graph_builder.impl.GtfsGraphBuilderImpl.readGtfs(GtfsGraphBuilderImpl.java:203)
at org.opentripplanner.graph_builder.impl.GtfsGraphBuilderImpl.buildGraph(GtfsGraphBuilderImpl.java:101)
at org.opentripplanner.graph_builder.GraphBuilderTask.run(GraphBuilderTask.java:92)
at org.opentripplanner.graph_builder.GraphBuilderMain.main(GraphBuilderMain.java:49)
2011-08-09 08:29:49,634 INFO  [AbstractApplicationContext.java:1002] : Closing org.springframework.context.support.GenericApplicationContext@68916a2: startup date [Tue Aug 09 08:28:33 UTC 2011]; root of context hierarchy
2011-08-09 08:29:49,635 INFO  [DefaultSingletonBeanRegistry.java:422] : Destroying singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@7632efa7: defining beans [org.springframework.context.annotation.internalConfigurationAnnotationProcessor,org.springframework.context.annotation.internalAutowiredAnnotationProcessor,org.springframework.context.annotation.internalRequiredAnnotationProcessor,org.springframework.context.annotation.internalCommonAnnotationProcessor,graphService,graphBuilderTask,graphBundle,gtfsBuilder,osmBuilder,transitStreetLink,optimizeTransit]; root of factory hierarchy


gb-full.config.xml

Francisco José Peñarrubia

unread,
Aug 9, 2011, 5:08:27 AM8/9/11
to opentripp...@googlegroups.com
Hi.

I don't know the size of your GTFS files, but I guess the
defaultAgencyId should be different for each GTFS.

BTW, did you test them with these tools:
http://code.google.com/p/googletransitdatafeed/wiki/TransitFeedDistribution?
(FeedValidator and Schedule Viewer).

Cheers.

El 09/08/2011 10:59, Moohyoung Park escribió:
> <property name="defaultAgencyId" value="NSW"/>

--
Fran Peñarrubia
Scolab
www.scolab.es

Asociación gvSIG
www.gvsig.com

Moohyoung Park

unread,
Aug 9, 2011, 9:29:24 PM8/9/11
to Francisco José Peñarrubia, opentripp...@googlegroups.com
Hi Francisco, 

Thank you for replying. I've change the defaultAgencyId. But there're same error messages.
The GTFS files are over 360Mb totally. In addition, I've check those with the feedvalidator.
It shows warnings, but we know what cause the warnings.
Therefore, don't I have any choice without merging those into one transit file?
Or do I need to ask Google team ? If I have a solution, I'll share it with  you.
Thank you.

Moohyoung

--
Moohyoung Park
Senior Associate
Identity Hive Pty. Ltd.
Identity for the real world.
Sent with Sparrow

Andrew Byrd

unread,
Aug 10, 2011, 12:46:28 AM8/10/11
to opentripp...@googlegroups.com
Hi Moohyoung,

There should be no reason to merge the feeds.

In my experience it is possible to comfortably build a graph for all of
the New York metropolitan area, which includes ten separate feeds, in
"only" 5 to 6 GB of memory. There is nothing special to do other than
listing several GTFSBundles in the configuration XML. I have built
graphs for other large urban regions with very dense transport networks
in similar amounts of space or less.

If you are actually filling up 32GB of heap by using multiple transit
feeds from Sydney, this is clearly a problem with OTP that we would want
to resolve. The size of your transit data set (360MB zipped) surprises
me a bit - again, for comparison the entire New York metro area transit
service is around 100MB. Do you know much about the contents of the
feeds you are using? Might they contain a lot of redundant information?

Can you explain from where you are launching the graph builder, and how
you are setting the JVM heap size (the -Xmx option)?

-Andrew

On 08/10/2011 03:29 AM, Moohyoung Park wrote:
> Hi Francisco,
>
> Thank you for replying. I've change the defaultAgencyId. But there're
> same error messages.
> The GTFS files are over 360Mb totally. In addition, I've check those
> with the feedvalidator.
> It shows warnings, but we know what cause the warnings.
> Therefore, don't I have any choice without merging those into one
> transit file?
> Or do I need to ask Google team ? If I have a solution, I'll share it
> with you.
> Thank you.
>
> Moohyoung
>
> --
> Moohyoung Park
> Senior Associate
> Identity Hive Pty. Ltd.
> Identity for the real world.

> mooh...@identityhive.com.au <mailto:mooh...@identityhive.com.au>
> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>

Moohyoung Park

unread,
Aug 10, 2011, 2:45:01 AM8/10/11
to Andrew Byrd, opentripp...@googlegroups.com
Hi Andrew,

Thank you for answering. 
I'm sorry I've misunderstood.  its file size is 170MB. The size of the file used before is about 30Mb.
The files, shapes.txt and stop_times.txt are big. 
For example, followings are files for one agency. 
$ ls -al
total 8208
-rw-r--r--   1 ??  staff          110 10 Aug 15:47 agency.txt
-rw-r--r--   1 ??  staff       10017 10 Aug 15:47 calendar.txt
-rw-r--r--   1 ??  staff         2303 10 Aug 15:47 calendar_dates.txt
-rw-r--r--   1 ??  staff      108197 10 Aug 15:47 routes.txt
-rw-r--r--   1 ??  staff  23366219 10 Aug 15:47 shapes.txt
-rw-r--r--   1 ??  staff    6351234 10 Aug 15:47 stop_times.txt
-rw-r--r--   1 ??  staff      559760 10 Aug 15:47 stops.txt
-rw-r--r--   1 ??  staff      323513 10 Aug 15:47 trips.txt

In addition, the JVM heap size setting is ;
java -Xmx32G -jar /root/iHive/graph-builder.jar gb-full.config.xml

Thank you.

Moohyoung

--
Moohyoung Park
Senior Associate
Identity Hive Pty. Ltd.
Identity for the real world.
Sent with Sparrow

Francisco José Peñarrubia

unread,
Aug 10, 2011, 3:36:35 AM8/10/11
to Moohyoung Park, opentripp...@googlegroups.com
Hi.

I think your problem comes form shapes.txt. It's too big, and I think you should pre-process the shapes. Try to use a Douglas-Peucker algorithm to simplify the routes (less coordinates). You can use ArcGis, or gvSIG, or QGis...

About stop_times, you are right, it seems too big. Maybe you can define calendar and calendar_dates other way, or include only 1 month ahead instead of all year....

Hope it helps.

Fran.

Moohyoung Park

unread,
Aug 10, 2011, 10:01:18 AM8/10/11
to Francisco José Peñarrubia, opentripp...@googlegroups.com
Hi,

To draw the detail route path of transit, the shapes.txt file is needed. 
When I build Graph.obj with one merged transit file, it's going to be successful.
I've got following error message as going up to the maximum memory, 48Gb memory. 
It looks to go into unlimited recursive loop. Is it possible to handle over 50 agencies' transit files ?

Thank you.

Moohyoung

----------------------------------------------------------
2011-08-10 07:43:31,664 INFO  [GtfsGraphBuilderImpl.java:191] : reading entities: org.onebusaway.gtfs.model.Agency

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2760)
at java.util.Arrays.copyOf(Arrays.java:2734)
at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
at java.util.ArrayList.addAll(ArrayList.java:474)
at org.opentripplanner.graph_builder.impl.GtfsGraphBuilderImpl.readGtfs(GtfsGraphBuilderImpl.java:203)
at org.opentripplanner.graph_builder.impl.GtfsGraphBuilderImpl.buildGraph(GtfsGraphBuilderImpl.java:101)
at org.opentripplanner.graph_builder.GraphBuilderTask.run(GraphBuilderTask.java:92)
at org.opentripplanner.graph_builder.GraphBuilderMain.main(GraphBuilderMain.java:49)
2011-08-10 07:45:28,665 INFO  [AbstractApplicationContext.java:1002] : Closing org.springframework.context.support.GenericApplicationContext@68916a2: startup date [Wed Aug 10 07:43:31 UTC 2011]; root of context hierarchy
2011-08-10 07:45:28,665 INFO  [DefaultSingletonBeanRegistry.java:422] : Destroying singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@7632efa7: defining beans [org.springframework.context.annotation.internalConfigurationAnnotationProcessor,org.springframework.context.annotation.internalAutowiredAnnotationProcessor,org.springframework.context.annotation.internalRequiredAnnotationProcessor,org.springframework.context.annotation.internalCommonAnnotationProcessor,graphService,graphBuilderTask,graphBundle,gtfsBuilder,osmBuilder,transitStreetLink,optimizeTransit]; root of factory hierarchy



--
Moohyoung Park
Senior Associate
Identity Hive Pty. Ltd.
Identity for the real world.
Sent with Sparrow

Andrew Byrd

unread,
Aug 10, 2011, 11:09:06 AM8/10/11
to opentripp...@googlegroups.com
Hi Moohyoung,

Francisco was suggesting that your shapes file might be excessively big
because the shapes are too detailed. If (for example) you have a data
point every meter, you don't need so much information to draw a map and
there are algorithms to simplify curves and reduce the size.

While this is a legitimate concern, 48 gigabytes is enormous, and even
the most detailed shapes should not use nearly that much memory.

It looks like your error is happening very early on in loading, where
agency IDs are pre-loaded to allow references between feeds. As you say,
it is possible that there is a data structure growing endlessly and
filling up all available memory.

Please clarify: are you are getting this error with the single merged
feed, or with the 50 separate feeds?

It is certainly possible to handle 50 transit feeds, and there is no
reason you should have to merge the feeds by hand. If this is a problem
with OTP we would prefer to fix it rather than require elaborate
work-arounds.

Under what conditions do you have access to all this data? Are you by
any chance allowed to share your transit data for debugging purposes?

-Andrew

> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>


>
> On Wednesday, 10 August 2011 at 5:36 PM, Francisco José Peñarrubia wrote:
>
>> Hi.
>>
>> I think your problem comes form shapes.txt. It's too big, and I think
>> you should pre-process the shapes. Try to use a Douglas-Peucker
>> algorithm to simplify the routes (less coordinates). You can use
>> ArcGis, or gvSIG, or QGis...
>>
>> About stop_times, you are right, it seems too big. Maybe you can
>> define calendar and calendar_dates other way, or include only 1 month
>> ahead instead of all year....
>>
>> Hope it helps.
>>
>> Fran.
>>
>> El 10/08/2011 8:45, Moohyoung Park escribió:
>>> Hi Andrew,
>>>
>>> Thank you for answering.
>>> I'm sorry I've misunderstood. its file size is 170MB. The size of the
>>> file used before is about 30Mb.
>>> The files, shapes.txt and stop_times.txt are big.
>>> For example, followings are files for one agency.
>>> $ ls -al
>>> total 8208
>>> -rw-r--r-- 1 ?? staff 110 10 Aug 15:47 agency.txt

>>> -rw-r--r-- 1 ??staff 10017 10 Aug 15:47 calendar.txt
>>> -rw-r--r-- 1 ??staff 2303 10 Aug 15:47 calendar_dates.txt
>>> -rw-r--r-- 1 ??staff 108197 10 Aug 15:47 routes.txt
>>> -rw-r--r-- 1 ??staff 23366219 10 Aug 15:47 shapes.txt
>>> -rw-r--r-- 1 ??staff 6351234 10 Aug 15:47 stop_times.txt
>>> -rw-r--r-- 1 ??staff 559760 10 Aug 15:47 stops.txt
>>> -rw-r--r-- 1 ??staff 323513 10 Aug 15:47 trips.txt


>>>
>>> In addition, the JVM heap size setting is ;
>>> java -Xmx32G -jar /root/iHive/graph-builder.jar gb-full.config.xml
>>>
>>> Thank you.
>>>
>>> Moohyoung
>>>
>>> --
>>> Moohyoung Park
>>> Senior Associate
>>> Identity Hive Pty. Ltd.
>>> Identity for the real world.

>> www.scolab.es <http://www.scolab.es>
>>
>> Asociación gvSIG
>> www.gvsig.com <http://www.gvsig.com>
>

Moohyoung Park

unread,
Aug 10, 2011, 11:25:26 AM8/10/11
to Andrew Byrd, opentripp...@googlegroups.com
Hi Andrew,

Please clarify: are you are getting this error with the single merged 
feed, or with the 50 separate feeds?
=> When to use 50 separated feeds, the error messages come out.

When to use about 20 separated feeds, it works. 
When to use the merged one feeds with 50 agencies' files, it works.
As David said, it is reasonable to handle the multiple transit feeds.


Thank you.

Moohyoung
--
Moohyoung Park
Senior Associate
Identity Hive Pty. Ltd.
Identity for the real world.
Sent with Sparrow

Andrew Byrd

unread,
Aug 10, 2011, 11:29:55 AM8/10/11
to opentripp...@googlegroups.com
On 08/10/2011 04:01 PM, Moohyoung Park wrote:
> I've got following error message as going up to the maximum memory, 48Gb
> memory.
> It looks to go into unlimited recursive loop. Is it possible to handle
> over 50 agencies' transit files ?

I have confirmed that there is indeed a problem where the agencies list
doubles in length for each feed added. In the New York case, despite
using many feeds we never ran into this because 2**9 is only 512, an
acceptable length for a list. However 2*49 is certainly enough to reveal
the problem!

Thanks for the bug report, I'm looking into it.

-Andrew

Andrew Byrd

unread,
Aug 11, 2011, 1:15:57 AM8/11/11
to Moohyoung Park, opentripp...@googlegroups.com
Hi Moohyoung,

Multiple-feed GTFS loading should now be fixed in the master branch on
Github. Let us know if it works for you!

-Andrew

On 08/10/2011 05:25 PM, Moohyoung Park wrote:
> Hi Andrew,
>
> Please clarify: are you are getting this error with the single merged
> feed, or with the 50 separate feeds?
> => When to use 50 separated feeds, the error messages come out.
>
> When to use about 20 separated feeds, it works.
> When to use the merged one feeds with 50 agencies' files, it works.
> As David said, it is reasonable to handle the multiple transit feeds.
>
> Thank you.
>
> Moohyoung
> --
> Moohyoung Park
> Senior Associate
> Identity Hive Pty. Ltd.
> Identity for the real world.
> mooh...@identityhive.com.au

> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>

Moohyoung Park

unread,
Aug 12, 2011, 12:39:54 AM8/12/11
to Andrew Byrd, opentripp...@googlegroups.com
I've checked out the new sources and rebuilt it yesterday.
Today I test it. It works well with over 50 transit feed files.
In addition, it needs less memory, two third of before than using one merged file.
(about up to 8Gb memory)
You save our project because we don't have enough resources in computing.
I really appreciate your help.

Sincerely,
Moohyoung in Sydney

--
Moohyoung Park
Senior Associate
Identity Hive Pty. Ltd.
Identity for the real world.
Sent with Sparrow

On Thursday, 11 August 2011 at 3:32 PM, Moohyoung Park wrote:

I'm sorry I've misunderstood.
I will check out the changes and test it tonight.
And then I will report it to the usergroup.
I appreciate your help. 

Thank you.

Moohyoung

--
Moohyoung Park
Senior Associate
Identity Hive Pty. Ltd.
Identity for the real world.
Sent with Sparrow

On Thursday, 11 August 2011 at 3:28 PM, Andrew Byrd wrote:

Hi,

Just to be clear, I'm not asking you to modify it. It is already fixed!
I said "should be" because I cannot test it myself, because I don't have
50 feeds :) You can pull in the changes and try it when you are ready.

-Andrew


On 08/11/2011 07:26 AM, Moohyoung Park wrote:
Hi Andrew,

I'm not yet modifying it.

Karel Novotny

unread,
Aug 23, 2011, 11:55:57 AM8/23/11
to opentripp...@googlegroups.com
Hi Andrew,

On Wed, 2011-08-10 at 17:09 +0200, Andrew Byrd wrote:
> Hi Moohyoung,
>

> Francisco was suggesting that your shapes file might be excessively big
> because the shapes are too detailed. If (for example) you have a data
> point every meter, you don't need so much information to draw a map and
> there are algorithms to simplify curves and reduce the size.

What is a reasonable 'resolution' of shapes.txt? I have a 2.5 mb
shapefile with 600 transportation lines and I am creating shapes.txt
from it. I wonder if I need to optimize it in any way in order not to be
wasting otp memory resources in vain.

thanks.

karel

Andrew Byrd

unread,
Aug 23, 2011, 12:18:32 PM8/23/11
to opentripp...@googlegroups.com
On 08/23/2011 05:55 PM, Karel Novotny wrote:
> Hi Andrew,
>
> On Wed, 2011-08-10 at 17:09 +0200, Andrew Byrd wrote:
>> Hi Moohyoung,
>>
>> Francisco was suggesting that your shapes file might be excessively big
>> because the shapes are too detailed. If (for example) you have a data
>> point every meter, you don't need so much information to draw a map and
>> there are algorithms to simplify curves and reduce the size.
>
> What is a reasonable 'resolution' of shapes.txt? I have a 2.5 mb
> shapefile with 600 transportation lines and I am creating shapes.txt
> from it. I wonder if I need to optimize it in any way in order not to be
> wasting otp memory resources in vain.

Hi Karel - while overly detailed geometry could be an issue, your file
size sounds reasonable to me. I don't think you need to worry that you
are wasting memory or manually thin the data.

Moohyoung's case turned out to be something totally different -- a bug
that under very specific circumstances would consume thousands of times
more memory than your shapes do.

OTP should probably be able to spot overly detailed shapes and simplify
them, and an ticket on this subject has just been created in the OTP
issue tracker.

-Andrew

Reply all
Reply to author
Forward
0 new messages