I don't know the size of your GTFS files, but I guess the
defaultAgencyId should be different for each GTFS.
BTW, did you test them with these tools:
http://code.google.com/p/googletransitdatafeed/wiki/TransitFeedDistribution?
(FeedValidator and Schedule Viewer).
Cheers.
El 09/08/2011 10:59, Moohyoung Park escribió:
> <property name="defaultAgencyId" value="NSW"/>
--
Fran Peñarrubia
Scolab
www.scolab.es
Asociación gvSIG
www.gvsig.com
There should be no reason to merge the feeds.
In my experience it is possible to comfortably build a graph for all of
the New York metropolitan area, which includes ten separate feeds, in
"only" 5 to 6 GB of memory. There is nothing special to do other than
listing several GTFSBundles in the configuration XML. I have built
graphs for other large urban regions with very dense transport networks
in similar amounts of space or less.
If you are actually filling up 32GB of heap by using multiple transit
feeds from Sydney, this is clearly a problem with OTP that we would want
to resolve. The size of your transit data set (360MB zipped) surprises
me a bit - again, for comparison the entire New York metro area transit
service is around 100MB. Do you know much about the contents of the
feeds you are using? Might they contain a lot of redundant information?
Can you explain from where you are launching the graph builder, and how
you are setting the JVM heap size (the -Xmx option)?
-Andrew
On 08/10/2011 03:29 AM, Moohyoung Park wrote:
> Hi Francisco,
>
> Thank you for replying. I've change the defaultAgencyId. But there're
> same error messages.
> The GTFS files are over 360Mb totally. In addition, I've check those
> with the feedvalidator.
> It shows warnings, but we know what cause the warnings.
> Therefore, don't I have any choice without merging those into one
> transit file?
> Or do I need to ask Google team ? If I have a solution, I'll share it
> with you.
> Thank you.
>
> Moohyoung
>
> --
> Moohyoung Park
> Senior Associate
> Identity Hive Pty. Ltd.
> Identity for the real world.
> mooh...@identityhive.com.au <mailto:mooh...@identityhive.com.au>
> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
Francisco was suggesting that your shapes file might be excessively big
because the shapes are too detailed. If (for example) you have a data
point every meter, you don't need so much information to draw a map and
there are algorithms to simplify curves and reduce the size.
While this is a legitimate concern, 48 gigabytes is enormous, and even
the most detailed shapes should not use nearly that much memory.
It looks like your error is happening very early on in loading, where
agency IDs are pre-loaded to allow references between feeds. As you say,
it is possible that there is a data structure growing endlessly and
filling up all available memory.
Please clarify: are you are getting this error with the single merged
feed, or with the 50 separate feeds?
It is certainly possible to handle 50 transit feeds, and there is no
reason you should have to merge the feeds by hand. If this is a problem
with OTP we would prefer to fix it rather than require elaborate
work-arounds.
Under what conditions do you have access to all this data? Are you by
any chance allowed to share your transit data for debugging purposes?
-Andrew
> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>
> On Wednesday, 10 August 2011 at 5:36 PM, Francisco José Peñarrubia wrote:
>
>> Hi.
>>
>> I think your problem comes form shapes.txt. It's too big, and I think
>> you should pre-process the shapes. Try to use a Douglas-Peucker
>> algorithm to simplify the routes (less coordinates). You can use
>> ArcGis, or gvSIG, or QGis...
>>
>> About stop_times, you are right, it seems too big. Maybe you can
>> define calendar and calendar_dates other way, or include only 1 month
>> ahead instead of all year....
>>
>> Hope it helps.
>>
>> Fran.
>>
>> El 10/08/2011 8:45, Moohyoung Park escribió:
>>> Hi Andrew,
>>>
>>> Thank you for answering.
>>> I'm sorry I've misunderstood. its file size is 170MB. The size of the
>>> file used before is about 30Mb.
>>> The files, shapes.txt and stop_times.txt are big.
>>> For example, followings are files for one agency.
>>> $ ls -al
>>> total 8208
>>> -rw-r--r-- 1 ?? staff 110 10 Aug 15:47 agency.txt
>>> -rw-r--r-- 1 ??staff 10017 10 Aug 15:47 calendar.txt
>>> -rw-r--r-- 1 ??staff 2303 10 Aug 15:47 calendar_dates.txt
>>> -rw-r--r-- 1 ??staff 108197 10 Aug 15:47 routes.txt
>>> -rw-r--r-- 1 ??staff 23366219 10 Aug 15:47 shapes.txt
>>> -rw-r--r-- 1 ??staff 6351234 10 Aug 15:47 stop_times.txt
>>> -rw-r--r-- 1 ??staff 559760 10 Aug 15:47 stops.txt
>>> -rw-r--r-- 1 ??staff 323513 10 Aug 15:47 trips.txt
>>>
>>> In addition, the JVM heap size setting is ;
>>> java -Xmx32G -jar /root/iHive/graph-builder.jar gb-full.config.xml
>>>
>>> Thank you.
>>>
>>> Moohyoung
>>>
>>> --
>>> Moohyoung Park
>>> Senior Associate
>>> Identity Hive Pty. Ltd.
>>> Identity for the real world.
>> www.scolab.es <http://www.scolab.es>
>>
>> Asociación gvSIG
>> www.gvsig.com <http://www.gvsig.com>
>
I have confirmed that there is indeed a problem where the agencies list
doubles in length for each feed added. In the New York case, despite
using many feeds we never ran into this because 2**9 is only 512, an
acceptable length for a list. However 2*49 is certainly enough to reveal
the problem!
Thanks for the bug report, I'm looking into it.
-Andrew
Multiple-feed GTFS loading should now be fixed in the master branch on
Github. Let us know if it works for you!
-Andrew
On 08/10/2011 05:25 PM, Moohyoung Park wrote:
> Hi Andrew,
>
> Please clarify: are you are getting this error with the single merged
> feed, or with the 50 separate feeds?
> => When to use 50 separated feeds, the error messages come out.
>
> When to use about 20 separated feeds, it works.
> When to use the merged one feeds with 50 agencies' files, it works.
> As David said, it is reasonable to handle the multiple transit feeds.
>
> Thank you.
>
> Moohyoung
> --
> Moohyoung Park
> Senior Associate
> Identity Hive Pty. Ltd.
> Identity for the real world.
> mooh...@identityhive.com.au
> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
On Thursday, 11 August 2011 at 3:32 PM, Moohyoung Park wrote:
I'm sorry I've misunderstood.
I will check out the changes and test it tonight.And then I will report it to the usergroup.I appreciate your help.Thank you.Moohyoung
--
Moohyoung ParkSenior AssociateIdentity Hive Pty. Ltd.Identity for the real world.Sent with Sparrow
On Thursday, 11 August 2011 at 3:28 PM, Andrew Byrd wrote:
Hi,
Just to be clear, I'm not asking you to modify it. It is already fixed!
I said "should be" because I cannot test it myself, because I don't have
50 feeds :) You can pull in the changes and try it when you are ready.
-Andrew
On 08/11/2011 07:26 AM, Moohyoung Park wrote:
Hi Andrew,
I'm not yet modifying it.
On Wed, 2011-08-10 at 17:09 +0200, Andrew Byrd wrote:
> Hi Moohyoung,
>
> Francisco was suggesting that your shapes file might be excessively big
> because the shapes are too detailed. If (for example) you have a data
> point every meter, you don't need so much information to draw a map and
> there are algorithms to simplify curves and reduce the size.
What is a reasonable 'resolution' of shapes.txt? I have a 2.5 mb
shapefile with 600 transportation lines and I am creating shapes.txt
from it. I wonder if I need to optimize it in any way in order not to be
wasting otp memory resources in vain.
thanks.
karel
Hi Karel - while overly detailed geometry could be an issue, your file
size sounds reasonable to me. I don't think you need to worry that you
are wasting memory or manually thin the data.
Moohyoung's case turned out to be something totally different -- a bug
that under very specific circumstances would consume thousands of times
more memory than your shapes do.
OTP should probably be able to spot overly detailed shapes and simplify
them, and an ticket on this subject has just been created in the OTP
issue tracker.
-Andrew