Errors Retrieving/Parsing CAP feeds

143 views
Skip to first unread message

Angelo Torres

unread,
Dec 4, 2012, 1:07:31 PM12/4/12
to cap-libra...@googlegroups.com
Hello.

I've setup the cap-library to successfully parse 1000+ alerts from various publishers over the past few weeks. I am hoping that I could get confirmation that I am using the library correctly and that the following errors need to be addressed upstream of my application? I've included the relevant portion of the message Alert Hub is sending my Callback Servlet inside req.getReader() and the exception it is ultimately producing.

#1 - Alert Hub sometimes passes along an error message from a publishers feed?

? cannot read /www1/cap_atom/web/cap/us.atom
com.sun.syndication.io.ParsingFeedException: Invalid XML: Error on line 1: Content is not allowed in prolog.
	at com.google.publicalerts.cap.feed.CapFeedParser$DocBuilder.buildDocument(CapFeedParser.java:465)

#2 - NWS (http://alerts.weather.gov/cap/us.php?x=1) occasionally has an invalid time in their ATOM feed <updated> element? From what I understand, the parser expects <updated> to contain a RFC 3339 formatted date and chokes when it sees '0-' after the 'T' character?

<feed xmlns:cap="urn:oasis:names:tc:emergency:cap:1.1" xmlns="http://www.w3.org/2005/Atom" xmlns:ha="http://www.alerting.net/namespace/index_1.0">
<id>http://alerts.weather.gov/cap/us.atom</id>
<logo>http://alerts.weather.gov/images/xml_logo.gif</logo>
<generator>NWS CAP Server</generator>
<updated>2012-12-04T0-2:38:00-08:00</updated>
CapException[reasons=[character content of element "updated" invalid; must be an ISO date and time]]
	at com.google.publicalerts.cap.feed.CapFeedParser.validate(CapFeedParser.java:290)

#3 - NYAlert (http://rss.nyalert.gov/RSS/CapIndices/_NewYorkStateRSSCAPIndex.xml) incorrectly orders the elements in <area> ?


CapException[reasons=[cvc-complex-type.2.4.a: Invalid content was found starting with element 'event'. One of '{"urn:oasis:names:tc:emergency:cap:1.1":language, "urn:oasis:names:tc:emergency:cap:1.1":category}' is expected.]]
	at com.google.publicalerts.cap.feed.CapFeedParser.parseAlert(CapFeedParser.java:372)


#4 - vialert.gov (http://rss.vialert.gov/RSS/CapIndices/_ALLVIRSSCAPIndex.xml) includes high resolution polygons for the entire territory and causes timeout exceptions on GAE.

java.net.SocketTimeoutException: Timeout while fetching URL: http://www.vialert.gov/Public/News/GetCapAlert.aspx?notID=3293245
	at com.google.appengine.api.urlfetch.URLFetchServiceImpl.convertApplicationException(URLFetchServiceImpl.java:142)
	at com.google.appengine.api.urlfetch.URLFetchServiceImpl.fetch(URLFetchServiceImpl.java:43)
	at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.fetchResponse(URLFetchServiceStreamHandler.java:417)
	at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getInputStream(URLFetchServiceStreamHandler.java:296)
	at java.net.URL.openStream(URL.java:1029)











Steve Hakusa

unread,
Dec 5, 2012, 12:12:14 AM12/5/12
to cap-libra...@googlegroups.com
Hi Angelo,

Nice summary.  I have some replies in-line.  We will also be following up with NOAA on the issues with their feed, some seem fairly recent.

Steve

On Tue, Dec 4, 2012 at 1:07 PM, Angelo Torres <adt...@gmail.com> wrote:
Hello.

I've setup the cap-library to successfully parse 1000+ alerts from various publishers over the past few weeks. I am hoping that I could get confirmation that I am using the library correctly and that the following errors need to be addressed upstream of my application? I've included the relevant portion of the message Alert Hub is sending my Callback Servlet inside req.getReader() and the exception it is ultimately producing.

#1 - Alert Hub sometimes passes along an error message from a publishers feed?



? cannot read /www1/cap_atom/web/cap/us.atom

com.sun.syndication.io.ParsingFeedException: Invalid XML: Error on line 1: Content is not allowed in prolog.
	at com.google.publicalerts.cap.feed.CapFeedParser$DocBuilder.buildDocument(CapFeedParser.java:465)

Yes, I believe we see this from time-to-time when loading NOAAs feed.

 
#2 - NWS (http://alerts.weather.gov/cap/us.php?x=1) occasionally has an invalid time in their ATOM feed <updated> element? From what I understand, the parser expects <updated> to contain a RFC 3339 formatted date and chokes when it sees '0-' after the 'T' character?

Correct "0-2" is invalid here.  Another NOAA issue.
 

<feed xmlns:cap="urn:oasis:names:tc:emergency:cap:1.1" xmlns="http://www.w3.org/2005/Atom" xmlns:ha="http://www.alerting.net/namespace/index_1.0">
<id>http://alerts.weather.gov/cap/us.atom</id>
<logo>http://alerts.weather.gov/images/xml_logo.gif</logo>
<generator>NWS CAP Server</generator>
<updated>2012-12-04T0-2:38:00-08:00</updated>

CapException[reasons=[character content of element "updated" invalid; must be an ISO date and time]]
	at com.google.publicalerts.cap.feed.CapFeedParser.validate(CapFeedParser.java:290)


#3 - NYAlert (http://rss.nyalert.gov/RSS/CapIndices/_NewYorkStateRSSCAPIndex.xml) incorrectly orders the elements in <area> ?

Correct.  The error message could be a bit nicer.  It's somewhat unfortunate that the XSD for CAP specifies <sequence>.




CapException[reasons=[cvc-complex-type.2.4.a: Invalid content was found starting with element 'event'. One of '{"urn:oasis:names:tc:emergency:cap:1.1":language, "urn:oasis:names:tc:emergency:cap:1.1":category}' is expected.]]
	at com.google.publicalerts.cap.feed.CapFeedParser.parseAlert(CapFeedParser.java:372)


#4 - vialert.gov (http://rss.vialert.gov/RSS/CapIndices/_ALLVIRSSCAPIndex.xml) includes high resolution polygons for the entire territory and causes timeout exceptions on GAE.


That's not an issue with the CAP library per-se.  It does seem to take their server quite a long time to return data. 

Angelo Torres

unread,
Dec 5, 2012, 3:28:51 PM12/5/12
to cap-libra...@googlegroups.com
Thanks Steve.

I hesitate to ask, but is working with publishers other than NOAA on correcting their feed an exercise left to the reader?

Steve Hakusa

unread,
Dec 5, 2012, 6:24:00 PM12/5/12
to cap-libra...@googlegroups.com
Sorry, Google Crisis Response is not currently working with the other publishers on this list.

Steve

Angelo Torres

unread,
Dec 7, 2012, 2:36:45 PM12/7/12
to cap-libra...@googlegroups.com
Thanks for the insight. It's good to know the NOAA feed is getting extra attention.
 
Came across another series of exceptions this morning, looks like some NOAA alerts are prepending a '\n' before all of their element values.
 
Error while processing: http://alerts.weather.gov/cap/wwacapget.php?x=CA124CD5680CD4.FireWeatherWatch.124CD574E1FCCA.LOXRFWLOX.8f2a4d901005644752e75813981bd446
CapException[reasons=[cvc-pattern-valid: Value '
NOAA-NWS-ALERTS-CA124CD5680CD4.FireWeatherWatch.124CD574E1FCCA.LOXRFWLOX.8f2a4d901005644752e75813981bd446
' is not facet-valid with respect to pattern '[^\s,&<]+' for type '#AnonType_identifieralert'.;
 
...
 
at com.google.publicalerts.cap.feed.CapFeedParser.parseAlert(CapFeedParser.java:372)

Steve Hakusa

unread,
Dec 10, 2012, 11:14:07 AM12/10/12
to cap-libra...@googlegroups.com
Thanks.

We've sent these to our contacts at NOAA and suggested they sign up at http://cap-validator.appspot.com/subscribe to have such errors emailed to them.

Steve
Reply all
Reply to author
Forward
0 new messages