Summary of OpenWayback call 17/08/2016

29 views
Skip to first unread message

Kristinn Sigurðsson

unread,
Aug 19, 2016, 6:06:30 AM8/19/16
to openway...@googlegroups.com
Dear all,

An OWB call was held on 17/08/16 @ 15:00 UTC. No agenda was sent out.

The following topics were discussed.


John Erik expects the new CDX server tools to be ready for testing very soon. This includes tools to generate the new CDXJ files as well as the CDX server itself. The CDX server will not be feature complete but should support the main use cases.


John Erik raised the question whether the CDX server should be packed as a servlet (WAR) that is deployed into a web server (e.g. Tomcat) or if it should be published as a stand-alone utility (effectively embedding the web server). Doing so may reduce the complexity of setup and allow us to choose the most appropriate server. Currently John Erik is considering Grizzly for this (https://grizzly.java.net/dependencies.html ). Comments on this are most welcome!


With a major new piece needing testing Sawood raised the idea of a standard WARC dataset for testing. This has been discussed before and usually is well received in principle but (so far) no one has volunteered to put together a suitable dataset. We'd very much welcome such volunteers!


There was some discussion about the practical differences between the Memento API and the CDX server API.


Mohammed raised a question about input sanitization on URLs searched for in OWB. The general consensus was that the search JSP pages might benefit from preventing some obvious data entry errors (repeated protocol for example) but that any API level interfaces should leave this to the caller.


Sawood advocated that issue https://github.com/iipc/openwayback/issues/285 be considered for the CDX server. I.e. that the cdx server advertise its version number in HTTP response headers.


There was a brief discussion on URI canonicalization. Existing canonicalizers can be over aggressive (e.g. down casing the entire URL). OWB 3 will include a new canonicalizer.


The next OWB call will be on September 7 @ 15:00 UTC.


Best,
Kristinn
-------------------------------------------------------------------------
Landsbókasafn Íslands - Háskólabókasafn | Arngrímsgötu 3 - 107 Reykjavík
Sími/Tel: +354 5255600 | www.landsbokasafn.is
-------------------------------------------------------------------------
fyrirvari/disclaimer - http://fyrirvari.landsbokasafn.is

Mohamed Elsayed

unread,
Sep 6, 2016, 7:41:22 AM9/6/16
to openwayback-dev
Where are we going to share WARC dataset for testing OWB 3?

Thank you.

Kristinn Sigurðsson

unread,
Sep 6, 2016, 9:24:55 AM9/6/16
to openway...@googlegroups.com
Hi Mohammed,

That hasn't been determined. Suggestions are welcome!

Best,
Kris
> <http://fyrirvari.landsbokasafn.is>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "openwayback-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to openwayback-d...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages