Thoughts of 3.0 changes

75 views
Skip to first unread message

John Erik Halse

unread,
Oct 19, 2015, 9:13:06 AM10/19/15
to openwayback-dev

Hi All,


When working with the code, I have come across some structural issues which I think should be addressed along with the architectural and functional issues we already are solving. I would like to get the community’s opinion on the following.

Code style

We should adopt a style guide. Current code is a mix of different styles which makes reading the code more difficult than it need to be. Heritrix has a document describing some simple guidelines (https://webarchive.jira.com/wiki/display/Heritrix/Style+Guide). Neither OpenWayback nor Heritrix follow this guide consistently.


The mix of tabs and spaces in the code shows up really ugly in github. I suggest we follow the guide, using spaces for indenting. This way indenting always look right in both editor and elsewhere like GitHub and e-mails.


I would also like to enforce the guidelines concerning curly braces mentioned in the style guide.


In short: I suggest we adopt the style guide for Heritrix.


Reformatting the code makes it difficult to see what’s actually changed. But if we want a uniform style, this has to be done at some point. I think a major release as 3.0 is a good candidate.

Formal API definition

While fixing bugs, I realized that there is no, or little formal definition of the service API for OpenWayback. This is a problem while debugging because you don’t know the right outcome when methods rewrite URI’s.


For OWB there is the old documentation describing some of the api (http://archive-access.sourceforge.net/projects/wayback/administrator_manual.html). But this document is not up to date. For example, there is no mentioning on how date ranges can be specified in the URI. OWB supports both path parameters (http://HOSTNAME:PORT/CONTEXT/TIMESTAMP/URL) and query parameters (http://HOSTNAME:PORT/CONTEXT/query?url=URL). The latter is not documented as far as I can see. Do we need both?


The CDX server API is better documented, but it seems like some of the parameters are experimental and should not be considered stable.


We should be clear on what API is supported and stable, and what might change without notice for both OWB and CDX server.

Utilize the http protocol

I think we should be better at using features of the http protocol where it makes sense.

  • The use of return codes could need a clean up.

  • Both in OWB and CDX server there are parameters for selecting the output format (OWB: /xmlquery, CDX: ?output=json). I think we should utilize the ‘Accept’ header for this.

  • Use ‘Accept-Encoding’ header for requesting compressed responses.


Configuration

We are using Spring xml for user configuration. The consequence is that almost every non-private method in every class must be considered as part of the public API and cannot be changed in minor releases and should, as far as possible, be avoided in major releases. My suggestion is to move away from Spring as the mechanism for user configuration. It’s probably not realistic for OWB 3.0, but for the CDX-server, it should be possible in the 3.0 timeframe. We could still use Spring xml for assembling the application (even though Spring itself seems to move away from xml). But for user configuration we should find a solution which restricts the possible configuration options to something we are able to support over time.


Best regards,


John Erik


andrew.jackson

unread,
Oct 19, 2015, 10:16:35 AM10/19/15
to openwayback-dev
Hi All,

FWIW, here's my feeling on these points:

Code Style: Yes, let's use H3's standard and please enforce it in the build/CI via checkstyle and all that jazz.

Formal API definition: I agree this should be a lot clearer, as we have a few implementations now, and I'd like to know they were interchangeable. I'm kinda expecting to move toward a model where the playback is separate from the 'cdx server/remote resource index' (I really like NLAs standalone and writable tinycdxserver, which implements the OpenSearch remote resource index API), so tightening this up would be great. I'd like to support the Accept header, but I'd rather keep the format URL parameter as an option because it's so handy. I think the OpenSearch API requires it anyway.

Configuration: I hate using Spring for user configuration - it gives you more than enough rope to shoot yourself in the foot*. Fully support this idea and prefer using Typesafe Config myself (https://github.com/typesafehub/config)

HTH,
Andy

* I do enjoy a well-mixed metaphor from time to time.

Kenji Nagahashi

unread,
Nov 5, 2015, 1:59:29 PM11/5/15
to openwayback-dev
I'm definitely for fixing up code style. I was tempted to do it many times, but refrained out of fear that it would make it difficult to track / merge changes.
I suggest running code style cleanup alone, once on entire code base, without mixing it with functional changes.

As to configuration, I posted a negative comment to https://github.com/iipc/openwayback/issues/296, misunderstanding your intention. Just disregard it... So I view typesafe config as more expressive replacement for properties file, not Spring.

--Kenji 
Reply all
Reply to author
Forward
0 new messages