Hi All,
When working with the code, I have come across some structural issues which I think should be addressed along with the architectural and functional issues we already are solving. I would like to get the community’s opinion on the following.
We should adopt a style guide. Current code is a mix of different styles which makes reading the code more difficult than it need to be. Heritrix has a document describing some simple guidelines (https://webarchive.jira.com/wiki/display/Heritrix/Style+Guide). Neither OpenWayback nor Heritrix follow this guide consistently.
The mix of tabs and spaces in the code shows up really ugly in github. I suggest we follow the guide, using spaces for indenting. This way indenting always look right in both editor and elsewhere like GitHub and e-mails.
I would also like to enforce the guidelines concerning curly braces mentioned in the style guide.
In short: I suggest we adopt the style guide for Heritrix.
Reformatting the code makes it difficult to see what’s actually changed. But if we want a uniform style, this has to be done at some point. I think a major release as 3.0 is a good candidate.
While fixing bugs, I realized that there is no, or little formal definition of the service API for OpenWayback. This is a problem while debugging because you don’t know the right outcome when methods rewrite URI’s.
For OWB there is the old documentation describing some of the api (http://archive-access.sourceforge.net/projects/wayback/administrator_manual.html). But this document is not up to date. For example, there is no mentioning on how date ranges can be specified in the URI. OWB supports both path parameters (http://HOSTNAME:PORT/CONTEXT/TIMESTAMP/URL) and query parameters (http://HOSTNAME:PORT/CONTEXT/query?url=URL). The latter is not documented as far as I can see. Do we need both?
The CDX server API is better documented, but it seems like some of the parameters are experimental and should not be considered stable.
We should be clear on what API is supported and stable, and what might change without notice for both OWB and CDX server.
I think we should be better at using features of the http protocol where it makes sense.
The use of return codes could need a clean up.
Both in OWB and CDX server there are parameters for selecting the output format (OWB: /xmlquery, CDX: ?output=json). I think we should utilize the ‘Accept’ header for this.
Use ‘Accept-Encoding’ header for requesting compressed responses.
We are using Spring xml for user configuration. The consequence is that almost every non-private method in every class must be considered as part of the public API and cannot be changed in minor releases and should, as far as possible, be avoided in major releases. My suggestion is to move away from Spring as the mechanism for user configuration. It’s probably not realistic for OWB 3.0, but for the CDX-server, it should be possible in the 3.0 timeframe. We could still use Spring xml for assembling the application (even though Spring itself seems to move away from xml). But for user configuration we should find a solution which restricts the possible configuration options to something we are able to support over time.
Best regards,
John Erik