This post is meant to get the discussion started about designing the
new API. As a side note, I'm cross posting this to the mailing list.
What I'd love to see are some sort of "top down" pseudo-code snippets
that show how you envision a new API interacting with your code.
Some ideas/questions:
- Maybe for querying we should implement IQueryable<T>? How would that look?
- Maybe for indexing we should implement IObservable<T>/IObserver<T>?
How would that look?
- How can we facilitate parallelization? What kinds of domain entities
should be serializable so that you can send them across a wire as part
of a distribution model?
- How should transactions and locking work?
- What kind of architectural patterns make sense for this problem domain?
- We should totally implement IDisposable!.. or should we? Maybe not
everything needs to be disposable or should be. What do you think?
- Generic collections and IEnumerable<T> interfaces... Great... but
where exactly? What about collections that don't have a .NET BCL
implementation already? Existing libraries for that? or roll our own?
- Injectable behaviours using delegates like Action<T> or Func<T>...
for filtering, scoring, sorting?
That's just a start of some of the things floating around in my head
at the moment. I want to know what you think and I *really* want to
see some pseudo-code examples of how you think the API should work.
Thanks,
Troy
--
You received this message because you are subscribed to the Google Groups "Lucere" group.
To post to this group, send email to luc...@googlegroups.com.
To unsubscribe from this group, send email to lucere+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lucere?hl=en.
My suggestion is to take the current Lucene.NET drop and start .Nettifying it.
--
You received this message because you are subscribed to the Google Groups "Lucere" group.
To post to this group, send email to luc...@googlegroups.com.
To unsubscribe from this group, send email to lucere+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lucere?hl=en.
I think people underestimate the cost of porting new Lucene Java releases.
Before starting to make a "real .NET library", I would suggest to take the
initial port of Lucene 3.0.2 (http://hg.slace.biz/lucene-porting/downloads)
which is created by an automated tool, and try to get it compiled.(I don't
even say it should work, just get it compiled!).
DIGY
--
As a stop gap, we've been using a simple wrapper to add IDisposable
support. In a nutshell, the wrapper object implements IDisposable,
exposes the wrapped object via a property, and takes a Action<T> in
the constructor to define a dispose action. When Dispose is called on
the wrapper it invokes the Action<T>. An extension method is included
which makes using it a bit more light-weight.
It's worth noting that this is a generally useful class for
implementing IDisposable integration on any class that doesn't
implement it.
Here's the classes for that with an example of how to use it:
https://gist.github.com/673545
There's room for improvement on that concept but it basically works.
The larger issue that we find is that calling Close on Lucene objects
doesn't always free resources... Which is one of the reasons it needs
some serious re-writing (not just refactoring) to work correctly on
.NET.
Thanks,
Troy
This is a very pragmatic response. I partially agree and disagree...
and as such, our plan of action includes both concepts, rather than
just one or the other.
Probably a good way to start this discussion is explain what my
intended plan of action is.
The first step will be extracting the interfaces from the 3.0.2 port
and placing them in their own project, as well as converting the
static enums to real enum types in the same library.
The next step will be re-writing the unit tests to work against those
interfaces. The unit tests will all fail because there will be no
implementations backing them. This is good.
While performing those coding tasks, we will be engaging in community
discussion to derive a ideal interaction contract for the API. once
we've come up with a basic plan for how the API should work, we will
refactor the interfaces (note: still no implementation!!) until they
look and feel how we want and support whatever we think they should
support. Then the unit tests and example apps will need to be
refactored to match.
After that we will implement mocks of all the types which contain
static data, hopefully allowing the unit tests to pass based on the
mock behaviour and data. Unit tests will probably pass but
integrations tests will probably not pass.
The we will start implementing those interfaces. At this stage, we may
find that we can refactor and improve the Lucene.Net automated 3.0.2
port to comply with the new interface... or maybe not.. or maybe only
partially. Where it doesn't fit we can either change it or write new
code.
Once the library is implemented and passes all the tests, we'll start
doing the same kind of port of the contrib libraries. After that we
will enter into maintenance mode where we attempt to integrate changes
from the Java Lucene project into our library. As Ayende said, keeping
up probably won't be that difficult assuming the Java developers
maintain their current awesome habits of explaining everything in good
detail and answering questions we may have.
Since we won't be part of the ASF, there will be no rigid expectations
set for our release schedule. This means that we can take as long as
necessary to do it right. I'm not afraid of being slow to release if
necessary, as long as what we release is quality code.
One other thing that should be stated: We will retain file-level
compatibility with all other Lucene implementations.
As far as aimee.net goes.. I like the idea of performing the
refactoring and making that a series of codeproject articles. I don't
want to interfere with his plans for that because I think it will be a
valuable exercise because the scope is small, so it can be done
quickly, and the articles will be valuable to people who are learning
about refactoring.
In fact, the aimee.net project relieves pressure from Lucere to do the
same thing and allows us to extend the initial release cycle longer.
We may find that just as we are entering into the implementation
stage, aimee.net is making it's first release of a refactored
codebase... Which means we could just use the aimee.net code instead
of the Lucene.Net code to implement the contact we've designed.
So because of our intention to use design-by-contract and test driven
development, we do not want to just jump in and start adding
IDisposable and changing getters and setters to properties and what
not... Specifically, we don't want to be focused on implementation
details in the initial stage, we want to be focused on creating a
great user experience for consumers of the library by designing a
great interface first.
This may seem like a unpragmatic approach, but honestly, the pragmatic
approach is what Lucene.Net is already doing. This project is meant to
diverge from that and lean a little more in the idealistic direction.
It may be important for our community to to understand that we are not
attempting to *port* the library, we are attempting to *reimplement*
it. This is very different and we are prepared for the challenges that
this approach will bring and positive about our ability to see it
through.
Thanks,
Troy
Lucene was originally written in 2000 and was probably written for
J2SE 1.2 / 1.3 ... At that time Java did not support annotations,
enums or generics and it still doesn't support lambdas.
It makes sense why the Lucene library seems so primitive compared to
the much more advanced modern-day .NET features. The Lucene project
itself is working on fixing this and re-defining their API to use
modern Java 5/6 features. It would be silly of us to work with the old
API, as Lucene won't even be using that moving forward. Instead, we
should be looking at the most recent release and the current
development branch to see where it's headed and design our API to be
able to support the API changes they will be making in the future.
The whole goal is to get un-stuck from the past.
Thanks,
Troy
I think this is a good set of processes to follow. However, it feels a
bit all or nothing which is where my thoughts - as Ayende initially
suggested - of taking 2.9.2 as a base came from.
I am happy to start with 3.0.2 ported and compiling instead of 2.9.2
but I think a pragmatic approach subsequent to that is to take X
amount of use cases and see if the approach works: index a Document,
retrieve a document using a Query.... For example. This would allow us
to prove the approach without going too deep too quickly.
I fully support the dev. of tests and the introduction of Mockable
Objects but I'd like to make sure that the file format stays the same
so I'd like a concrete-ish implementation of those core use cases
pretty early.
As for contrib projects, I'd far rather define the core initially and
get that working. Document, Field, IndexWriter etc .... There aren't
that many public objects required to support some very valuable use
case proofs.
I think the performance gains will be considerable and we should try
to benchmark from the start to prove this.
I note that the bare skeleton projects in the solution target .NET 4.
I am happy with this, is everybody else? Perhaps .NET 3.5 might be an
easier sell? I would not want to go any lower: 3.5 is 2.0 under a
slightly different guise after all.... The only real excuse for
someone not using it in production being the inclusion of .NET 2.0
SP2.
Finally, thanks for setting this project up; I really despaired of the
line by line approach going forward though I thank the Lucene.Net team
for their great work.
Ciaran
One way we could do that would be to work in layers. Though it is not
explicitly stated anywhere Lucene does seem to be nicely organized
into layers:
- Disk Access (Directory, File, Lock, etc..)
- Streamable Read and Write ( Various readers/writers )
- Persistable Domain Objects (Document, Field, Term, etc)
- Logical Domain Objects (Querys, Scorers, Searchers, Analyzers, etc)
A staged approach that works from the bottom up to implement the
interfaces and pass the unit tests, starting with the disk layer might
be a good approach.
I like the idea of benchmarking this goes on. This will provide an
opportunity to show how each of the decisions we make impacts
performance, and also set some goals for us if we find that our
implementation is slower for some reason.
Regarding using 2.9.2 or 3.0.2 as the base... I'd like to start with
3.0.2 because of the API changes that are already present in that
build, and because, with all likelihood this project will not progress
as quickly at the Java Lucene project. That means we should start at
their current release so that the gap which is inevitably created by
their forward progress will be as small as possible.
Unfortunately, we don't have a functioning port of 3.0.2 available
from Lucene.Net, however I fully expect to see one by the end of the
year. In the bitbucket repo that Aaron Powell set up for testing
porting mechanisms, there's a fairly complete port in the
JavaToVbCSharpConverter subdirectory. It doesn't compile on my
machine, but I think I can make that work.
Here's that link:
http://hg.slace.biz/lucene-porting
Assuming that interfaces extracted from this build won't be terribly
different than the final product, we could start by taking just the
disk classes, extracting their interfaces, porting the unit tests that
apply to that layer (not included in the above package unfortunately),
and then filling in the implementation with the Lucene.Net automated
port code.
Establish a set of benchmarks based on those unit tests, and then
refactor that layer as needed for perf/desired API, verifying through
each refactoring that a) it continues to work as expected and b)
performance doesn't slip.
This could be a nice rinse and repeat process to work our way up to
the top API layer.
Thanks,
Troy
A fully functioning library that is already field tested is going to
make a much more solid basis of comparison than whatever we take out
of 3.0.2... We're going to be mangling the API anyway so I guess API
level is less relevant. Also, I don't think 3.0.2 changed much in the
API vs 2.9.2, other than the obvious -- omitting deprecated/obsolete
elements.
So maybe our standard is to start from 2.9.2, but only incorporating
the non-deprecated/non-obsolete elements. If that sounds good to
everyone, along with the layered approach I described earlier, I'll
start this ball rolling.
That said, the questions I initially posed are still relevant, and
haven't yet been discussed:
- What should the new API look like, and what should it support?
The earlier we start implementation, the earlier we need to work out
those design requirements. We need to make sure that whatever
implementation we end up with is able to support our new API's needs
without too much hackage. I don't want to end up refactor-thrashing
(kind of like thread thrashing but for coding. ;))... Top level API
needs can easily cascade refactorings deep into the core layers. That
will be an arduous process if we have to iterate too many times.
Thanks,
Troy
Anyhow, I couldn't go back to 2.0 and I don't think that doing so
would be in anyones best interest. Also, (and this is becoming a
convenient excuse) Lucene.Net already covers legacy framework
compatibility.
Regarding 2.9.2 vs 3.0.2.... If we can get a working build of 3.0.2
going, well.. step one is to make sure Lucene.Net gets that build into
their hands. If we can share a common implementation of 3.0.2 to work
from then I think that would be a better place to start. If we're
looking at many months before that's ready for use... It's a toss up.
I do like the idea of having something solid and well field-tested to
base testing and comparison on (ie 2.9.2)... so I'm leaning in that
direction at this point, because I think it will be a while before
Lucene.Net 3.0.2 is really fully flushed out. My assumption is that a
release made under deadline pressure is going to be a weak release and
probably ship with bugs or at least reduced performance.
That said, I'm still not totally opposed to simply taking the latest
build of Java Lucene (3.x or 4.0) and extracting/porting the
interfaces from that and just manually implementing everything to
fulfill those contracts. It would take a long time, but really, I
think it might end up being a more healthy process than attempting to
improve and refactor the Lucene.Net code.
Prescott -- Did you say you were able to compile Aaron's port? I get
massive issues with the code that need manual porting with all
versions. Mostly related to uses of generics. Maybe I'm missing
something?
Thanks,
Troy
Interesting...
I made a local clone from the HG repo at
(http://hg.slace.biz/lucene-porting) and using a plain-jane install of
Visual Studio 2010 Express, opened the solution file, hit build and
got the following errors listed below (21 errors, 4 warnings)... After
digging into the errors a bit, fixing them as I went, I noticed they
chained into other errors, and in total, there were quite a few errors
to address littered around the code with some very obvious syntax
issues, incorrect use of generics, missing classes, etc...
It's ~4:20am here in Portland, Oregon, so I'm not about to try to fix
all of them before going to bed.. ;)
Are we looking at the same code base? Is there another revision of it
available that I'm unaware of?
Thanks,
Troy
Error 1 Invalid token '(' in class, struct, or interface member
declaration C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 68 27 Lucene.Net
Error 2 ; expected C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 68 56 Lucene.Net
Error 3 ; expected C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 68 95 Lucene.Net
Error 4 Invalid token '=' in class, struct, or interface member
declaration C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 70 18 Lucene.Net
Error 5 Invalid token ';' in class, struct, or interface member
declaration C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 70 29 Lucene.Net
Error 6 Invalid token '=' in class, struct, or interface member
declaration C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 71 28 Lucene.Net
Error 7 Invalid token '(' in class, struct, or interface member
declaration C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 71 82 Lucene.Net
Error 8 Invalid token ')' in class, struct, or interface member
declaration C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 71 95 Lucene.Net
Error 9 Expected class, delegate, enum, interface, or
struct C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 78 11 Lucene.Net
Error 10 Expected class, delegate, enum, interface, or
struct C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 88 11 Lucene.Net
Error 11 Expected class, delegate, enum, interface, or
struct C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 95 20 Lucene.Net
Error 12 Expected class, delegate, enum, interface, or
struct C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 97 14 Lucene.Net
Error 13 Expected class, delegate, enum, interface, or
struct C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 97 55 Lucene.Net
Error 14 Expected class, delegate, enum, interface, or
struct C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 106 20 Lucene.Net
Error 15 Expected class, delegate, enum, interface, or
struct C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 111 19 Lucene.Net
Error 16 Expected class, delegate, enum, interface, or
struct C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 112 26 Lucene.Net
Error 17 Expected class, delegate, enum, interface, or
struct C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 113 26 Lucene.Net
Error 18 Type or namespace definition, or end-of-file
expected C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 115 3 Lucene.Net
Warning 19 Type parameter 'K' has the same name as the type parameter
from outer type
'Lucene.Net.Util.Cache.Cache<K,V>' C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Util\Cache\Cache.cs 32 37 Lucene.Net
Warning 20 Type parameter 'V' has the same name as the type parameter
from outer type
'Lucene.Net.Util.Cache.Cache<K,V>' C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Util\Cache\Cache.cs 32 40 Lucene.Net
Warning 21 Type parameter 'K' has the same name as the type parameter
from outer type
'Lucene.Net.Util.Cache.SimpleMapCache<K,V>' C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Util\Cache\SimpleMapCache.cs 75 45 Lucene.Net
Warning 22 Type parameter 'V' has the same name as the type parameter
from outer type
'Lucene.Net.Util.Cache.SimpleMapCache<K,V>' C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Util\Cache\SimpleMapCache.cs 75 48 Lucene.Net
Error 23 The namespace '<global namespace>' already contains a
definition for 'SavedStreams' C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\StopAnalyzer.cs 101 18 Lucene.Net
Error 24 Elements defined in a namespace cannot be explicitly declared
as private, protected, or protected
internal C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\Standard\StandardAnalyzer.cs 109 25 Lucene.Net
Error 25 Elements defined in a namespace cannot be explicitly declared
as private, protected, or protected
internal C:\dev\lucene\lucene-porting\JavaToVbCSharpConverter\Lucene.Net
3.0.2\Lucene.Net 3.0.2 -
AfterSomeDumpPostProcess\Analysis\Standard\StandardAnalyzer.cs 109 25 Lucene.Net
I was under the impression you were successfully building a 3.0.2
port. If you were, well, I think we would all be very happy to have
that codebase... ;)
Thanks,
Troy
Thanks,
Troy