German is totally ok, but I still sitck with English to post this one
to the mailing list.
Faster retriebval is possible and integrated, but not fully tested.
You can find a test file here:
http://caliph-emir.svn.sourceforge.net/viewvc/caliph-emir/lire/src/test/java/net/semanticmetadata/lire/indexing/MetricSpacesTest.java?revision=48&view=markup
The implementation is here:
http://caliph-emir.svn.sourceforge.net/viewvc/caliph-emir/lire/src/main/java/net/semanticmetadata/lire/indexing/MetricSpacesInvertedListIndexing.java?revision=48&view=markup
The approach is based on the actual metric and the work of G. Amato
("Approximate Similarity Search in Metric Spaces using Inverted Files"
) and performs quite well. However the actual process of indexing is a
bit extended using this approach.
One has to create a set of reference points and then one has to
re-index the whole data set based on the reference points. It works
fine from precision perspective and quite ok from runtime perspektive
(constant in terms of number of images, linear depending on the number
of reference elements). However, I didn't test yet the actual
performance (e.g. #ms for 100.000 images).
perhaps you find the time to try the test with your data set?
cheers,
Mathias
--
Dr. Mathias Lux
http://tinyurl.com/mlux-itec
Well, that's something of course somewhat mysterious. One has to take
a class for Analysis and the corresponding name of the field the
feature is stored in. You also have to make sure that the feature has
been extracted in the initial indexing process.
I'd recommend using CEDD like this:
new MetricSpacesInvertedListIndexing(CEDD.class,
DocumentBuilder.FIELD_NAME_CEDD)
After indexing with the DocumentBuilder obtained by:
DocumentBuilder.getCEDDDocumentBuilder()
Another option would be to use a ChainedDocumentbuilder to extract a
whole lot of features like this:
builder = new ChainedDocumentBuilder();
builder.addBuilder(DocumentBuilderFactory.getCEDDDocumentBuilder());
builder.addBuilder(DocumentBuilderFactory.getFCTHDocumentBuilder());
builder.addBuilder(new SimpleDocumentBuilder(false, false, true));
builder.addBuilder(DocumentBuilderFactory.getColorHistogramDocumentBuilder());
builder.addBuilder(DocumentBuilderFactory.getDefaultAutoColorCorrelationDocumentBuilder());
to make sure you have the choice.
All available fields are given in the
net.semanticmetadata.lire.DocumentBuilder, corresponding classes are
in net.semanticmetadata.lire.imageanalysis.* extending LireFeature
(like CEDD.class, FCTH.class, ColorLayout.class,
SimpleColorHistogram.class, AutoColorCorrelogram.class, etc.)
cheers,
Mathias
On Wed, Feb 24, 2010 at 6:57 PM, seba...@kielmann.biz
<seba...@kielmann.biz> wrote:
> hi,
>
>
>
> small question:
>
>
>
> what are the possible values for featureClass and featureFieldName in
> MetricSpacesInvertedListIndexing?
>
> while running the indexer a lot of messages about no feature stored in this
> document are appearing.
>
>
>
> any help would be appreciated.
>
>
>
> bastian
Hello,
ok. tested it.
the indexer is created with
ChainedDocumentBuilder builder = new ChainedDocumentBuilder();
builder.addBuilder(DocumentBuilderFactory.getCEDDDocumentBuilder());
builder.addBuilder(DocumentBuilderFactory.getFCTHDocumentBuilder());
builder.addBuilder(new SimpleDocumentBuilder(false, false, true));
builder.addBuilder(DocumentBuilderFactory.getColorHistogramDocumentBuilder());
builder.addBuilder(DocumentBuilderFactory.getDefaultAutoColorCorrelationDocumentBuilder());
and indexes 100.000 images. thsi takes about 3 hours. but i only have a 64 amd with 1 gb ram.
now the search:
i create the image to be searched with
ChainedDocumentBuilder builder = new ChainedDocumentBuilder();
builder.addBuilder(DocumentBuilderFactory.getCEDDDocumentBuilder());Document doc = builder.createDocument(imageStream, null);
with
MetricSpacesInvertedListIndexing.numReferenceObjectsUsed = 100;
MetricSpacesInvertedListIndexing.numReferenceObjects = 500;
the search takes about 1923 ms
this part is the one that takes the most time:
TopDocs docs = ms.search(doc, indexPath);
which takes about 1500ms
with
MetricSpacesInvertedListIndexing.numReferenceObjectsUsed = 1000;
MetricSpacesInvertedListIndexing.numReferenceObjects = 1000;the search takes 9 seconds.
since lire doesnt have paging function, i would need to save a 1000 result set, to acomplish a paging.
what exactly do
MetricSpacesInvertedListIndexing.numReferenceObjectsUsed = 1000;
MetricSpacesInvertedListIndexing.numReferenceObjects = 1000;do?
any idea on how to improve the searching? i aim to have results under 1 second. possible?
cheers
bastian
Mathias Lux <mathi...@gmail.com> hat am 24. Februar 2010 um 20:50 geschrieben:
just checked in, so if you want to try check out revision 57 from the
SVN. The new and fast DocumentBuilder and ImageSearcher are
* CEDDImageSearcher
* CEDDDocumentBuilder
They both use now a byte[] array for storing their histograms in the
Lucene Fileds, resulting in a performance gain of 3 times on 129,000
photos (589ms per search with the new one in my tests and 1900ms per
search with the old one)
hope that helps,
Mathias
On Fri, Mar 12, 2010 at 3:51 PM, seba...@kielmann.biz
<seba...@kielmann.biz> wrote:
>
> hey nice,
>
>
>
> i made some changes on my implementation had have a better performance. but will check, if your solution is even better.
>
>
>
> will get back on this soon.
>
>
>
> cheers,
>
> sebastian
>
>
>
> p.s.
>
> have a nice weekend
>
>
>
> Mathias Lux <mathi...@gmail.com> hat am 12. März 2010 um 14:27 geschrieben: