Re: lire

Mathias Lux

unread,

Feb 24, 2010, 11:10:58 AM2/24/10

to seba...@kielmann.biz, lire...@googlegroups.com

Hi!

German is totally ok, but I still sitck with English to post this one
to the mailing list.

Faster retriebval is possible and integrated, but not fully tested.

You can find a test file here:
http://caliph-emir.svn.sourceforge.net/viewvc/caliph-emir/lire/src/test/java/net/semanticmetadata/lire/indexing/MetricSpacesTest.java?revision=48&view=markup

The implementation is here:
http://caliph-emir.svn.sourceforge.net/viewvc/caliph-emir/lire/src/main/java/net/semanticmetadata/lire/indexing/MetricSpacesInvertedListIndexing.java?revision=48&view=markup

The approach is based on the actual metric and the work of G. Amato
("Approximate Similarity Search in Metric Spaces using Inverted Files"
) and performs quite well. However the actual process of indexing is a
bit extended using this approach.

One has to create a set of reference points and then one has to
re-index the whole data set based on the reference points. It works
fine from precision perspective and quite ok from runtime perspektive
(constant in terms of number of images, linear depending on the number
of reference elements). However, I didn't test yet the actual
performance (e.g. #ms for 100.000 images).

perhaps you find the time to try the test with your data set?

cheers,
Mathias

--
Dr. Mathias Lux
http://tinyurl.com/mlux-itec

Message has been deleted

Mathias Lux

unread,

Feb 24, 2010, 2:50:37 PM2/24/10

to lire...@googlegroups.com

Hi!

Well, that's something of course somewhat mysterious. One has to take
a class for Analysis and the corresponding name of the field the
feature is stored in. You also have to make sure that the feature has
been extracted in the initial indexing process.

I'd recommend using CEDD like this:

new MetricSpacesInvertedListIndexing(CEDD.class,
DocumentBuilder.FIELD_NAME_CEDD)

After indexing with the DocumentBuilder obtained by:
DocumentBuilder.getCEDDDocumentBuilder()

Another option would be to use a ChainedDocumentbuilder to extract a
whole lot of features like this:

builder = new ChainedDocumentBuilder();
builder.addBuilder(DocumentBuilderFactory.getCEDDDocumentBuilder());
builder.addBuilder(DocumentBuilderFactory.getFCTHDocumentBuilder());
builder.addBuilder(new SimpleDocumentBuilder(false, false, true));
builder.addBuilder(DocumentBuilderFactory.getColorHistogramDocumentBuilder());
builder.addBuilder(DocumentBuilderFactory.getDefaultAutoColorCorrelationDocumentBuilder());

to make sure you have the choice.

All available fields are given in the
net.semanticmetadata.lire.DocumentBuilder, corresponding classes are
in net.semanticmetadata.lire.imageanalysis.* extending LireFeature
(like CEDD.class, FCTH.class, ColorLayout.class,
SimpleColorHistogram.class, AutoColorCorrelogram.class, etc.)

cheers,
Mathias

On Wed, Feb 24, 2010 at 6:57 PM, seba...@kielmann.biz
<seba...@kielmann.biz> wrote:
> hi,
>
>
>
> small question:
>
>
>
> what are the possible values for featureClass and featureFieldName in
> MetricSpacesInvertedListIndexing?
>
> while running the indexer a lot of messages about no feature stored in this
> document are appearing.
>
>
>
> any help would be appreciated.
>
>
>
> bastian

Message has been deleted

Mathias Lux

unread,

Mar 12, 2010, 8:27:46 AM3/12/10

to lire...@googlegroups.com

Hi!

Yesterday I released a new version, which has an updated implementation of the metric lspaces approach. I recommend to use

values like these.

MetricSpacesInvertedListIndexing.numReferenceObjectsUsed = 50;

MetricSpacesInvertedListIndexing.numReferenceObjects = 1000;

for a million images.

the latter one controls how many reference objects are used, so to say the "vocabulary you use to describe your dat set". The more you have here, the mor accurate your approximation will be. Typically 1000 - 5000 work really kfine for millions of images (according to my chat with giuseppe :)

The first one is more critical. It defines how many of the "words" of your "vocabulary" are used to describe one single image. the choice of this parameter heabvily influences the number of hits and therefore the performance. The number has to be significantly lower than the other one, otherwise every single image will be considered as result and we're back at linear search. However too low numbers lead to "very approximate" results.

Best approakch would be to try some parameters and comparing them to the linear search. try to get the first as low as possible with still high enough precision and you're there.

I'm currently working on another option for the serialization of the descriptors and tried already with CEDD with very promising results. runtime tests indicated that the linear search is at least twice as fast as the original one. I'll check in later today.

cheers,

Mathias

On Thu, Feb 25, 2010 at 6:01 PM, seba...@kielmann.biz <seba...@kielmann.biz> wrote:

Hello,

ok. tested it.

the indexer is created with

ChainedDocumentBuilder builder = new ChainedDocumentBuilder();

builder.addBuilder(DocumentBuilderFactory.getCEDDDocumentBuilder());
builder.addBuilder(DocumentBuilderFactory.getFCTHDocumentBuilder());
builder.addBuilder(new SimpleDocumentBuilder(false, false, true));
builder.addBuilder(DocumentBuilderFactory.getColorHistogramDocumentBuilder());
builder.addBuilder(DocumentBuilderFactory.getDefaultAutoColorCorrelationDocumentBuilder());

and indexes 100.000 images. thsi takes about 3 hours. but i only have a 64 amd with 1 gb ram.

now the search:

i create the image to be searched with

ChainedDocumentBuilder builder = new ChainedDocumentBuilder();
builder.addBuilder(DocumentBuilderFactory.getCEDDDocumentBuilder());

Document doc = builder.createDocument(imageStream, null);

with

MetricSpacesInvertedListIndexing.numReferenceObjectsUsed = 100;
MetricSpacesInvertedListIndexing.numReferenceObjects = 500;

the search takes about 1923 ms

this part is the one that takes the most time:

TopDocs docs = ms.search(doc, indexPath);

which takes about 1500ms

with

MetricSpacesInvertedListIndexing.numReferenceObjectsUsed = 1000;
MetricSpacesInvertedListIndexing.numReferenceObjects = 1000;

the search takes 9 seconds.

since lire doesnt have paging function, i would need to save a 1000 result set, to acomplish a paging.

what exactly do

MetricSpacesInvertedListIndexing.numReferenceObjectsUsed = 1000;
MetricSpacesInvertedListIndexing.numReferenceObjects = 1000;

do?

any idea on how to improve the searching? i aim to have results under 1 second. possible?

cheers

bastian

Mathias Lux <mathi...@gmail.com> hat am 24. Februar 2010 um 20:50 geschrieben:

Message has been deleted

Mathias Lux

unread,

Mar 12, 2010, 2:39:32 PM3/12/10

to lire...@googlegroups.com

Hi!

just checked in, so if you want to try check out revision 57 from the
SVN. The new and fast DocumentBuilder and ImageSearcher are

* CEDDImageSearcher
* CEDDDocumentBuilder

They both use now a byte[] array for storing their histograms in the
Lucene Fileds, resulting in a performance gain of 3 times on 129,000
photos (589ms per search with the new one in my tests and 1900ms per
search with the old one)

hope that helps,
Mathias

On Fri, Mar 12, 2010 at 3:51 PM, seba...@kielmann.biz
<seba...@kielmann.biz> wrote:
>
> hey nice,
>
>
>
> i made some changes on my implementation had have a better performance. but will check, if your solution is even better.
>
>
>
> will get back on this soon.
>
>
>
> cheers,
>
> sebastian
>
>
>
> p.s.
>
> have a nice weekend
>
>
>
> Mathias Lux <mathi...@gmail.com> hat am 12. März 2010 um 14:27 geschrieben:

Message has been deleted

laxmikant patil

unread,

May 25, 2012, 2:34:13 AM5/25/12

to lire...@googlegroups.com

hello,

The indexing code:

/*

* To change this template, choose Tools | Templates

* and open the template in the editor.

*/

/**

*

* @author student

*/

import java.io.File;

import java.io.FileInputStream;

import java.io.IOException;

import net.semanticmetadata.lire.DocumentBuilder;

import net.semanticmetadata.lire.DocumentBuilderFactory;

import net.semanticmetadata.lire.utils.LuceneUtils;

import org.apache.lucene.analysis.SimpleAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.IndexWriterConfig;

import org.apache.lucene.store.FSDirectory;

import org.apache.lucene.util.Version;

public class myindex {

private static String[] testFiles = new String[]{"img01.JPG","img02.JPG","img03.JPG","img04.JPG","img05.JPG","img06.JPG"};

private static String testFilespath = "/home/student/Desktop/images";

private static String indexpath = "/home/student/Desktop/indexDemo";

//private static String testExtensive="D:\\testlire";

public void testCreateIndex() throws IOException {

//System.out.println(">> Indexing " + testFiles.size() + " files.");

DocumentBuilder builder = DocumentBuilderFactory.getExtensiveDocumentBuilder();

try (IndexWriter iw = new IndexWriter(FSDirectory.open(new File(indexpath)), new IndexWriterConfig(Version.LUCENE_34, new SimpleAnalyzer()))) {

for (String identifier : testFiles) {

try (FileInputStream fis = new FileInputStream(testFilespath + "/" + identifier)) {

Document doc = builder.createDocument(fis, identifier);

iw.addDocument(doc);

}

iw.optimize();

}

public void testCreateCorrelogramIndex() throws IOException {

DocumentBuilder builder = DocumentBuilderFactory.getAutoColorCorrelogramDocumentBuilder();

try (IndexWriter iw = LuceneUtils.createIndexWriter(indexpath + "-small", true)) {

long ms = System.currentTimeMillis();

for (String identifier : testFiles) {

Document doc = builder.createDocument(new FileInputStream(testFilespath + "/" + identifier), identifier);

iw.addDocument(doc);

}

System.out.println("Time taken: " + ((System.currentTimeMillis() - ms) / testFiles.length) + " ms");

iw.optimize();

}

public static void main(String[] args) throws Exception {

myindex res = new myindex();

res.testCreateIndex();

res.testCreateCorrelogramIndex();

//System.exit(res);

}

---------------------

& the searching code::

import java.awt.image.BufferedImage;

import java.io.File;

import java.io.FileInputStream;

import java.io.IOException;

import javax.imageio.ImageIO;

import net.semanticmetadata.lire.DocumentBuilder;

import net.semanticmetadata.lire.ImageSearchHits;

import net.semanticmetadata.lire.ImageSearcher;

import net.semanticmetadata.lire.ImageSearcherFactory;

import org.apache.lucene.document.Document;

import org.apache.lucene.index.IndexReader;

import org.apache.lucene.store.FSDirectory;

public class mysearch{

private static String[] testFiles=new String[]{ "img01.JPG", "img02.JPG"};

private static String testFilespath="/home/student/Desktop/images";

private static String indexpath="/home/student/Desktop/indexDemo";

private int numsearches = 25;

public void testSearch() throws IOException{

IndexReader reader= IndexReader.open(FSDirectory.open(new File(indexpath))); //opening an indexreader

ImageSearcher searcher =ImageSearcherFactory.createDefaultSearcher(); //creating imagesearcher

FileInputStream imagestream =new FileInputStream(testFilespath + "/" + testFiles[0]);

BufferedImage bimg = ImageIO.read(imagestream); //search for simlar images

ImageSearchHits hits=null;

hits= searcher.search(bimg,reader);

int i;

for(i=0;i<1;i++)

{

System.out.println("IMAGE OUTPUT:" + hits.score(i) + ":" + hits.doc(i).getField(DocumentBuilder.FIELD_NAME_IDENTIFIER).stringValue());

}

Document document = hits.doc(0);

hits= searcher.search(document,reader);

for(i=0;i<1;i++)

{

System.out.println("out:" + hits.score(i) + ":" + hits.doc(i).getField(DocumentBuilder.FIELD_NAME_IDENTIFIER).stringValue());

}

public static void main(String[] args) throws Exception

{

mysearch res=new mysearch();

res.testSearch();

//System.exit(res);

}

---------------

the index is created , but for searching output it says:

run:

May 25, 2012 11:47:18 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance