Re: lire

229 views
Skip to first unread message

Mathias Lux

unread,
Feb 24, 2010, 11:10:58 AM2/24/10
to seba...@kielmann.biz, lire...@googlegroups.com
Hi!

German is totally ok, but I still sitck with English to post this one
to the mailing list.

Faster retriebval is possible and integrated, but not fully tested.

You can find a test file here:
http://caliph-emir.svn.sourceforge.net/viewvc/caliph-emir/lire/src/test/java/net/semanticmetadata/lire/indexing/MetricSpacesTest.java?revision=48&view=markup

The implementation is here:
http://caliph-emir.svn.sourceforge.net/viewvc/caliph-emir/lire/src/main/java/net/semanticmetadata/lire/indexing/MetricSpacesInvertedListIndexing.java?revision=48&view=markup

The approach is based on the actual metric and the work of G. Amato
("Approximate Similarity Search in Metric Spaces using Inverted Files"
) and performs quite well. However the actual process of indexing is a
bit extended using this approach.

One has to create a set of reference points and then one has to
re-index the whole data set based on the reference points. It works
fine from precision perspective and quite ok from runtime perspektive
(constant in terms of number of images, linear depending on the number
of reference elements). However, I didn't test yet the actual
performance (e.g. #ms for 100.000 images).

perhaps you find the time to try the test with your data set?

cheers,
Mathias

--
Dr. Mathias Lux
http://tinyurl.com/mlux-itec

Message has been deleted
Message has been deleted
Message has been deleted

Mathias Lux

unread,
Feb 24, 2010, 2:50:37 PM2/24/10
to lire...@googlegroups.com
Hi!

Well, that's something of course somewhat mysterious. One has to take
a class for Analysis and the corresponding name of the field the
feature is stored in. You also have to make sure that the feature has
been extracted in the initial indexing process.

I'd recommend using CEDD like this:

new MetricSpacesInvertedListIndexing(CEDD.class,
DocumentBuilder.FIELD_NAME_CEDD)

After indexing with the DocumentBuilder obtained by:
DocumentBuilder.getCEDDDocumentBuilder()

Another option would be to use a ChainedDocumentbuilder to extract a
whole lot of features like this:

builder = new ChainedDocumentBuilder();
builder.addBuilder(DocumentBuilderFactory.getCEDDDocumentBuilder());
builder.addBuilder(DocumentBuilderFactory.getFCTHDocumentBuilder());
builder.addBuilder(new SimpleDocumentBuilder(false, false, true));
builder.addBuilder(DocumentBuilderFactory.getColorHistogramDocumentBuilder());
builder.addBuilder(DocumentBuilderFactory.getDefaultAutoColorCorrelationDocumentBuilder());

to make sure you have the choice.

All available fields are given in the
net.semanticmetadata.lire.DocumentBuilder, corresponding classes are
in net.semanticmetadata.lire.imageanalysis.* extending LireFeature
(like CEDD.class, FCTH.class, ColorLayout.class,
SimpleColorHistogram.class, AutoColorCorrelogram.class, etc.)

cheers,
Mathias


On Wed, Feb 24, 2010 at 6:57 PM, seba...@kielmann.biz
<seba...@kielmann.biz> wrote:
> hi,
>
>
>
> small question:
>
>
>
> what are the possible values for featureClass and featureFieldName in
> MetricSpacesInvertedListIndexing?
>
> while running the indexer a lot of messages about no feature stored in this
> document are appearing.
>
>
>
> any help would be appreciated.
>
>
>
> bastian

Message has been deleted

Mathias Lux

unread,
Mar 12, 2010, 8:27:46 AM3/12/10
to lire...@googlegroups.com
Hi!

Yesterday I released a new version, which has an updated implementation of the metric lspaces approach. I recommend to use
values like these.

MetricSpacesInvertedListIndexing.numReferenceObjectsUsed = 50;
MetricSpacesInvertedListIndexing.numReferenceObjects = 1000; 

for a million images.

the latter one controls how many reference objects are used, so to say the "vocabulary you use to describe your dat set". The more you have here, the mor accurate your approximation will be. Typically 1000 - 5000 work really kfine for millions of images (according to my chat with giuseppe :)

The first one is more critical. It defines how many of the "words" of your "vocabulary" are used to describe one single image. the choice of this parameter heabvily influences the number of hits and therefore the performance. The number has to be significantly lower than the other one, otherwise every single image will be considered as result and we're back at linear search. However too low numbers lead to "very approximate" results.

Best approakch would be to try some parameters and comparing them to the linear search. try to get the first as low as possible with still high enough precision and you're there.

I'm currently working on another option for the serialization of the descriptors and tried already with CEDD with very promising results. runtime tests indicated that the linear search is at least twice as fast as the original one. I'll check in later today.

cheers,
Mathias

On Thu, Feb 25, 2010 at 6:01 PM, seba...@kielmann.biz <seba...@kielmann.biz> wrote:

Hello,

 

ok. tested it.

the indexer is created with

ChainedDocumentBuilder builder = new ChainedDocumentBuilder();


builder.addBuilder(DocumentBuilderFactory.getCEDDDocumentBuilder());
builder.addBuilder(DocumentBuilderFactory.getFCTHDocumentBuilder());
builder.addBuilder(new SimpleDocumentBuilder(false, false, true));
builder.addBuilder(DocumentBuilderFactory.getColorHistogramDocumentBuilder());
builder.addBuilder(DocumentBuilderFactory.getDefaultAutoColorCorrelationDocumentBuilder());

 

and indexes 100.000 images. thsi takes about 3 hours. but i only have a 64 amd with 1 gb ram.

 

now the search:

i create the image to be searched with

ChainedDocumentBuilder builder = new ChainedDocumentBuilder();
builder.addBuilder(DocumentBuilderFactory.getCEDDDocumentBuilder());

Document doc = builder.createDocument(imageStream, null);

 

with

MetricSpacesInvertedListIndexing.numReferenceObjectsUsed = 100;
MetricSpacesInvertedListIndexing.numReferenceObjects = 500;

 

the search takes about 1923 ms

 

this part is the one that takes the most time:

TopDocs docs = ms.search(doc, indexPath);

which takes about 1500ms

 

with

MetricSpacesInvertedListIndexing.numReferenceObjectsUsed = 1000;
MetricSpacesInvertedListIndexing.numReferenceObjects = 1000;

the search takes 9 seconds.

 

since lire doesnt have paging function, i would need to save a 1000 result set, to acomplish a paging.

 

what exactly do

MetricSpacesInvertedListIndexing.numReferenceObjectsUsed = 1000;
MetricSpacesInvertedListIndexing.numReferenceObjects = 1000;

do?

 

any idea on how to improve the searching? i aim to have results under 1 second. possible?

 

cheers

bastian

 

 

Mathias Lux <mathi...@gmail.com> hat am 24. Februar 2010 um 20:50 geschrieben:
Message has been deleted

Mathias Lux

unread,
Mar 12, 2010, 2:39:32 PM3/12/10
to lire...@googlegroups.com
Hi!

just checked in, so if you want to try check out revision 57 from the
SVN. The new and fast DocumentBuilder and ImageSearcher are

* CEDDImageSearcher
* CEDDDocumentBuilder

They both use now a byte[] array for storing their histograms in the
Lucene Fileds, resulting in a performance gain of 3 times on 129,000
photos (589ms per search with the new one in my tests and 1900ms per
search with the old one)

hope that helps,
Mathias

On Fri, Mar 12, 2010 at 3:51 PM, seba...@kielmann.biz
<seba...@kielmann.biz> wrote:
>
> hey nice,
>
>
>
> i made some changes on my implementation had have a better performance. but will check, if your solution is even better.
>
>
>
> will get back on this soon.
>
>
>
> cheers,
>
> sebastian
>
>
>
> p.s.
>
> have a nice weekend
>
>
>
> Mathias Lux <mathi...@gmail.com> hat am 12. März 2010 um 14:27 geschrieben:

Message has been deleted

laxmikant patil

unread,
May 25, 2012, 2:34:13 AM5/25/12
to lire...@googlegroups.com
hello,
 
The indexing code:
/*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */

/**
 *
 * @author student
 */

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import net.semanticmetadata.lire.DocumentBuilder;
import net.semanticmetadata.lire.DocumentBuilderFactory;
import net.semanticmetadata.lire.utils.LuceneUtils;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class myindex {

    private static String[] testFiles = new String[]{"img01.JPG","img02.JPG","img03.JPG","img04.JPG","img05.JPG","img06.JPG"};
    private static String testFilespath = "/home/student/Desktop/images";
    private static String indexpath = "/home/student/Desktop/indexDemo";
//private static  String testExtensive="D:\\testlire";

    public void testCreateIndex() throws IOException {
//System.out.println(">> Indexing " + testFiles.size() + " files.");

        DocumentBuilder builder = DocumentBuilderFactory.getExtensiveDocumentBuilder();
        try (IndexWriter iw = new IndexWriter(FSDirectory.open(new File(indexpath)), new IndexWriterConfig(Version.LUCENE_34, new SimpleAnalyzer()))) {
            for (String identifier : testFiles) {
                try (FileInputStream fis = new FileInputStream(testFilespath + "/" + identifier)) {
                    Document doc = builder.createDocument(fis, identifier);
                    iw.addDocument(doc);
                }

            }
            iw.optimize();
        }
    }
    
    public void testCreateCorrelogramIndex() throws IOException {
        

        DocumentBuilder builder = DocumentBuilderFactory.getAutoColorCorrelogramDocumentBuilder();
        try (IndexWriter iw = LuceneUtils.createIndexWriter(indexpath + "-small", true)) {
            long ms = System.currentTimeMillis();
            for (String identifier : testFiles) {
                Document doc = builder.createDocument(new FileInputStream(testFilespath + "/" + identifier), identifier);
                iw.addDocument(doc);
            }
            System.out.println("Time taken: " + ((System.currentTimeMillis() - ms) / testFiles.length) + " ms");
            iw.optimize();
        }
    }

    public static void main(String[] args) throws Exception {
        myindex res = new myindex();
        res.testCreateIndex();
        res.testCreateCorrelogramIndex();
        
       
//System.exit(res);
    }
}

---------------------

& the searching code::


import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import javax.imageio.ImageIO;
import net.semanticmetadata.lire.DocumentBuilder;
import net.semanticmetadata.lire.ImageSearchHits;
import net.semanticmetadata.lire.ImageSearcher;
import net.semanticmetadata.lire.ImageSearcherFactory;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.store.FSDirectory;


public class mysearch{
        private static String[] testFiles=new String[]{ "img01.JPG", "img02.JPG"};
private static  String testFilespath="/home/student/Desktop/images";
private static String indexpath="/home/student/Desktop/indexDemo";
    private int numsearches = 25;

   public void testSearch() throws IOException{

   IndexReader reader= IndexReader.open(FSDirectory.open(new File(indexpath))); //opening an indexreader
   ImageSearcher searcher =ImageSearcherFactory.createDefaultSearcher(); //creating imagesearcher
   FileInputStream imagestream  =new FileInputStream(testFilespath + "/" + testFiles[0]);


   BufferedImage bimg = ImageIO.read(imagestream); //search for simlar images
  
 ImageSearchHits hits=null;
  
 hits= searcher.search(bimg,reader);
int i;
for(i=0;i<1;i++)
 {
   System.out.println("IMAGE OUTPUT:" + hits.score(i) + ":" + hits.doc(i).getField(DocumentBuilder.FIELD_NAME_IDENTIFIER).stringValue());
 }


 Document document = hits.doc(0);
 hits= searcher.search(document,reader);

for(i=0;i<1;i++)
{  
  System.out.println("out:" + hits.score(i) + ":" + hits.doc(i).getField(DocumentBuilder.FIELD_NAME_IDENTIFIER).stringValue());
 }
}

public static void main(String[] args) throws Exception 
{
 mysearch res=new mysearch();
res.testSearch();
//System.exit(res); 
}
 }


---------------

the index is created , but for searching output  it says:
run:
May 25, 2012 11:47:18 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
IMAGE OUTPUT:NaN:img01.JPG
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
out:NaN:img01.JPG
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
May 25, 2012 11:47:19 AM net.semanticmetadata.lire.impl.GenericFastImageSearcher getDistance
WARNING: No feature stored in this document! (net.semanticmetadata.lire.imageanalysis.CEDD)
BUILD SUCCESSFUL (total time: 0 seconds)
------------

Please help me regarding this.

laxmikant patil

unread,
May 25, 2012, 2:35:18 AM5/25/12
to lire...@googlegroups.com, seba...@kielmann.biz
Reply all
Reply to author
Forward
0 new messages