HTTP ERROR: 404 (NOT_FOUND) when clicked query result link for linked document using web interface.

19 views
Skip to first unread message

jagadeesha kanihal

unread,
Aug 31, 2017, 1:57:52 PM8/31/17
to MG4J
I am trying out big.mg4j on a very small corupus and trying querying in web interface.
Although query results appear on query page, when you click on query result link it gives an 404 error instead of showing the corresponding document.

I tried using just mg4j (not big.mg4j) but the problem exists with that also.
Is there a problem with http server code linked with mg4j??

Here is the code that I am using and the screenshots of the error.
 

package try.mg4j;

import it.unimi.di.big.mg4j.document.FileSetDocumentCollection;
import it.unimi.di.big.mg4j.query.Query;
import it.unimi.di.big.mg4j.tool.IndexBuilder;

import java.io.File;
import java.util.ArrayList;
import java.util.Collection;

import org.apache.commons.io.FileUtils;

public class TryMg4j {
/**
* Indexes a directory of HTML document files.
* From command line, this can be run as
* mvn exec:java -Dexec.inClass="try.mg4j.TryMg4j"
* -Dexec.args="corpus index"
* Then visit http://localhost:4242/Query
*
* @param args [1]=/path/to/corpus/dir [2]=/path/to/index/dir
* @throws Exception
*/
public static void main(String args[]) throws Exception {
final String corpusPath = args[0], indexPath = args[1];
final File corpusDir = new File(corpusPath),
indexDir = new File(indexPath);
assert corpusDir.isDirectory() && indexDir.isDirectory();
Collection<File> docFiles =
FileUtils.listFiles(corpusDir, new String[]{"txt"}, false);
ArrayList<String> mg4jArgs = new ArrayList<String>();
mg4jArgs.add("-f");
mg4jArgs.add("it.unimi.di.big.mg4j.document.tika.TextDocumentFactory");
final File collectionFile = new File(indexDir, "corpus.collection");
mg4jArgs.add(collectionFile.getAbsolutePath());
for (File docFile : docFiles) {
mg4jArgs.add(docFile.getAbsolutePath());
}


FileSetDocumentCollection.main(mg4jArgs.toArray(new String[]{}));
IndexBuilder.main(new String[]{
"-S", collectionFile.getAbsolutePath(),
(new File(indexDir, "cs635")).getAbsolutePath()
});
Query.main(new String[]{
"-h", "-i", "it.unimi.di.big.mg4j.query.FileSystemItem",
"-c", collectionFile.getAbsolutePath(),
(new File(indexDir, "cs635-text")).getAbsolutePath()
});
}
}



jagadeesha kanihal

unread,
Aug 31, 2017, 2:14:50 PM8/31/17
to MG4J
these are maven mg4j versions that I'm using.

        <!--<dependency>-->
            <!--<groupId>it.unimi.di</groupId>-->
            <!--<artifactId>mg4j</artifactId>-->
            <!--<version>5.2</version>-->
        <!--</dependency>-->

        <dependency>
            <groupId>it.unimi.di</groupId>
            <artifactId>mg4j-big</artifactId>
            <version>5.4.3</version>
        </dependency>

Sebastiano Vigna

unread,
Aug 31, 2017, 7:14:55 PM8/31/17
to mg...@googlegroups.com

> On 31 Aug 2017, at 19:57, jagadeesha kanihal <jagadk...@gmail.com> wrote:
>
> I am trying out big.mg4j on a very small corupus and trying querying in web interface.
> Although query results appear on query page, when you click on query result link it gives an 404 error instead of showing the corresponding document.

If the URL is of file file://, some browsers will not allow you to access a file through a link, unless it comes from a file. Which URL gives you 404?

Ciao,

seba

jagadeesha kanihal

unread,
Sep 1, 2017, 2:40:25 AM9/1/17
to MG4J, vi...@di.unimi.it

As you can see, url is not file:// , There something wrong with httpservlet

Using jetty server seems to work.

Can you please verify FileSystemItem in mg4j ? 

jagadeesha kanihal

unread,
Sep 1, 2017, 1:31:37 PM9/1/17
to mg...@googlegroups.com
@vigna
Any updates,  are you able reproduce the issue? If you need log files, please let me know.

--
You received this message because you are subscribed to a topic in the Google Groups "MG4J" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mg4j/ht0TEiovbew/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mg4j+unsubscribe@googlegroups.com.
To post to this group, send email to mg...@googlegroups.com.
Visit this group at https://groups.google.com/group/mg4j.
For more options, visit https://groups.google.com/d/optout.

Sebastiano Vigna

unread,
Sep 1, 2017, 8:21:28 PM9/1/17
to mg...@googlegroups.com

> On 1 Sep 2017, at 19:31, jagadeesha kanihal <jagadk...@gmail.com> wrote:
>
> @vigna
> Any updates, are you able reproduce the issue? If you need log files, please let me know.
>

I'm on vacation, it won't be that quick, but yes, please send me log.

Ciao,

seba

Sebastiano Vigna

unread,
Sep 9, 2017, 2:13:09 PM9/9/17
to mg...@googlegroups.com

> On 31 Aug 2017, at 19:57, jagadeesha kanihal <jagadk...@gmail.com> wrote:
>
> I am trying out big.mg4j on a very small corupus and trying querying in web interface.
> Although query results appear on query page, when you click on query result link it gives an 404 error instead of showing the corresponding document.
>
> I tried using just mg4j (not big.mg4j) but the problem exists with that also.
> Is there a problem with http server code linked with mg4j??
>

OK, I replicated the problem. I think HttpFileServer is not working for some reason (Jetty evolves very quickly). I'll try to understand what's wrong...

Ciao,

seba

jagadeesha kanihal

unread,
Sep 26, 2017, 2:11:15 AM9/26/17
to MG4J
Hi,
Any updates on fix for this issue?
Reply all
Reply to author
Forward
0 new messages