FILE_READ('http://...') -- any chance to ... conn.setRequestProperty()

175 views
Skip to first unread message

Dani

unread,
Feb 9, 2012, 9:50:30 AM2/9/12
to H2 Database
Hey everybody,

I am fetching data from a website via FILE_READ('http://...') in my
SQL statement.
The site returns a "403" HTTP error because a check is made to
restrict access to "real" browser!

Is there a way to get a handle on the connection H2 establishes for
the FILE_READ operation in order to pass appropriate user-agent
information?

URL urlObject = new URL(urlString);
URLConnection conn = urlObject.openConnection();
conn.setRequestProperty ( "User-Agent", "Mozilla/4.0 (compatible; MSIE
6.0; Windows NT 5.0)" );
st = conn.createStatement() ;
st.executeUpdate("INSERT INTO RAWDATA (HTML) VALUES ((SELECT
FILE_READ('http://...);

Thanks for any advice...


Dani

Christoph Läubrich

unread,
Feb 9, 2012, 9:58:41 AM2/9/12
to h2-da...@googlegroups.com
You can try to set the default java useragent:
http://docs.oracle.com/javase/1.4.2/docs/guide/net/properties.html
(start with java -dhttp.agent="myAgent" ....) alternativly cou can provide a cstom URL sceme.

Dani

unread,
Feb 9, 2012, 10:40:43 AM2/9/12
to H2 Database
Hmm,

my above sample works when I am not using H2 but a BufferedReader in a
loop!

I am a bit helpless here...
I tried System.setProperty ("http.agent", "Mozilla/4.0 (compatible;
MSIE 6.0; Windows NT 5.0)");
but that does not work!

What do you mean:

> (start with java -d*http.agent*="myAgent" ....)

commandline? as a param with conn url org:h2 ?
How would I do that on startup with my program?

> provide a cstom URL sceme

within H2? How?


Thanks again,

Dani


On 9 Feb., 15:58, Christoph Läubrich <lae...@googlemail.com> wrote:
> You can try to set the default java useragent:http://docs.oracle.com/javase/1.4.2/docs/guide/net/properties.html
> (start with java -d*http.agent*="myAgent" ....) alternativly cou can

Thomas Mueller

unread,
Feb 13, 2012, 2:05:50 PM2/13/12
to h2-da...@googlegroups.com
Hi,

> What do you mean:
>> (start with java -d*http.agent*="myAgent" ....)
>
> commandline? as a param with conn url org:h2 ?

No, command line (terminal, console, shell) or starting a process.

> How would I do that on startup with my program?

java -Dhttp.agent="myAgent" -jar yourApplication.jar

Regards,
Thomas

Dani

unread,
Feb 21, 2012, 4:48:35 AM2/21/12
to H2 Database
Hey there,

I tried:

- commandline parameters: java "-Dhttp.agent=Mozilla/4.0 (compatible;
MSIE 6.0; Windows NT 5.0)" -jar my.jar
- System.setProperty ("http.agent", "Mozilla/4.0 (compatible;MSIE 6.0;
Windows NT 5.0)");

the only thing that does work for me is:

- not using H2 - FILE_READ but a buffered reader with:
conn.setRequestProperty ( "User-Agent", "Mozilla/4.0 (compatible;
MSIE6.0; Windows NT 5.0)" );
and save the data locally to disc -> then executing a FILE_READ :-(

this does not work (e.g. H2 Console):

DROP TABLE IF EXISTS RAWDATA;
CREATE TABLE RAWDATA (`HTML` CLOB );
INSERT INTO
RAWDATA
(HTML)
VALUES
(SELECT
FILE_READ('http://www.google.de/search?q=H2+database',
NULL));

I do not know if it is only my system!?
But it would be nice to have a FILE_READ / CSV_READ parameter 'User-
Agent' for urls

Thanks again,
Dani

On 13 Feb., 20:05, Thomas Mueller <thomas.tom.muel...@gmail.com>
wrote:

Noel Grandin

unread,
Feb 21, 2012, 7:10:29 AM2/21/12
to h2-da...@googlegroups.com, Dani

What you want to do is quite far out of the ordinary, so I don't think
we're going to build such a feature into H2.

However, you can implement it yourself and expose it to your SQL code
using CREATE ALIAS. See here:

http://www.h2database.com/html/grammar.html#create_alias

e.g.

CREATE ALIAS MATRIX FOR "org.h2.samples.Function.getMatrix"
SELECT * FROM MATRIX(?)

Christoph Läubrich

unread,
Feb 21, 2012, 11:10:10 AM2/21/12
to h2-da...@googlegroups.com
java -Dhttp.agent="Mozilla/4.0 (compatible;
MSIE 6.0; Windows NT 5.0)" -jar my.jar

(the quotes must be only around the argument)

For the Streamhandler you can try this:
URL.setURLStreamHandlerFactory(new URLStreamHandlerFactory() {

@Override
public URLStreamHandler createURLStreamHandler(String protocol) {
if (protocol.equalsIgnoreCase("httpwithagent")) {
return new URLStreamHandler() {
@Override
protected URLConnection openConnection(URL u) throws IOException {
URL httpurl = new URL("http", u.getHost(), u.getPort(), u.getFile(),
null);
URLConnection connection = httpurl.openConnection();
connection.setRequestProperty("User-Agent",


"Mozilla/4.0 (compatible; MSIE6.0; Windows NT 5.0)");

return connection;
}
};
}
return null;
}
});



CREATE TABLE RAWDATA (`HTML` CLOB );
INSERT INTO
RAWDATA
(HTML)
VALUES
(SELECT

FILE_READ('httpwithagent://www.google.de/search?q=H2+database',
NULL));

Dani

unread,
Feb 21, 2012, 12:09:23 PM2/21/12
to H2 Database
@Christoph

> (the quotes must be only around the argument)

THANK YOU!

That's all it took ... the quotes ... zzzzzzzzzzzzzzzz ...

Thomas Mueller

unread,
Feb 24, 2012, 1:35:17 PM2/24/12
to h2-da...@googlegroups.com
Hi,

I don't think it make sense to add this to FILE_READ and CSV_READ. The
http.agent system property is a solution, and the URLStreamHandler.
The real solution would be:

> The site returns a "403" HTTP error because a check is made to
> restrict access to "real" browser!

That's no reason to return 403.

Regards,
Thomas

Reply all
Reply to author
Forward
0 new messages