Hi,
I'm currently evaluating H2 for using it mainly as a key - value storage, which involves storing LOBs in the range in between 1KB and 10MB. The main features why I am looking at H2 are transactions, transparent storage of external LOBs in the case they are too large and transparent file encryption. All of this works quite fine for me. However, I ran into some 'problems', which someone can hopefully clarify for me.
0. My Setup:
H2 Version 1.3.166
System.setProperty("h2.lobInDatabase", "false");
DB setting: SET MAX_LENGTH_INPLACE_LOB=512
connection: jdbc:h2:split:28:fooDB;CIPHER=AES
DROP TABLES IF EXISTS documents;
CREATE TABLE documents(id INT PRIMARY KEY, name VARCHAR, data BLOB);
CREATE INDEX docNameIDX ON documents(name);
Finally, I insert 2 million documents, which results in an id range from 1 to 2,000,000.
1. Searches are usually done against name. Hence, I thought about speeding up those searches with a temporary memory table, which fails:
CREATE MEMORY TEMP TABLE IF NOT EXISTS memTempDocs AS SELECT * FROM documents;
Errormessage:
Eingabe/Ausgabe:
"java.io.FileNotFoundException:
X:\tmpDB\fooDB.lobs.db\1.lobs.db\1.lobs.db\1.lobs.db\16843009.t24.lob.db
Indeed, this file does not exist on the HDD. However, if I restrict the above select statement to AS SELECT * FROM documents WHERE id=x
and insert for x different values like 1 or 10,000 or 1,000,000 or whatever id exists, I get exactly the same error message.
If I omit the BLOBs it works:
CREATE MEMORY TEMP TABLE IF NOT EXISTS memTempDocs AS SELECT id,name FROM documents;
My question is whether it is possible to create a temporary memory table that still contains the BLOBs.
2. For the 2 Million documents, query time highly differs depending on the column queried. Is there an implementation reason for it?
a) about 60ms for SELECT * FROM documents WHERE id=someid;
--> returns at most 1 row
b) about 550ms for SELECT * FROM documents WHERE name=SELECT name FROM documents where id=someid;
--> possibly returns many elements because of similar names (on average 40 rows). I know the SELECT name statement takes time equal to a). However, in the index, the returned rows should be near each other in the search tree because of equality. Right now, I cannot use the column name as primary key as the names are not unique.
3. If I query the documents table with a prepared statement, is the binary data directly fetched from HDD or at the time when I access the binary stream from the returned resultset?
Example, which should use streams instead of a simple byte array (but for my tests ok):
String query = "SELECT * FROM documents WHERE id=" + index + ";";
ps = conn.prepareStatement(query);
rs = ps.executeQuery();
if(rs.first()) { //at most one element because of unique id!
String name = rs.getString("name");
Blob data = rs.getBlob("data");
if(retrieveData) {
byteData = data.getBytes(0,(int)data.length());
this.fetchedDataSize += byteData.length;
}
}
4. H2 manages the storage of external BLOBs transparently. I wonder whether I can get the explicit information of the BLOB's name. The use case here is twofold:
a) get statistical information about the occupied HDD-Space vs. Memory-Space (e.g. compressed files or encrypted vs. uncompressed files)
b) check whether some hero deleted BLOB-files to save some space on HDD ;-)
--> I guess I can surround the fetch statement with a try/catch block and mark rows with a missing file as invalid. However, I have the impression this is a workaround and a misuse of the java.io.FileNotFoundException.
Many thanks in advance,
Sascha