Information to and problems with LOBs not in DB

42 views
Skip to first unread message

SME

unread,
May 30, 2012, 5:47:44 AM5/30/12
to h2-da...@googlegroups.com
Hi,
 
I'm currently evaluating H2 for using it mainly as a key - value storage, which involves storing LOBs in the range in between 1KB and 10MB. The main features why I am looking at H2 are transactions, transparent storage of external LOBs in the case they are too large and transparent file encryption. All of this works quite fine for me. However, I ran into some 'problems', which someone can hopefully clarify for me.
 
0. My Setup:
H2 Version 1.3.166
System.setProperty("h2.lobInDatabase", "false");
DB setting: SET MAX_LENGTH_INPLACE_LOB=512
connection: jdbc:h2:split:28:fooDB;CIPHER=AES
 
DROP TABLES IF EXISTS documents;
CREATE TABLE documents(id INT PRIMARY KEY, name VARCHAR, data BLOB);
CREATE INDEX docNameIDX ON documents(name);
 
Finally, I insert 2 million documents, which results in an id range from 1 to 2,000,000.
 
 
1. Searches are usually done against name. Hence, I thought about speeding up those searches with a temporary memory table, which fails:
CREATE MEMORY TEMP TABLE IF NOT EXISTS memTempDocs AS SELECT * FROM documents;
Errormessage:
Eingabe/Ausgabe: "java.io.FileNotFoundException: X:\tmpDB\fooDB.lobs.db\1.lobs.db\1.lobs.db\1.lobs.db\16843009.t24.lob.db
Indeed, this file does not exist on the HDD. However, if I restrict the above select statement to AS SELECT * FROM documents WHERE id=x
and insert for x different values like 1 or 10,000 or 1,000,000 or whatever id exists, I get exactly the same error message.
 
If I omit the BLOBs it works:
CREATE MEMORY TEMP TABLE IF NOT EXISTS memTempDocs AS SELECT id,name FROM documents;
 
My question is whether it is possible to create a temporary memory table that still contains the BLOBs.
 
2. For the 2 Million documents, query time highly differs depending on the column queried. Is there an implementation reason for it?
a) about 60ms for SELECT * FROM documents WHERE id=someid;
--> returns at most 1 row
b) about 550ms for SELECT * FROM documents WHERE name=SELECT name FROM documents where id=someid;
--> possibly returns many elements because of similar names (on average 40 rows). I know the SELECT name statement takes time equal to a). However, in the index, the returned rows should be near each other in the search tree because of equality. Right now, I cannot use the column name as primary key as the names are not unique.
 
3. If I query the documents table with a prepared statement, is the binary data directly fetched from HDD or at the time when I access the binary stream from the returned resultset?
Example, which should use streams instead of a simple byte array (but for my tests ok):
String query = "SELECT * FROM documents WHERE id=" + index + ";";
ps = conn.prepareStatement(query);
rs = ps.executeQuery();
if(rs.first()) { //at most one element because of unique id!
  String name = rs.getString("name");
  Blob data = rs.getBlob("data");
  if(retrieveData) {
    byteData = data.getBytes(0,(int)data.length());
    this.fetchedDataSize += byteData.length;
  }
}
 
4. H2 manages the storage of external BLOBs transparently. I wonder whether I can get the explicit information of the BLOB's name. The use case here is twofold:
a) get statistical information about the occupied HDD-Space vs. Memory-Space (e.g. compressed files or encrypted vs. uncompressed files)
b) check whether some hero deleted BLOB-files to save some space on HDD ;-)
--> I guess I can surround the fetch statement with a try/catch block and mark rows with a missing file as invalid. However, I have the impression this is a workaround and a misuse of the java.io.FileNotFoundException.
 
 
Many thanks in advance,
 
Sascha
 
 

SME

unread,
May 30, 2012, 7:45:53 AM5/30/12
to h2-da...@googlegroups.com
I'm working on my problems posted above. So far, I found the following answers:
 
3. binary data is fetched at the point of time I try to access it because my idea for use case 4b) works
 
4b) workaround to check whether a file has been deleted:
if (rs.first()) { //at most one element because of unique id!

  Blob data = rs.getBlob("data");
  try{
    byteData = data.getBytes(0, (int) data.length());
  }catch(Exception e) {
    if(e instanceof JdbcSQLException) {
      if(e.getMessage().contains("[90124-")){//Error Code for FileNotFound
        System.err.println("Chuck Norris killed a file on id: " + index);
        //do some update work
      }
    }
  }
}
However, I rate this a workaround and not a solution. Suggestions to get a more failure proof method are welcome.

Thomas Mueller

unread,
Jun 4, 2012, 4:43:33 AM6/4/12
to h2-da...@googlegroups.com
Hi,

h2.lobInDatabase=false will not be supported in future versions of H2. If you need this feature, could you store the files in the filesystem yourself?

Regards,
Thomas


--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To view this discussion on the web visit https://groups.google.com/d/msg/h2-database/-/drBCttCShRgJ.
To post to this group, send email to h2-da...@googlegroups.com.
To unsubscribe from this group, send email to h2-database...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.
Reply all
Reply to author
Forward
0 new messages