Instantiations Smalltalk - get file from disk then insert into database

162 views
Skip to first unread message

Minh Tu

unread,
Jun 22, 2021, 5:09:22 PM6/22/21
to VAST Community Forum
Hi, I want to write Smalltalk code to read a file of any type (e.g. text, picture, etc.) from the C:\ drive then insert it into a Microsoft SQL database. Have anyone tried that before? I would appreciate any ideas and/or directions of how to do that.

Thanks,
Minh

Louis LaBrunda

unread,
Jun 22, 2021, 7:44:48 PM6/22/21
to VAST Community Forum
Hi Minh,

Below is some code you can run in a workspace to read a file.  It will prompt you for the file you want to read.  You will have to add code where the comment is that will parse the line and save it to your SQL database.  For help with that you will have to tell us more about the database table and its columns.

Lou


| file fileStream prompter |

prompter := CwFileSelectionPrompter new.
file := prompter
accessType: XmOPEN;
searchMask: '*.*';
prompt.

file isNil ifTrue: [^nil.].
fileStream := CfsReadFileStream open: file.
fileStream isCfsError ifTrue: [^fileStream inspect].
[fileStream atEnd] whileFalse: [ | line |
line := fileStream nextLine.
" At this point you have a line from the file.
You will need to parse it to get the values that you want to put into the
SQL columns.
"

].
fileStream close.

Minh Tu

unread,
Jun 23, 2021, 1:33:10 PM6/23/21
to VAST Community Forum
Hi Lou,

Thanks for your response. I had code similar to yours to read the file. However, I don't want to save each line of the file as a record to the database table. I want to save the whole file as a value to a column of the table. I defined the column as varbinary(max). Something like this:

CREATE TABLE ITEM_ATTACHMENT(
OBJECT_ID VARCHAR(13) NOT NULL PRIMARY KEY CLUSTERED,
        ...
ATTACHMENT VARBINARY(MAX) NOT NULL);

After I read the file, I save the contents of the stream to a data model which is used to populate the values of the columns for 1 table row. When I try to insert the row, I get an error. I think I need to convert the content of the stream to a binary format but I'm not sure how to do it.

Thanks,
Minh

Richard Sargent

unread,
Jun 23, 2021, 1:41:17 PM6/23/21
to VAST Community Forum
" When I try to insert the row, I get an error. "

As always, it would be best to include the complete details of the error you encountered and, when possible, the relevant code that triggered the error.

Louis LaBrunda

unread,
Jun 23, 2021, 5:19:09 PM6/23/21
to VAST Community Forum
Hi Minh,

The code below should read the entire file into the variable #data as a byte array, that I think you can save as you want.

Lou

| file fileStream prompter data |

prompter := CwFileSelectionPrompter new.
file := prompter
accessType: XmOPEN;
searchMask: '*.*';
prompt.

file isNil ifTrue: [^nil.].
fileStream := CfsReadFileStream open: file.
fileStream isCfsError ifTrue: [^fileStream inspect].
data := fileStream contents asByteArray.
fileStream close.
data inspect.

Minh Tu

unread,
Jun 23, 2021, 6:47:06 PM6/23/21
to VAST Community Forum
Hi Richard and Lou,

I use the AbtTable>>#addRow:ifError method to insert the data. Now I don't get an error anymore (and I forgot to record the previous error before I made a lot of changes to my code). However, no data is inserted into the column, even with the #asByteArray method as Lou recommended.

The row inspection:
AbtOdbcRow of type: 
  AbtOdbcVarCharField(OBJECT_ID(13)) Data: 'A0001RD00000Q'
  AbtOdbcVarCharField(ITEM(13)) Data: 'A0000CZ00000P'
  AbtOdbcVarCharField(FILE_NAME_PATH(255)) Data: 'C:\data\pl2.txt'
  AbtDatabaseVarBinaryField(ATTACHMENT(0)) Data: 

Thanks,
Minh

Richard Sargent

unread,
Jun 23, 2021, 6:56:08 PM6/23/21
to VAST Community Forum
On Wednesday, June 23, 2021 at 3:47:06 PM UTC-7 Minh Tu wrote:
Hi Richard and Lou,

I use the AbtTable>>#addRow:ifError method to insert the data. Now I don't get an error anymore (and I forgot to record the previous error before I made a lot of changes to my code). However, no data is inserted into the column, even with the #asByteArray method as Lou recommended.

 
Minh,  is this inspection done before adding the row or upon reading it back?
If the latter, make sure that the row you (think you) are adding really is correct. (It *probably is, but ... certainty is the only thing that matters.)
In the send of #addRow:ifError:, what does the argument to ifError: do? I am guessing it takes a Block, so you want to make sure the Block will actually let you know about the error. "self halt" is your friend when debugging something that doesn't appear to work like you think it should.

Louis LaBrunda

unread,
Jun 24, 2021, 8:07:36 AM6/24/21
to VAST Community Forum
Hi Minh,

Richard has a good point about the error block and letting you know if there was an error or not.

I normally use the highest level of database classes and methods.  Some of them do things like begin transaction, update/insert and commit transaction.  If you are not using those methods, you will need to do the transaction stuff yourself.

Lou

Marcus Wagner

unread,
Jun 24, 2021, 8:13:30 AM6/24/21
to VAST Community Forum
Hi Minh,
I saw you are using DDL varbinary(max). 
This potentially leads to another caveat.
All fields declared in the DB as xxx(max) require special attention.

Did you provide enough low level buffer space for the ODBC interface?
Inspect AbtDatabaseVarBinaryField  bufferSize to check if you allocated enoiugh space.
You can set this value higher at image load time.
Smalltalk and ODBC require allocation of a full database record in advance (of a transfer).

If you map to AbtDatabaseVarBinaryField in Smalltalk, it uses per default 32767 bytes (potentially far less than you actually need).
Every content larger than the (lazy) initialized buffer size will (silently) truncate the content of that field in the data transfer.

[anyhow, varbinary(max) or varchar(max) is now easier to handle than ever. In ancient times you had to use C(B)LOB and a protocol to be able to transfer such fields using ODBC.]

And beware: if you turn this value to a very high value, your image might become too small, in particular using a 32 bit image.

Kind regards
Marcus

Minh Tu

unread,
Jun 24, 2021, 3:35:56 PM6/24/21
to VAST Community Forum
Hi Richard and Lou,

The AbtOdbcRow inspection is done right before adding the row to the database table. My error block has code to print out the error's information to the Transcript for now so I know what to fix. I'm not getting any error at this point. The problem now is that no data is copied from the parameter object to the ATTACHMENT column.

Hi Marcus,

I also think the (max) in the sql statment has something to do with the problem. When I walked though the abt code in the AbtDatabaseVarBinaryField, I saw that the baseLength attribute of the field has the value 0 and thus no data is copied from the parameter object to the ATTACHMENT column. I want the column to be able to handle the content of a 10-MB file so I defined the length of the column to be (max). However, (max) doesn't seem to work for me. This is where I'm at right now.

Thanks,
Minh

Richard Sargent

unread,
Jun 24, 2021, 4:23:10 PM6/24/21
to VAST Community Forum
On Thursday, June 24, 2021 at 12:35:56 PM UTC-7 Minh Tu wrote:
Hi Richard and Lou,

The AbtOdbcRow inspection is done right before adding the row to the database table. My error block has code to print out the error's information to the Transcript for now so I know what to fix. I'm not getting any error at this point. The problem now is that no data is copied from the parameter object to the ATTACHMENT column.

At this point, I think your best choice is to put a breakpoint on the code that creates and populates the  AbtDatabaseVarBinaryField so that you can step through the code as it executes and discover what it's doing to make it differ from what you expect. Your AbtOdbcRow instance does not look correct, so find out what's going wrong with how it gets created.

Minh Tu

unread,
Jun 24, 2021, 5:40:09 PM6/24/21
to VAST Community Forum
Yes, I agree with you Richard. I'm walking through the code and trying a few things.

Richard Sargent

unread,
Jun 24, 2021, 8:12:31 PM6/24/21
to VAST Community Forum
On Thursday, June 24, 2021 at 2:40:09 PM UTC-7 Minh Tu wrote:
Yes, I agree with you Richard. I'm walking through the code and trying a few things.

Please let us know what you learn, even if only document the final answer and not the steps of your journey.
That will help the next person who tries to do what you are doing.

By the way, an arbitrary maximum attachment size may work for your application, but it won't work in general.
Way back in the dark ages, I worked with computer graphics and a 4K raw file would be 64MB. A company that I help encounters PDFs as large as 300MB+. Unless the database can store arbitrarily large fields without penalty, a flexible implementation would probably have to store an attachment as a series of "chunks" and allow arbitrarily large numbers of chunks. In other words, let your storage medium be the limiting factor, not your design.

Louis LaBrunda

unread,
Jun 25, 2021, 1:52:26 PM6/25/21
to VAST Community Forum
Hi Minh,

Can you tell use a little about what this is for?  What are the reasons for putting the files in an SQL database?  What does being in an SQL database add?

For small amounts of data (a small element size, not the total database size) being in a database reduces the disk space used and makes getting at the data easier than having a lot of small files.  But for large files, I think the directory system probably does a good enough job of accessing the files.  Plus one could grow their own indexing system in Smalltalk, without much effort.

Lou

Marcus Wagner

unread,
Jun 25, 2021, 5:24:10 PM6/25/21
to VAST Community Forum
Hi Minh,
just at the beginning a disclaimer: my knowledge stems from IBM VAST 6.x and earlier,  I do NOT know about the current implementation of all later versions provided by Instantiations, that is VAST 7 up to the current VAST 2021. Nethertheleas, I hope my advice can help you, under the assumption that the DB implementation did not change significantly.

Having 0 for baseLength is bad. The ODBC / field design and implementation requires you to allocate enough space to be able to transfer the longest possible field length in memory in either direction (to or from datebase). 

Thus you should try to set  (for example)

AbtDatabaseVarBinaryField  bufferSize: 10*1024*1024 

for 10MB max. length of ANY field of type "VARBINARY(MAX)" in advance. 

I admit, this is very inflexible, in particular if a record is declared containing several such fields, not to say if you are running a query returning many records of that kind. 

Remember: if set that value too low, any field being longer than that buffer will be truncated in transit without error [if you run an update, insert or select].

The ODBC API demands this low level buffer wiring per field. 
The Smalltalk implementation [remember VAST 6.x and earlier] realizes this with lazy init concerning this bufferSize.

As Richard stated, in such cases a transfer in chunks would be prefarable and solve this. 
Actually ODBC provides this in its API (and the DBs provided this by types CLOB or BLOB). 
As I remember, VAST 6.x and earlier implemented CLOB and BLOB support for DB/2 and there was support for that in the visual part editors of query statement, but possibly nearly undocumented, and unfortunately, this support is product specific, 
Then there where traces for that in IBM VIsual Age Oracle BLOB implementation, but nothing for ODBC.

So there are two ways to deal with this: 

A pragmatic solution: garuantee that no variable field is longer than an predetermined length, whatever this value might be, set this value in advance using AbtDatabaseVarBinaryField  bufferSize:  value (typically in the loaded code of your application) and use the given visual parts with utmost care, in the sense, that you will suffer problems if you are transfering too large records.
Thats means a table with many colums having all varbinary(max) will allocate many instance of VARBINARY; each one with the maximum possible size statet in bufferSIze (your value, in advance). As I warned you earlier, that may blow up you image size.

The other way is to write you own low level implementation of the ODBC protocol to transfer junks. This is very complex. To give you the idea, the BLOB logic was somehow 
a) declare a record with all fields not being too large (e.g. those not being either varbinary(max) or varchar(max)) and a stub for every field being of type varbinary(max) or varchar(max)
b) run a insert, update or select which transfers initially only the fields not being stubbed,
c) for every field of type varbinary(max) or varchar(max) (= for every stub) you have to implement a transfer loop, transfering incrementally of chunks of reasonable size (thus requiring to allocating and wiring only ONE buffer).
Before b) and after c) you to have to provide the standard ODBC connection protocol start and stop.
As far as I remember, concerning this block transfer of stubs required alos a techniqe of specialized callbacks.

The design of the many AbtxxxxField classes supports this approach of product specific solutions.
In particular, there is a layer of product independent classes for every field type ... e.g. the AbtDatabaseVarBinaryField  or AbtDatabaseVarCharacterField  to be used in your application to declare queries, tables and host variables, and there is a layer of product specific fields like records for ODBC, DB2 or other products, which are mapped (registered) to be used instead of the generic fields when you open and run your connection using s specific product. THis mapping is static. That means you cannot change the database product at runtime (but I guess this is not a general problem).
In other word, If you only loaded the ODBC support and never the  DB2 or ORACLE database features, you will not have seen this approach.

Changing VARBINARY(MAX) TO VARBINARY(10000) "10000 or any other lower value than max" will not change anything and will not help you.

I tend you should attempt the following: 
a) determine a reasonable value for the maximum transferable length in you scope (e.g. as in Web uploads, deny any attempt to transfer any length longer than that, but it is up to you to detect this, neither ODBC nor the database will do this for you
b) and refrain to declare a tables, queries or host variables with many such instances, however, the trade off is to run many queries insted of one.
For b I can give you an example to explain what I mean with b)
Table A with col1, col2, col3, all three colums varbinary(max) is bad in this sense
Table B with colno, column where colNo is integer and column(max) is better
Instead of running one select of table A returning you one record at once you will have to run a cursor with many selects for table B, BUT returning only ONE record in one select. A multiple result row over TABLE B would have the same problem as the single select of TABLE A.
-> you have to work with LIMIT values of the SELECT part, to steer the maximum number to be returned from the database per call.

Kind regards
Marcus

Minh Tu

unread,
Jun 25, 2021, 8:56:05 PM6/25/21
to VAST Community Forum
Hi Richard,

Yes, if I find out anthing, I'll post an update.

Thanks,
Minh

Minh Tu

unread,
Jun 25, 2021, 9:04:15 PM6/25/21
to VAST Community Forum
Hi Lou,

The file is an attachment to a business item. The ITEM_ATTACHMENT table has a foreign key to the ITEM table. I'm trying to see if putting the file attachment to the database is a feasible solution. I will also have to consider performance. The other option that I'm thinking of is to leave the file on the disk and just save the path of the file to the database.

Minh Tu

unread,
Jun 25, 2021, 9:13:46 PM6/25/21
to VAST Community Forum
Hi Marcus,

"Thus you should try to set  (for example)

AbtDatabaseVarBinaryField  bufferSize: 10*1024*1024 

for 10MB max. length of ANY field of type "VARBINARY(MAX)" in advance."

That's actually what I wanted to try next. Right now, I just got pulled to a different task. When I'm back to this proof of concept, I'll experiment with that.

Richard Sargent

unread,
Jun 28, 2021, 1:19:40 PM6/28/21
to VAST Community Forum
On Friday, June 25, 2021 at 6:04:15 PM UTC-7 Minh Tu wrote:
Hi Lou,

The file is an attachment to a business item. The ITEM_ATTACHMENT table has a foreign key to the ITEM table. I'm trying to see if putting the file attachment to the database is a feasible solution. I will also have to consider performance. The other option that I'm thinking of is to leave the file on the disk and just save the path of the file to the database.

The one disadvantage of storing attachments outside the database is the *possibility* of a logical inconsistency between the two data stores.
e.g. When can you safely delete the attachment file? What happens if it gets deleted or moved or renamed?

I'm not saying it's a bad idea. It has a lot of merit, but you need to plan and execute carefully. The things you don't think about will sooner or later bite you. (That's a general statement, but it's absolutely true.)

mar...@feldtmann.online

unread,
Jun 28, 2021, 5:20:24 PM6/28/21
to VAST Community Forum
The approach (files in database) is a valid pattern and very often discussed - actually putting large files into a relational database may result into heavy performance problems. Some databases have therefore special support and they store files or blobs in a special extent/directory/harddisc space. The reason is, that otherwise useless swapping of discs may be the result. It depends on how many files do you store and how large they are.

I tried this approach with relational databases and noSQL databases (CouchDB - which had special support for that) and most approaches simply fail, when used very heavily (lots of thousands of pictures).

So, I now store the files outside the database in normal directories, but then I have to manage multiple directories (concerning 100000 of pictures), due to performance problems in file managements (management of lots of files) in Windows (around 25000) and sometimes even Linux (around 100000)

For managing the external files with internal SQL (or general database transactions), I add structures (file commands) into the database (simple table with some columns holding information for success or failures) and an external task is quering this table and is executing the commands. The external tasks may update the domain structure to signal, that the files are available.

So, external tasks are then introduced and this means - polling the database for new data/commands (easy, but bad) or you introduce some kind of event system ... here again a message system library like ZeroMQ will be a great help.

The next point is how to retrieve the content again. Retrieve a file from a database may take some time ... in case of web systems a link is pretty nice to have ... the download then can be done by external tools and the file is available at once.

If anyone is try to get a file in my system I copy the file or a symbolic link to this file to the current session folder on the server (accessable via Apache) . So the file is available as long as the session is valid. To make it even more secure you may add database structures to hold the information of the link or the copied file and how long it should be valid and an external task is working on that) - which of course is difficult to predict, if you have a very large file and a bad web connection. Other than that - the request of a file (regardless if it has been downloaded or not) may be logged into the database again for security reason.

This is also the approach I use currently with Gemstone/S in various projects/products.
Reply all
Reply to author
Forward
0 new messages