High Memory usage due to uploads_in_blob

46 views
Skip to first unread message

Mandar Vaze

unread,
Apr 2, 2015, 5:03:15 AM4/2/15
to web...@googlegroups.com
Hi,

I have an application that uses uploads_in_blob feature as follows :

    db._adapter.uploads_in_blob = True

This creates an additional column of BYTEA type at the DB Level (I am using postgres, if it matters)

Here is a sample table (This is just to give you the idea, this is not the exact definition):

db.define_table('uploaded_docs',
               
Field('document_name', 'string', default=''),
               
Field('doc', 'upload', label=T('Document'),
                      uploadfield
=True,
                      requires
=IS_NOT_EMPTY(
                          error_message
=T('Select a document to upload'))),
               
Field('doc_property1', 'string'),
               
Field('doc_property2', 'string'),
               
Field('document_date', 'date'),
               
Field('status', 'string', default='Open',
                      requires
=IS_IN_SET(['Draft', 'Approved', 'Under Review'])),
               
Field('remarks', 'string', default='')
               
)


  1. User can upload documents in the "doc" field (which is the upload field)
  2. There is some meta data as described by other fields.
  3. There is no restriction on the size of the document (Customer requirement, can't negotiate)

These documents are shown using SQLFORM.grid widget (automatic pagination, search, all the cool things)

Here is the problem :
Each time a DB query is run (and results returned to web2py), the size of each row returned also includes the size of the uploaded document.
e.g. If each row has a document of say 5MB, then 20 rows that are returned by default pagination, consumes 100MB
(I am not sure when this memory is released/GC'ed) So after going thru say 5 such queries, memory consumed is 500MB

I have deployed the app on webfaction, with default memory block of 512MB

So at this point, the "app" is killed, resulting into "502-Bad gateway" error to the end user.

Customer may not always "download" the file, customer may be just looking at the records' metadata, so access to the BLOB isn't needed till user clicks the download link (denoted by "file" URL)
When NOT using uploads_in_blob, the uploads folder only contains a filename, and the file actually resides on the disk. IMO the filesystem is accessed only when needed.
Is there a way to handle BLOB field in similar fashion ? (Access only when needed)

Are there any suggestions on how to limit the memory usage ?
(The app is already in production, so if I handle this via code changes, this is definitely preferred over data migration)

Thanks,
-Mandar

Paolo Valleri

unread,
Apr 2, 2015, 10:20:29 AM4/2/15
to web...@googlegroups.com
That is a bug because Grid selects the 'hidden' doc_blob field.
Please open an issue on github

Paolo

Paolo Valleri

unread,
Apr 2, 2015, 10:28:49 AM4/2/15
to web...@googlegroups.com
Try to change https://github.com/web2py/web2py/blob/master/gluon/sqlhtml.py#L2151
with 
filter1 = lambda f: isinstance(f, Field) and f.type != 'blob'

Paolo

Mandar Vaze / मंदार वझे

unread,
Apr 2, 2015, 12:35:35 PM4/2/15
to web...@googlegroups.com
Paolo,

On Thu, Apr 2, 2015 at 7:58 PM, Paolo Valleri <paolo....@gmail.com> wrote:
Try to change https://github.com/web2py/web2py/blob/master/gluon/sqlhtml.py#L2151
with 
filter1 = lambda f: isinstance(f, Field) and f.type != 'blob'

Thanks for the workaround.
 
On Thursday, April 2, 2015 at 4:20:29 PM UTC+2, Paolo Valleri wrote:
That is a bug because Grid selects the 'hidden' doc_blob field.
Please open an issue on github

Yes, I will.

-Mandar

 

--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/7K4hAcOiEfg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mandar Vaze

unread,
Apr 2, 2015, 12:42:41 PM4/2/15
to web...@googlegroups.com


On Thursday, April 2, 2015 at 7:50:29 PM UTC+5:30, Paolo Valleri wrote:
That is a bug because Grid selects the 'hidden' doc_blob field.
Please open an issue on github

Reply all
Reply to author
Forward
0 new messages