storing large amounts of text in the DB

114 views
Skip to first unread message

Mike

unread,
Jan 11, 2013, 1:35:50 AM1/11/13
to django...@googlegroups.com
My users will upload text documents ranging from hundreds to thousands of words.  At the moment I store the text in a TextField.  Is this going to cause a performance problem in the future or would it be better to store the text on the file system and put a file path in the data model?  The text does not need to be indexed and I'm using MySQL.  I suppose the best way is to profile the app and see if the text retrieval is a bottleneck but I thought someone on this list would already have experience in this.

iñigo medina

unread,
Jan 11, 2013, 3:30:39 AM1/11/13
to django...@googlegroups.com


El 11/01/2013 07:36, "Mike" <mike...@gmail.com> escribió:
>
> My users will upload text documents ranging from hundreds to thousands of words.  At the moment I store the text in a TextField.  Is this going to cause a performance problem in the future or would it be better to store the text on the file system and put a file path in the data model?  The text does not need to be indexed and I'm using MySQL.

That depends pretty much on the operations you perform over such field. Fetch? Search? Concurrence updates?

        Iñigo

>
> --
> You received this message because you are subscribed to the Google Groups "Django users" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/5CA4p7wHPmoJ.
> To post to this group, send email to django...@googlegroups.com.
> To unsubscribe from this group, send email to django-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Mike

unread,
Jan 11, 2013, 5:22:20 AM1/11/13
to django...@googlegroups.com


On Friday, January 11, 2013 9:30:39 AM UTC+1, iñigo medina wrote:


El 11/01/2013 07:36, "Mike" <mike...@gmail.com> escribió:
>
> My users will upload text documents ranging from hundreds to thousands of words.  At the moment I store the text in a TextField.  Is this going to cause a performance problem in the future or would it be better to store the text on the file system and put a file path in the data model?  The text does not need to be indexed and I'm using MySQL.

That depends pretty much on the operations you perform over such field. Fetch? Search? Concurrence updates?

        Iñigo


I need to fetch but not search or update.  I guess this is pretty much case specific so I don't need to worry about it now, but if I run into performance problems in the future, this may be the first place to look. 

Sreenivas Reddy T

unread,
Jan 11, 2013, 1:47:18 AM1/11/13
to django...@googlegroups.com
On Fri, Jan 11, 2013 at 12:05 PM, Mike <mike...@gmail.com> wrote:
My users will upload text documents ranging from hundreds to thousands of words.
What kind of documents? pdf? word docs? excel?
 
 At the moment I store the text in a TextField.  Is this going to cause a performance problem in the future or would it be better to store the text on the file system and put a file path in the data model?
 
If you are uploading to a folder and storing the path in db, then when somebody moves the folder, then you need to update the all the corresponding paths in the database.When somebody deletes the folder, then everything will be gone.You need to take care of handling duplicate names too.

Having said that.Serving from database is very slower  than serving the documents from a folder.

Just my 2 cents.
 
 The text does not need to be indexed and I'm using MySQL.  I suppose the best way is to profile the app and see if the text retrieval is a bottleneck but I thought someone on this list would already have experience in this.

Tim Chase

unread,
Jan 11, 2013, 10:08:54 AM1/11/13
to django...@googlegroups.com, Mike
On 01/11/13 00:35, Mike wrote:
> My users will upload text documents ranging from hundreds to
> thousands of words. At the moment I store the text in a
> TextField. Is this going to cause a performance problem in the
> future or would it be better to store the text on the file
> system and put a file path in the data model? The text does not
> need to be indexed and I'm using MySQL.

If it doesn't need to be indexed (by which I also assume that you're
not searching by its contents), that's actually a pretty small
quantity of data to stash in a TEXT field. So it should pose no
problem. The only other issue might be if you have cases where you
bring back large quantities of these fields and try to display, in
which case you're pulling N records time M
average-bytes-per-text-record. But usually users don't want to see
that sort of volume of data.

> I suppose the best way is to profile the app and see if the text
> retrieval is a bottleneck but I thought someone on this list
> would already have experience in this.

I'd code under the assumption that it's not an issue, and then
profile if it becomes one.

-tkc


Mike

unread,
Jan 11, 2013, 11:03:07 AM1/11/13
to django...@googlegroups.com


On Friday, January 11, 2013 7:47:18 AM UTC+1, Srinivas Reddy T wrote:



On Fri, Jan 11, 2013 at 12:05 PM, Mike <mike...@gmail.com> wrote:
My users will upload text documents ranging from hundreds to thousands of words.
What kind of documents? pdf? word docs? excel?
 
 At the moment I store the text in a TextField.  Is this going to cause a performance problem in the future or would it be better to store the text on the file system and put a file path in the data model?
 
If you are uploading to a folder and storing the path in db, then when somebody moves the folder, then you need to update the all the corresponding paths in the database.When somebody deletes the folder, then everything will be gone.You need to take care of handling duplicate names too.

Not a problem in this case because users will upload word, PDF and other docs, but I will extract the text content and discard the original file. 

Users will only see one document at a time so I'll follow Tim Chase's advise and assume its not an issue unless it becomes one in the future.

Thanks everyone for the replies.  I have to say I like this mailing list a lot.  It's a lot easier to ask questions here than on stack overflow.

Reply all
Reply to author
Forward
0 new messages