Any benefit in storing shorter strings or numbers to represent data for a choice char field?

30 views
Skip to first unread message

Radomir Wojcik

unread,
Dec 17, 2014, 11:56:08 AM12/17/14
to django...@googlegroups.com
The official django doc uses this as an example (see below). Is there any point to storing 'FR' instead of 'Freshman" so the choice matches what is stored? Is it faster at all? Saves room? What about for querying the data? Then you have to lookup the symbol 'FR' in the tuple if someone wants to query by 'Freshman'. 

What is the best practice? Is it even more efficient if integers were used, like 1 value to represent Freshman and so on? To me it makes sense to store 'Freshman' as 'Freshman' but I'd like to get some insight.

from django.db import models

class Student(models.Model):
    FRESHMAN = 'FR'
    SOPHOMORE = 'SO'
    JUNIOR = 'JR'
    SENIOR = 'SR'
    YEAR_IN_SCHOOL_CHOICES = (
        (FRESHMAN, 'Freshman'),
        (SOPHOMORE, 'Sophomore'),
        (JUNIOR, 'Junior'),
        (SENIOR, 'Senior'),
    )
    year_in_school = models.CharField(max_length=2,
                                      choices=YEAR_IN_SCHOOL_CHOICES,
                                      default=FRESHMAN)

    def is_upperclass(self):
        return self.year_in_school in (self.JUNIOR, self.SENIOR)

Carl Meyer

unread,
Dec 17, 2014, 12:14:24 PM12/17/14
to django...@googlegroups.com
Hi Radomir,

On 12/17/2014 09:56 AM, Radomir Wojcik wrote:
> The official django doc uses this as an example (see below). Is there any
> point to storing 'FR' instead of 'Freshman" so the choice matches what is
> stored? Is it faster at all? Saves room? What about for querying the data?
> Then you have to lookup the symbol 'FR' in the tuple if someone wants to
> query by 'Freshman'.
>
> What is the best practice? Is it even more efficient if integers were used,
> like 1 value to represent Freshman and so on? To me it makes sense to store
> 'Freshman' as 'Freshman' but I'd like to get some insight.

I'm no expert on database performance (and what I do know is entirely
specific to Postgres), but I think that a shorter string will be
somewhat more efficient (both in terms of space and speed) than a longer
string, and an integer more efficient than either. However, I'd consider
worrying about that to be premature optimization, unless you have
evidence of it being a problem.

For me the two primary considerations here are:

1) It is useful to separate the database-stored value from the displayed
representation for humans, because the latter is prone to change (in a
sense this is a normalization, so the real name is only stored in one
place, not in various rows throughout your table).

2) I find some ease-of-development value in using a text slug rather
than an integer for the database-stored value, simply because it makes
database rows more readable when working with the database directly.

Carl
signature.asc

Radomir Wojcik

unread,
Dec 17, 2014, 1:23:45 PM12/17/14
to django...@googlegroups.com
Thanks for the insight, I'm on the same page as you.

My only real concern is that I will have to  use a function, such as this one to get the human readable version of the stored data, then you need to lookup the value from the dict. And if you're doing querying on that object using key words, you have to query based on the human readable names, so that will cost you to look them all up every time and do a like SQL query or contains every time.

def school_year_to_dict():
return dict((x, y) for x, y in YEAR_IN_SCHOOL_CHOICES)

Carl Meyer

unread,
Dec 17, 2014, 1:40:24 PM12/17/14
to django...@googlegroups.com
Django will actually attach a helper method to your model for this; see
https://docs.djangoproject.com/en/1.7/ref/models/instances/#django.db.models.Model.get_FOO_display

If you're doing keyword-based searching and worried about performance,
you'll want to index all the queryable fields into a search engine
(Elasticsearch, Solr) or a full-text-search-indexed field in Postgres
anyway, so the lookup from slug/id to display name only happens at
indexing time, not on every search.

Carl

signature.asc

Radomir Wojcik

unread,
Dec 17, 2014, 1:52:52 PM12/17/14
to django...@googlegroups.com
So to answer my own question, you wouldn't use the short version in the db if you plan to do like/contains queries on them. Correct me if I'm wrong.

Unless I plan to use haystack later and index the human readable form later. I think I understand now, thanks 

Carl Meyer

unread,
Dec 17, 2014, 1:58:29 PM12/17/14
to django...@googlegroups.com


On 12/17/2014 11:52 AM, Radomir Wojcik wrote:
>
> So to answer my own question, you wouldn't use the short version in the db if you plan to do like/contains queries on them. Correct me if I'm wrong.

True. I don't think I've ever seen a case where a field had choices and
I also wanted to do like/contains queries on it. If a field has choices
set, it's likely that a more useful way to let a user search on that
field would be a fixed-choices filter rather than a keyword search.

> Unless I plan to use haystack later and index the human readable form
> later. I think I understand now, thanks

Right, the exception to the above would be when the value of this field
is just one field feeding into a larger keyword search on the whole
object -- and in that case I wouldn't ever do like/contains on the
original field, I'd feed its human readable form into a full text search
engine.

Carl

signature.asc

Alan Hicks

unread,
Dec 17, 2014, 2:12:38 PM12/17/14
to django...@googlegroups.com, Radomir Wojcik
You might want to try using a many to one relationship, it adds a table
lookup but offers flexibility where the data is in the database instead
of the application and for large data sets can be much faster. It's
also good database normalization practice.

class StudentType(Models.Model):
short_name = models.CharField(max_length=2)
name = models.CharField(max_length=30)

class Student(models.Model):
student_type = models.ForeignKey(StudentType)

It's good to learn all the different ways of doing things because each
project is different and can benefit from either choice. For example a
traditional normalized database such as PostgreSQL offers many benefits,
whereas NoSQL used in Big Data is the opposite and also offers many
benefits.

https://docs.djangoproject.com/en/1.7/topics/db/models/#many-to-one-relationships
http://en.wikipedia.org/wiki/Database_normalization
https://en.wikipedia.org/wiki/PostgreSQL
https://en.wikipedia.org/wiki/NoSQL

Alan
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-users...@googlegroups.com
> <mailto:django-users...@googlegroups.com>.
> To post to this group, send email to django...@googlegroups.com
> <mailto:django...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/e0a71f13-1541-437b-a7b9-24c06f08c7e5%40googlegroups.com
> <https://groups.google.com/d/msgid/django-users/e0a71f13-1541-437b-a7b9-24c06f08c7e5%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Persistent Objects Ltd
128 Lilleshall Road
Morden SM4 6DR

The Home of Lasting Solutions

Mobile: 079 3030 5004
Tel: 020 8544 5292
Web: p-o.co.uk
Skype: alan-hicks-london
Personal blog https://plus.google.com/+AlanHicksLondon
Company blog https://plus.google.com/+PoCoUkLondon/posts
LinkedIn https://uk.linkedin.com/in/alanhickslondon/
GitHub https://github.com/alan-hicks

Radomir Wojcik

unread,
Dec 17, 2014, 4:24:36 PM12/17/14
to django...@googlegroups.com
Thanks Carl, I wasn't aware of that function, that works :)  Sometimes normalization can just get you in trouble, slow things down.  My question was mainly regarding the efficiency in terms of storage, and how you query that if you store the short version.  I would not consider dividing the data up into more models at the time.  I would use something like haystack with elasticsearch to index the human readable names if I needed them and that display function is a life saver. Thanks all.
Reply all
Reply to author
Forward
0 new messages