Storing IP address as integer within database to remove need for full text search

419 views
Skip to first unread message

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 18, 2011, 10:56:05 AM7/18/11
to django...@googlegroups.com, django-d...@googlegroups.com
Hi,

I have created a ModelField called RealIPAddressField.

It stores the IP address in integer form, meaning the lookups on large tables are much faster:


@django-developers - Do you think there is any possibility of this getting included into the core?

@django-users - Hope this is of some use to someone else, would appreciate any criticisms / improvements.

Cal

Javier Guerra Giraldez

unread,
Jul 18, 2011, 11:13:04 AM7/18/11
to django...@googlegroups.com, django-d...@googlegroups.com
On Mon, Jul 18, 2011 at 9:56 AM, Cal Leeming [Simplicity Media Ltd]
<cal.l...@simplicitymedialtd.co.uk> wrote:
> It stores the IP address in integer form, meaning the lookups on large
> tables are much faster:

are they? hashtables shouldn't be too sensitive to key size, as
long as the string size stays bounded... like on IP addresses (max 15
chars)

of course, i don't know about the specific dict implementation on
Python. Sounds like a job for a microbenchmark!

--
Javier

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 18, 2011, 11:15:50 AM7/18/11
to django...@googlegroups.com, django-d...@googlegroups.com
Sorry, I should have been a little more specific. 

I meant faster lookups in terms of database index, such as MySQL with InnoDB.

Cal


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.


Cal Leeming [Simplicity Media Ltd]

unread,
Jul 18, 2011, 11:17:15 AM7/18/11
to django...@googlegroups.com, django-d...@googlegroups.com
Or maybe you were talking about hash tables in MySQL.

Either way, you're right that this would probably need some benchmarks before being approved for the core.

@django-developers, if I was to provide some benchmarks, would this possibly be considered for the core?

Cal

Javier Guerra Giraldez

unread,
Jul 18, 2011, 11:21:02 AM7/18/11
to django...@googlegroups.com, django-d...@googlegroups.com
On Mon, Jul 18, 2011 at 10:13 AM, Javier Guerra Giraldez
<jav...@guerrag.com> wrote:
> On Mon, Jul 18, 2011 at 9:56 AM, Cal Leeming [Simplicity Media Ltd]
> <cal.l...@simplicitymedialtd.co.uk> wrote:
>> It stores the IP address in integer form, meaning the lookups on large
>> tables are much faster:
>
> are they?    hashtables shouldn't be too sensitive to key size, as
> long as the string size stays bounded... like on IP addresses (max 15
> chars)

OTOH, for really huge database tables, making the index 4 times
smaller can be a significant difference on RAM-starved servers....
but to fill 1G can hold something like of 50million IP addresses in
text form...

--
Javier

Łukasz Rekucki

unread,
Jul 18, 2011, 11:25:49 AM7/18/11
to django-d...@googlegroups.com
On 18 July 2011 16:56, Cal Leeming [Simplicity Media Ltd]

<cal.l...@simplicitymedialtd.co.uk> wrote:
> Hi,
> I have created a ModelField called RealIPAddressField.
> It stores the IP address in integer form, meaning the lookups on large
> tables are much faster:
> http://djangosnippets.org/snippets/2493/
> @django-developers - Do you think there is any possibility of this getting
> included into the core?

From a quick look, it doesn't seem to support IPv6 (which is a real
problem in year 2011 when all IPv4 addresses are allocated). So I
would be -1 on it. There is also the problem of signed int.

PS. Cross-posting like this to django-users and django-developers is
probably not a good idea as people on django-users will send emails
here by accident.

--
Łukasz Rekucki

Dmitry Gladkov

unread,
Jul 18, 2011, 11:28:22 AM7/18/11
to django-d...@googlegroups.com, django...@googlegroups.com
I like the idea, but I don't like the approach, it should be a
subclass of PositiveIntegerField, as it is an unsigned int on DB
level. Also, I agree with Łukasz that it should support both IPv6 and
IPv4.


--
Best wishes,
Dmitry Gladkov, mailto:dmitry....@gmail.com

+380 91 303-37-46

On Mon, Jul 18, 2011 at 6:21 PM, Javier Guerra Giraldez

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 18, 2011, 11:35:27 AM7/18/11
to django-d...@googlegroups.com
Thansk for your re

2011/7/18 Łukasz Rekucki <lrek...@gmail.com>

On 18 July 2011 16:56, Cal Leeming [Simplicity Media Ltd]
> Hi,
> I have created a ModelField called RealIPAddressField.
> It stores the IP address in integer form, meaning the lookups on large
> tables are much faster:
> http://djangosnippets.org/snippets/2493/
> @django-developers - Do you think there is any possibility of this getting
> included into the core?

From a quick look, it doesn't seem to support IPv6 (which is a real
problem in year 2011 when all IPv4 addresses are allocated). So I
would be -1 on it. There is also the problem of signed int.

Yeah, I had a feeling the signed int and IPv6 issue would be a bit of a deal breaker.

I'm not even sure there is a way to store IPv6 as a long within a single field. I had a feeling that future implementations of storing IPv6 in a database, would either be full text search engine powered, or would be split into multiple columns for each 'byte'?
 

PS. Cross-posting like this to django-users and django-developers is
probably not a good idea as people on django-users will send emails
here by accident.

My apologies, I'll refrain from doing so in the future.
 

--
Łukasz Rekucki

Tom Evans

unread,
Jul 18, 2011, 11:36:15 AM7/18/11
to django-d...@googlegroups.com

Cal, this may be too late (considering you are already proposing this
for core!), but there are nicer ways of doing this. I use
django-postgresql-netfields [1], which does require postgres as the
name suggests. This library supports (well, exposes postgres's
excellent inet support) ipv4, ipv6, CIDR masks etc, and allows you to
perform complex inet queries using the elegance of django's ORM:

ip=request.META['REMOTE_ADDR']
IPAuth.objects.filter(addr__net_contains_or_equals=ip)

Cheers

Tom

[1] https://github.com/adamcik/django-postgresql-netfields

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 18, 2011, 11:42:30 AM7/18/11
to django-d...@googlegroups.com
On Mon, Jul 18, 2011 at 4:36 PM, Tom Evans <tevans.uk@googlemail.com> wrote:
On Mon, Jul 18, 2011 at 3:56 PM, Cal Leeming [Simplicity Media Ltd]
> Hi,
> I have created a ModelField called RealIPAddressField.
> It stores the IP address in integer form, meaning the lookups on large
> tables are much faster:
> http://djangosnippets.org/snippets/2493/
> @django-developers - Do you think there is any possibility of this getting
> included into the core?
> @django-users - Hope this is of some use to someone else, would appreciate
> any criticisms / improvements.
> Cal
>

Cal, this may be too late (considering you are already proposing this
for core!), but there are nicer ways of doing this. I use
django-postgresql-netfields [1], which does require postgres as the
name suggests. This library supports (well, exposes postgres's
excellent inet support) ipv4, ipv6, CIDR masks etc, and allows you to
perform complex inet queries using the elegance of django's ORM:

Ohh, now that is extremely nice. I like the fact you can query using CIDR masks too. 

I'm finding more reasons every day to try PostgreSQL, so thank you for suggesting this!

I think the original code I pasted is suitable as a snippet, but in hindsight, it's probably best to be kept out of the core (given the fact it won't support IPv6, and in comparison with how pretty django-postgresql-netfields is).
 
Thank you everyone for your input :)


ip=request.META['REMOTE_ADDR']
IPAuth.objects.filter(addr__net_contains_or_equals=ip)

Cheers

Tom

[1] https://github.com/adamcik/django-postgresql-netfields

--

Daniel Swarbrick

unread,
Jul 19, 2011, 8:02:10 AM7/19/11
to Django developers
The snippet seems to have been removed (returns "page not found"). I
was curious to have a look at how you were handling this. For sure,
Postgres has native support for ipv4 and ipv6 address storage. AFAIK,
MySQL does not... although could store an ipv4 address in a 32-bit
unsigned int field. I don't know of any DB engines that support 128
bit ints however (for ipv6).

Django 1.4 alpha already uses Postgres' 'inet' field type for
IPAddressField and GenericIPAddressField. Other DB backends use
varchar(15) and varchar(39) respectively - which probably leads to
some interest sorting side effects.

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 19, 2011, 8:16:09 AM7/19/11
to django-d...@googlegroups.com


On Tue, Jul 19, 2011 at 1:02 PM, Daniel Swarbrick <daniel.s...@gmail.com> wrote:
The snippet seems to have been removed (returns "page not found").<snip>

Wtf? :X


Looks like they have suffered some sort of data loss.. I'm seeing only snippets from 2 weeks ago, then 1 from an hour ago lol.

I've reposted:


And as luck would have it, it's given me the EXACT same ID... wtf? More than a coincidence? lol.
 
I
was curious to have a look at how you were handling this. For sure,
Postgres has native support for ipv4 and ipv6 address storage. AFAIK,
MySQL does not... although could store an ipv4 address in a 32-bit
unsigned int field. I don't know of any DB engines that support 128
bit ints however (for ipv6).

Yeah, it's storing using an unsigned int, but there's no way the same logic could be applied to ipv6.

If I was to continue using MySQL for ipv6 storage, I'd probably create a table with a column for each byte, convert to an int, and apply a unique index to them all. 
 

Django 1.4 alpha already uses Postgres' 'inet' field type for
IPAddressField and GenericIPAddressField. Other DB backends use
varchar(15) and varchar(39) respectively - which probably leads to
some interest sorting side effects.

--

Daniel Swarbrick

unread,
Jul 19, 2011, 8:47:24 AM7/19/11
to Django developers
On Jul 19, 2:16 pm, "Cal Leeming [Simplicity Media Ltd]"
<cal.leem...@simplicitymedialtd.co.uk> wrote:
> If I was to continue using MySQL for ipv6 storage, I'd probably create a
> table with a column for each byte, convert to an int, and apply a unique
> index to them all.

I think MySQL supports 64 bit ints, so you could split an ipv6 address
into two 64 bit chunks (since IETF discourages allocation of blocks
smaller than /64 anyway).

But yeah... I prefer to use Postgres too ;-)
Reply all
Reply to author
Forward
0 new messages