Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

convert IP address into numerical value

2,706 views
Skip to first unread message

Paul Nevin

unread,
Nov 16, 2007, 11:25:06 PM11/16/07
to
I am trying to use SPSS to convert dotted quad IP addresses (e.g.
192.168.1.2) into a decimal equivalent for analysis. I can do this in
Excel using the following formula:

Conversion of IP address to decimal

Formula = ((VALUE(LEFT(A1, FIND(".", A1)-1)))*256^3)+((VALUE(MID(A1,
FIND(".", A1)+1, FIND(".", A1, FIND(".", A1)+1)-FIND(".",
A1)-1)))*256^2)+((VALUE(MID(A1, FIND(".", A1, FIND(".", A1)+1)+1,
FIND(".", A1, FIND(".", A1, FIND(".", A1)+1)+1)-FIND(".", A1, FIND(".",
A1)+1)-1)))*256)+(VALUE(RIGHT(A1, LEN(A1)-FIND(".", A1, FIND(".", A1,
FIND(".", A1)+1)+1))))

where cell A1 contains the IP address in the form 192.168.1.1 etc.

Unfortunately, I have 240,000 records and Excel can't work with that
sort of volume. Is SPSS the right tool for this job, if so, how can I
do it? I tried to create a data transform but couldn't work out how to
get it working.

Thanks in advance.

Art Kendall

unread,
Nov 17, 2007, 9:18:52 AM11/17/07
to
please post a few addresses and what "decimal" equivalent you want them
to be translated to. Are they input as string variables?


Art Kendall
Social Research Consultants

Paul Nevin

unread,
Nov 17, 2007, 7:41:30 PM11/17/07
to
The IP addresses are inputed as string variables in SPSS. They are in
the form xxx.xxx.xxx.xxx, where xxx are valid IP integers. An integer
equivalent of an IP address can be converted using the Excel formual
described below. For example, 192.168.1.2 can be converted to
3232235778. Try for yourself at
http://www.allredroster.com/iptodec.htm.

10.1.24.2 = 167843842
204.61.22.109 = 3426555501

I have 240,000 records of which there are two IP addresses per record.
Converting manually isnt an option but is essential for what I am
trying to do. I am trying to demonstrate clusters around IP addresses
(numeric values) that are consistent with Internet worms.

Perhaps I will have to learn Perl?

Paul Nevin

unread,
Nov 17, 2007, 7:48:22 PM11/17/07
to
An algorithm for doing what I need is:

Convert IP address aaa.bbb.ccc.ddd to IP equivalent

= aaa*256*256*256 + bbb*256*256 + ccc*256 + ddd

So, 192.168.1.2 = 192*256*256*256 + 168*256*256 + 1*256 + 2 = 3232235778

Is this the sort of thing you can do with the data transform function
in SPSS or should it be done prior to importing the records?


On 2007-11-18 01:18:52 +1100, Art Kendall <Arthur....@verizon.net> said:

Bruce Weaver

unread,
Nov 17, 2007, 8:29:13 PM11/17/07
to
Paul Nevin wrote:
> An algorithm for doing what I need is:
>
> Convert IP address aaa.bbb.ccc.ddd to IP equivalent
>
> = aaa*256*256*256 + bbb*256*256 + ccc*256 + ddd
>
> So, 192.168.1.2 = 192*256*256*256 + 168*256*256 + 1*256 + 2 = 3232235778
>
> Is this the sort of thing you can do with the data transform function in
> SPSS or should it be done prior to importing the records?

--- snip ---

Use the SUBSTR function for pulling out aaa, bbb, ccc, and ddd from your
string variable, and NUMBER for converting those substrings to numeric
variables. (You can get help for both of those functions in the GUI for
COMPUTE.) Then plug the various bits into a COMPUTE statement. This is
untested, but it will be something like:


compute aaa = number(substr(ipstring,1,3),f3).
compute bbb = number(substr(ipstring,5,3),f3).
compute ccc = number(substr(ipstring,9,3),f3).
compute ddd = number(substr(ipstring,11,3),f3).
compute newip = aaa*256**3 + bbb*256**2 + ccc*256 + ddd.
exe.

If ccc and ddd are not always single digit numbers, then you'll have to
use the INDEX function first to find the position numbers of the
periods, and use them to work out the values you need to feed into the
SUBSTR function.

--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."

Paul Nevin

unread,
Nov 17, 2007, 10:40:00 PM11/17/07
to
On 2007-11-18 12:29:13 +1100, Bruce Weaver <bwe...@lakeheadu.ca> said:

> compute aaa = number(substr(ipstring,1,3),f3).
> compute bbb = number(substr(ipstring,5,3),f3).
> compute ccc = number(substr(ipstring,9,3),f3).
> compute ddd = number(substr(ipstring,11,3),f3).
> compute newip = aaa*256**3 + bbb*256**2 + ccc*256 + ddd.

Bruce,

Thanks for the tip it is a huge step forward. The problem I now have
is that the input data is not always consistently three numbers. IP
addresses can take the form of 123.123.123.123 or equally valid
1.23.123.1. Is there a similar function to substr that looks for the
"." instead of counting?

Ideall, the function should take in 1.1.1.1 as well as 1.2.123.123 and
any other combination.

By the way, 1.1.1.1 can be represented as 001.001.001.001 but a
transform would need to occur.

Maciej Lobinski

unread,
Nov 18, 2007, 10:01:16 AM11/18/07
to
Paul,


I hope it solves your problem. It doesn't matter how IP address looks like.

*vbariable IP should contains your IP's addresses.

string tempIP (a20).
compute newip=0.
compute tempIP=IP.

loop i=3 to 1 by -1.
compute dot1=index(tempIP,'.').
compute newip=newip+number(substr(tempIP,1,dot1-1),f3)*256**i.
compute tempIP=substr( tempIP,dot1+1).
end loop.


reg:)
lobin

JKPeck

unread,
Nov 18, 2007, 11:10:26 AM11/18/07
to

Close, but this calculation leaves out the last component of the ip
address, because there is no "." after the last component of the
address. You need to add
compute newip = newip + number(tempIP, f3).
after the end loop statement.

HTH,
Jon Peck

JKPeck

unread,
Nov 18, 2007, 1:41:13 PM11/18/07
to

As a bonus, here is Python programmability code to do this
calculation. This will work in SPSS 15 or later. The three curs
lines set up to read the ip variable and define a new variable,
ipdecimal with an F12.0 format.

The for case line starts iterating over the cases. All the
calculation is done in the reduce line, which gets the string variable
from SPSS, splits it into four parts on ".", converts to integers, and
accumulates the values multiplied by the appropriate power, and stores
the new value in the case.

begin program.
import spss, spssdata

curs = spssdata.Spssdata(indexes='ip', accessType='w')
curs.append(spssdata.vdef("ipdecimal", vfmt=("F", 12,0)))
curs.commitdict()

for case in curs:
curs.casevalues([reduce(lambda x,y: x*256 + int(y),
case[0].split("."), 0)])
curs.CClose()
end program.

Maciej Lobinski

unread,
Nov 18, 2007, 2:02:13 PM11/18/07
to

> Close, but this calculation leaves out the last component of the ip
> address, because there is no "." after the last component of the
> address. You need to add
> compute newip = newip + number(tempIP, f3).

Ups I forgot it.

> after the end loop statement.
>
> HTH,
> Jon Peck

Thanks Jon

Paul Nevin

unread,
Nov 18, 2007, 8:07:26 PM11/18/07
to
Thanks for all the help guys. Your solution is much better than the
one I came up with. I ended up using regex to work on the raw log data
to get it into the aaa.bbb.ccc.ddd format.

Month,Day,Time,SourceIP,SourcePort,TargetIP
Sep,20,02:01:12,192.168.1.2,1052,172.16.26.1

This data is saved in a file called 1.

# search for instances of .X and new line
perl -pe 's/(\.)([1234567890])(\n)/$1ZEROZERO$2$3/g' 1 > 2
# search
for instances of ,X.
perl -pe 's/(\,)([1234567890])(\.)/$1ZEROZERO$2$3/g' 2 > 1
# search for instances of .X.
perl -pe 's/(\.)([1234567890])(\.)/$1ZEROZERO$2$3/g' 1 > 2
# search for instances of .XX and new line
perl -pe 's/(\.)([123456789])([1234567890])(\n)/$1ZERO$2$3$4/g' 2 > 1
# search for instances of .XX.
perl -pe 's/(\.)([123456789])([1234567890])(\.)/$1ZERO$2$3$4/g' 1 > 2
# search for instances of ,XX.
perl -pe 's/(\,)([123456789])([1234567890])(\.)/$1ZERO$2$3$4/g' 2 > 1
# search for instances of .XX,
perl -pe 's/(\.)([123456789])([1234567890])(\,)/$1ZERO$2$3$4/g' 2 > 1
# replace ZERO text with 0 numeral
perl -pe 's/ZERO/0/g' 2 > 1

Ugly, I know... but it works. The log data is now in the format.
Month,Day,Time,SourceIP,SourcePort,TargetIP
Sep,20,02:01:12,192.168.001.002,1052,172.016.026.001


0 new messages