Conversion of IP address to decimal
Formula = ((VALUE(LEFT(A1, FIND(".", A1)-1)))*256^3)+((VALUE(MID(A1,
FIND(".", A1)+1, FIND(".", A1, FIND(".", A1)+1)-FIND(".",
A1)-1)))*256^2)+((VALUE(MID(A1, FIND(".", A1, FIND(".", A1)+1)+1,
FIND(".", A1, FIND(".", A1, FIND(".", A1)+1)+1)-FIND(".", A1, FIND(".",
A1)+1)-1)))*256)+(VALUE(RIGHT(A1, LEN(A1)-FIND(".", A1, FIND(".", A1,
FIND(".", A1)+1)+1))))
where cell A1 contains the IP address in the form 192.168.1.1 etc.
Unfortunately, I have 240,000 records and Excel can't work with that
sort of volume. Is SPSS the right tool for this job, if so, how can I
do it? I tried to create a data transform but couldn't work out how to
get it working.
Thanks in advance.
Art Kendall
Social Research Consultants
10.1.24.2 = 167843842
204.61.22.109 = 3426555501
I have 240,000 records of which there are two IP addresses per record.
Converting manually isnt an option but is essential for what I am
trying to do. I am trying to demonstrate clusters around IP addresses
(numeric values) that are consistent with Internet worms.
Perhaps I will have to learn Perl?
Convert IP address aaa.bbb.ccc.ddd to IP equivalent
= aaa*256*256*256 + bbb*256*256 + ccc*256 + ddd
So, 192.168.1.2 = 192*256*256*256 + 168*256*256 + 1*256 + 2 = 3232235778
Is this the sort of thing you can do with the data transform function
in SPSS or should it be done prior to importing the records?
On 2007-11-18 01:18:52 +1100, Art Kendall <Arthur....@verizon.net> said:
--- snip ---
Use the SUBSTR function for pulling out aaa, bbb, ccc, and ddd from your
string variable, and NUMBER for converting those substrings to numeric
variables. (You can get help for both of those functions in the GUI for
COMPUTE.) Then plug the various bits into a COMPUTE statement. This is
untested, but it will be something like:
compute aaa = number(substr(ipstring,1,3),f3).
compute bbb = number(substr(ipstring,5,3),f3).
compute ccc = number(substr(ipstring,9,3),f3).
compute ddd = number(substr(ipstring,11,3),f3).
compute newip = aaa*256**3 + bbb*256**2 + ccc*256 + ddd.
exe.
If ccc and ddd are not always single digit numbers, then you'll have to
use the INDEX function first to find the position numbers of the
periods, and use them to work out the values you need to feed into the
SUBSTR function.
--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."
> compute aaa = number(substr(ipstring,1,3),f3).
> compute bbb = number(substr(ipstring,5,3),f3).
> compute ccc = number(substr(ipstring,9,3),f3).
> compute ddd = number(substr(ipstring,11,3),f3).
> compute newip = aaa*256**3 + bbb*256**2 + ccc*256 + ddd.
Bruce,
Thanks for the tip it is a huge step forward. The problem I now have
is that the input data is not always consistently three numbers. IP
addresses can take the form of 123.123.123.123 or equally valid
1.23.123.1. Is there a similar function to substr that looks for the
"." instead of counting?
Ideall, the function should take in 1.1.1.1 as well as 1.2.123.123 and
any other combination.
By the way, 1.1.1.1 can be represented as 001.001.001.001 but a
transform would need to occur.
I hope it solves your problem. It doesn't matter how IP address looks like.
*vbariable IP should contains your IP's addresses.
string tempIP (a20).
compute newip=0.
compute tempIP=IP.
loop i=3 to 1 by -1.
compute dot1=index(tempIP,'.').
compute newip=newip+number(substr(tempIP,1,dot1-1),f3)*256**i.
compute tempIP=substr( tempIP,dot1+1).
end loop.
reg:)
lobin
Close, but this calculation leaves out the last component of the ip
address, because there is no "." after the last component of the
address. You need to add
compute newip = newip + number(tempIP, f3).
after the end loop statement.
HTH,
Jon Peck
As a bonus, here is Python programmability code to do this
calculation. This will work in SPSS 15 or later. The three curs
lines set up to read the ip variable and define a new variable,
ipdecimal with an F12.0 format.
The for case line starts iterating over the cases. All the
calculation is done in the reduce line, which gets the string variable
from SPSS, splits it into four parts on ".", converts to integers, and
accumulates the values multiplied by the appropriate power, and stores
the new value in the case.
begin program.
import spss, spssdata
curs = spssdata.Spssdata(indexes='ip', accessType='w')
curs.append(spssdata.vdef("ipdecimal", vfmt=("F", 12,0)))
curs.commitdict()
for case in curs:
curs.casevalues([reduce(lambda x,y: x*256 + int(y),
case[0].split("."), 0)])
curs.CClose()
end program.
Ups I forgot it.
> after the end loop statement.
>
> HTH,
> Jon Peck
Thanks Jon
Month,Day,Time,SourceIP,SourcePort,TargetIP
Sep,20,02:01:12,192.168.1.2,1052,172.16.26.1
This data is saved in a file called 1.
# search for instances of .X and new line
perl -pe 's/(\.)([1234567890])(\n)/$1ZEROZERO$2$3/g' 1 > 2
# search
for instances of ,X.
perl -pe 's/(\,)([1234567890])(\.)/$1ZEROZERO$2$3/g' 2 > 1
# search for instances of .X.
perl -pe 's/(\.)([1234567890])(\.)/$1ZEROZERO$2$3/g' 1 > 2
# search for instances of .XX and new line
perl -pe 's/(\.)([123456789])([1234567890])(\n)/$1ZERO$2$3$4/g' 2 > 1
# search for instances of .XX.
perl -pe 's/(\.)([123456789])([1234567890])(\.)/$1ZERO$2$3$4/g' 1 > 2
# search for instances of ,XX.
perl -pe 's/(\,)([123456789])([1234567890])(\.)/$1ZERO$2$3$4/g' 2 > 1
# search for instances of .XX,
perl -pe 's/(\.)([123456789])([1234567890])(\,)/$1ZERO$2$3$4/g' 2 > 1
# replace ZERO text with 0 numeral
perl -pe 's/ZERO/0/g' 2 > 1
Ugly, I know... but it works. The log data is now in the format.
Month,Day,Time,SourceIP,SourcePort,TargetIP
Sep,20,02:01:12,192.168.001.002,1052,172.016.026.001