I've implemented these fairly simple algorithms twice now, in both C# and T-SQL:
(End of blog entry) http://snook.ca/archives/php/url-shortener
I can successfully shorten codes and watch them increment through the
character set, so I think that part's right, but no matter I do, I
can't decode/explode shortened values back to original form without
storing them in a DB. Is there a flaw in the decompression algo that
I'm not seeing?
∞ Andy Badera
∞ +1 518-641-1280 Google Voice
∞ This email is: [ ] bloggable [x] ask first [ ] private
∞ Google me: http://www.google.com/search?q=andrew%20badera
--ab
So as opposed to example.com/12345678 you have example.com/a14s or
the like thusly saving 4 chars. Basically you just creating your own
base System and converting between decimal and your custom base. So
the rounding / dropping of the remainder normally doesn't matter since
you are converting to whole integers normally used as the ID fields.
I do it in python (stolen from someplace on the web) like this..
url56 = '23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ'
def to_url56(num):
return dec_to_anybase(num, url56)
def from_url56(value):
return anybase_to_dec(value, url56)
# base 10 to any base using basestring for digits
def dec_to_anybase(num, basestring):
base = len(basestring)
new = ''
current = num
while current >= base:
remainder = current % base
digit = basestring[remainder]
new = '%s%s' % (digit, new)
current = current / base
if basestring[current]: # non-zero indexing
new = '%s%s' % (basestring[current], new)
return new
# any base defined by basestring to base 10
def anybase_to_dec(value, basestring):
base = len(basestring)
n = 0
count = 0
while value:
last = len(value) - 1
digit = value[last]
n += basestring.index(digit) * base**count
value = value[:last]
count += 1
return n
-Patrick
On Jan 6, 11:47 am, Andrew Badera <and...@badera.us> wrote:
> Actually taking another look at the compression side, this is shaving
> off remainders, right? Isn't that kind of ... broken?
>
> --ab
>
>
>
> On Wed, Jan 6, 2010 at 11:33 AM, Andrew Badera <and...@badera.us> wrote:
> > Hi all,
>
> > I've implemented these fairly simple algorithms twice now, in both C# and T-SQL:
>
> > (End of blog entry)http://snook.ca/archives/php/url-shortener
Does the base value need to be even? I'm coming up small percentages
short on decode.
> --
> You received this message because you are subscribed to the Google Groups "TechValley Ruby Brigade" group.
> To post to this group, send email to techva...@googlegroups.com.
> To unsubscribe from this group, send email to techvalleyrb...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/techvalleyrb?hl=en.
>
>
>
>
The IDs are floats? so you could have an id 1.2456 ? I not sure if
what will work do to the use of modulus, in the conversion algorithm.
The length of the base shouldn't matter, neither should weather or not
it is even or odd. since you are basically just using it as a
substitution hash. I like the 56char one that I was using since it
removes some of the letter/numbers that can be confused (0O 1l,
etc..).
-Patrick
> I've never seen this done without some sort of DB. Normally you are
> using this technique to shorten the ID field.
>
> So as opposed to example.com/12345678 you have example.com/a14s or
> the like thusly saving 4 chars. Basically you just creating your own
> base System and converting between decimal and your custom base. So
> the rounding / dropping of the remainder normally doesn't matter since
> you are converting to whole integers normally used as the ID fields.
I just implemented something like this in Ruby yesterday, using Doug
Crockford's enhanced base 32 encoding. I'll post code later today and
maybe put it into a gem.
Note also that in Ruby, Numeric#to_s and String#to_i take an optional
base argument.
Best,
--
Marnen Laibow-Koser
http://www.marnen.org
mar...@marnen.org
>
> floats with all 18 places to the left of 0. all whole numbers.
Then why not make it a bigint?
Yeah I re-read it, and was like oh yeah, forgot about the precision
field :)
Can you share the code that is causing the error? Perhaps something
off in your implementation?
-Patric
CREATE PROCEDURE dbo.ExplodeShortID
@shortenedID varchar(20),
@idExploded numeric(18,0) OUT
AS
BEGIN
SET @idExploded = -1
IF LTRIM(RTRIM(@shortenedID)) = ''
RETURN
DECLARE @ShortChars varchar(100) =
'8A2aBb95CcDdeFfGgHhJjK6kLMm3NnOPpQqRrSsTtUuV7vXxYyZ4z'
DECLARE @charBase int = LEN(@ShortChars)
DECLARE @currentCharacter char
DECLARE @currPos int = 0
DECLARE @exploded numeric (18, 0) = 0
DECLARE @lenShortenedID int = LEN(@shortenedID)
DECLARE @i int = @lenShortenedID
DECLARE @pow int = 0;
DECLARE @someInt int = 0
PRINT '@shortenedID: ' + @shortenedID
WHILE (@i > 0)
BEGIN
PRINT 'i: ' + CONVERT(varchar(20), @i)
SET @someInt = (-1 * ( @i - LEN(@shortenedID) ))
PRINT '@someInt: ' + CONVERT(varchar(20), @someInt)
PRINT 'Getting SUBSTRING(' + CONVERT(varchar(20), @shortenedID) + ',
' + CONVERT(varchar(20), @someInt) + ', 1)'
SET @currentCharacter = SUBSTRING( @shortenedID, @someInt+1, 1 )
PRINT '@currentCharacter: ' + ISNULL(@currentCharacter, 'NULL')
SET @currPos = CHARINDEX( @currentCharacter, @ShortChars )
PRINT '@currPos: ' + CONVERT(varchar(20), @currPos)
SET @pow = POWER(@charBase, @i-1)
PRINT 'pow: ' + CONVERT(varchar(20), @pow)
SET @exploded = @exploded + (@currPos * @pow)
PRINT 'currPos * pow: ' + CONVERT(varchar(20), (@currPos * @pow))
PRINT 'exploded: ' + CONVERT(varchar(20), @exploded)
SET @i = @i - 1
END
SET @idExploded = @exploded
END
On Jan 6, 2010, at 12:41 PM, Andrew Badera <and...@badera.us> wrote:
> because a bigint in sql server is only 8 bytes. numeric(18,0) starts
> there, and can run up to 38 significant places.
Ah, I'm happy to say I've never used MS SQL Server. Now I see.
[BTW, top-posting makes discussions rather hard to follow.]
A habit I've tried to break unsuccessfully.
We simply need smarter email clients.
Hmm what version of SQL Server is this? SQL Server 2k5 doesn't seem
to like the procedure (Lots of complainting about assigning defaults
to local variables).
but lets see if we can break this down and see (MY TSQL is a bit
rusty, so No promises )
>
> CREATE PROCEDURE dbo.ExplodeShortID
> @shortenedID varchar(20),
> @idExploded numeric(18,0) OUT
> AS
>
> BEGIN
> SET @idExploded = -1
>
> IF LTRIM(RTRIM(@shortenedID)) = ''
> RETURN
Return if blank..
>
> DECLARE @ShortChars varchar(100) =
> '8A2aBb95CcDdeFfGgHhJjK6kLMm3NnOPpQqRrSsTtUuV7vXxYyZ4z'
any reason above is varchar(100), why not char(53)? (mostly just
curious, doesn't seem to affect anything...
> DECLARE @charBase int = LEN(@ShortChars)
> DECLARE @currentCharacter char
> DECLARE @currPos int = 0
>
> DECLARE @exploded numeric (18, 0) = 0
>
> DECLARE @lenShortenedID int = LEN(@shortenedID)
> DECLARE @i int = @lenShortenedID
> DECLARE @pow int = 0;
> DECLARE @someInt int = 0
>
> PRINT '@shortenedID: ' + @shortenedID
> WHILE (@i > 0)
> BEGIN
> PRINT 'i: ' + CONVERT(varchar(20), @i)
>
> SET @someInt = (-1 * ( @i - LEN(@shortenedID) ))
> PRINT '@someInt: ' + CONVERT(varchar(20), @someInt)
>
> PRINT 'Getting SUBSTRING(' + CONVERT(varchar(20), @shortenedID) + ',
> ' + CONVERT(varchar(20), @someInt) + ', 1)'
> SET @currentCharacter = SUBSTRING( @shortenedID, @someInt+1, 1 )
> PRINT '@currentCharacter: ' + ISNULL(@currentCharacter, 'NULL')
>
> SET @currPos = CHARINDEX( @currentCharacter, @ShortChars )
IN looking I think your problem is with the char index? When I run
your code I get this
@shortenedID: a23
i: 3
@someInt: 0
Getting SUBSTRING(a23,
0, 1)
@currentCharacter: a
@currPos: 2
pow: 2809
currPos * pow: 5618
exploded: 5618
But in looking 'a' is in position 4 in the string , not position 2.
Position 2 is 'A' Which makes since by default most stuff in sql is
case-insensitive.
Try using this.
SET @currPos = CHARINDEX( @currentCharacter, @ShortChars COLLATE
Latin1_General_CS_AS)
which then gives this
@shortenedID: a23
i: 3
@someInt: 0
Getting SUBSTRING(a23,
0, 1)
@currentCharacter: a
@currPos: 4
which looks way better..
-Patrick
COLLATE did the trick! Thanks a million Doon!
--Andy Badera
On Jan 6, 3:14 pm, Andrew Badera <and...@badera.us> wrote:
WooHoo. Not bad for a non M$ guy right :) (We do have 2 MS SQL
servers that we store our backend in, and I am the author of
mod_sql_tds for proftpd, so I guess I not 100% non M$ :)) .
-Patrick
>
> --Andy Badera
> On Jan 6, 2010, at 12:08 PM, Doon <pmul...@gmail.com> wrote:
>
>> I've never seen this done without some sort of DB. Normally you are
>> using this technique to shorten the ID field.
>>
>> So as opposed to example.com/12345678 you have example.com/a14s or
>> the like thusly saving 4 chars. Basically you just creating your own
>> base System and converting between decimal and your custom base. So
>> the rounding / dropping of the remainder normally doesn't matter since
>> you are converting to whole integers normally used as the ID fields.
>
> I just implemented something like this in Ruby yesterday, using Doug Crockford's enhanced base 32 encoding. I'll post code later today and maybe put it into a gem.
Here's the code. It could probably use some refinement, and as I said, I'd like to make it into a gem, but...
-----begin base32.rb-----
# Utilities to support Crockford's base-32 encoding, described at http://crockford.com/wrmg/base32.html
module Base32
DIGITS = ('0'..'9').to_a + (('a'..'z').to_a - ['i', 'l', 'o', 'u'])
class << self
=begin rdoc
Encodes a number to a base-32 string.
If leading_zero is true, then a leading zero is added to the encoded string if it does not already contain a digit.
This makes it possible to determine if something is a base-32 string by checking for the presence of digits, which may be useful in writing routes.
=end
def encode(n, leading_zero = true)
return nil if !n or n.blank?
string = ''
tmp = n.to_i
begin
string = DIGITS[tmp % 32] + string
tmp = tmp >> 5 # n /= 32
end until tmp.zero?
prefix = (leading_zero and /\d/ !~ string) ? '0' : ''
prefix + string
end
# Decodes a base-32 string to an integer.
def decode(string)
return nil if string.blank?
int = 0
string.downcase.strip.each_char do |char|
int = int << 5 # int *= 32
int += DIGITS.index char
end
int
end
end
end
-------end base32.rb-------
Enjoy!
I was thinking a bit about your motivation for shortening the URLs as
it relates to support. It is my opinion, having done a considerable
amount of support related to seemingly-random mixed-case strings,
particularly those used in passwords, that you might want a solution
like Marnen's that produces strings that consist only of numbers and
lower-case letters. Knowing that the letter your discussing with the
person you are supporting is definitely lower-case is worth the work.
John Fitzpatrick
Vita Rara
On Jan 6, 5:09 pm, Marnen Laibow-Koser <mar...@marnen.org> wrote:
> On Jan 6, 2010, at 12:33 PM, Marnen Laibow-Koser wrote:
>
> > On Jan 6, 2010, at 12:08 PM, Doon <pmuld...@gmail.com> wrote:
>
> >> I've never seen this done without some sort of DB. Normally you are
> >> using this technique to shorten the ID field.
>
> >> So as opposed to example.com/12345678 you have example.com/a14s or
> >> the like thusly saving 4 chars. Basically you just creating your own
> >> base System and converting between decimal and your custom base. So
> >> the rounding / dropping of the remainder normally doesn't matter since
> >> you are converting to whole integers normally used as the ID fields.
>
> > I just implemented something like this in Ruby yesterday, using Doug Crockford's enhanced base 32 encoding. I'll post code later today and maybe put it into a gem.
>
> Here's the code. It could probably use some refinement, and as I said, I'd like to make it into a gem, but...
>
> -----begin base32.rb-----
>
> # Utilities to support Crockford's base-32 encoding, described athttp://crockford.com/wrmg/base32.html
You'll note easily-confused letters have generally been removed. But
that said, you do still have a pretty decent address space with just
numbers and lower case letters, and it certainly wouldn't hurt.
:)
--ab