HTH,
Bernie
MS Excel MVP
"jheby" <jh...@discussions.microsoft.com> wrote in message
news:FB83AD4A-0184-4717...@microsoft.com...
> Excel vlookup function returns #VALUE when the lookup value exceeds 256
> characters. I need a hash function to transform large strings into a value
> that does not exceed the 256 character limit. Are there any quick solutions
> to this problem?
I find it difficult to believe that there isn't any logic behind the
population of your keys. Still that is the "key" to hashing.
Can you tell a bit more about the nature of your keys?
--
Kind regards,
Niek Otten
That isn't true. But it is if you want to use it with VLOOKUP.
Maybe you should tell us more about the nature of the lookup problem too.
--
Kind regards,
Niek Otten
"Niek Otten" <nico...@xs4all.nl> wrote in message
news:ejDMvAhQ...@TK2MSFTNGP14.phx.gbl...
Regards,
Jon
AC_IS_ _Fixed BT_Adds_Non_Recourse _Cost_Amt _Recv_Amt
to pick just five. You could envisage, then, a user-defined function
which is applied to the original string and returns a reduced string by
repeated application of SUBSTITUTE( ) to change these character groups
to some other character which is not contained within any of the
strings.
If you apply the UDF to the lookup table as well, then you can used the
modified string as the key field.
Is this the kind of thing you were looking for?
Pete
But probably Pete's suggestion for repeating strings is easier to code.
--
Kind regards,
Niek Otten
"Pete_UK" <pash...@auditel.net> wrote in message
news:1141832867....@i40g2000cwc.googlegroups.com...
--
Kind regards,
Niek Otten
"jheby" <jh...@discussions.microsoft.com> wrote in message
news:9D5A6507-411D-42EC...@microsoft.com...
> Unfortunately, the repeating strings, while present in this example, don't
> occur in most cases. This example is more of an anomoly in that regard.
There are tasks for which Excel is most definitely not the most
appropriate tool, and in this case, Excel may be one of the worst tools
for the task. Most scripting languages (Perl, Pyton, Ruby, gawk)
provide associative arrays that handle hashing arbitrarily long
strings. But if you want a simple spreadsheet approach, get a capable
spreadsheet like the one in OpenOffice. OpenOffice Calc doesn't choke
on string operations with string lengths > 256 characters. This is one
of the ugly little backwaters of Excel functionality that hasn't been
changed in more than a decade.
"Harlan Grove" wrote:
> jheby wrote...
> >I am pulling log entries into Excel from Essbase, a multi-dimensional
> >database, and want to assign categories to each entry using a vlookup. The
> >entries could use pretty much any character. Below is a sample of a large
> >string entry:
> ....
Your example text string is 428 characters long. I put it in A1 and
this formula in B1:
=SUBSTITUTE(A1,"BT_Adds_Non_Recourse",CHAR(161))
The length came down to 333 characters.
Changing the formula to this:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE
(A1,"BT_Adds_Non_Recourse",CHAR(161)),"_Cost_Amt",CHAR(162)),
"_Recv_Amt",CHAR(163)),"AC_IS_",CHAR(164)),"_Fixed",CHAR(165))
brought the length down to 233 characters - so you could use this as a
key in your VLOOKUP formula. The formula merely changes the character
groups identified earlier into a single character.
Hopefully, you can see how this method could be extended by using a UDF
to pick up other character groups.
Pete
Regards,
Jon
You won't be able to extend the above formula very much because of the
limits on nesting. Here's an example UDF to do the same thing:
Function reduce(test As String) As String
reduce = test
If Len(reduce) < 255 Then Exit Function
reduce = Application.WorksheetFunction.Substitute(reduce,
"BT_Adds_Non_Recourse", Chr(161))
reduce = Application.WorksheetFunction.Substitute(reduce, "_Cost_Amt",
Chr(162))
reduce = Application.WorksheetFunction.Substitute(reduce, "_Recv_Amt",
Chr(163))
reduce = Application.WorksheetFunction.Substitute(reduce, "AC_IS_",
Chr(164))
reduce = Application.WorksheetFunction.Substitute(reduce, "_Fixed",
Chr(165))
reduce = Application.WorksheetFunction.Substitute(reduce, "_NB",
Chr(166))
reduce = Application.WorksheetFunction.Substitute(reduce, "_EB",
Chr(167))
reduce = Application.WorksheetFunction.Substitute(reduce,
"Finance_Leases", Chr(168))
reduce = Application.WorksheetFunction.Substitute(reduce,
"Conditional_Sales_Contracts", Chr(169))
End Function
I've added a few extra functions to the end, which reduce the length of
the original string to 184 characters when used as:
=reduce(A1)
The function does not change any original text which is already less
than 255 characters in length. Obviously you can apply this to other
strings by copying down, but you can also extend the UDF to cope with
other patterns - just change the character number at the end of the
line.
Hope this helps.
Pete
And thinking holistically, how does this udf help? While it may reduce
the length of the 1st argument to VLOOKUP, it's not usable directly for
the first column of the 2nd argument to VLOOKUP. It's not even usable
directly in an INDEX/MATCH approach because it can't handle range or
array arguments and return values. At best, the OP would need to use
one additional cell per log file record with each of those cells
calling this UDF with the corresponding record value. Calling *any* udf
a few thousand times even on a superfast system will bring back fond
memories of the responsiveness of 110 baud dial-up mainframe
connections.
MATCH is afflicted with the same 256 character limit on comparisons as
VLOOKUP. However, the = operator isn't so afflicted. Use INDEX/MATCH.
Given A1:A6 containing
=REPT("0123456789ABCDEF",16)&" - "&CHAR(64+ROW())
and B1:B6 containing
=ROW()
Then instead of VLOOKUP(x,A1:B6,2,0) use arrays formulas like
=INDEX(B1:B6,MATCH(TRUE,(A1:A6=x),0))
A kludge, but this will be faster than any udf.
An ingenious solution! I like the Out of the box approach.
However, I'm not sure I agree with what you say about performance; in my
experience array formulas are rather slow.
I tested your solution (somewhat modified because of the limits of CHAR()).
Because it requires some efforts to arrange for more records than there are
rows, I tested with 30,000 (instead of 300,000) of your formulas; that took
(on my machine) 284 seconds.
I built a reducekey function similar to Pete's one; 30,000 reducements took
15 seconds.
If the 5K table is or can be sorted, Pete's solutions (which indeed requires
some more work, like adding the reduced key to the lookup table) might be
quite a lot faster than applying array formulas.
I know UDF's have a bad reputation for performance. Indeed the overhead of
switching between Worksheet and VBA is significant. But often the
performance problems seem to be caused by inefficient programming,
especially wrong data types.
My actuarial function system manages 10,000 calls from a worksheet per
second.
It would be a lot of work, but it would be nice to test Jon's actual case
with the different approaches suggested.
--
Kind regards,
Niek Otten
"Harlan Grove" <hrl...@aol.com> wrote in message
news:1141928694....@j33g2000cwa.googlegroups.com...
It's not the array formulas per se, it's the 30K LONG string
comparisons. If the OP had relatively static data (apparently the
case), a macro essentially performing repeated Edit > Replace
operations on copies of the log file records would be better than UDFs.
One VBA invocation beats 30K separate ones.
The greater difficulty is whether the chaff that Pete_UK has
*hardcoded* into his udf would apply to *all* the OP's records. Usually
it's a bad idea to generalize from one sample record. But there's also
the question whether given the *entire* population of Essbase fields,
categories, dimensions, etc. could prove awkward. For instance, the
OP's single sample record contained
BT_Adds_Non_Recourse_NB_Recv_Amt,
BT_Adds_Non_Recourse_NB_Equip_Cost_Amt,
BT_Adds_Non_Recourse_NB_RE_Cost_Amt,
BT_Adds_Non_Recourse_EB_Recv_Amt,
BT_Adds_Non_Recourse_EB_Equip_Cost_Amt,
Pete_UK's udf would reduce these to
,
Equip,
RE,
,
Equip,
Do you begin to see a problem?
GIGO processing can be very runtime efficient. When you're after
correct results, however, it may require more calculation time and
perhaps a wee bit more time figuring out the full extent of the
problem.
Myself, I don't believe the OP can *RELIABLY* accomplish his/her task
by removing any text other than space characters on either side of
punctuation.
Applying the reduced key formula to the lookup table would only have to
be done once - the values could be fixed and then copied over the
original keys.
Jon said he does 300k weekly, so performance is an issue, and like you,
Niek, I have found that array formulae can be very slow. He wanted a
way to reduce his long strings so that he could use his existing
vlookup formulae, and my approach allows him to do that with not a lot
of effort in setting it up nor in using it - I also tested it on 30k
identical records following Harlan's posting, and it took less than 10
seconds.
Anyway, it's up to him - I think writing a generalised hashing
algorithm as he was contemplating would take considerable time.
Pete
this is what the final part of the string gets converted into:
[Business_Type(¡¦£,¡¦_Equip¢, ¡¦_RE¢,¡§£, ¡§_Equip¢, ]
which represents the following:
[Business_Type(
[Business_Type(
BT_Adds_Non_Recourse_NB_Recv_Amt, ¡¦£,
BT_Adds_Non_Recourse_NB_Equip_Cost_Amt, ¡¦_Equip¢,
BT_Adds_Non_Recourse_NB_RE_Cost_Amt, ¡¦_RE¢,
BT_Adds_Non_Recourse_EB_Recv_Amt, ¡§£,
BT_Adds_Non_Recourse_EB_Equip_Cost_Amt, ] ¡§_Equip¢, ]
I am substituting the phrases with individual characters in the range
161 onwards, not with blanks. The uniqueness is maintained.
Pete
OK, I screwed up. Sorry.
This might shorten the OP's records enough, or it might not. Also, if there
just happen to be more than 128 such common phrases, you'd need to alter
your approach. I still believe a macro approach would be better than a udf
approach, and a macro approach could prompt for selection of a different
range containing a table of common susbtrings rather than hard coding them
into a udf.
Maybe you could wait for Excel 13 or 14, but this particular problem
won't be fixed in Excel 12. David Gainer confirmed in the Excel 12 blog
that VLOOKUP will remain limited to searching up to but no more than
255 chars in Excel 12.
FWIW, Lotus 123 97 Edition (almost 10 years old) handles lookup values
> 255 characters, as does the Windows port of Gnumeric 1.6.0. I'd be surprised if Excel wasn't the only arguably high-end spreadsheet that can't handle such strings. Microsoft has a very bad tendency to leave matters of actual functionality unchanged version to version over decades but spend thousands of developers' hours on interface and other eyewash issues.
Excel 12 is Microsoft's latest attempt to prove their assertion that
they're years ahead of the competition. Then stuff like this comes
along an proves that when low-level backwaters of functionality are at
issue, it's Excel that's a decade or so behind the competition. And a
long as the lemmings dutifully upgrade, there'll be no incentive for
them to fix this.
=INDEX(B1:B6,MATCH(TRUE,(A1:A6=x),0))
Was that your intent? - when I tried that, I got a "formula too long" error.