Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Message from discussion String handling slow?
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Kevin Williamson  
View profile  
 More options Oct 4 2001, 12:10 pm
Newsgroups: microsoft.public.vb.com
From: kwillonl...@microsoft.com (Kevin Williamson)
Date: Thu, 04 Oct 2001 14:55:15 GMT
Local: Thurs, Oct 4 2001 10:55 am
Subject: RE: String handling slow?
I have done some work in various methods of speeding up using strings in
VB.  Here are some of my findings.  Also, if you post the routine you are
using we may be able to find some things you could easily improve.  I
believe that the last one of these - the string concatenation one - will be
the most useful for your situation.

*************************************************************************** *
*****************************************
There are times when you may need to optimize the InStr function in Visual
Basic. This can be done by modifying the stored length of the string that
Visual Basic
stores with the String variable. This will trick Visual Basic into thinking
that the string is shorter than it
actually is, thus having less characters to search for a match.  Notice
that this is only useful if you know that you only want to search the
first X number of characters for the string, but that the string may not
exist. If the string exists within the first X number of characters, this
is not any
faster as the InStr function will not continue once it finds a match.

Create a new Visual Basic Standard EXE. Add a command button (Command1) to
the Form and insert the following code:  
    Option Explicit

    Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" _
        (pDst As Any, pSrc As Any, ByVal ByteLen As Long)

    Private Const SignBit As Long = &H80000000

    Private Function UnsignedDif(ByVal x As Long, ByVal y As Long) As Long
        'Note: x must be > y to avoid overflow
        If x And SignBit Then
            UnsignedDif = -((y Xor SignBit) - x Xor SignBit)
        Else
            UnsignedDif = (x Xor SignBit) - y Xor SignBit
        End If
    End Function

    Private Sub Command1_Click()
        Dim LenStart As Long
        Dim LenPtr As Long
        Dim s As String
        s = String(1000000, "A")
        s = s & "ABDA" & String(1000000, "A")

        ' Get a pointer to the string
        LenPtr = UnsignedDif(StrPtr(s), 4)
        ' Save the original length so we can restore it
        CopyMemory LenStart, ByVal LenPtr, 4
        ' Copy a new length in.
        ' Notice that 2000000 is 2x the length we want as BSTRs are stored
as Unicode
        CopyMemory ByVal LenPtr, 2000000, 4

        ' Notice InStr will not find ABDA because we cut the length just
short of it.
        Debug.Print "Instr(1, s, ""ABDA"") = " & InStr(1, s, "ABDA")

        ' Restore the original length
        CopyMemory ByVal LenPtr, LenStart, 4
    End Sub
*************************************************************************** *
*****************************************

*************************************************************************** *
*****************************************
You will often find strings that have information seperated by a token
(usually
a tab or other character).  To get that information it will be necessary to
parse the string to find all of the information between the tokens.  This
sample will show a routine that will put the information seperated by the
tokens into an array of strings.  The sample shows the difference in time
between the quick parse routine and a
normal parsing routine.  Both routines use a text comparison so Xx is the
same
as xx.  If you change the normal routine to use vbBinaryCompare instead of
vbTextCompare the Normal routine will be faster, but will produce
inaccurate
results.  Notice that both routines make no assumption on the number of
results returned,
so they do some processing to check the size of the array and resize it if
needed.  This could be further optimized if you knew ahead of time how many
elements would be found.  Also, if you knew there were a lot of elements to
find you could increase the size of the array by more than 50 items each
time.
1. Start a new standard EXE.
2. Add two command buttons (cmdQuickParse and cmdNormalParse) and the
following code:

    Option Explicit

    Private Type SAFEARRAY
       cDims As Integer
       fFeatures As Integer
       cbElements As Long
       cLocks As Long
       pvData As Long
       cElements As Long
       lLbound As Long
    End Type

    Private Declare Function VarPtrArray Lib "msvbvm60.dll" Alias "VarPtr" _
        (Ptr() As Any) As Long

    Private Declare Sub CopyMemory Lib "Kernel32" Alias "RtlMoveMemory" _
        (pDest As Any, pSrc As Any, ByVal ByteLen As Long)

    Private Declare Function SysAllocStringLen Lib "oleaut32.dll" _
        (ByVal pch As Long, ByVal cch As Long) As Long

    Private Declare Function GetTickCount Lib "Kernel32" () As Long
    Dim s As String

    Function ParseStringQuick(s As String, token As String, results() As
String) As Long
        Dim psaSearch() As Integer
        Dim psatoken() As Integer
        Dim saSearch As SAFEARRAY
        Dim satoken As SAFEARRAY
        Dim strlen As Long
        Dim tokenlen As Long
        Dim iChar As Integer
        Dim i As Long, j As Long
        Dim lCurStart As Long
        Dim lCurLen As Long
        Dim nelem As Long
        'UNDONE: results should always be erased (fixed size), or newly
dimensioned, or this will memleak
        nelem = 0
        strlen = Len(s)
        With saSearch
            .cbElements = 2
            .cDims = 1
            .lLbound = 0
            .cElements = strlen
            .pvData = StrPtr(s)
        End With
        CopyMemory ByVal VarPtrArray(psaSearch), VarPtr(saSearch), 4
        tokenlen = Len(token)
        With satoken
            .cbElements = 2
            .cDims = 1
            .lLbound = 0
            .cElements = tokenlen
            .pvData = StrPtr(token)
        End With
        CopyMemory ByVal VarPtrArray(psatoken), VarPtr(satoken), 4
        For i = 0 To strlen - 1
            'DoEvents
            iChar = psaSearch(i)
            For j = 0 To tokenlen - 1
                If psatoken(j) = iChar Then Exit For
            Next j
            If j < tokenlen Then
                If lCurLen Then
                    If nelem > UBound(results) Then       ' This line
checks the size of the array
                        ReDim Preserve results(nelem + 50) As String        
   'This line increases the size of the array
                    End If
                    CopyMemory ByVal VarPtr(results(nelem)),
SysAllocStringLen(VarPtr(psaSearch(lCurStart)), lCurLen), 4
                    lCurLen = 0
                    nelem = nelem + 1
                End If
                lCurStart = i + 1
            Else
                lCurLen = lCurLen + 1
            End If
        Next i
        If lCurLen Then
            CopyMemory ByVal VarPtr(results(nelem)),
SysAllocStringLen(VarPtr(psaSearch(lCurStart)), lCurLen), 4
            nelem = nelem + 1
        End If
        ParseStringQuick = nelem
        CopyMemory ByVal VarPtrArray(psaSearch), 0&, 4
        CopyMemory ByVal VarPtrArray(psatoken), 0&, 4
    End Function

    Private Function ParseStringNormal(s As String, token As String,
results() As String) As Long
        Dim i As Long
        Dim pos As Long
        Dim start As Long
        Dim count As Long
        count = 0
        start = 1
        pos = InStr(1, s, token, vbTextCompare)
        While pos <> 0
            If count > UBound(results) Then              ' This line checks
the size of the array
                ReDim Preserve results(count + 50) As String           '
This line increases the size of the array
            End If
            results(count) = Mid$(s, start, pos - start)
            start = pos + Len(token)
            pos = InStr(start, s, token, vbTextCompare)
            count = count + 1
        Wend
        ParseStringNormal = count
    End Function

    Private Sub cmdNormalParse_Click()
        Dim result() As String
        Dim ret As Long
        Dim starttick As Long

        ReDim result(1) As String
        starttick = GetTickCount
        ret = ParseStringNormal(s, "xx", result)
        Debug.Print "Normal Parse - " & GetTickCount - starttick & _
        vbTab & ret & " items."
    End Sub

    Private Sub cmdQuickParse_Click()
        Dim result() As String
        Dim ret As Long
        Dim starttick As Long

        ReDim result(1) As String
        starttick = GetTickCount
        ret = ParseStringQuick(s, "xx", result)
        Debug.Print "Quick Parse  - " & GetTickCount - starttick & _
        vbTab & ret & " items."
    End Sub

    Private Sub Form_Load()
        cmdQuickParse.Caption = "Quick"
        cmdNormalParse.Caption = "Normal"
        Dim i As Integer
        s = "This is a Xx test to see xx how the xx new string parsing
routine xx works.xx"
        For i = 1 To 8
            s = s & s
        Next i
    End Sub
*************************************************************************** *
*****************************************

*************************************************************************** *
*****************************************
String concatenation in Visual Basic can become very slow when dealing with
a
large numer of concatenations.  This is due to the way Visual Basic (or
more
appropriately oleaut32.dll) handles BSTRs: Every time you do a string
concatenation, you're allocating a new BSTR.  The
cost for a new BSTR gets higher the larger it gets.  Behind the scenes,
oleaut32.dll, which provides BSTR's for VB, actually caches BSTR values.  
BSTR's are cached in 4 different buffer ranges: 16,32,256,64k character
strings.  This means that repeatedly calling x = x & y repeatedly is not so
bad below a certain threshold because the memory for the strings isn't
actually released.  With big strings, however, you're no longer operating
in
the cache, so you end up in algorithms that are incredibly slow, taking
several minutes to do operations which should take less than a second.  The
following class will optimize string concatenations for instances when you
know you will have a large number of concatenations or large strings to
concatenate.  The differences in performance get even more exaggerated as
strings become larger and the number for the for loop increases.

1. Create a new Standard EXE.
2. Add a Class Module and name it SmartConcat.
3. Add the following code to the SmartConcat class:

    Option Explicit

    Private m_Separator As String

    Private m_MediumGrowSize As Long
    Private m_CurrentMediumSize As Long
    Private m_CurrentMediumIndex As Long
    Private m_MediumStrings() As String

    Private m_MaxMediumLength As Long
    Private m_SmallTotalLength As Long

    Private m_MaxSmallEntries As Long
    Private m_SmallIndex As Long
    Private m_SmallStrings() As String

    Public Function GenerateCurrentString() As String
       If m_SmallIndex Then ClearSmallStrings
       If m_CurrentMediumIndex Then
           'Shrink the size of the string array so that join doesn't get
extra stuff
           ReDim Preserve m_MediumStrings(m_CurrentMediumIndex - 1)
           GenerateCurrentString = Join(m_MediumStrings, m_Separator)
           ReDim Preserve m_MediumStrings(m_CurrentMediumSize - 1)
       End If
    End Function

    Public Function ClearStrings()
       Dim l As Long
       If m_SmallIndex Then
           For l = 0 To m_SmallIndex - 1
               m_SmallStrings(l) = vbNullString
           Next l
       End If
       m_SmallIndex = 0
       m_SmallTotalLength = 0
       m_CurrentMediumIndex = 0
       m_CurrentMediumSize = 0
       Erase m_MediumStrings
    End Function

    Public Sub AddString(NewString As String)
       Dim NewLen As Long
       NewLen = Len(NewString)
       If m_SmallTotalLength + NewLen > m_MaxMediumLength Then
           ClearSmallStrings
       End If
       m_SmallTotalLength = m_SmallTotalLength + NewLen
       m_SmallStrings(m_SmallIndex) = NewString
       m_SmallIndex = m_SmallIndex + 1
       If m_SmallIndex = m_MaxSmallEntries Then
           'Clear out now
           ClearSmallStrings
       End If
    End Sub

    Public Property Get TempStringCount() As Long
       TempStringCount = m_MaxSmallEntries
    End Property

    Public Property Let TempStringCount(ByVal RHS As Long)
       If RHS < 1 Then Err.Raise 5
       If m_SmallIndex Then ClearSmallStrings
       m_MaxSmallEntries = RHS
       ReDim Preserve m_SmallStrings(RHS - 1)
    End Property

    Public Property Get FinalCacheGrowSize() As Long
       FinalCacheGrowSize = m_MediumGrowSize
    End Property

    Public Property Let FinalCacheGrowSize(ByVal RHS As Long)
       If RHS < 1 Then Err.Raise 5
       If m_CurrentMediumIndex Then Err.Raise 5, , "Call before AddString
or after ClearStrings"
       m_MediumGrowSize = RHS
    End Property

    Public Property Get MaxTempLength() As Long
       MaxTempLength = m_MaxMediumLength
    End Property

    Public Property Let MaxTempLength(ByVal RHS As Long)
       If RHS < 1 Then Err.Raise 5
       m_MaxMediumLength = RHS
    End Property

    Public Property Get Separator() As String
       Separator = m_Separator
    End Property

    Public Property Let Separator(ByVal RHS As String)
       m_Separator = RHS
    End Property

    Private Function NextMediumIndex() As Long
       If (m_CurrentMediumIndex Mod m_MediumGrowSize) = 0 Then
           m_CurrentMediumSize = m_CurrentMediumSize + m_MediumGrowSize
           ReDim Preserve m_MediumStrings(m_CurrentMediumSize - 1)
       End If
       NextMediumIndex = m_CurrentMediumIndex
       m_CurrentMediumIndex = m_CurrentMediumIndex + 1
    End Function

    'Use the Join function to generate a medium length string
    'and move it to our medium length
    Private Sub ClearSmallStrings()
       Dim iNextMediumIndex As Long
       Debug.Assert m_SmallIndex

       'Temporarily shrink the array to stop Join from adding extra
separators
       'This isn't as bad as it seems because we'll be growing back to the
same
       'size, so it will likely reoccupy the same memory.  Although you
can't Stop
       'the memory from relocatin, you can see if it actually happened by
looking
       'at VarPtr(m_SmallStrings(0)) before and after each ReDim Preserve.
       ReDim Preserve m_SmallStrings(m_SmallIndex - 1)

       iNextMediumIndex = NextMediumIndex 'Note: Native/Fast bug, don't do
NextMediumIndex inline
       m_MediumStrings(iNextMediumIndex) = Join(m_SmallStrings, m_Separator)

       'We could clear all of the current strings here, but
       'it turns out that it is slightly faster to just leave them
       'alone and let them clear out naturally as the buffer is reused
       'Dim l As Long
       'For l = 0 To m_SmallIndex - 1
       '    m_SmallStrings(l) = vbNullString
       'Next l

       'Put the array size back where it should be
       ReDim Preserve m_SmallStrings(m_MaxSmallEntries - 1)

       'Clear the current length and index
       m_SmallIndex = 0
       m_SmallTotalLength = 0
    End Sub

    Private Sub Class_Initialize()
       'Set default and initialize array
       m_MaxMediumLength = 4095
       m_MaxSmallEntries = 128
       m_MediumGrowSize = 64
       ReDim m_SmallStrings(m_MaxSmallEntries - 1)
    End Sub

4. Add the following code to the Form:

    Option Explicit

    Private Declare Function GetTickCount Lib "Kernel32" () As Long

    Private Sub cmdNormalConcat_Click()
       Dim i As Integer
       Dim s As String
       Dim result As String
       Dim starttick As Long

       s = "This is a test to see how the new smart concat routine works."
       starttick = GetTickCount
       For i = 1 To 6000
           result = result & s
       Next i
       Debug.Print "Normal concat - " & GetTickCount - starttick & _
           vbTab & "Len(s) = " & Len(result)
    End Sub

    Private Sub cmdSmartConcat_Click()
       Dim i As Integer
       Dim s As String
       Dim result As String
       Dim starttick As Long
       Dim Concat As SmartConcat
       Set Concat = New SmartConcat

       s = "This is a test to see how the new smart concat routine works."
       starttick = GetTickCount
       For i = 1 To 6000
           Concat.AddString s
       Next i
       result = Concat.GenerateCurrentString
       Debug.Print "Smart concat - " & GetTickCount - starttick & _
           vbTab & "Len(s) = " & Len(result)
    End Sub

    Private Sub Form_Load()
       cmdSmartConcat.Caption = "Smart"
       cmdNormalConcat.Caption = "Normal"
    End Sub
*************************************************************************** *
*****************************************

Kevin Williamson - Microsoft Visual Basic Developer Support
This posting is provided "AS IS" with no warranties, and confers no rights.
You assume all risk for your use. © 2001 Microsoft Corporation. All rights
reserved.

--------------------
| From: "Yagting" <yagt...@yahoo.com>
| Subject: String handling slow?
| Date: Thu, 4 Oct 2001 09:18:03 +0800
| Lines: 12
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 5.50.4133.2400
| X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400
| Message-ID: <#E##QMHTBHA.1220@tkmsftngp07>
| Newsgroups: microsoft.public.vb.com
| NNTP-Posting-Host: 203.172.6.141
| Path: cppssbbsa01.microsoft.com!tkmsftngp01!tkmsftngp07
| Xref: cppssbbsa01.microsoft.com microsoft.public.vb.com:20130
| X-Tomcat-NG: microsoft.public.vb.com
|
| Is there a way to speed up processing of strings in VB? I noticed that
| string manipulations in VB is somewhat slow. By the way, am processing a
| very large text file put it in a string and process it there (sort of
| reformatting the structure of the file). This file is a billing system run
| on Intebes and since it's not a formatted file, i have to make it appear
| like a billing system for archiving.  Is there a better way to get around
| this situation.
|
| If you could provide some input regarding this matter I'd appreciate much.
| Thanks in advance.
|
|
|


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google