Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

String handling slow?

141 views
Skip to first unread message

Yagting

unread,
Oct 3, 2001, 9:18:03 PM10/3/01
to
Is there a way to speed up processing of strings in VB? I noticed that
string manipulations in VB is somewhat slow. By the way, am processing a
very large text file put it in a string and process it there (sort of
reformatting the structure of the file). This file is a billing system run
on Intebes and since it's not a formatted file, i have to make it appear
like a billing system for archiving. Is there a better way to get around
this situation.

If you could provide some input regarding this matter I'd appreciate much.
Thanks in advance.


Jure Leskovec

unread,
Oct 4, 2001, 3:35:59 AM10/4/01
to
Hi

You didn't tell how you are doing it. If you are using variable-length
strings than this is the wrong way. Use fixed length strigs instead.

- Jure


Yagting

unread,
Oct 4, 2001, 3:50:29 AM10/4/01
to
Thats what I exactly do it.. Variable length strength.. Am reading a text
file and put it into a string variable. I cannot set the length of the
string because it should depend on the no. of bytes returned by the read.


"Jure Leskovec" <jurele...@yahoo.com> wrote in message
news:OD5o9eKTBHA.1488@tkmsftngp07...

Simon Woods

unread,
Oct 4, 2001, 7:53:55 AM10/4/01
to

http://www.vb2themax.com/HtmlDoc.asp?Table=Articles&ID=30

HTH

S

"Yagting" <yag...@yahoo.com> wrote in message
news:#E##QMHTBHA.1220@tkmsftngp07...

Ivan Demkovitch

unread,
Oct 4, 2001, 9:35:57 AM10/4/01
to
May be it doesn't work for you, but I read txt into disconnected ADO
recordset and then manipulate data using SQL statements. I found this much
faster and easier to implement
"Simon Woods" <simon...@virgin.net> wrote in message
news:u$31asMTBHA.2144@tkmsftngp05...

Anthony C

unread,
Oct 4, 2001, 11:19:14 AM10/4/01
to
Read it in chunks.

"Yagting" <yag...@yahoo.com> wrote in message

news:#SPR4nKTBHA.1216@tkmsftngp07...

Kevin Williamson

unread,
Oct 4, 2001, 10:55:15 AM10/4/01
to
I have done some work in various methods of speeding up using strings in
VB. Here are some of my findings. Also, if you post the routine you are
using we may be able to find some things you could easily improve. I
believe that the last one of these - the string concatenation one - will be
the most useful for your situation.

****************************************************************************
*****************************************
There are times when you may need to optimize the InStr function in Visual
Basic. This can be done by modifying the stored length of the string that
Visual Basic
stores with the String variable. This will trick Visual Basic into thinking
that the string is shorter than it
actually is, thus having less characters to search for a match. Notice
that this is only useful if you know that you only want to search the
first X number of characters for the string, but that the string may not
exist. If the string exists within the first X number of characters, this
is not any
faster as the InStr function will not continue once it finds a match.

Create a new Visual Basic Standard EXE. Add a command button (Command1) to
the Form and insert the following code:
Option Explicit

Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" _
(pDst As Any, pSrc As Any, ByVal ByteLen As Long)

Private Const SignBit As Long = &H80000000

Private Function UnsignedDif(ByVal x As Long, ByVal y As Long) As Long
'Note: x must be > y to avoid overflow
If x And SignBit Then
UnsignedDif = -((y Xor SignBit) - x Xor SignBit)
Else
UnsignedDif = (x Xor SignBit) - y Xor SignBit
End If
End Function

Private Sub Command1_Click()
Dim LenStart As Long
Dim LenPtr As Long
Dim s As String
s = String(1000000, "A")
s = s & "ABDA" & String(1000000, "A")

' Get a pointer to the string
LenPtr = UnsignedDif(StrPtr(s), 4)
' Save the original length so we can restore it
CopyMemory LenStart, ByVal LenPtr, 4
' Copy a new length in.
' Notice that 2000000 is 2x the length we want as BSTRs are stored
as Unicode
CopyMemory ByVal LenPtr, 2000000, 4

' Notice InStr will not find ABDA because we cut the length just
short of it.
Debug.Print "Instr(1, s, ""ABDA"") = " & InStr(1, s, "ABDA")

' Restore the original length
CopyMemory ByVal LenPtr, LenStart, 4
End Sub
****************************************************************************
*****************************************

****************************************************************************
*****************************************
You will often find strings that have information seperated by a token
(usually
a tab or other character). To get that information it will be necessary to
parse the string to find all of the information between the tokens. This
sample will show a routine that will put the information seperated by the
tokens into an array of strings. The sample shows the difference in time
between the quick parse routine and a
normal parsing routine. Both routines use a text comparison so Xx is the
same
as xx. If you change the normal routine to use vbBinaryCompare instead of
vbTextCompare the Normal routine will be faster, but will produce
inaccurate
results. Notice that both routines make no assumption on the number of
results returned,
so they do some processing to check the size of the array and resize it if
needed. This could be further optimized if you knew ahead of time how many
elements would be found. Also, if you knew there were a lot of elements to
find you could increase the size of the array by more than 50 items each
time.
1. Start a new standard EXE.
2. Add two command buttons (cmdQuickParse and cmdNormalParse) and the
following code:

Option Explicit

Private Type SAFEARRAY
cDims As Integer
fFeatures As Integer
cbElements As Long
cLocks As Long
pvData As Long
cElements As Long
lLbound As Long
End Type

Private Declare Function VarPtrArray Lib "msvbvm60.dll" Alias "VarPtr" _
(Ptr() As Any) As Long

Private Declare Sub CopyMemory Lib "Kernel32" Alias "RtlMoveMemory" _
(pDest As Any, pSrc As Any, ByVal ByteLen As Long)

Private Declare Function SysAllocStringLen Lib "oleaut32.dll" _
(ByVal pch As Long, ByVal cch As Long) As Long

Private Declare Function GetTickCount Lib "Kernel32" () As Long
Dim s As String

Function ParseStringQuick(s As String, token As String, results() As
String) As Long
Dim psaSearch() As Integer
Dim psatoken() As Integer
Dim saSearch As SAFEARRAY
Dim satoken As SAFEARRAY
Dim strlen As Long
Dim tokenlen As Long
Dim iChar As Integer
Dim i As Long, j As Long
Dim lCurStart As Long
Dim lCurLen As Long
Dim nelem As Long
'UNDONE: results should always be erased (fixed size), or newly
dimensioned, or this will memleak
nelem = 0
strlen = Len(s)
With saSearch
.cbElements = 2
.cDims = 1
.lLbound = 0
.cElements = strlen
.pvData = StrPtr(s)
End With
CopyMemory ByVal VarPtrArray(psaSearch), VarPtr(saSearch), 4
tokenlen = Len(token)
With satoken
.cbElements = 2
.cDims = 1
.lLbound = 0
.cElements = tokenlen
.pvData = StrPtr(token)
End With
CopyMemory ByVal VarPtrArray(psatoken), VarPtr(satoken), 4
For i = 0 To strlen - 1
'DoEvents
iChar = psaSearch(i)
For j = 0 To tokenlen - 1
If psatoken(j) = iChar Then Exit For
Next j
If j < tokenlen Then
If lCurLen Then
If nelem > UBound(results) Then ' This line
checks the size of the array
ReDim Preserve results(nelem + 50) As String
'This line increases the size of the array
End If
CopyMemory ByVal VarPtr(results(nelem)),
SysAllocStringLen(VarPtr(psaSearch(lCurStart)), lCurLen), 4
lCurLen = 0
nelem = nelem + 1
End If
lCurStart = i + 1
Else
lCurLen = lCurLen + 1
End If
Next i
If lCurLen Then
CopyMemory ByVal VarPtr(results(nelem)),
SysAllocStringLen(VarPtr(psaSearch(lCurStart)), lCurLen), 4
nelem = nelem + 1
End If
ParseStringQuick = nelem
CopyMemory ByVal VarPtrArray(psaSearch), 0&, 4
CopyMemory ByVal VarPtrArray(psatoken), 0&, 4
End Function

Private Function ParseStringNormal(s As String, token As String,
results() As String) As Long
Dim i As Long
Dim pos As Long
Dim start As Long
Dim count As Long
count = 0
start = 1
pos = InStr(1, s, token, vbTextCompare)
While pos <> 0
If count > UBound(results) Then ' This line checks
the size of the array
ReDim Preserve results(count + 50) As String '
This line increases the size of the array
End If
results(count) = Mid$(s, start, pos - start)
start = pos + Len(token)
pos = InStr(start, s, token, vbTextCompare)
count = count + 1
Wend
ParseStringNormal = count
End Function

Private Sub cmdNormalParse_Click()
Dim result() As String
Dim ret As Long
Dim starttick As Long

ReDim result(1) As String
starttick = GetTickCount
ret = ParseStringNormal(s, "xx", result)
Debug.Print "Normal Parse - " & GetTickCount - starttick & _
vbTab & ret & " items."
End Sub

Private Sub cmdQuickParse_Click()
Dim result() As String
Dim ret As Long
Dim starttick As Long

ReDim result(1) As String
starttick = GetTickCount
ret = ParseStringQuick(s, "xx", result)
Debug.Print "Quick Parse - " & GetTickCount - starttick & _
vbTab & ret & " items."
End Sub

Private Sub Form_Load()
cmdQuickParse.Caption = "Quick"
cmdNormalParse.Caption = "Normal"
Dim i As Integer
s = "This is a Xx test to see xx how the xx new string parsing
routine xx works.xx"
For i = 1 To 8
s = s & s
Next i
End Sub
****************************************************************************
*****************************************

****************************************************************************
*****************************************
String concatenation in Visual Basic can become very slow when dealing with
a
large numer of concatenations. This is due to the way Visual Basic (or
more
appropriately oleaut32.dll) handles BSTRs: Every time you do a string
concatenation, you're allocating a new BSTR. The
cost for a new BSTR gets higher the larger it gets. Behind the scenes,
oleaut32.dll, which provides BSTR's for VB, actually caches BSTR values.
BSTR's are cached in 4 different buffer ranges: 16,32,256,64k character
strings. This means that repeatedly calling x = x & y repeatedly is not so
bad below a certain threshold because the memory for the strings isn't
actually released. With big strings, however, you're no longer operating
in
the cache, so you end up in algorithms that are incredibly slow, taking
several minutes to do operations which should take less than a second. The
following class will optimize string concatenations for instances when you
know you will have a large number of concatenations or large strings to
concatenate. The differences in performance get even more exaggerated as
strings become larger and the number for the for loop increases.

1. Create a new Standard EXE.
2. Add a Class Module and name it SmartConcat.
3. Add the following code to the SmartConcat class:

Option Explicit

Private m_Separator As String

Private m_MediumGrowSize As Long
Private m_CurrentMediumSize As Long
Private m_CurrentMediumIndex As Long
Private m_MediumStrings() As String

Private m_MaxMediumLength As Long
Private m_SmallTotalLength As Long

Private m_MaxSmallEntries As Long
Private m_SmallIndex As Long
Private m_SmallStrings() As String

Public Function GenerateCurrentString() As String
If m_SmallIndex Then ClearSmallStrings
If m_CurrentMediumIndex Then
'Shrink the size of the string array so that join doesn't get
extra stuff
ReDim Preserve m_MediumStrings(m_CurrentMediumIndex - 1)
GenerateCurrentString = Join(m_MediumStrings, m_Separator)
ReDim Preserve m_MediumStrings(m_CurrentMediumSize - 1)
End If
End Function

Public Function ClearStrings()
Dim l As Long
If m_SmallIndex Then
For l = 0 To m_SmallIndex - 1
m_SmallStrings(l) = vbNullString
Next l
End If
m_SmallIndex = 0
m_SmallTotalLength = 0
m_CurrentMediumIndex = 0
m_CurrentMediumSize = 0
Erase m_MediumStrings
End Function

Public Sub AddString(NewString As String)
Dim NewLen As Long
NewLen = Len(NewString)
If m_SmallTotalLength + NewLen > m_MaxMediumLength Then
ClearSmallStrings
End If
m_SmallTotalLength = m_SmallTotalLength + NewLen
m_SmallStrings(m_SmallIndex) = NewString
m_SmallIndex = m_SmallIndex + 1
If m_SmallIndex = m_MaxSmallEntries Then
'Clear out now
ClearSmallStrings
End If
End Sub

Public Property Get TempStringCount() As Long
TempStringCount = m_MaxSmallEntries
End Property

Public Property Let TempStringCount(ByVal RHS As Long)
If RHS < 1 Then Err.Raise 5
If m_SmallIndex Then ClearSmallStrings
m_MaxSmallEntries = RHS
ReDim Preserve m_SmallStrings(RHS - 1)
End Property

Public Property Get FinalCacheGrowSize() As Long
FinalCacheGrowSize = m_MediumGrowSize
End Property

Public Property Let FinalCacheGrowSize(ByVal RHS As Long)
If RHS < 1 Then Err.Raise 5
If m_CurrentMediumIndex Then Err.Raise 5, , "Call before AddString
or after ClearStrings"
m_MediumGrowSize = RHS
End Property

Public Property Get MaxTempLength() As Long
MaxTempLength = m_MaxMediumLength
End Property

Public Property Let MaxTempLength(ByVal RHS As Long)
If RHS < 1 Then Err.Raise 5
m_MaxMediumLength = RHS
End Property

Public Property Get Separator() As String
Separator = m_Separator
End Property

Public Property Let Separator(ByVal RHS As String)
m_Separator = RHS
End Property

Private Function NextMediumIndex() As Long
If (m_CurrentMediumIndex Mod m_MediumGrowSize) = 0 Then
m_CurrentMediumSize = m_CurrentMediumSize + m_MediumGrowSize
ReDim Preserve m_MediumStrings(m_CurrentMediumSize - 1)
End If
NextMediumIndex = m_CurrentMediumIndex
m_CurrentMediumIndex = m_CurrentMediumIndex + 1
End Function

'Use the Join function to generate a medium length string
'and move it to our medium length
Private Sub ClearSmallStrings()
Dim iNextMediumIndex As Long
Debug.Assert m_SmallIndex

'Temporarily shrink the array to stop Join from adding extra
separators
'This isn't as bad as it seems because we'll be growing back to the
same
'size, so it will likely reoccupy the same memory. Although you
can't Stop
'the memory from relocatin, you can see if it actually happened by
looking
'at VarPtr(m_SmallStrings(0)) before and after each ReDim Preserve.
ReDim Preserve m_SmallStrings(m_SmallIndex - 1)

iNextMediumIndex = NextMediumIndex 'Note: Native/Fast bug, don't do
NextMediumIndex inline
m_MediumStrings(iNextMediumIndex) = Join(m_SmallStrings, m_Separator)

'We could clear all of the current strings here, but
'it turns out that it is slightly faster to just leave them
'alone and let them clear out naturally as the buffer is reused
'Dim l As Long
'For l = 0 To m_SmallIndex - 1
' m_SmallStrings(l) = vbNullString
'Next l

'Put the array size back where it should be
ReDim Preserve m_SmallStrings(m_MaxSmallEntries - 1)

'Clear the current length and index
m_SmallIndex = 0
m_SmallTotalLength = 0
End Sub

Private Sub Class_Initialize()
'Set default and initialize array
m_MaxMediumLength = 4095
m_MaxSmallEntries = 128
m_MediumGrowSize = 64
ReDim m_SmallStrings(m_MaxSmallEntries - 1)
End Sub




4. Add the following code to the Form:

Option Explicit

Private Declare Function GetTickCount Lib "Kernel32" () As Long

Private Sub cmdNormalConcat_Click()
Dim i As Integer
Dim s As String
Dim result As String
Dim starttick As Long

s = "This is a test to see how the new smart concat routine works."
starttick = GetTickCount
For i = 1 To 6000
result = result & s
Next i
Debug.Print "Normal concat - " & GetTickCount - starttick & _
vbTab & "Len(s) = " & Len(result)
End Sub

Private Sub cmdSmartConcat_Click()
Dim i As Integer
Dim s As String
Dim result As String
Dim starttick As Long
Dim Concat As SmartConcat
Set Concat = New SmartConcat

s = "This is a test to see how the new smart concat routine works."
starttick = GetTickCount
For i = 1 To 6000
Concat.AddString s
Next i
result = Concat.GenerateCurrentString
Debug.Print "Smart concat - " & GetTickCount - starttick & _
vbTab & "Len(s) = " & Len(result)
End Sub

Private Sub Form_Load()
cmdSmartConcat.Caption = "Smart"
cmdNormalConcat.Caption = "Normal"
End Sub
****************************************************************************
*****************************************

Kevin Williamson - Microsoft Visual Basic Developer Support
This posting is provided "AS IS" with no warranties, and confers no rights.
You assume all risk for your use. © 2001 Microsoft Corporation. All rights
reserved.

--------------------
| From: "Yagting" <yag...@yahoo.com>
| Subject: String handling slow?
| Date: Thu, 4 Oct 2001 09:18:03 +0800
| Lines: 12
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 5.50.4133.2400
| X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400
| Message-ID: <#E##QMHTBHA.1220@tkmsftngp07>
| Newsgroups: microsoft.public.vb.com
| NNTP-Posting-Host: 203.172.6.141
| Path: cppssbbsa01.microsoft.com!tkmsftngp01!tkmsftngp07
| Xref: cppssbbsa01.microsoft.com microsoft.public.vb.com:20130
| X-Tomcat-NG: microsoft.public.vb.com

0 new messages