Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Remove extra Double codes from CSV Files (vb Script Language)

462 views
Skip to first unread message

Learn4Develop

unread,
Nov 16, 2009, 6:20:01 AM11/16/09
to
On daily bases I have received files with double quotes comma separated, but
in some records I found 2 times double quotes and some time different squence
of double quotes placements.

I just want to remove all unnecessary (extra) double quotes (see the
following values where i want to remove extra double quotes.

"0123x",""Company D-Val"","Class D, sector N","DD5894"
"4894D",""Recycle" Rubbish, C class","Class D, Sector F, Block N","D870GH"
"AB8679",""AB Ltd"",""Need" Clean Drive Way, stores","GF0347"


I am looking the output like that.

"0123x","Company D-Val","Class D, sector N","DD5894"
"4894D","Recycle Rubbish, C class","Class D, Sector F, Block N","D870GH"
"AB8679","AB Ltd","Need Clean Drive Way, stores","GF0347"


Following code i used is work fine with ""Company D-Val"" but didn't work
with ""Recycle" Rubbish, C class", The code is

code:-
Do While inFile.AtEndOfStream = False
record = InFile.ReadLine()
record = Replace(record, """", """") 'replace "" with single space
outFile.WriteLine record & vbclrf
Loop

Tom Lavedas

unread,
Nov 16, 2009, 10:28:03 AM11/16/09
to
On Nov 16, 6:20 am, Learn4Develop

<Learn4Deve...@discussions.microsoft.com> wrote:
> On daily bases I have received files with double quotes comma separated, but
> in some records I found 2 times double quotes and some time different squence
> of double quotes placements.
>
> I just want to remove all unnecessary (extra) double quotes (see the
> following values where i want to remove extra double quotes.
>
> "0123x",""Company D-Val"","Class D, sector N","DD5894"
> "4894D",""Recycle" Rubbish, C class","Class D, Sector F, Block N","D870GH"
> "AB8679",""AB Ltd"",""Need" Clean Drive Way, stores","GF0347"
>
> I am looking the output like that.
>
> "0123x","Company D-Val","Class D, sector N","DD5894"
> "4894D","Recycle Rubbish, C class","Class D, Sector F, Block N","D870GH"
> "AB8679","AB Ltd","Need Clean Drive Way, stores","GF0347"
>
{snip}

It was tricky getting the indexing right, but this seemed to do what
you want on the test sample you provided ...

Do Until inFile.AtEndOfStream
record = InFile.ReadLine
nPos = Instr(record, Chr(34) & Chr(34))
do while nPos > 0
record = Left(record, nPos - 1) & Mid(record, nPos + 1)
nPos = Instr(nPos + 1, record, Chr(34))
record = Left(record, nPos - 1) & Mid(record, nPos + 1)
nPos = Instr(nPos + 1, record, Chr(34) & Chr(34))
loop
outFile.WriteLine record
Loop

The approach removes the first of a pair of double quotes and the very
next double quote that follows; thereby removing the starting and
ending quotes that enclose text, when there are too many.
_____________________
Tom Lavedas

Pegasus [MVP]

unread,
Nov 16, 2009, 3:54:18 PM11/16/09
to

"Learn4Develop" <Learn4...@discussions.microsoft.com> wrote in message
news:C072F055-9B82-4D2D...@microsoft.com...

You could do it like this for every line of text:
1. Replace every "," with your own delimiter. I use chr(0).
2. Remove all remaining double quotes.
3. Replace your own delimiters with ",".
4. Surround the line with double quotes.
This code appears to work for your data sample:

Q = """"
Z = Chr(0)
sDelimiter = Q & "," & Q
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFile = oFSO.OpenTextFile("d:\test.txt")
While Not oFile.AtEndOfStream
sLine = Replace(oFile.ReadLine, sDelimiter, Z)
sLine = Replace(sLine, Q, "")
sLine = Q & Replace(sLine, Z, sDelimiter) & Q
WScript.Echo sLine
Wend


Pegasus [MVP]

unread,
Nov 16, 2009, 4:15:54 PM11/16/09
to
Strange - my initial reply appears to have disappeared into the never-never.
I'll try to post it once more:

Al Dunbar

unread,
Nov 16, 2009, 10:48:10 PM11/16/09
to

"Learn4Develop" <Learn4...@discussions.microsoft.com> wrote in message
news:C072F055-9B82-4D2D...@microsoft.com...
> On daily bases I have received files with double quotes comma separated,
> but
> in some records I found 2 times double quotes and some time different
> squence
> of double quotes placements.
>
> I just want to remove all unnecessary (extra) double quotes (see the
> following values where i want to remove extra double quotes.
>
> "0123x",""Company D-Val"","Class D, sector N","DD5894"
> "4894D",""Recycle" Rubbish, C class","Class D, Sector F, Block N","D870GH"
> "AB8679",""AB Ltd"",""Need" Clean Drive Way, stores","GF0347"

The rubbish line looks like exactly that. the first paired double quotes are
not matched by another such pair. In the context of your data, what does it
mean to have a doubled double-quote?

>
> I am looking the output like that.
>
> "0123x","Company D-Val","Class D, sector N","DD5894"
> "4894D","Recycle Rubbish, C class","Class D, Sector F, Block N","D870GH"
> "AB8679","AB Ltd","Need Clean Drive Way, stores","GF0347"
>
>
> Following code i used is work fine with ""Company D-Val"" but didn't work
> with ""Recycle" Rubbish, C class", The code is
>
> code:-
> Do While inFile.AtEndOfStream = False
> record = InFile.ReadLine()
> record = Replace(record, """", """") 'replace "" with single space

That replace call seems to change each double quote with exactly one double
quote.

Perhaps you just need to change each pair of double quotes with a single
double quote.

/Al

Todd Vargo

unread,
Nov 16, 2009, 11:16:34 PM11/16/09
to

record = Replace(record, Chr(34)&Chr(34), Chr(34)) 'replace "" with single "

--
Todd Vargo
(Post questions to group only. Remove "z" to email personal messages)

Learn4Develop

unread,
Nov 17, 2009, 11:46:11 AM11/17/09
to

"Tom Lavedas" wrote:

> .
>

Hi Tom
Thanks for the code, its working fine with the above criteria, but gives
me error if there is Null value between double quotes ("") i.e.

("AB8679","""",""Need" Clean Drive Way, stores","")

As i have some columns in CSV file where the value is Null "". How i can
resolve it.

Thanks again for your great help.

Pegasus [MVP]

unread,
Nov 17, 2009, 11:49:34 AM11/17/09
to

"Learn4Develop" <Learn4...@discussions.microsoft.com> wrote in message
news:70A7DE23-C1A4-417B...@microsoft.com...

Try my suggestion. It should beable to cope with empty fields.


Tom Lavedas

unread,
Nov 17, 2009, 1:52:59 PM11/17/09
to
On Nov 17, 11:46 am, Learn4Develop

<Learn4Deve...@discussions.microsoft.com> wrote:
> "Tom Lavedas" wrote:
> > On Nov 16, 6:20 am, Learn4Develop
> > <Learn4Deve...@discussions.microsoft.com> wrote:
> > > On daily bases I have received files with double quotes comma separated, but
> > > in some records I found 2 times double quotes and some time different squence
> > > of double quotes placements.
>
> > > I just want to remove all unnecessary (extra) double quotes (see the
> > > following values where i want to remove extra double quotes.
>
> > > "0123x",""Company D-Val"","Class D, sector N","DD5894"
> > > "4894D",""Recycle" Rubbish, C class","Class D, Sector F, Block N","D870GH"
> > > "AB8679",""AB Ltd"",""Need" Clean Drive Way, stores","GF0347"
>
> > > I am looking the output like that.
>
> > > "0123x","Company D-Val","Class D, sector N","DD5894"
> > > "4894D","Recycle Rubbish, C class","Class D, Sector F, Block N","D870GH"
> > > "AB8679","AB Ltd","Need Clean Drive Way, stores","GF0347"
>
{snip}
>
> Hi Tom
>   Thanks for the code, its working fine with the above criteria, but gives
> me error if there is Null value between double quotes ("") i.e.
>
> ("AB8679","""",""Need" Clean Drive Way, stores","")
>
> As i have some columns in CSV file where the value is Null "". How i can
> resolve it.
>
> Thanks again for your great help.

Try this (though Pegusus' approach is also workable) ...

Do Until inFile.AtEndOfStream
record = InFile.ReadLine
nPos = Instr(record, Chr(34) & Chr(34))
do while nPos > 0

record = Left(record, nPos) & Mid(record, nPos + 2)


nPos = Instr(nPos + 1, record, Chr(34))

if nPos > 0 then


record = Left(record, nPos - 1) & Mid(record, nPos + 1)
nPos = Instr(nPos + 1, record, Chr(34) & Chr(34))

else
record = record & Chr(34)
end if
loop
outFile.WriteLine record
Loop
_____________________
Tom Lavedas

Christoph Basedau

unread,
Nov 17, 2009, 6:31:32 PM11/17/09
to
Learn4Develop schrieb:

> On daily bases I have received files with double quotes comma separated, but
> in some records I found 2 times double quotes and some time different squence
> of double quotes placements.
>
> I just want to remove all unnecessary (extra) double quotes (see the
> following values where i want to remove extra double quotes.
>
> "0123x",""Company D-Val"","Class D, sector N","DD5894"
> "4894D",""Recycle" Rubbish, C class","Class D, Sector F, Block N","D870GH"
> "AB8679",""AB Ltd"",""Need" Clean Drive Way, stores","GF0347"
>
>
> I am looking the output like that.
>
> "0123x","Company D-Val","Class D, sector N","DD5894"
> "4894D","Recycle Rubbish, C class","Class D, Sector F, Block N","D870GH"
> "AB8679","AB Ltd","Need Clean Drive Way, stores","GF0347"

If you want to implement the appropriate algorithm, you cannot
use a replace-mechanism based on literals, you always have to be
aware of the quoting and delimiting context.
The right way, j2mc, is to parse the string char by char, count the
DQs and check whether a DQ is followed by a delimiter.
If the delimiter appears after an even number of DQs it truly terminates
the token, if the number is odd it's an "inline-delimiter".
A DQ on the other hand is kept if its the first one (obvious,
as opener) or one with an even ordinal number followed by the delimiter
(terminator).

Christoph


In VBS a function that follows these rules would like this:


'---------------------
Function unquote(record)

Const DQ = """"
Const CM = ","

Dim newRecord
Dim dqCount
Dim char
Dim nextChar
Dim i
Dim keep

newRecord = ""
dqCount = 0

For i = 1 To Len(Record)

char = Mid(record, i, 1)
nextChar = Mid(record, i+1, 1)
keep = 0

If char = DQ Then
If dqCount = 0 Then
'beginning of token
keep = 1
dqCount = 1

ElseIf (dqCount Mod 2 = 1) And _
(nextChar = CM OR nextChar = "") Then

'end of token marked by ", OR "\r\n
keep = 1
dqCount = 0
Else
'inline "
keep = 0
dqCount = dqCount + 1
End If
Else
'char other then "
keep = 1
End If
If keep = 1 Then
newRecord = newRecord & char
End If
'WSH.Echo dqCount, i, keep, char, nextchar, newRecord
Next
unquote = newRecord
End Function
'---------------------

To test the results, run:

'---------------------
Option Explicit
Dim records
records = Array ( _
"""0123x"",""""Company D-Val"""",""Class D, sector N"",""DD5894""" _
, """4894D"",""""Recycle"" Rubbish, C class"",""Class D, Sector F, Block
N"",""D870GH""" _
, """AB8679"",""""AB Ltd"""",""""Need"" Clean Drive Way,
stores"",""GF0347""" _
, """""AB8679"", Test New"",""""""AB Ltd"""""",""""Need"" Clean Drive
Way, stores"",""GF0347""" _
)

Dim Record

For Each Record in records
WSH.Echo record
WSH.Echo unquote(record)
WSH.Echo
Next


Function unquote(record)
...
End Function
'---------------------

Todd Vargo

unread,
Nov 17, 2009, 6:50:30 PM11/17/09
to
Learn4Develop wrote:
> Hi Tom
> Thanks for the code, its working fine with the above criteria, but gives
> me error if there is Null value between double quotes ("") i.e.
>
> ("AB8679","""",""Need" Clean Drive Way, stores","")
>
> As i have some columns in CSV file where the value is Null "". How i can
> resolve it.
>
> Thanks again for your great help.

ISTM, only strings containing a comma should need quoted. But since you say
it gives you an error, it begs the questions. What is giving the error? And
what is creating these incorrect csv files?

Paul Randall

unread,
Nov 17, 2009, 11:08:00 PM11/17/09
to

"Learn4Develop" <Learn4...@discussions.microsoft.com> wrote in message
news:C072F055-9B82-4D2D...@microsoft.com...

I agree with Christoph that either the mechanism that creates this
non-standard quoting of a CSV file should be made more robust, or a more
robust mechanism should be used to read the poorly implimented CSV file.
Perhaps a simple VBScript using ADO as the robust file reading mechanism
could be used.

-Paul Randall


0 new messages