I just want to remove all unnecessary (extra) double quotes (see the
following values where i want to remove extra double quotes.
"0123x",""Company D-Val"","Class D, sector N","DD5894"
"4894D",""Recycle" Rubbish, C class","Class D, Sector F, Block N","D870GH"
"AB8679",""AB Ltd"",""Need" Clean Drive Way, stores","GF0347"
I am looking the output like that.
"0123x","Company D-Val","Class D, sector N","DD5894"
"4894D","Recycle Rubbish, C class","Class D, Sector F, Block N","D870GH"
"AB8679","AB Ltd","Need Clean Drive Way, stores","GF0347"
Following code i used is work fine with ""Company D-Val"" but didn't work
with ""Recycle" Rubbish, C class", The code is
code:-
Do While inFile.AtEndOfStream = False
record = InFile.ReadLine()
record = Replace(record, """", """") 'replace "" with single space
outFile.WriteLine record & vbclrf
Loop
It was tricky getting the indexing right, but this seemed to do what
you want on the test sample you provided ...
Do Until inFile.AtEndOfStream
record = InFile.ReadLine
nPos = Instr(record, Chr(34) & Chr(34))
do while nPos > 0
record = Left(record, nPos - 1) & Mid(record, nPos + 1)
nPos = Instr(nPos + 1, record, Chr(34))
record = Left(record, nPos - 1) & Mid(record, nPos + 1)
nPos = Instr(nPos + 1, record, Chr(34) & Chr(34))
loop
outFile.WriteLine record
Loop
The approach removes the first of a pair of double quotes and the very
next double quote that follows; thereby removing the starting and
ending quotes that enclose text, when there are too many.
_____________________
Tom Lavedas
You could do it like this for every line of text:
1. Replace every "," with your own delimiter. I use chr(0).
2. Remove all remaining double quotes.
3. Replace your own delimiters with ",".
4. Surround the line with double quotes.
This code appears to work for your data sample:
Q = """"
Z = Chr(0)
sDelimiter = Q & "," & Q
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFile = oFSO.OpenTextFile("d:\test.txt")
While Not oFile.AtEndOfStream
sLine = Replace(oFile.ReadLine, sDelimiter, Z)
sLine = Replace(sLine, Q, "")
sLine = Q & Replace(sLine, Z, sDelimiter) & Q
WScript.Echo sLine
Wend
The rubbish line looks like exactly that. the first paired double quotes are
not matched by another such pair. In the context of your data, what does it
mean to have a doubled double-quote?
>
> I am looking the output like that.
>
> "0123x","Company D-Val","Class D, sector N","DD5894"
> "4894D","Recycle Rubbish, C class","Class D, Sector F, Block N","D870GH"
> "AB8679","AB Ltd","Need Clean Drive Way, stores","GF0347"
>
>
> Following code i used is work fine with ""Company D-Val"" but didn't work
> with ""Recycle" Rubbish, C class", The code is
>
> code:-
> Do While inFile.AtEndOfStream = False
> record = InFile.ReadLine()
> record = Replace(record, """", """") 'replace "" with single space
That replace call seems to change each double quote with exactly one double
quote.
Perhaps you just need to change each pair of double quotes with a single
double quote.
/Al
record = Replace(record, Chr(34)&Chr(34), Chr(34)) 'replace "" with single "
--
Todd Vargo
(Post questions to group only. Remove "z" to email personal messages)
"Tom Lavedas" wrote:
> .
>
Hi Tom
Thanks for the code, its working fine with the above criteria, but gives
me error if there is Null value between double quotes ("") i.e.
("AB8679","""",""Need" Clean Drive Way, stores","")
As i have some columns in CSV file where the value is Null "". How i can
resolve it.
Thanks again for your great help.
Try my suggestion. It should beable to cope with empty fields.
Try this (though Pegusus' approach is also workable) ...
Do Until inFile.AtEndOfStream
record = InFile.ReadLine
nPos = Instr(record, Chr(34) & Chr(34))
do while nPos > 0
record = Left(record, nPos) & Mid(record, nPos + 2)
nPos = Instr(nPos + 1, record, Chr(34))
if nPos > 0 then
record = Left(record, nPos - 1) & Mid(record, nPos + 1)
nPos = Instr(nPos + 1, record, Chr(34) & Chr(34))
else
record = record & Chr(34)
end if
loop
outFile.WriteLine record
Loop
_____________________
Tom Lavedas
> On daily bases I have received files with double quotes comma separated, but
> in some records I found 2 times double quotes and some time different squence
> of double quotes placements.
>
> I just want to remove all unnecessary (extra) double quotes (see the
> following values where i want to remove extra double quotes.
>
> "0123x",""Company D-Val"","Class D, sector N","DD5894"
> "4894D",""Recycle" Rubbish, C class","Class D, Sector F, Block N","D870GH"
> "AB8679",""AB Ltd"",""Need" Clean Drive Way, stores","GF0347"
>
>
> I am looking the output like that.
>
> "0123x","Company D-Val","Class D, sector N","DD5894"
> "4894D","Recycle Rubbish, C class","Class D, Sector F, Block N","D870GH"
> "AB8679","AB Ltd","Need Clean Drive Way, stores","GF0347"
If you want to implement the appropriate algorithm, you cannot
use a replace-mechanism based on literals, you always have to be
aware of the quoting and delimiting context.
The right way, j2mc, is to parse the string char by char, count the
DQs and check whether a DQ is followed by a delimiter.
If the delimiter appears after an even number of DQs it truly terminates
the token, if the number is odd it's an "inline-delimiter".
A DQ on the other hand is kept if its the first one (obvious,
as opener) or one with an even ordinal number followed by the delimiter
(terminator).
Christoph
In VBS a function that follows these rules would like this:
'---------------------
Function unquote(record)
Const DQ = """"
Const CM = ","
Dim newRecord
Dim dqCount
Dim char
Dim nextChar
Dim i
Dim keep
newRecord = ""
dqCount = 0
For i = 1 To Len(Record)
char = Mid(record, i, 1)
nextChar = Mid(record, i+1, 1)
keep = 0
If char = DQ Then
If dqCount = 0 Then
'beginning of token
keep = 1
dqCount = 1
ElseIf (dqCount Mod 2 = 1) And _
(nextChar = CM OR nextChar = "") Then
'end of token marked by ", OR "\r\n
keep = 1
dqCount = 0
Else
'inline "
keep = 0
dqCount = dqCount + 1
End If
Else
'char other then "
keep = 1
End If
If keep = 1 Then
newRecord = newRecord & char
End If
'WSH.Echo dqCount, i, keep, char, nextchar, newRecord
Next
unquote = newRecord
End Function
'---------------------
To test the results, run:
'---------------------
Option Explicit
Dim records
records = Array ( _
"""0123x"",""""Company D-Val"""",""Class D, sector N"",""DD5894""" _
, """4894D"",""""Recycle"" Rubbish, C class"",""Class D, Sector F, Block
N"",""D870GH""" _
, """AB8679"",""""AB Ltd"""",""""Need"" Clean Drive Way,
stores"",""GF0347""" _
, """""AB8679"", Test New"",""""""AB Ltd"""""",""""Need"" Clean Drive
Way, stores"",""GF0347""" _
)
Dim Record
For Each Record in records
WSH.Echo record
WSH.Echo unquote(record)
WSH.Echo
Next
Function unquote(record)
...
End Function
'---------------------
ISTM, only strings containing a comma should need quoted. But since you say
it gives you an error, it begs the questions. What is giving the error? And
what is creating these incorrect csv files?
I agree with Christoph that either the mechanism that creates this
non-standard quoting of a CSV file should be made more robust, or a more
robust mechanism should be used to read the poorly implimented CSV file.
Perhaps a simple VBScript using ADO as the robust file reading mechanism
could be used.
-Paul Randall