Reading malformed ascii/text files in VB.net

42 views
Skip to first unread message

Andrew

unread,
Sep 3, 2008, 1:23:22 PM9/3/08
to DotNetDevelopment, VB.NET, C# .NET, ADO.NET, ASP.NET, XML, XML Web Services,.NET Remoting
Hi all, I am trying to read a text like file that is padded with some
garbage (or at least it is to me). When I try to read using
StreamReader it does not make it past the first few awkward
characters, can anyone suggest a method to read in this data? Simply
ignoring it is a acceptable solution, as the end result is to filter
this out.



This is an example of what the input is like, everything before the
0H## is garbage, but using StreamReader it does not get past them.

µ á 0H12$HEHDT,224.71,T*1D
É á 0H14$WIMWV,009,R,023,N,A*2B
É á 0H15$VMVLW,301074.5,N,0296833,N*79
Ý á 0H23$AGHTD,V,3.9,R,R,N,10,10,0.5,10.0,,0.0,,T,A,A,A,224.7*6C
Ý á 0H12$HCHDM,232.90,M*13

Here is the implementation of the code. Currently just dumps what is
read into a text box as proof of concept.

myStream = opnFile.OpenFile()
If (myStream IsNot Nothing) Then
Dim objReader As New
StreamReader(opnFile.FileName)
txtOut.Text = objReader.ReadToEnd
objReader.Close()
'Read stream
End If



Any help greatly appreciated.

Cheers and thanks.

Glenn

unread,
Sep 4, 2008, 7:48:20 AM9/4/08
to DotNetDe...@googlegroups.com
Without running some tests, I'm only guessing here.  I suspect that StreamReader is treating the data as text.  Try using a BinaryReader instead to filter out the data.
 
...Glenn

Joseph Irizarry

unread,
Sep 4, 2008, 8:12:50 PM9/4/08
to DotNetDe...@googlegroups.com
I haven't tested this but I think it's a step in the right direction... 
Change the encoding of the StreamReader...

.......
using(StreamReader sr = ....){
sr.CurrentEncoding = System.Text.Encoding.Unicode;
var line = sr.ReadLine();
.......
}

I see this working unless you are having to deal with non-unicode characters (not likely). There is also an Encoding.UTF32 property that uses 32 bits to describe it's code page instead of 16 bits.

rbdavidson

unread,
Sep 5, 2008, 3:54:32 PM9/5/08
to DotNetDevelopment, VB.NET, C# .NET, ADO.NET, ASP.NET, XML, XML Web Services,.NET Remoting
Your problem is probably encoding related, also you may want to use
BinaryReader which is more tolerant of odd-ball data.

On a side note, dropping into a textBox may not work the way you
expect. TextBox.Text does not behave the same as a String variable or
an array of characters or bytes. TextBox.Text often truncates data
when it runs into characters it can't display, Strings etc... don't.
It may be that your garbage characters are causing the TextBox.Text
property to truncate your data string. i.e. Your code may be fine but
you simply can't display it all in a textbox.

CK

unread,
Sep 8, 2008, 4:06:44 AM9/8/08
to DotNetDevelopment, VB.NET, C# .NET, ADO.NET, ASP.NET, XML, XML Web Services,.NET Remoting
You may find your file has EOF characters within it, causing the
stream to end.
> > Cheers and thanks.- Hide quoted text -
>
> - Show quoted text -
Reply all
Reply to author
Forward
0 new messages