Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Split large text file by number of lines?

2 views
Skip to first unread message

ivan....@gmail.com

unread,
Feb 21, 2007, 4:58:59 PM2/21/07
to
Hello,

im a beginner in VB.NET... The thing i would like to do is as it
follows....

I have a text file (list of names, every name to the next line) which
is about 350000 lines long. I would like to split it and create a new
file at every lets say 20000 lines... so, the directory output would
have to be something like this:

File1: 1-20000 lines of the original file
File2: 20001-40000 lines of the original file
File3: 40001-60000 lines of the original file

etc.

Can it be done simply? one form with field to enter the number of
lines, button to load a text file and a "Start" button...

Thanks in advance

Stephany Young

unread,
Feb 21, 2007, 6:19:23 PM2/21/07
to
Yes.

Read the source file line by line

Write each line to the target file

After each nth line, close the target file and open a new one (with a
different name of course).


<ivan....@gmail.com> wrote in message
news:1172095139.3...@a75g2000cwd.googlegroups.com...

Michael M.

unread,
Feb 21, 2007, 8:11:22 PM2/21/07
to
This code I have writen works but it takes some time to complete(about 50
seconds for a 1 mb text file)

Mabye beter to do a "readall" and then use the SPLIT(str, vbcrlf) function
anyway this should do it

add a textbox and a button. This is created in vb.net 2005 (the free
version from microsoft)

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles Button1.Click

TextSplitter()

End Sub


Sub TextSplitter()

' open the source fle and read assign it to a stream

Dim AsciiStreamReader As IO.StreamReader =
IO.File.OpenText("C:\HugeSourceTextFile1.txt")

Dim sb As New Text.StringBuilder

Dim LineCounter As Integer = 0

Dim FileNumber As Integer = 1

Dim bProcessWinMsg As Boolean = 0

Me.Text = "processng file... "

While AsciiStreamReader.EndOfStream = False

bProcessWinMsg += 1

If bProcessWinMsg Then Application.DoEvents()

sb.Append(AsciiStreamReader.ReadLine() & vbCrLf)

If LineCounter = CInt(TextBox1.Text) Or AsciiStreamReader.EndOfStream = True
Then

' Writes the data stored in the stringBuiler(sb) and then closes the file

IO.File.WriteAllText("C:\" & "File " & FileNumber & ".txt", sb.ToString,
Encoding.ASCII)

' Reset the line count, clear the sb string and increment the file number

LineCounter = 0

sb.Length = 0

FileNumber += 1

End If

LineCounter += 1

End While

Me.Text = "Complete: created " & FileNumber & " files"

End Sub

"Stephany Young" <noone@localhost> wrote in message
news:%23ZoLG7g...@TK2MSFTNGP05.phx.gbl...

Armin Zingler

unread,
Feb 21, 2007, 8:48:58 PM2/21/07
to
"Michael M." <nos...@mike.com> schrieb

> This code I have writen works but it takes some time to
> complete(about 50 seconds for a 1 mb text file)
>
> Mabye beter to do a "readall" and then use the SPLIT(str, vbcrlf)
> function anyway this should do it
>
> add a textbox and a button. This is created in vb.net 2005 (the
> free version from microsoft)


Suggestion (untested):

Sub TextSplitter()

Dim fsIN, fsOut As IO.FileStream
Dim sr As IO.StreamReader
Dim sw As IO.StreamWriter
Dim OutCount As Integer

fsIN = New IO.FileStream( _
"infile.txt", IO.FileMode.Open, IO.FileAccess.Read _
)

sr = New IO.StreamReader(fsIN, System.Text.Encoding.Default)

Do
Dim Line As String
Dim LineCount As Integer

Line = sr.ReadLine()
If Line Is Nothing Then Exit Do

If fsOut Is Nothing Then
OutCount += 1

fsOut = New IO.FileStream( _
"outfile" & OutCount & ".txt", _
IO.FileMode.CreateNew, IO.FileAccess.Write _
)

sw = New IO.StreamWriter(fsOut, System.Text.Encoding.Default)
LineCount = 0
End If

sw.WriteLine(Line)
LineCount += 1

If LineCount = 20000 Then
sw.Close()
fsOut = Nothing
End If
Loop

If fsOut IsNot Nothing Then
sw.Close()
End If

fsIN.Close()

End Sub


Be aware that Encoding.Ascii supports only 7 bit characters.


Armin

Tom Leylan

unread,
Feb 21, 2007, 11:57:25 PM2/21/07
to
I'm going to opt for an OOP solution which isn't quite so dependent upon all
the inputs being fixed. Personally I've learned that "specs change" and
planning for change saves the client money which makes for a happy client.

So try the other solutions out and then try this one. Do note that you can
change the input file name, set the output file names, set the line count
(indpendently per file) and it can produce more (or fewer) than 3 files by
calling the Copy() method as many times as you want.

Personally I'd add some error handling before I tried to sell it and I might
add a methodology to indicate the end of the input file was reached. While
it should cause no harm it seems pointless to keep calling Copy() if the
end-of-file was already reached.

The reason there is only 30 lines indicated in my example is that's all I
wanted to type into my test.

Tom

Dim oCopier As Copier = New Copier()

With oCopier
.Open("infile.txt")
.Copy("file1.txt", 10)
.Copy("file2.txt", 10)
.Copy("file3.txt", 10)
.Close()
End With


Public Class Copier
Inherits Object

Private fs As IO.FileStream
Private sr As IO.StreamReader

Public Sub Open(ByVal file As String)
fs = New IO.FileStream(file, IO.FileMode.Open, IO.FileAccess.Read)
sr = New IO.StreamReader(fs, System.Text.Encoding.Default)
End Sub

Public Sub Close()
sr.Close()
End Sub

Public Sub Copy(ByVal file As String, ByVal max As Int32)

Dim fs As IO.FileStream = New IO.FileStream(file,
IO.FileMode.CreateNew, IO.FileAccess.Write)
Dim sw As IO.StreamWriter = New IO.StreamWriter(fs,
System.Text.Encoding.Default)

Dim input As String
Dim count As Int32 = 0

Dim processing As Boolean = True
While processing

input = sr.ReadLine()
count += 1

processing = ((input IsNot Nothing) AndAlso count < max)

If processing Then
sw.WriteLine(input)
End If

End While

sw.Close()

End Sub

End Class


<ivan....@gmail.com> wrote in message
news:1172095139.3...@a75g2000cwd.googlegroups.com...

PFC Sadr

unread,
Feb 22, 2007, 1:19:15 AM2/22/07
to
you know they make partitioning in databases right?

keep everything in one table and then you can use rank; or you can
filter, search-- anything you want to do.

and it doesn't matter if you have 20k records or 4m...

I mean seriously; why reinvent the wheel? Did I mention why reinvent
the wheel?

per...@gmail.com

unread,
Mar 1, 2007, 2:46:46 AM3/1/07
to

Armin Zingler je napisao/la:

thanks man, this code does exactly what i need, and pretty fast....

0 new messages