bufio Read Reverse

3,107 views
Skip to first unread message

Luke Mauldin

unread,
Dec 31, 2013, 9:42:56 AM12/31/13
to golan...@googlegroups.com
All,

I have an os.File that I need to search through until I find a delimiter and I think that bufio.Reader.ReadString() would work well for my purposes.  However, I need to read the file in reverse order, in other words I need to search for a string starting at the end of the file.  I can read the entire file into memory and then use the bytes package to find the LastIndex function but I would prefer not to read the entire file into memory.  Any suggestions?

Luke

Konstantin Khomoutov

unread,
Dec 31, 2013, 10:00:26 AM12/31/13
to Luke Mauldin, golan...@googlegroups.com
You do not need to use buffered I/O for this task.

The *os.File type implements both the io.Seeker and io.Reader (or
io.ReadSeeker, which embeds both) interfaces, so basically your
approach should be like this:

1. Settle on a buffer size, say, 8 KiB or whatever.
Let this value be named bufSuze.

2. Get the file length calling Stat() on *os.File;

3. Subtract bufSize bytes from the file's size (or take its full
size if it's less than bufSize), save this number, Seek() to that
position, read bufSize bytes into a byte slice of the length bufSize
using the Read() method.

Note that the actual amount of data read might be less than
bufSize as the file might shrink during reading.
Seek() might fail due to the same reason.

4. Scan the bytes in the buffer backwards, looking for the terminating
character.

5. If not found, subtract bufSize from the position saved on step (3),
Seek(), Read(), go to step (4).

Caleb Doxsey

unread,
Dec 31, 2013, 10:18:52 AM12/31/13
to Konstantin Khomoutov, Luke Mauldin, golan...@googlegroups.com
With Konstantinov's approach you'll have to keep in mind that your terminator may end up split between two slices. For example if you were looking for cat and you end up with a file sliced like this:

.... | ... ca | t ... | ...

You'd miss it.



--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Luke Mauldin

unread,
Dec 31, 2013, 3:23:31 PM12/31/13
to golan...@googlegroups.com, Konstantin Khomoutov, Luke Mauldin, ca...@doxsey.net
That is a great point about the split identifiers.  Do I just need to build my own function on top of the standard library functions or does someone have a package that already encapsulates this functionality?

Luke

Kevin Gillette

unread,
Dec 31, 2013, 6:04:19 PM12/31/13
to golan...@googlegroups.com
http://godoc.org/github.com/extemporalgenome/curio

Particularly <http://godoc.org/github.com/extemporalgenome/curio#example-Match--Reverse>.  As long as the symbol you're trying to find is printable ASCII (not a control-character or a high-bit, e.g. unicode rune), then Match should work fine; it was designed to aid in parsing PDFs, which are have file trailers and is all 7-bit safe (besides stream payloads).

roger peppe

unread,
Jan 2, 2014, 8:36:43 AM1/2/14
to Luke Mauldin, golang-nuts
This package should solve your problem - it allows
you to use any of the Split functions from bufio to read in reverse.

http://godoc.org/code.google.com/p/rog-go/reverse

It hasn't been used much in anger - bug reports gratefully received.

cheers,
rog.

lukem...@gmail.com

unread,
Jan 2, 2014, 8:49:17 AM1/2/14
to roger peppe, golang-nuts
Roger,

Thank you, that is exactly what I was looking for.

Luke

roger peppe

unread,
Jan 2, 2014, 11:11:35 AM1/2/14
to Luke Mauldin, golang-nuts
Great!

Oleku Konko

unread,
Jan 2, 2014, 12:38:24 PM1/2/14
to golan...@googlegroups.com
You are reading all files to memory which might not be efficient for large log files
Reply all
Reply to author
Forward
0 new messages