Differences between os, io, ioutils, bufio, bytes (with Buffer type) packages for file reading

5,884 views
Skip to first unread message

Cyru Sol

unread,
Feb 25, 2014, 2:20:45 AM2/25/14
to golan...@googlegroups.com
Hi, I've posted this on reddit and got send here.

I'm quite confused as there seems to be multiple redundant ways to solve my problem (read a file, parse the content, serve it via http).
Most people on stackoverflow would use bufio, but I just can't get the differences between this package and the Buffer type of bytes and just reading a file with the os methods.
Also I don't know when and why I should choose those ways to do it, when I have the simple, but non-versatile, ioutils.ReadFile.

This is how I solved it currently:

func loadPage(path string, f os.FileInfo, err error) error {
    if err != nil {
        panic(err)
    }
    if !!f.IsDir() {
        return nil
    }
    matched, err := filepath.Match("[0-9]-*.md", f.Name())
    if err != nil {
        panic(err)
    }
    if matched {
        titleStart := strings.Index(f.Name(), "-") + 1
        titleEnd := strings.LastIndex(f.Name(), ".md")
        title := strings.ToLower(f.Name()[titleStart:titleEnd])
        content, err := ioutil.ReadFile(path)
        if err != nil {
            panic(err)
        }
        pages[title] = bytes.NewBuffer(blackfriday.MarkdownCommon(content))
    }
    return nil
}

func loadPages() {
    pages = make(map[string]*bytes.Buffer)
    err := filepath.Walk(PAGEDIR, loadPage)
    if err != nil {
        panic(err)
    }
}


Besides replacing the panic calls, what could I do to improve this? I think there are too many confusing redundant possibilities...

Cyru Sol

unread,
Feb 25, 2014, 2:22:16 AM2/25/14
to golan...@googlegroups.com
Forgot the handler:

func pageHandler(title string) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        fmt.Fprint(w, pages[title])
    }
}

Tamás Gulácsi

unread,
Feb 25, 2014, 2:55:49 AM2/25/14
to golan...@googlegroups.com
Look at http.ServeFile.

RickyS

unread,
Feb 25, 2014, 4:07:20 AM2/25/14
to golan...@googlegroups.com
os is for opening and closing files.
bufio is for reading, writing and parsing, usually text files.
fmt is usually used for printing and formatting.

But http has lots of good stuff.

Carlos Castillo

unread,
Feb 25, 2014, 11:10:32 AM2/25/14
to golan...@googlegroups.com
That won't work, he's trying to process the files.

On Monday, February 24, 2014 11:55:49 PM UTC-8, Tamás Gulácsi wrote:
Look at http.ServeFile.

Ondekoza

unread,
Feb 25, 2014, 12:03:53 PM2/25/14
to golan...@googlegroups.com
The only problem that might occur with ReadFile is that it might fail on extremely large files.
But since your files are guaranteed to be markdown files which are probably written by a human writer,
this is not a problem. What is it, that you don't like about your solution? It seems to work for
you.

I would use a regular expression to extract the title from the filename. The identification of
the title using indexes look overly complicated to me. But that's only because it gives me
an opportunity to promote my tutorial on regular expressions. :-)

Carlos Castillo

unread,
Feb 25, 2014, 12:58:27 PM2/25/14
to golan...@googlegroups.com
First of all, if you are presenting code, you should use playground links.
  • You shouldn't be using panic in load, filepath.Walk will return the error that load returns if the error is not nil, and not filepath.SkipDir: http://play.golang.org/p/2bu5gWluri
    • loadPages can panic, or log.Fatal, return the error to code that handles the error, or deal with the error itself (eg: try another path)
  • You can store just the processed byte slice in the map, not a *bytes.Buffer: http://play.golang.org/p/MjSi3dBHDr
    • If you need an io.Reader type (which your example doesn't), you can create a bytes.Reader on the []byte instead, which is much more lightweight, and safer to use
  • In the handler, with a byte-slice you can use http.ResponseWriter.Write directly: http://play.golang.org/p/dKM6MZKdas
    • Although the only thing that can fail is the write to the client (and so you can't give them an error message), you probably should log the error from the ResponseWriter
  • Alternatively to using filepath.Walk, with filepath.Match to check files, you could use filepath.Glob, to get the list of matching files and then make your own for-loop.
    • Upsides
      • You don't need to write callback code
      • Probably can fit it all in one function
    • Downsides:
      • No recursive traversal into subdirectories
        • May not be an issue
      • You don't get a os.FileInfo automatically (to test for dirs, etc...)
        • If the user has a directory that matches your pattern, it might make more sense to return the error from ioutil.ReadFile than to check and then skip it
In your code, you are reading the contents of a directory once at startup, and storing the processed contents in memory. This means that you have to restart the program if you want to serve updated or new files, and that the memory for these files is always in use. Neither of these problems is critical, and I have no idea of you specific situation, so you don't need to change your program, but just be mindful of the limitations of the code you have written. Also your code doesn't distinguish the files "1-foo.md" from "2-foo.md" in the map, so the later will overwrite the former. This may have been your intent though...

Finally to answer your initial question, the packages you mention in the subject line work as follows w.r.t. your needs:
  • io defines interfaces that handle streams of bytes (Reader, Writer, etc...) as well as functions that work generically with types implement these interfaces (eg: io.Copy)
  • os defines types and functions that represent operating system objects/functionality at a low level that is portable across all go implementations.
    • *os.File is a type that implements io.Reader, and io.Writer (among others) which streams bytes to or from a file on disk
      • It is useful if you don't want to read the whole file into memory, or are using io.Copy (or some other method) to stream data to/from the file
      • It has the downside of being a lower level construct, meaning data must often be processed in loops (with error checks on each iteration), and that it must be manually managed (via Close())
  • io/ioutil provides helper functions for some non-trivial file and io tasks
    • ReadFile reads an entire file into memory (as a []byte) in a single call
      • It automatically allocates a byte slice of the correct size (no need to Read + append in a loop)
      • It automatically closes the file
      • It returns the first error that prevented it from working (so you only need a single error check)
  • bufio provides wrapper types for io.Reader and io.Writer that buffer the input / output to improve efficiency
    • The net/http package already buffers data for you (using bufio itself) so you don't need this package for that
    • If you are reading a file in one or a few large steps, you probably don't need it either
    • buffered input and output add some extra concerns
    • bufio.Scanner is a nice utility type to efficiently read independent lines of text from an io.Reader
  • bytes provides helper functions and types for interacting with byte slices ([]byte)
    • bytes.Reader turns a []byte into a io.Reader (as well as an io.Seeker to rewind)
    • bytes.Buffer uses []bytes to implement a reader/writer, it is useful when you want to use code that takes an io.Writer, and store the results in memory for use later
    • the strings package provides analogous behaviour for go strings
Message has been deleted

Cyru Sol

unread,
Feb 25, 2014, 4:26:00 PM2/25/14
to golan...@googlegroups.com
@Ondekoza, thank you, that is the kind of advice I'm looking for (the regex hint).

And a big thank you to Carlos! That is fantastic explanation of the usecases :).

For my project: I will have a few (<10) markdown files that will be a few KB (<100) of size. They won't change for a long time (>1 year), so I thought it might be a good idea to store the (little amount of) content in the RAM (of which I have 8 GB available) to reduce the amount of hard drive I/O.
Later I want to wrap them within a html template which I also want to store in the RAM.

I will switch from the *bytes.Buffer to the []byte - I weren't aware of the ResponseWriter.Write method, although it is obvious. That is what i meant with me getting confused. So big thanks for the clarification :)

I'll also test whether the Glob or maybe some regex stuff will work for me.

zhouhon...@gmail.com

unread,
May 10, 2018, 2:20:44 PM5/10/18
to golang-nuts
Your answer is awesome, thank you



在 2014年2月26日星期三 UTC+13上午6:58:27,Carlos Castillo写道:
Reply all
Reply to author
Forward
0 new messages