How to efficiently iterate and regexp-replace through a []byte

560 views
Skip to first unread message

Thomas Kappler

unread,
Jul 10, 2011, 3:56:15 AM7/10/11
to golang-nuts
Hi group,

I have multiline text in a []byte. It's in Markdown format, where
lines starting with four or more spaces are special. I'd like to
iterate over the lines of the string and do a regexp.Replace on lines
that do not start with four spaces. The result could be written back
directly, but since the Replace might change the length of the string,
I imagine I have to store the processed text somewhere else and
overwrite the old one in the end.

Being new to Go, I got a little lost in all the options I have, and
would appreciate some advice on idiomatic and efficient Go code (a
rough sketch is enough) to achieve the above goal.

I came up with the following so far:

for _, line := range bytes.Split(c.Content, []byte{'\n'}, -1) {
if !bytes.HasPrefix(line, strings.Bytes(" ")) {
// line to string, Replace, back to []byte
}
}

or

contentR := bufio.NewReader( strings.NewReader(string(c.Content)) )
read := func() (string, os.Error) {return contentR.ReadString('\n')};
for line, err := read(); err == nil; line, err = read() {
if !strings.HasPrefix(line, " ") {
// Replace, to []byte
}
}

Thanks for your help,
Thomas

Christopher Dunn

unread,
Jul 10, 2011, 1:11:33 PM7/10/11
to golan...@googlegroups.com
Until you are comfortable with the language, don't worry about efficiency. Worry about correctness.

I think the simplest solution is via regexp. I would match on "\n    ". I'd add a leading linefeed to the input, process it, and then strip the leading linefeed.

package main
import ("regexp"; "fmt")

func repl(match []byte) []byte {
    result := []byte("\n----")
    result = append(result, match[5:]...)
    result = append(result, []byte("====")...)
    return result 
}

func main() {
    src := []byte("\n" + "some markup\n    * more\n    * still more")
    re := regexp.MustCompile("\n    [^\n]*")
    fmt.Println(string(src))
    src = re.ReplaceAllFunc(src, repl)
    fmt.Println(string(src))
    src = src[1:]
}


###################

some markup
    * more
    * still more

some markup
----* more====
----* still more====


Unfortunately, Go's Regexp is not yet as powerful as pcre. However, Go's source code is very readable. See "src/pkg/regexp/regexp.go". ReplaceAllFunc uses a bytes.Buffer, which is efficient.

~Chris

Thomas Kappler

unread,
Jul 11, 2011, 5:04:28 PM7/11/11
to golang-nuts
Thanks, Chris. I agree to worry about correctness first, but since my
algorithm is very simple I wanted to use it as a chance to learn a bit
about efficient and idiomatic Go.

After some more studying of the relevant packages and discovering the
ellipsis to expand slices to varargs, I think I came up with a
reasonably efficient solution:

func preprocessContent(content []byte) []byte {
newContent := make([]byte, len(content)+200)
for _, line := range bytes.Split(content, []byte{'\n'}) {
if !bytes.HasPrefix(line, []byte(" ")) {
line = linkRE.ReplaceAllFunc(line, linkPages)
}
newContent = append(newContent, line...)
}
// ...
Reply all
Reply to author
Forward
0 new messages