[ANN] Gleam now supports distributed pure Go Map Reduce

227 views
Skip to first unread message

ChrisLu

unread,
Jan 20, 2017, 10:38:32 PM1/20/17
to golang-nuts

Many people may know Gleam used to support only LuaJIT for distributed map reduce. 
After several rounds of re-thinking, I came down to this syntax to do it via pure Go. Please let me know whether it seems ok or can be better.

Chris

package main

import (
    "strings"
    "os"

    "github.com/chrislusf/gleam/flow"
    "github.com/chrislusf/gleam/gio"
)

var (
    MapperTokenizer = gio.RegisterMapper(tokenize)
    MapperAddOne    = gio.RegisterMapper(addOne)
    ReducerSum      = gio.RegisterReducer(sum)
)

func main(){

    gio.Init() // required, place it right after main() starts

    flow.New().TextFile("/etc/passwd").
        Mapper(MapperTokenizer). // invoke the registered "tokenize" mapper function.
        Mapper(MapperAddOne).    // invoke the registered "addOne" mapper function.
        ReducerBy(ReducerSum).   // invoke the registered "sum" reducer function.
        Sort(flow.OrderBy(2, true)).
        Fprintf(os.Stdout, "%s %d\n").Run()
}

func tokenize(row []interface{}) error {
    line := string(row[0].([]byte))
    for _, s := range strings.FieldsFunc(line, func(r rune) bool {
        return !('A' <= r && r <= 'Z' || 'a' <= r && r <= 'z' || '0' <= r && r <= '9')
    }) {
        gio.Emit(s)
    }
    return nil
}

func addOne(row []interface{}) error {
    word := string(row[0].([]byte))
    gio.Emit(word, 1)
    return nil
}

func sum(x, y interface{}) (interface{}, error) {
    return x.(uint64) + y.(uint64), nil
}

Pablo Rozas-Larraondo

unread,
Jan 22, 2017, 2:36:43 AM1/22/17
to golang-nuts
That's great news Chris! Is it documented anywhere with some more detail this pure Go implementation? I'd love to know more about how you overcame the initial problems such as external code execution that lead you to choose LuaJit in the first place.

Thanks,
Pablo

Chris Lu

unread,
Jan 22, 2017, 8:19:26 PM1/22/17
to Pablo Rozas-Larraondo, golang-nuts
Thanks! This page has a little more details. 


LuaJIT is fast. But Go code is more manageable and can have more complicated logic with libraries. Sacrificing the readability with extra type casting from interface{} seems a small cost to pay.

Chris



--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/MDx44mMa7jE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pablo Rozas Larraondo

unread,
Jan 22, 2017, 8:22:26 PM1/22/17
to Chris Lu, golang-nuts
Thanks Chris!
I'll go through the documentation to understand this new approach.

Pablo
Reply all
Reply to author
Forward
0 new messages