Streaming protobuf objects

722 views
Skip to first unread message

Kyrylo Stokoz

unread,
Jan 30, 2016, 9:37:03 AM1/30/16
to Akka User List
Hi All,

I`m trying to learn more about akka streams and have on first sight trivial task.
I have a very large file of protobuf encoded objects i want to stream them using akka http streams. 

So i would like to create following flow:

 Read file(s) -> parse protobuf into seq(domain objects) -> transform each object -> render result into json / other format -> send to receiver.

I start with creating source from each file i needed to render using FileIO.fromFile(file).
And i`m stuck on 2nd step: to parse protobufs i have 3 options:
1. Obj.parseFrom(inputstream): Obj
2. Obj.parseFrom(byte []): Obj
3. Obj.streamFromDelimitedInput(inputstream): Stream[Obj]

So far i was not able to find a way how i can properly convert Source[ByteString, _] to Source[Obj, _].

Is there any way to create above mentioned flow? ideally without loading all data in memory.

So far i tried to play with:
1. Framing not success, i was not able to find correct delimiter to parse protobuf
2. Create Stream[Obj] using (Obj.streamFromDelimitedInput(input stream): Stream[Obj]) and later create Source from Stream - but this gives very poor performance results. 

Thanks in advance for your help,
Kyrylo

Viktor Klang

unread,
Jan 30, 2016, 10:16:53 AM1/30/16
to Akka User List
What's the layout of the file with protobuf objects?

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--
Cheers,

Kyrylo Stokoz

unread,
Jan 30, 2016, 12:19:40 PM1/30/16
to Akka User List
I guess you are interested how objects are stored:

Initial bytes define length of record later goes record itself. Normally i have file with 350K of them. Example head of file you can see below:
Hope this answers your question.

~$
^I326216374^R^KtrafficFlow^Xú<85>^B">bb1246cb-6f12-4b9e-a68c-fe3f67e65388* 324a6978e0ae0e159437dc03a3f00ad9~$
^I326216375^R^KtrafficFlow^Xú<85>^B">f647601e-5b1f-4325-a0a4-48cc1a1598a6* 9585fd96770a68faa41eb0bd202c45d1~$
^I326216380^R^KtrafficFlow^Xú<85>^B">6e3c8929-dd6e-4de7-8342-8f5f08da9e38* df06d0a9f4ef5d8866538f5d2352b3fd~$
...

Kyrylo Stokoz

unread,
Jan 30, 2016, 12:32:05 PM1/30/16
to Akka User List
Sorry looks like i was wrong, there is no record length in beginning. Example of proto definition see below:

message MyObject {
optional string FieldName1 = 1;
optional string FieldName2 = 2;
optional string FieldName3 = 3;
optional string FieldName4 = 4;
optional int64 FieldName5 = 5;
}

Viktor Klang

unread,
Jan 30, 2016, 12:50:13 PM1/30/16
to Akka User List
Hi,

it is important to know how the file is layouted. (i.e. between MyObjects)

Kyrylo Stokoz

unread,
Jan 31, 2016, 7:13:33 AM1/31/16
to Akka User List
Ok, i dug into how our objects are stored and indeed they have a record length in front of them, but unfortunately protobuf is using base 128 varints (https://developers.google.com/protocol-buffers/docs/encoding), length field can be encoded in 1 byte for short records and 2-4 bytes for longer records.

Basically in this case i cannot split stream into frames with Framing.lengthField.

Taking into account proto layout is there anything i can do to build above mentioned pipeline in efficient way? 
Any suggestions very appreciated.

Thanks,
Kyrylo

Viktor Klang

unread,
Jan 31, 2016, 9:16:58 AM1/31/16
to Akka User List

Sounds like you ought to write a parser GraphStage which decodes inbound ByteStrings and emits ByteStrings, one per object, then you can decode it no matter if it comes from a file or over network, and the conversion from ByteString to your protobuf object can be a simple map()

--
Cheers,

Leopoldo Müller

unread,
Dec 29, 2016, 5:16:33 AM12/29/16
to Akka User List
Kyrylo,

As suggested by √ I implemented the GraphStage, seems to be working so far.

import com.google.protobuf.Parser

import akka.actor._
import akka.stream._
import akka.stream.scaladsl._
import akka.stream.stage._
import akka.util._
import scala.collection.JavaConverters._

import scala.util.Try

object BmruleView extends App {

    implicit val system = ActorSystem()
    implicit val executor = system.dispatcher
    implicit val materializer = ActorMaterializer()

    StreamConverters.fromInputStream(() => System.in)
        .via {
            new ParseDelimitedFromStage(MYMESSAGE.parser())
        }
        .runForeach { it =>
            println(it)
        }
        .onComplete {
            case _ => system.terminate()
        }
}

class ParseDelimitedFromStage[T](parser: Parser[T]) extends GraphStage[FlowShape[ByteString, T]] {
    val in = Inlet[ByteString]("ParseDelimitedFromStage.in")
    val out = Outlet[T]("ParseDelimitedFromStage.out")
    override val shape = FlowShape(in, out)

    private var tail = ByteString.empty

    override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) with InHandler with OutHandler {

        setHandlers(in, out, this)

        override def onPush() = {

            val chunk = tail ++ grab(in)
            val iterator = chunk.iterator
            var leftover = 0

            Iterator
                .continually {
                    leftover = iterator.len
                    Try(parser.parseDelimitedFrom(iterator.asInputStream))
                }
                .takeWhile { _ => iterator.len > 0 }
                .foreach { it => emit(out, it.get) }

            tail = chunk.takeRight(leftover)
        }

        override def onPull() = pull(in)

Viktor Klang

unread,
Dec 29, 2016, 9:42:46 AM12/29/16
to Akka User List
Glad it worked out well for you!

You'll want to put ` private var tail = ByteString.empty` inside the creation of th GraphStageLogic (otherwise your stage does not rematerialize properly)

To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscribe@googlegroups.com.

To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--
Cheers,
Reply all
Reply to author
Forward
0 new messages