Light ScalaPB?

56 views
Skip to first unread message

Elias Ross

unread,
Oct 18, 2019, 8:20:18 PM10/18/19
to sca...@googlegroups.com
Hi everyone,

It's no surprise that ScalaPB generates a ton of classes and code. In
fact, it generates jar files about 5-6 times the Java version (29M
versus 5.6M in my case). For complicated projects, it is such a
headache really that we are considering dropping use of ScalaPB
entirely. Using Java within Scala, though, isn't a great fit. I'd
prefer Scala collections and Option values if possible.

There's also the issue about unknown fields being dropped. In some
instances, we can't use ScalaPB at all since we are losing
information, like in the case of processing serialized data on disk.

I'm kind of wondering how to approach this a bit differently. I would
like it it the ScalaPB files were simply case classes that could be
created directly from the Java protobuf messages. In particular,
something like this would be fine for my use:

message Foo {
required string name = 1
optional uint32 age = 2
}

Would be:

case class FooProto(name: String, age: Option[Int] = None, unknown: Unknown)
extends ScalaMessage[JavaFooProto]
{
def toJavaProto(): JavaFooProto = ...
}

trait ScalaMessage[A] {
def toJavaProto: A
def unknown: Unknown
}

'unknown' here would hold the unknown (serialized fields) so they
aren't lost. Naming is of course up to debate.

The companion would basically look like this:

object FooProto {
def apply(javaProto: FooProto) = ...
}

Maybe I'd keep some functionality such as withName() and getName().
Maybe I'd add some serialization features. But I don't really make use
of the other features and they really just bloat everything too much.

I'm sort of wondering how to approach a 'lite' version since I'm
pretty new to the code base, or if there was any interest outside of
my world in such a thing.

Nadav Samet

unread,
Oct 18, 2019, 9:43:12 PM10/18/19
to Elias Ross, ScalaPB
Hi Elias,

You are bringing up two issues: (1) generated code size and (2) preservation of unknown fields during serialization/deserialization.

For (1):
Maybe I'd keep some functionality such as withName() and getName().
Maybe I'd add some serialization features. But I don't really make use
of the other features and they really just bloat everything too much.

I think that the main optional feature that you have not listed is lenses, which you can turn off by passing the "no_lenses" option to the generator, or set the option "lenses: false" as a file-level or package-level option.
t has been requested in the past to provide an option to not generate `withX` and `clearX` - I know they contribute quadratically (in number of fields within each message) to the code size: each withX method calls copy() which access all fields.

Can you be more specific about the "other features" that we can make optional? Maybe we can consider not generating descriptors as well, though I am doubtful this is going to substantially reduce code size.

It also sounds like you are willing to tradeoff performance for code size by delegating serialization/deserialization to Java and perform conversion. Is that right?

For (2):
ScalaPB has an option to preserve unknown fields, but it is disabled by default in 0.9.x. This default is planned to change in a future minor version to match the behavior of the Java implementation. In the mean time, to enable preservation of unknown fields, set "preserver_unknown_fields" to true.

Hope this helps.
-Nadav


--
You received this message because you are subscribed to the Google Groups "ScalaPB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scalapb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scalapb/CAKsEmEOc141tbqqrv2sry16Xn3DcqoMtHFTh%2Bdk_186z-dDCaA%40mail.gmail.com.


--
-Nadav

Elias Ross

unread,
Oct 23, 2019, 3:01:06 PM10/23/19
to Nadav Samet, ScalaPB
On Fri, Oct 18, 2019 at 6:43 PM Nadav Samet <thes...@gmail.com> wrote:
>
> Hi Elias,
>
> You are bringing up two issues: (1) generated code size and (2) preservation of unknown fields during serialization/deserialization.
>
> For (1):
>>
>> Maybe I'd keep some functionality such as withName() and getName().
>> Maybe I'd add some serialization features. But I don't really make use
>> of the other features and they really just bloat everything too much.
>
>
> I think that the main optional feature that you have not listed is lenses, which you can turn off by passing the "no_lenses" option to the generator, or set the option "lenses: false" as a file-level or package-level option.
> t has been requested in the past to provide an option to not generate `withX` and `clearX` - I know they contribute quadratically (in number of fields within each message) to the code size: each withX method calls copy() which access all fields.

Hi Nadav,

I haven't looked at a recent version of the project (my apologies!)
but it's good that you have an option to disable the lenses feature.

> Can you be more specific about the "other features" that we can make optional? Maybe we can consider not generating descriptors as well, though I am doubtful this is going to substantially reduce code size.

Perhaps disabling the
writeTo/mergeFrom/serializedSize/getField/fromFieldsMap would reduce
code size dramatically.

Here's what I see with a typical Protobuf class in terms of class count:

fromFieldsMap - 4 classes
serializedSize - 2
getField - 6
toJavaProto - 10
writeTo - 8
lenses - 33

Each class seems to be about 1-2 kilobytes. Scala 2.11 here.

I do wonder if Scala is going to reduce code size in a future release,
so perhaps this is an unnecessary optimization looking ahead?

> It also sounds like you are willing to tradeoff performance for code size by delegating serialization/deserialization to Java and perform conversion. Is that right?

I think so. I haven't checked every API, but 90% of my APIs don't
require speed in de/serialization.

> For (2):
> ScalaPB has an option to preserve unknown fields, but it is disabled by default in 0.9.x. This default is planned to change in a future minor version to match the behavior of the Java implementation. In the mean time, to enable preservation of unknown fields, set "preserver_unknown_fields" to true.
>

Thanks for the help here.

Nadav Samet

unread,
Oct 23, 2019, 3:51:36 PM10/23/19
to Elias Ross, ScalaPB
Can you check how your use case is doing on Scala 2.12 or 2.13? There have been number of class size related issues that were fixed between 2.11 and 2.12.
--
-Nadav
Reply all
Reply to author
Forward
0 new messages