Question on "total time": does not include creation?

239 views
Skip to first unread message

Tatu Saloranta

unread,
Nov 10, 2012, 8:22:05 PM11/10/12
to java-serializat...@googlegroups.com
I started looking into Avro serialization/deserialization (outside of
benchmark), and realized that Avro incurs some of serialization costs
sort of ahead of time; basically it does UTF-8 encoding when building
GenericData instances. This means that repeated serialization calls of
the same object are faster than serialization calls using identical
but not same object.

To compensate for that, jvm-serializer benchmark has separate 'create'
timing section, measured by using "forward" method; this will account
for additional time in case of Avro and other codecs that do similar
pre-processing. So far so good.

However: I noticed that "total" only includes serialization and
deserialization, but not create time.
This is probably just an oversight, and I can fix it. But I thought I
will first verify that I am not ignoring something.
(an alternative to adding 'create' time would be just to call
"transformer.forward(value)" as part of serialization)

-+ Tatu +-

Nate

unread,
Nov 10, 2012, 9:13:00 PM11/10/12
to java-serializat...@googlegroups.com
Either fix sounds good to me.

We should also put results back up on the wiki.

-Nate



-+ Tatu +-

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.


cng...@gmail.com

unread,
Aug 2, 2013, 11:31:54 AM8/2/13
to java-serializat...@googlegroups.com
This change seems to be somewhat inaccurate/skewed against Avro's specific record serialization/deserialization.  Generally if using Avro's generated objects you are unlikely to be converting back and forth from them to some other format when serializing, you'd just use the generated objects themselves.


On Sunday, November 11, 2012 2:13:43 AM UTC, Nate wrote:
Either fix sounds good to me.

We should also put results back up on the wiki.

-Nate


On Sat, Nov 10, 2012 at 5:22 PM, Tatu Saloranta <tsalo...@gmail.com> wrote:
I started looking into Avro serialization/deserialization (outside of
benchmark), and realized that Avro incurs some of serialization costs
sort of ahead of time; basically it does UTF-8 encoding when building
GenericData instances. This means that repeated serialization calls of
the same object are faster than serialization calls using identical
but not same object.

To compensate for that, jvm-serializer benchmark has separate 'create'
timing section, measured by using "forward" method; this will account
for additional time in case of Avro and other codecs that do similar
pre-processing. So far so good.

However: I noticed that "total" only includes serialization and
deserialization, but not create time.
This is probably just an oversight, and I can fix it. But I thought I
will first verify that I am not ignoring something.
(an alternative to adding 'create' time would be just to call
"transformer.forward(value)" as part of serialization)

-+ Tatu +-

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-benchmarking+unsubscribe@googlegroups.com.

Kannan Goundan

unread,
Aug 2, 2013, 12:54:49 PM8/2/13
to java-serializat...@googlegroups.com
It's true that you can take the extra effort to make your code use Avro's Utf8 directly.  But I think it's more typical that your code uses java.lang.String objects that need to be converted to Avro Utf8 objects.

Since we're only showing one benchmark number, I think it's better to show the common case.  This is not ideal, but I think it's more helpful to the average programmer who looks at the benchmarks.

To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.

To post to this group, send email to java-serializat...@googlegroups.com.
Visit this group at http://groups.google.com/group/java-serialization-benchmarking.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

cng...@gmail.com

unread,
Aug 5, 2013, 4:31:41 AM8/5/13
to java-serializat...@googlegroups.com
that's not accurate for SpecificRecord.  You can use String's in SpecificRecord, you don't have to use Utf8.

point being that if i've gone to the trouble of generating eg the avro Media types that are in the benchmark, i'm not going to then insist on using a completely different set of Media domain objects and only convert to the avro Media type when i'm serializing.  i'm going to use the avro Media types everywhere.  so the benchmark shows an unrealistic use case for most any library that generate their own domain objects.  the general intention of those libraries is that you are going to use the generated domain objects.
To post to this group, send email to java-serialization-benchmarking...@googlegroups.com.

To unsubscribe from this group, send email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.

To post to this group, send email to java-serialization-benchm...@googlegroups.com.

Kannan Goundan

unread,
Aug 5, 2013, 5:51:25 PM8/5/13
to java-serializat...@googlegroups.com
Hmm, looks like I misunderstood your objection.  It looks like you don't think the "total" time should include the time required to create the object?

I agree that it's common to use Avro-generated classes everywhere.  However, I wouldn't consider it "unrealistic" to only use Avro for serialization.

For example, if you use Avro for client/server RPC, it's probably normal to create one message per serialization operation.  It's reasonable that the RPC structures aren't necessarily the best structures for the internal logic of the client or server.

And also even if you use Avro-generated classes everywhere, I think many realistic applications will create at least one Avro object per serialization operation (though it may have been created way ahead of time).

So, yeah, I don't think it's horrible to count creation time in the "total" results.  Even for serializers that use the hand-written POJOs, I think we include the time to create those objects in the "total" results.  Unless something has changed?

To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.

Nate

unread,
Aug 6, 2013, 12:42:17 AM8/6/13
to java-serializat...@googlegroups.com
In general, generated classes don't make good first class citizens in an object model. While using only Avro classes is also very useful, I would agree that it is reasonable to include "create" time in "total" time.

cng...@gmail.com

unread,
Aug 6, 2013, 6:15:35 AM8/6/13
to java-serializat...@googlegroups.com
you both make good points.

i suppose what i mean is that i would prefer to see the ser time not include the creation time as it does now.  so we would remove the transform time from ser, but add creat time to total (ie total = create + ser + deser).

at the moment total = ser + deser, but ser includes creation time as well.  so i don't think it presents a very accurate picture for avro generic in particular, where at a cursory glance you see high object creation time, and then reasonably high ser time as well.  but in fact ser includes creation time again.

if we change total to include create time and remove create time from ser time, i think it gives a more accurate picture.  then you can see the cost of eg not using the serialization-specific objects as domain objects.

To post to this group, send email to java-serialization-benchmarking...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.
Visit this group at http://groups.google.com/group/java-serialization-benchmarking.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Nate

unread,
Aug 6, 2013, 6:51:24 AM8/6/13
to java-serializat...@googlegroups.com
On Tue, Aug 6, 2013 at 12:15 PM, <cng...@gmail.com> wrote:
you both make good points.

i suppose what i mean is that i would prefer to see the ser time not include the creation time as it does now.  so we would remove the transform time from ser, but add creat time to total (ie total = create + ser + deser).

at the moment total = ser + deser, but ser includes creation time as well.  so i don't think it presents a very accurate picture for avro generic in particular, where at a cursory glance you see high object creation time, and then reasonably high ser time as well.  but in fact ser includes creation time again.

if we change total to include create time and remove create time from ser time, i think it gives a more accurate picture.  then you can see the cost of eg not using the serialization-specific objects as domain objects.

Ah, that makes sense to me.

-Nate

Reply all
Reply to author
Forward
0 new messages