mavenization pr submitted

56 views
Skip to first unread message

James Northrup

unread,
Feb 4, 2016, 4:03:55 AM2/4/16
to java-serialization-benchmarking
Hello i have committed a branch in my tree called 'teraformed" with a pr at https://github.com/eishay/jvm-serializers/pull/62

I am experienced with maven and can assist with build organization if someone with more experience can guess what can and can't work in maven-native-compiler, maven-exec, maven-antrun plugins and wants to help guide a maven port through IM's.     my own serializer code currently depends on a maven plugin to generate. 

there is otherwise a maven repo now in src/main/repo from the contents of tpc/lib/ that makes easy ports to  ivy, gradle, and related tools.

cheers

Kannan Goundan

unread,
Feb 4, 2016, 5:21:05 AM2/4/16
to java-serialization-benchmarking
Interesting!  Is this currently working?  I like the idea of getting JARs from Maven Central when possible.

My initial concerns:
- Is it easy to make the existing code generators work with Maven?
- Will it be a lot slower than the current Makefile-based build?

Also, out of curiosity, was it difficult to add your code generator to the current Makefile-based build?  It should be easy as long as there's a "main()" function that will run the generator.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.
Visit this group at https://groups.google.com/group/java-serialization-benchmarking.
For more options, visit https://groups.google.com/d/optout.

James Northrup

unread,
Feb 4, 2016, 7:46:49 AM2/4/16
to java-serialization-benchmarking
Hello Kannan,

i didnt even attempt to run the pom knowing there are shell and makefiles i didnt account for.

for my serializer, i just added the benchmark's schema to my serializer maven build and threw the finished jarfile in.  

im not sure where the guides are for writing the Transformers and Serializers, why would anyone publish Proto IDL in order to murder their hardware with all these class level attribute conversions.  there's no way this is a useful benchmark of a 0-copy cursor library for MMap and DirectByteBuffers.  the examples that are close appear to be Wuby, and it's plagued with horrible forwarding also.  If im using proto to generate java interfaces, shouldint i just refactor the test to use my interfaces instead of the forwarding ?

cheers.

On Thursday, February 4, 2016 at 2:21:05 AM UTC-8, Kannan Goundan wrote:
Interesting!  Is this currently working?  I like the idea of getting JARs from Maven Central when possible.

My initial concerns:
- Is it easy to make the existing code generators work with Maven?
- Will it be a lot slower than the current Makefile-based build?

Also, out of curiosity, was it difficult to add your code generator to the current Makefile-based build?  It should be easy as long as there's a "main()" function that will run the generator.
On Thu, Feb 4, 2016 at 1:03 AM, James Northrup <northru...@gmail.com> wrote:
Hello i have committed a branch in my tree called 'teraformed" with a pr at https://github.com/eishay/jvm-serializers/pull/62

I am experienced with maven and can assist with build organization if someone with more experience can guess what can and can't work in maven-native-compiler, maven-exec, maven-antrun plugins and wants to help guide a maven port through IM's.     my own serializer code currently depends on a maven plugin to generate. 

there is otherwise a maven repo now in src/main/repo from the contents of tpc/lib/ that makes easy ports to  ivy, gradle, and related tools.

cheers

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

Tatu Saloranta

unread,
Feb 4, 2016, 11:07:45 AM2/4/16
to java-serializat...@googlegroups.com
James, little bit less hostile wording would help keep conversation civil. No one here is particularly vested in current implementation, but there are reasons for some of the choices. It helps to keep an open mind and first ask about possible reasons, before declaring that it's all total and utter crap.

Regarding delegation, as far as I know, those are only used for converting input data in more automated fashion; actual serialization/deserialization use native direct approach where applicable. Intent is not to burden codecs with unreasonable overhead; but at same time, effort is made to try to keep field level. That is a non-trivial task with number of codecs included; and number of developers who have contributed their wrappers with varying levels of knowledge.

As to 0-copy cursor library, the focus is on straight-forward data-binding, not on low-level cursor/iterator access. While there are "manual" cases, those are mostly of interest to library developers themselves. Comparison of specific access patterns would be even more difficult to do; there are enough concerns about "byte[] vs InputStream", "pre-allocated all buffer vs incremental" that it would be impractical to worry about memory mapping.
In the end it is also doubtful whether more optimal access would be very important, considering modest size of payloads being tested.

Hope this helps,

-+ Tatu +-

ps. Maybe we also need to make sure Wuby example doesn't try to be too clever with optimizing access. It is not about absolutely highest performance for any given codec, but about relative performance with most straight-forward likely usage in real world (for given somewhat simplistic task).


On Thu, Feb 4, 2016 at 4:46 AM, James Northrup <northru...@gmail.com> wrote:
Hello Kannan,

i didnt even attempt to run the pom knowing there are shell and makefiles i didnt account for.

for my serializer, i just added the benchmark's schema to my serializer maven build and threw the finished jarfile in.  

im not sure where the guides are for writing the Transformers and Serializers, why would anyone publish Proto IDL in order to murder their hardware with all these class level attribute conversions.  there's no way this is a useful benchmark of a 0-copy cursor library for MMap and DirectByteBuffers.  the examples that are close appear to be Wuby, and it's plagued with horrible forwarding also.  If im using proto to generate java interfaces, shouldint i just refactor the test to use my interfaces instead of the forwarding ?

cheers.

On Thursday, February 4, 2016 at 2:21:05 AM UTC-8, Kannan Goundan wrote:
Interesting!  Is this currently working?  I like the idea of getting JARs from Maven Central when possible.

My initial concerns:
- Is it easy to make the existing code generators work with Maven?
- Will it be a lot slower than the current Makefile-based build?

Also, out of curiosity, was it difficult to add your code generator to the current Makefile-based build?  It should be easy as long as there's a "main()" function that will run the generator.
On Thu, Feb 4, 2016 at 1:03 AM, James Northrup <northru...@gmail.com> wrote:
Hello i have committed a branch in my tree called 'teraformed" with a pr at https://github.com/eishay/jvm-serializers/pull/62

I am experienced with maven and can assist with build organization if someone with more experience can guess what can and can't work in maven-native-compiler, maven-exec, maven-antrun plugins and wants to help guide a maven port through IM's.     my own serializer code currently depends on a maven plugin to generate. 

there is otherwise a maven repo now in src/main/repo from the contents of tpc/lib/ that makes easy ports to  ivy, gradle, and related tools.

cheers

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.

James Northrup

unread,
Feb 4, 2016, 4:48:07 PM2/4/16
to java-serialization-benchmarking
Hi cowtowncoder no hostilities are meant sorry.  There's useful io benchmarking here so i did perservere along the grain.

i think that for benchmarking the commonalities of access costs, a duck typing solution could level the field greatly.  

something i've learned when teamates paste serialization results from xml and json into unit test assertions is how to do duck-type comparison with a pair of serialize and deserailize.

between any two version of the same json serializer you may have differently order hash buckets in your java proxies.

the workaround is to serialize both objects and compare them for string likeness.

so if the millesconds cost for a fast json serializer is eliminated for all seralizer comparisons it could be apples-to-apples comparison of large arrays upstream of eqaulity by json.

i believe gson tends to use sorted keys predictably.

once again if im describing an option that's already in the suite, some pointers would be good.  



To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

James Northrup

unread,
Feb 4, 2016, 4:49:26 PM2/4/16
to java-serialization-benchmarking
>the workaround is to serialize both objects and compare them for string likeness.

compare them for equality, string or otherwise.  i think gson does map comparison fpr two gson objects

Tatu Saloranta

unread,
Feb 4, 2016, 9:22:55 PM2/4/16
to java-serializat...@googlegroups.com
On Thu, Feb 4, 2016 at 1:48 PM, James Northrup <northru...@gmail.com> wrote:
Hi cowtowncoder no hostilities are meant sorry.  There's useful io benchmarking here so i did perservere along the grain.

Ok no problem. And I can't really blame anyone for being frustrated -- code, build system and many other parts have their quirks.
 

i think that for benchmarking the commonalities of access costs, a duck typing solution could level the field greatly.  

Just to make sure I understand this: are you thinking of "untyped" deserialization into Maps, Lists etc? (or, equivalent format-specific tree models)? Some tests actually use these as well, in case of libs/formats that do not offer data-binding; or occasionally when lib provides multiple processing models.
 

something i've learned when teamates paste serialization results from xml and json into unit test assertions is how to do duck-type comparison with a pair of serialize and deserailize.

between any two version of the same json serializer you may have differently order hash buckets in your java proxies.
the workaround is to serialize both objects and compare them for string likeness.

so if the millesconds cost for a fast json serializer is eliminated for all seralizer comparisons it could be apples-to-apples comparison of large arrays upstream of eqaulity by json.

i believe gson tends to use sorted keys predictably.

once again if im describing an option that's already in the suite, some pointers would be good.  


I guess I am not 100% sure I follow here.
Perhaps you are referring to existing use of Transformers, which are used to convert from the canonical input data representation into either POJOs (for non-schema based codecs), or into lib/datatype-specific generated objects for formats like protoc. Transformation is not really needed with POJOs (or, Maps/Lists), but rather to work with source-generation-based packages: protoc, thrift, Avro's object binding and many others.

Validation of correct round-tripping (that is, data being serialized from input into format, then deserialized back) uses explicit comparison of input and output POJOs, using canonical presentation. Use of Maps/Lists would slightly simplify the code (enforcing ordering is simple enough, as you point out many libs like GSON and Jackson compare JSON Objects without assuming specific ordering), but would then require ability to convert to/from Maps.
For schema-generated cases this would conversely add bit more complexity I think.

-+ Tatu +-

 


To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.

James Northrup

unread,
Feb 5, 2016, 1:59:54 AM2/5/16
to java-serialization-benchmarking
i had given a thought to creating a jmh maven archetype to present n*m compatibility and covariance matrix for the

 {ZERO_KNOWLEDGE. FULLGRAPH|FLAT_TREE} (RHS) vs all (LHS) serializers

first, RHS publishes a set of objects (Object) for LHS to ingest as a benchmark
then LHS publishes the objects back for comparison.  

im sure im skipping something important here.  

for me this seems like the way to isolate an individual serializer's strengths while removing the hand-written glue 

for RHS a...f x LHS a...z a rectangular graph can show pass/fail, and benchmark timers.   covariance can indicate where one conversion did something of impact to its counterparts in a consistent way. 

"how" is another question. ii like the existing "make" results but i want to jar it all up into something maven-like up front with the features and not all the glue code.

i think its possible that each individual unit test ala junit (maven src/test/java/) is isolation of a classloader and a jvm under maven, or at least mostly  hermetic

im not a maven site publishing expert, since it seems broke more often than it works, but i hope that there's some recipes for jmh junit reporting in maven.
 

Just to make sure I understand this: are you thinking of "untyped" deserialization into Maps, Lists etc? (or, equivalent format-specific tree models)? Some tests actually use these as well, in case of libs/formats that do not offer data-binding; or occasionally when lib provides multiple processing models.
 
Validation of correct round-tripping (that is, data being serialized from input into format, then deserialized back) uses explicit comparison of input and output POJOs, using canonical presentation. Use of Maps/Lists would slightly simplify the code (enforcing ordering is simple enough, as you point out many libs like GSON and Jackson compare JSON Objects without assuming specific ordering), but would then require ability to convert to/from Maps.
Reply all
Reply to author
Forward
0 new messages