HashMap is O(1) cost for puts and gets, TreeMap is O(log n) for those
operations, with the fairly small (only a few hundred entries) size of
the attributes map in CRN that shouldn't change performance much at all
and would amortize the minor cost increase over each map.put operation.
-Eric
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups "Cernunnos Discussion" group.
> To post to this group, send email to cernunnos-...@googlegroups.com
> To unsubscribe from this group, send email to cernunnos-discus...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/cernunnos-discussion?hl=en
> -~----------~----~----~----~------~----~------~--~---
>
>
--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Cernunnos Discussion" group. To post to this group, send email to cernunnos-...@googlegroups.com
On that note I have a few areas I'd like to see the logging switched
from INFO to DEBUG as the messages are very verbose or are not INFO
level info for end users.
-Eric
wills...@gmail.com wrote:
> > Thinking about this a bit more doing the sort once per log call is much
> > more expensive than just having these maps be TreeMaps.
>
> But they're only logged once, in the ScriptRunner, which happens one
> time per invocation of a CRN XML document.
>
> It might be a good idea to move the sorting code itself into a private
> method w/in ScriptRunner, to prevent it from being used unwittingly
> elsewhere.
>
> >
> > The only other thing to be careful with either approach is if the keys
> > do not all implement Comparable an exception will be thrown.
> >
>
> Ah sure. In our case, the keys are all Strings per the API.
>
> drew wills
Ah but what about the uPortal import/export where subscripts are used everywhere and called thousands of times.
On that note I have a few areas I'd like to see the logging switched from INFO to DEBUG as the messages are very verbose or are not INFO level info for end users.
-Eric> Thinking about this a bit more doing the sort once per log call is muchBut they're only logged once, in the ScriptRunner, which happens one time per invocation of a CRN XML document.
> more expensive than just having these maps be TreeMaps.
It might be a good idea to move the sorting code itself into a private method w/in ScriptRunner, to prevent it from being used unwittingly elsewhere.
On Fri, Jan 16, 2009 at 6:56 PM, Eric Dalquist <eric.d...@doit.wisc.edu> wrote:
Ah but what about the uPortal import/export where subscripts are used everywhere and called thousands of times.
By switching mergedAttributes from a HashMap implementation to a TreeMap implementation (or LinkedHashMap or any other implementation where the sort order of the keys is guaranteed) the keys are (re-)sorted every time you add an attribute to it. It wouldn't be noticeable for little jobs but for an export or import of thousands of records, it could be significant.
On that note I have a few areas I'd like to see the logging switched from INFO to DEBUG as the messages are very verbose or are not INFO level info for end users.
I agree. For example, if the preamble was logged at DEBUG rather than INFO, then the patch I submitted is fine. Furthermore, large jobs with lots of script invocations execute without any noticeable overhead (maybe they would even run faster since the log statements may not even be ran at all).
-Eric> Thinking about this a bit more doing the sort once per log call is muchBut they're only logged once, in the ScriptRunner, which happens one time per invocation of a CRN XML document.
> more expensive than just having these maps be TreeMaps.
Right.
It might be a good idea to move the sorting code itself into a private method w/in ScriptRunner, to prevent it from being used unwittingly elsewhere.
That's why my patch scoped the method as package private. It felt like the "right" place to put it but it certainly does not feel "right" for that method to be public. I'm not intimately familiar with the runtime package and where else something like that would be useful so busting it down to a private method in ScriptRunner is reasonable (even in-lining it in the logging statement before the iterating through the keys is fine too).
Ah sure. In our case, the keys are all Strings per the API.
>
> The only other thing to be careful with either approach is if the keys
> do not all implement Comparable an exception will be thrown.
>
drew wills
Clearly there are some performance advantages here. But I was under
the impression that HashMaps are generally faster. From the API:
"This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the
elements properly among the buckets." Obviously the hashCode methods
for the element types need to be written carefully. String's hashCode
function would be as close to a guarantee that this happened. With
the small number of elements in the attribute maps, the difference in
implementations is negligible.
>
> A full sort, which the patch does on each log call costs N*log2(N) since you have to loop over every key (N) and add it to the tree (log2(N)).
OK.
>
> In the case of the attribute Map sizes we have (only a few hundred elements) the log2(N) may actually be faster than the linear time work as calculating the hash position is fairly complex.
It would actually be quite a bit faster (I'm a mathematician more than
a data structure expert). The curve for a log function is more
constant (shallower) over time.
>
> Just advocating that switching the attributes map to a TreeMap would solve the problem and likely have other benefits down the road with no cost most of the time and a speed up when using lots of subscripts and logging the start stanza.
>> On that note I have a few areas I'd like to see the logging switched from INFO to DEBUG as the messages are very verbose or are not INFO level info for end users.
Sure that would be fine too. Also though, if the preamble were logged
at DEBUG level (which is probably more appropriate as you hinted at
earlier) that would also work too.