some questions re: current design

0 views
Skip to first unread message

Joshua O'Madadhain

unread,
Jan 4, 2009, 9:13:59 PM1/4/09
to obvious...@googlegroups.com
Breaking the thus-far deafening silence on this list: there are a few
questions I have, now that I've taken another look at a few things
(and synced to the repository again).

Some of these suggest interface changes (better now than later, even
if we're all (still) tired of this topic).

(1) Iterable vs. Collection
Right now the methods return Collections. They could instead return
Iterables (which only have an iterator() method).
Arguments for:
- makes it harder for people to screw up our data models by modifying
the Collection returned
- allows us to return (and use internally) arrays, which are Iterables
against:
- we need to provide extra methods (size, contains for each
Collection-returning method, probably).

(2) An easy one (I hope): what purpose does the interface Data serve?
It has no methods and seems to specify no behavior; if it has no
present use I'd suggest removing it (and adding it back in later if
appropriate).

(3) Rather than reinvent Predicate, why not use the
commons-collections interface by that name, which comes with many
useful utilities?

(4) What is the Scale interface for? Some class-level documentation
would be handy; the method-level docs are obscure without the larger
context.

I hope that you all have enjoyed your holidays.

Regards,

Joshua


--
joshua.o...@gmail.com...................www.ics.uci.edu/~jmadden
Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
It's that moment of dawning comprehension that I live for. -- Bill Watterson
My opinions are too rational and insightful to be those of any organization.

Mike Smoot

unread,
Jan 5, 2009, 7:49:04 PM1/5/09
to obvious...@googlegroups.com
Hi Joshua,

Regarding point (1), I'd argue for keeping Collections.
  • We can prevent people from mucking with our data models by returning unmodifiable collections or by returning copies of collections. 
  • If we provided an iterator() method, then to achieve thread safety we'd probably need to make an internal copy of whatever our internal collection is anyway so as not to lock the entire data structure.
  • We can still use arrays internally and provide Collection access with Arrays.toList().
  • Adding size(), contains() and other methods makes it sound like we're recreating the Collection interface.
  • If we're talking about graphs (and that's my reference point for the previous comments) then we'd need to provide methods which return Iterable objects (as opposed to iterators) for both nodes and edges.  This doesn't seem to be how Iterable is normally used.
I agree about point (3).  Can't help with the other questions.

cheers,
Mike
--
____________________________________________________________
Michael Smoot, Ph.D.               Bioengineering Department
tel: 858-822-4756         University of California San Diego

Joshua O'Madadhain

unread,
Jan 6, 2009, 1:57:39 PM1/6/09
to obvious...@googlegroups.com
On Mon, Jan 5, 2009 at 4:49 PM, Mike Smoot <msm...@ucsd.edu> wrote:
> Hi Joshua,

Hey, thanks for the response.

> Regarding point (1), I'd argue for keeping Collections.
>
> We can prevent people from mucking with our data models by returning
> unmodifiable collections or by returning copies of collections.

True. Since an array isn't a collection, though, we can't return an
array if we return a Collection.

> If we provided an iterator() method, then to achieve thread safety we'd
> probably need to make an internal copy of whatever our internal collection
> is anyway so as not to lock the entire data structure.

Possibly. But you typically need to make a new data structure
(Collection, array, whatever) for returning things like
graph.getNeighbors(vertex) anyway; you don't tend to store all
structures being implicitly maintained as elements of the topology.

> We can still use arrays internally and provide Collection access with
> Arrays.toList().

Of course you can always convert from an array to a Collection or vice
versa; the point is to avoid this sort of overhead where possible.

> Adding size(), contains() and other methods makes it sound like we're
> recreating the Collection interface.

Not at all. Collection includes a number of methods that I wouldn't
expect to need: mutation methods (add, addAll, clear, remove,
removeAll, retainAll) plus a couple of convenience methods
(containsAll, toArray, isEmpty) that we also don't need (given a way
of getting the size). The two remaining methods are equals and
hashCode, both of which are in Object anyway.

Let me be more explicit about what I'm proposing. For example, in place of

Collection<V> getNeighbors(V v)

we would have

Iterable<V> getNeighbors(V v)
int getNeighborCount(V v)
boolean isNeighbor(V v1, V v2)

and in place of

Collection<V> getNodes()

we'd have

Iterable<V> getNodes()
int getNodeCount()
boolean containsNode()

and so forth.

In my experience, when processing graphs (for algorithms, rendering,
whatever), 90% of the time you want to just iterate over the
neighbors, or all vertices, or whatever. The reason to have the count
and contains-like methods is for efficiency: you don't want to be
required to iterate over the whole thing to get the size or to do a
contains check. Everything else is gravy (or, rather, syntactic
sugar).

> If we're talking about graphs (and that's my reference point for the
> previous comments) then we'd need to provide methods which return Iterable
> objects (as opposed to iterators) for both nodes and edges.

Not sure we're on the same page. See above for examples.

> This doesn't
> seem to be how Iterable is normally used.

How do you think Iterable is normally used? This seems like a
perfectly appropriate use to me.

Joshua

Jean-Daniel Fekete

unread,
Jan 7, 2009, 6:03:22 PM1/7/09
to obvious...@googlegroups.com
Hi Joshua and Mike,

Thanks for keeping the conversation lively. I've been busy writing
reports during the winter break but now, I'm back. So, I start by
wishing a happy new year to everyone in the list.

I would like to raise another issue that is the "portability" of the
interface for other languages.
The Java classes are well designed but quite specific.
Therefore, I would avoid relying on Java idioms as much as possible when
they don't translate well into other languages.
This is why I would prefer:


Iterable<V> getNeighbors(V v)
int getNeighborCount(V v)
boolean isNeighbor(V v1, V v2)

to

Collection<V> getNeighbors(V v)

Furthermore, Iterators are nice, but when you have to create them when
iterating a large graph, the cost is important, due to the allocation
and garbage-collection.
On the same line, AWT Components that initially had only the
Rectangle getBounds()
method, now have
Rectangle getBounds(Rectangle r)
to avoid allocating rectangles when possible. I implemented iterators
the same way in my toolkit:
Iterator<V> getNeighbors(V v, Iterator<V> i)
so that the same iterator object could be reused. It also saves the
double indirection of Iterable<V>.
I know it is ugly but quite effective.

Iterating over a graph becomes:
Iterator<V> vi = null;
for (Iterator<V> v = graph.getNodes(null); v.hasNext(); ) {
for (vi = graph.getNeighbors(v, vi); vi.hasNext(); ) { // note that vi
is reused
V neighbor = vi.next();
...
}
}

Another idiom that Colt likes is mapper. However, it does not translates
well in C/C++ that don't have inner classes or lamda functions.

Therefore, I would vote for NOT using Collections.

Joshua's question 2:


> (2) An easy one (I hope): what purpose does the interface Data serve?
> It has no methods and seems to specify no behavior; if it has no
> present use I'd suggest removing it (and adding it back in later if
> appropriate).

I am not sure of the purpose of the Data interface in Java that has a
toplevel Object type. For other languages, it is handy.

> (3) Rather than reinvent Predicate, why not use the
> commons-collections interface by that name, which comes with many
> useful utilities?

Again, this is a JavaCentric proposition but that would be fine if we
had a Tuple object that would carry the Table and the row.
BTW, I am definitively in favor of adding a Tuple object if, like the
Iterator, it can be reused. It then becomes a bridge between the table
model and the object model at an acceptable cost. Otherwise, it is too
expensive.

> (4) What is the Scale interface for? Some class-level documentation
> would be handy; the method-level docs are obscure without the larger
> context.

Agreed. I just worked on these items that I call Axis. It could be
called AxisScale.
I added some javadoc to give an example. Maybe Jeff wants to elaborate.

Joshua O'Madadhain a écrit :

> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com
> Version: 8.0.176 / Virus Database: 270.10.3/1878 - Release Date: 06/01/2009 07:56
>
>

Joshua O'Madadhain

unread,
Jan 8, 2009, 6:45:27 PM1/8/09
to obvious...@googlegroups.com
On Wed, Jan 7, 2009 at 3:03 PM, Jean-Daniel Fekete
<Jean-Dani...@laposte.net> wrote:
>
> Hi Joshua and Mike,
>
> Thanks for keeping the conversation lively. I've been busy writing
> reports during the winter break but now, I'm back. So, I start by
> wishing a happy new year to everyone in the list.
>
> I would like to raise another issue that is the "portability" of the
> interface for other languages.

In my opinion, if we worry too much about this we're not going to make
much progress. Sure, it would be nice if these libraries were easy to
use for any language. But in practice, I think that it's more
important that this standard get used widely (or even at all!) in Java
software than that it be universal, which is an impractical (if not
impossible) goal anyway.

> The Java classes are well designed but quite specific.
> Therefore, I would avoid relying on Java idioms as much as possible when
> they don't translate well into other languages.
> This is why I would prefer:
> Iterable<V> getNeighbors(V v)
> int getNeighborCount(V v)
> boolean isNeighbor(V v1, V v2)
>
> to
>
> Collection<V> getNeighbors(V v)
>
> Furthermore, Iterators are nice, but when you have to create them when
> iterating a large graph, the cost is important, due to the allocation
> and garbage-collection.

I'm not sure what you're getting at.

You seem to like the Iterable-plus-size-and-contains methods solution
that I was proposing as an alternative.

But I'm not sure what this has to do with the creation-of-iterators issue.

> On the same line, AWT Components that initially had only the
> Rectangle getBounds()
> method, now have
> Rectangle getBounds(Rectangle r)
> to avoid allocating rectangles when possible. I implemented iterators
> the same way in my toolkit:
> Iterator<V> getNeighbors(V v, Iterator<V> i)
> so that the same iterator object could be reused.

I'd like to see the implementation of this method. My understanding
is that Iterator objects are not meant to be re-used, and I don't know
of any way to do so. This is, in fact, at least part of why I have
been advised to not return Iterators by people whose Java programming
expertise I respect (the rest of the argument is made nicely here:
http://jroller.com/tackline/entry/keep_iterators_to_yourself).

> It also saves the
> double indirection of Iterable<V>.

From a syntactic standpoint, this is a non-issue as of Java 1.5. Rather than

for (Iterator<V> it = graph.getNodes(); it.hasNext(); ) {
V v = it.next();
...

this is much cleaner:

for (V v : graph.getNodes()) {
...

If you're talking about the extra method call overhead...it seems to
me that Java itself comes with much greater overhead than that, and
I'm not sure why we're using Java at all if we're really that
concerned about such things.

> I know it is ugly but quite effective.
>
> Iterating over a graph becomes:
> Iterator<V> vi = null;
> for (Iterator<V> v = graph.getNodes(null); v.hasNext(); ) {
> for (vi = graph.getNeighbors(v, vi); vi.hasNext(); ) { // note that vi
> is reused
> V neighbor = vi.next();
> ...
> }
> }

I don't understand this snippet. v is declared as an Iterator but
seems then to be used as a vertex inside getNeighbors.

> Another idiom that Colt likes is mapper. However, it does not translates
> well in C/C++ that don't have inner classes or lamda functions.

Not sure what you're referring to here. Can you clarify? (Java
doesn't have lambda functions either...)

> Therefore, I would vote for NOT using Collections.
>
> Joshua's question 2:
>> (2) An easy one (I hope): what purpose does the interface Data serve?
>> It has no methods and seems to specify no behavior; if it has no
>> present use I'd suggest removing it (and adding it back in later if
>> appropriate).
>
> I am not sure of the purpose of the Data interface in Java that has a
> toplevel Object type. For other languages, it is handy.

Again, I'd prefer to keep things as simple as possible to drive
adoption. One of the ways to do this is to not include interfaces for
which we don't yet have an actual use, as it reduces the number of
questions that people will have when they're deciding whether it's
worth the trouble to use our stuff.

Just out of curiosity, in what way is this interface handy elsewhere?
The fact that everything inherits from Data doesn't give us anything
as far as I can tell, because Data defines no capabilities.

>> (3) Rather than reinvent Predicate, why not use the
>> commons-collections interface by that name, which comes with many
>> useful utilities?
>
> Again, this is a JavaCentric proposition but that would be fine if we
> had a Tuple object that would carry the Table and the row.
> BTW, I am definitively in favor of adding a Tuple object if, like the
> Iterator, it can be reused. It then becomes a bridge between the table
> model and the object model at an acceptable cost. Otherwise, it is too
> expensive.

All we need is a wrapper. If by 'Tuple' you mean just a simple
2-element wrapper for Table/row, that's fine by me. An example of
what I mean is the Context class in JUNG (in graph.util).

>> (4) What is the Scale interface for? Some class-level documentation
>> would be handy; the method-level docs are obscure without the larger
>> context.
>
> Agreed. I just worked on these items that I call Axis. It could be
> called AxisScale.
> I added some javadoc to give an example. Maybe Jeff wants to elaborate.

Thanks for the documentation.

Joshua

Jean-Daniel Fekete

unread,
Jan 9, 2009, 3:18:16 AM1/9/09
to obvious...@googlegroups.com
Joshua O'Madadhain a écrit :
> In my opinion, if we worry too much about this we're not going to make
> much progress. Sure, it would be nice if these libraries were easy to
> use for any language. But in practice, I think that it's more
> important that this standard get used widely (or even at all!) in Java
> software than that it be universal, which is an impractical (if not
> impossible) goal anyway.
>
>
Just to keep that in mind.

>> The Java classes are well designed but quite specific.
>> Therefore, I would avoid relying on Java idioms as much as possible when
>> they don't translate well into other languages.
>> This is why I would prefer:
>> Iterable<V> getNeighbors(V v)
>> int getNeighborCount(V v)
>> boolean isNeighbor(V v1, V v2)
>>
>> to
>>
>> Collection<V> getNeighbors(V v)
>>
>> Furthermore, Iterators are nice, but when you have to create them when
>> iterating a large graph, the cost is important, due to the allocation
>> and garbage-collection.
>>
>
> I'm not sure what you're getting at.
>
> You seem to like the Iterable-plus-size-and-contains methods solution
> that I was proposing as an alternative.
>
> But I'm not sure what this has to do with the creation-of-iterators issue.
>
I am in favor of the Iterable-plus-size-and-contains methods but I hate
the allocation/GC overhead that is large when iterating large graphs.
The allocation/GC overhead can be larger than the computing code, for
example when collecting statistics or searching for connected components.

>> On the same line, AWT Components that initially had only the
>> Rectangle getBounds()
>> method, now have
>> Rectangle getBounds(Rectangle r)
>> to avoid allocating rectangles when possible. I implemented iterators
>> the same way in my toolkit:
>> Iterator<V> getNeighbors(V v, Iterator<V> i)
>> so that the same iterator object could be reused.
>>
>
> I'd like to see the implementation of this method. My understanding
> is that Iterator objects are not meant to be re-used, and I don't know
> of any way to do so. This is, in fact, at least part of why I have
> been advised to not return Iterators by people whose Java programming
> expertise I respect (the rest of the argument is made nicely here:
> http://jroller.com/tackline/entry/keep_iterators_to_yourself).
>
>

Reusing iterators is easy:

Iterator<E> getLinks(V v, Iterator<V> old) {
MyEdgeIterator ei = null;
if (old instanceof MyEdgeIterator) {
ei = (MyEdgeIterator) old;
ei.reset(whatever);
}
else {
ei = new MyEdgeIterator(whatever);
}
return ei;
}

I'd be happy to strictly conform to Java good practices but the price to
pay is huge. The code above works also
when a null iterator is supplied so it fits beginners/lazy programmers
practice but it also allows for optimizations.
I understand Java syntax is not optimally designed for that as you point
out.


>> It also saves the
>> double indirection of Iterable<V>.
>>
>
> >From a syntactic standpoint, this is a non-issue as of Java 1.5. Rather than
>
> for (Iterator<V> it = graph.getNodes(); it.hasNext(); ) {
> V v = it.next();
> ...
>
> this is much cleaner:
>
> for (V v : graph.getNodes()) {
> ...
>
> If you're talking about the extra method call overhead...it seems to
> me that Java itself comes with much greater overhead than that, and
> I'm not sure why we're using Java at all if we're really that
> concerned about such things.
>
>

Not method call overhead, allocation/GC overhead. Creating a new object
for iterating each entity in the graph is expensive.


>> I know it is ugly but quite effective.
>>
>> Iterating over a graph becomes:

Here is the corrected version. It is ok to allocate one or two iterators
for a graph traversal but not as many as there are nodes.

Iterator<V> vi = null;

for (Iterator<V> iter = graph.getNodes(null); iter.hasNext(); ) {
V v = iter.next();
for (vi = graph.getNeighbors(v, vi); vi.hasNext(); ) { // vi is reused


V neighbor = vi.next();
...
}
}

>> Another idiom that Colt likes is mapper. However, it does not translates


>> well in C/C++ that don't have inner classes or lamda functions.
>>
>
> Not sure what you're referring to here. Can you clarify? (Java
> doesn't have lambda functions either...)
>

It has inner classes that I see as a sort of palliative for
lambda/closure in functional languages.

Yep, exactly, plus all the handy methods.

> Version: 8.0.176 / Virus Database: 270.10.5/1881 - Release Date: 07/01/2009 17:59
>
>

Joshua O'Madadhain

unread,
Jan 9, 2009, 11:16:32 AM1/9/09
to obvious...@googlegroups.com
On Fri, Jan 9, 2009 at 12:18 AM, Jean-Daniel Fekete
<Jean-Dani...@laposte.net> wrote:
>
> I am in favor of the Iterable-plus-size-and-contains methods but I hate
> the allocation/GC overhead that is large when iterating large graphs.
> The allocation/GC overhead can be larger than the computing code, for
> example when collecting statistics or searching for connected components.

I suppose it could. Has this been a problem in practice for you in
recent years?

> Reusing iterators is easy:
>
> Iterator<E> getLinks(V v, Iterator<V> old) {
> MyEdgeIterator ei = null;
> if (old instanceof MyEdgeIterator) {
> ei = (MyEdgeIterator) old;
> ei.reset(whatever);
> }
> else {
> ei = new MyEdgeIterator(whatever);
> }
> return ei;
> }
>
> I'd be happy to strictly conform to Java good practices but the price to
> pay is huge. The code above works also
> when a null iterator is supplied so it fits beginners/lazy programmers
> practice but it also allows for optimizations.
> I understand Java syntax is not optimally designed for that as you point
> out.

Ah. So in order to make this optimization possible you have to
provide custom Iterators that can be reset. This is not something
that's really natural to Java programmers in my experience.

The price of conforming to Java good practices may be less efficient
code in some regards (although it's not clear to me that that's
actually a big issue in practice these days). The price of _not_
doing so is that our APIs are more confusing and lead to more
confusing code. Generally speaking, unless we're talking about
large-scale computing, if there's a tradeoff between programmer time
and computer time, I know which one I want to optimize for. :)

>>> I know it is ugly but quite effective.
>>>
>>> Iterating over a graph becomes:
> Here is the corrected version. It is ok to allocate one or two iterators
> for a graph traversal but not as many as there are nodes.
>
> Iterator<V> vi = null;
> for (Iterator<V> iter = graph.getNodes(null); iter.hasNext(); ) {
> V v = iter.next();
> for (vi = graph.getNeighbors(v, vi); vi.hasNext(); ) { // vi is reused
> V neighbor = vi.next();
> ...
> }
> }

Seems to me that if you're really worried about this, you would return
arrays or Lists so that you wouldn't have to use iterators at all. (I
am not recommending this! :) )

Jean-Daniel Fekete

unread,
Jan 10, 2009, 4:40:24 AM1/10/09
to obvious...@googlegroups.com

Joshua O'Madadhain a écrit :

> On Fri, Jan 9, 2009 at 12:18 AM, Jean-Daniel Fekete
> <Jean-Dani...@laposte.net> wrote:
>
>> I am in favor of the Iterable-plus-size-and-contains methods but I hate
>> the allocation/GC overhead that is large when iterating large graphs.
>> The allocation/GC overhead can be larger than the computing code, for
>> example when collecting statistics or searching for connected components.
>>
>
> I suppose it could. Has this been a problem in practice for you in
> recent years?
>
>

Yes. I visualize the structure of Wikipedia (page to page links). The
French Wikipedia is 600,000 pages and 6,000,000 links.
See http://www.aviz.fr/~elm/projects/zame.html

I propose two levels of interface, one that adheres strictly to standard
java good practices and a second level to allow optimizations.
Just like the Component.getBounds() methods. One allocates whereas the
other can avoid allocating.

>>>> I know it is ugly but quite effective.
>>>>
>>>> Iterating over a graph becomes:
>>>>
>> Here is the corrected version. It is ok to allocate one or two iterators
>> for a graph traversal but not as many as there are nodes.
>>
>> Iterator<V> vi = null;
>> for (Iterator<V> iter = graph.getNodes(null); iter.hasNext(); ) {
>> V v = iter.next();
>> for (vi = graph.getNeighbors(v, vi); vi.hasNext(); ) { // vi is reused
>> V neighbor = vi.next();
>> ...
>> }
>> }
>>
>
> Seems to me that if you're really worried about this, you would return

> arrays or Lists so that you wouldn't have to use iterators at all. (Iby

> am not recommending this! :) )
>

There is a tension when designing interfaces. You want to abstract the
underlying structure but you want to keep performance and resources
reasonable.
In our case, since we design "models" of the data structures, there is
always the possibility of using the underlying implementation when you
know it and use whatever concrete data structure it maintains directly
if it provides more efficient mechanisms.
This iterator thing will provide a 10x increase of speed compared to the
standard one. If you want a 100x, you have to access the raw data
structure, that's my point.

Jean-Daniel.
> Joshua

Mike Smoot

unread,
Jan 14, 2009, 11:42:53 AM1/14/09
to obvious...@googlegroups.com
Hi Guys,

Sorry for the slow response.  I'll just glom my comments together here rather than parse through the past points.
  • I still prefer Collection to Iterable, and actually my real preference would be to return List objects, but I don't see that suggestion going anywhere.  This all has to do with utility to the API consumer, not ease of coding for the library author. As far as I'm concerned, we're *supposed* to the dirty work and let users have an easy-to-use API.  I think the more functionality we can provide, the happier users will be and the more popular the library will be.  It's of course a fine balance deciding what is too much, but I think the (very) minor inconvenience of calling Arrays.toList() or other wrapper methods is small cost to pay in the case that you're using arrays internally.  That said, I don't think this is a big deal and I can implement either interface without issue.
  • Jean-Daniel, is the reason that you want to reuse iterators because your iterators contain a copy of all objects being iterated over?  In my experience iterators are very light weight with just a few references back to some data set, so I'm not clear on why there is a performance concern here.
  • I actually support getNodeCount() and contains() methods as optimizations.  I agree that you don't want to construct an entire collection just to count the nodes. 
  • I'm with Josh on being extremely wary of attempting a cross-language library.  If we want to support multiple languages then I think we should strive for conceptual consistency rather than method-by-method consistency.  Different languages have such different capabilities that I think we'd miss much of the benefit of modern languages if we relied on a common subset of features.

thanks,
Mike

Jean-Daniel Fekete

unread,
Feb 1, 2009, 3:34:03 PM2/1/09
to obvious...@googlegroups.com
Hi,

I am very late at catching up with the discussion... INRIA has an
evaluation and I had another one last week so it was a mess to prepare
all these assessments. Anyway, the work is done so I am back at serious
issues.

I agree with your points Mike, List is fine with me.
Iterators seem lightweight until you allocate millions of them. Let's
not bother about that yet and focus on a clean interface first that
won't hurt future optimizations.
Cross-language issues are at the same level I think. We can stick to
Java idioms if they don't imply too many load (e.g. the reflection API).

The only issue I have with these Collection/List interfaces is that they
carry a fairly large number of methods (contains, addAll, containsAll,
etc.) that are boring to do. Anyway, they are convenient for the user so
that's fine.
Could we try to implement the more specific/convenient one first (List)
and then see if there are serious issues with it?

Jean-Daniel.

Mike Smoot a écrit :
> Hi Guys,
>
> Sorry for the slow response. I'll just glom my comments together here
> rather than parse through the past points.
>
> * I still prefer Collection to Iterable, and actually my real
> preference would be to return List objects, but I don't see that
> suggestion going anywhere. This all has to do with utility to
> the API consumer, not ease of coding for the library author. As
> far as I'm concerned, we're *supposed* to the dirty work and let
> users have an easy-to-use API. I think the more functionality
> we can provide, the happier users will be and the more popular
> the library will be. It's of course a fine balance deciding
> what is too much, but I think the (very) minor inconvenience of
> calling Arrays.toList() or other wrapper methods is small cost
> to pay in the case that you're using arrays internally. That
> said, I don't think this is a big deal and I can implement
> either interface without issue.
> * Jean-Daniel, is the reason that you want to reuse iterators
> because your iterators contain a copy of all objects being
> iterated over? In my experience iterators are very light weight
> with just a few references back to some data set, so I'm not
> clear on why there is a performance concern here.
> * I actually support getNodeCount() and contains() methods as
> optimizations. I agree that you don't want to construct an
> entire collection just to count the nodes.
> * I'm with Josh on being extremely wary of attempting a
> cross-language library. If we want to support multiple
> languages then I think we should strive for conceptual
> consistency rather than method-by-method consistency. Different
> languages have such different capabilities that I think we'd
> miss much of the benefit of modern languages if we relied on a
> common subset of features.
>
>
> thanks,
> Mike
>
>
> On Sat, Jan 10, 2009 at 1:40 AM, Jean-Daniel Fekete
> <Jean-Dani...@laposte.net
> <mailto:Jean-Dani...@laposte.net>> wrote:
>
>
>
>
> Joshua O'Madadhain a écrit :
> > On Fri, Jan 9, 2009 at 12:18 AM, Jean-Daniel Fekete
> > <Jean-Dani...@laposte.net
> <mailto:Jean-Dani...@laposte.net>> wrote:
> >
> >> I am in favor of the Iterable-plus-size-and-contains methods
> but I hate
> >> the allocation/GC overhead that is large when iterating large
> graphs.
> >> The allocation/GC overhead can be larger than the computing
> code, for
> >> example when collecting statistics or searching for connected
> components.
> >>
> >
> > I suppose it could. Has this been a problem in practice for you in
> > recent years?
> >
> >
> Yes. I visualize the structure of Wikipedia (page to page links). The
> French Wikipedia is 600,000 pages and 6,000,000 links.
> See http://www.aviz.fr/~elm/projects/zame.html
> <http://www.aviz.fr/%7Eelm/projects/zame.html>
> Version: 8.0.176 / Virus Database: 270.10.6/1888 - Release Date: 12/01/2009 07:04
>
>

Joshua O'Madadhain

unread,
Feb 4, 2009, 12:06:16 PM2/4/09
to obvious...@googlegroups.com
On Sun, Feb 1, 2009 at 12:34 PM, Jean-Daniel Fekete
<Jean-Dani...@laposte.net> wrote:
>
> Hi,
>
> I am very late at catching up with the discussion... INRIA has an
> evaluation and I had another one last week so it was a mess to prepare
> all these assessments. Anyway, the work is done so I am back at serious
> issues.
>
> I agree with your points Mike, List is fine with me.

List has the problem that the available implementations don't provide
efficient contains() checks.

It's important to keep a couple of things in mind here:

(1) We're talking about specifying interfaces, not implementations.
We can always change the implementations, or write new ones.

(2) No one will take our cookies away if we change the interfaces a
few times before we release anything.
Let me emphasize this: _I should not have brought this issue up in the
first place before actually checking some implementations into the
repository._

> Iterators seem lightweight until you allocate millions of them. Let's
> not bother about that yet and focus on a clean interface first that
> won't hurt future optimizations.
> Cross-language issues are at the same level I think. We can stick to
> Java idioms if they don't imply too many load (e.g. the reflection API).

I cannot imagine why we would need to reference the reflection API in
our interfaces. Let's definitely not do that. :)

>
> The only issue I have with these Collection/List interfaces is that they
> carry a fairly large number of methods (contains, addAll, containsAll,
> etc.) that are boring to do. Anyway, they are convenient for the user so
> that's fine.

This is only an issue if the person implementing one of these Graph
classes doesn't want to use internally the Collections implementations
provided as part of the Java API.

> Could we try to implement the more specific/convenient one first (List)
> and then see if there are serious issues with it?

I applaud the notion of providing an implementation first and then
having more discussions on design.

I will say one more thing on this subject, and then shut up until I
get something checked in.

If we really, really can't agree on what these methods should return,
then we can do this:

public interface Graph<V, E, VC extends Collection<V>, EC extends Collection<E>>
{
VC getNeighbors(V vertex);
EC getIncidentEdges(V vertex);
// etc.
}

That is, we can genericize the interface so that the precise type of
Collection [or Iterable] that is returned is specified at compile
time. This complicates the declaration of a Graph somewhat, but
basically pushes the "return a List? or a Set? or a Collection?"
question off to the implementation, thus:

public class ListBasedGraph<V,E> implements Graph<V,E, List<V>, List<E>>
{
List<V> getNeighbors(V vertex) { ... }
List<E> getIncidentEdges(V vertex) { ... }
// etc.
}

Joshua
Reply all
Reply to author
Forward
0 new messages