* All the unit tests dealing with null/empty collections currently
fail because I haven't implemented the OOB property to track null
state yet.
* We need a unit test for a null object in an @Embedded array; right
now it will come back with a constructed (but otherwise empty) object
of the correct type.
* Haven't implemented @TransientWith yet.
* The test for duplicate @OldName population is failing. It can be
solved at the cost of some code complexity, but is it a big deal?
* @OldName should work as-advertised for fields anywhere in the
object graph, including embedded classes and collections.
* You can't currently mix @OldName on *methods* with @Embedded.
* Registration is complicated but at runtime the code is
super-efficient and barely creates any garbage while loading and
saving entities.
Tomorrow I'll revisit the null/empty debate and probably implement
something. I'd like to get 2.0b1 out tomorrow (Thursday).
Big thanks to Matt for doing the first pass of implementation and
writing a ton of great unit tests!!
Jeff
> * The test for duplicate @OldName population is failing. It can be
> solved at the cost of some code complexity, but is it a big deal?
I think I could live with that, so long as the new-name value clearly
trumps the old-name value.
> Tomorrow I'll revisit the null/empty debate and probably implement
> something. I'd like to get 2.0b1 out tomorrow (Thursday).
My main input is: so long as it is clear and obvious to users what the
OOB property names will be (so they can be used in queries and
compound indexes) I'm happy. I have a gut preference that the default
behavior should be zero OOB properties, but you made a good case for
the opposite in the previous thread.
While we are talking about 2.0, there are a couple of things I'd like
to propose for 2.0:
1) Instead of calling Class.newInstance(), we should use reflection to
get a hold of the zero-arg constructor and use that to create new
instances. That way, we can call private zero-arg constructors.
2) Should we replace the phrase "OKey" with "Key" in the ObjectFactory
rawKeyToOKey() and oKeyToRawKey() methods?
3) We should test and document the fact that Blobs and Texts go to the
end of list properties, no matter what their initial order (this only
impacts heterogeneous Lists)
4) Key used to have a getKind(), but that was changed to
getKindClassName() for GWT serialization purposes. I'd like that try
and change that back, if I can work out how to still get it to work
with serialization. Using Kind instead of KindClassName seems more
orthogonal to the datastore Key's class.
And here are some post-2.0 suggestions:
1) java.util.Map support. Serialize the keys and values as two lists, e.g.
class Foo {
Map m = new HashMap();
}
gets saved as:
m.keys = ["a", "b", "c"]
m.values = ["A", "B", "C"]
Actually, that's not what I was after with Map support. What I was
after was a capability similar to GAE/Py's Expando capability.
http://code.google.com/appengine/docs/python/datastore/entitiesandmodels.html#Expando_Models
For example:
class Foo {
@Id Long id;
String fixed;
@Expando Map<String, ?> extra = ...;
}
Foo f = new Foo();
f.fixed = "foo";
f.extra.put("a", 1);
f.extra.put("b", "B");
f.extra.put("c", new String[] {"x", "y", "z"});
Would produce:
fixed = "foo"
extra.a = 1
extra.b = "B"
extra c = ["x", "y", "z"].
Specifically:
- The type of an @Expando must be Map, and the first type argument
must be String (or is assumed to be).
- The values of an @Expando map can be any core data type, or an
array/Collection of any core data type.
The use case for @Expando is when you have some dynamic set of fields
on an Entity. The above mechanism allows you to do that, and even
makes those field searchable.
An @Expando map can be @UnIndexed, but there is no way to make
individual keys of an @Expando map unindexed.
I'm not sure an attribute @Expando is needed -- it makes it opt-in,
and seems self-documenting.
=Matt "I come up with wacky ideas" Quail
But I do like the idea of being able to store arbitrary, and extra
data. I know you can just change your class to include these fields,
but there are other options too.
I was thinking about how to support merge semantics so a load/save is
not destructive for properties (not supported by the current version
of the object). Two thoughts came to mind which when combined allow
complete control over persistence while still making things easy by
default.
1.) Support @RawEntity for a field of Entity type. This will load the
entity into this field when loading, and will use the object in this
field as the base for saving (all the save code will just replace
properties on this object) before persisting.
2.) Create a lifecycle interface that each entity (or factory) can
implement to get the following events ( postLoad, prePopulateEntity,
prePut). This would allow anyone to alter the load behavior (like with
OOB properties). In addition it would let you alter the state of the
object before it creates the entity, and then before the entity is put
in the datastore.
If you are using an object with a @RawEntity and that supports the
lifecycle interface then you can do anything with the raw datastore
entity both on loading and storing.
On Thu, Feb 4, 2010 at 1:53 AM, Matt Quail <spud...@gmail.com> wrote:
[snip]
>> Tomorrow I'll revisit the null/empty debate and probably implement
>> something. I'd like to get 2.0b1 out tomorrow (Thursday).
>
> My main input is: so long as it is clear and obvious to users what the
> OOB property names will be (so they can be used in queries and
> compound indexes) I'm happy. I have a gut preference that the default
> behavior should be zero OOB properties, but you made a good case for
> the opposite in the previous thread.
I understand both sides, but I'd prefer the less being written rather
than more. I'd vote for no OOB stuff unless the user opts-in.
> While we are talking about 2.0, there are a couple of things I'd like
> to propose for 2.0:
>
> 1) Instead of calling Class.newInstance(), we should use reflection to
> get a hold of the zero-arg constructor and use that to create new
> instances. That way, we can call private zero-arg constructors.
+1, Plus, I'm using a DI framework and want to make sure there is an
easy place to replace how objects are created.
>
> 2) Should we replace the phrase "OKey" with "Key" in the ObjectFactory
> rawKeyToOKey() and oKeyToRawKey() methods?
>
> 3) We should test and document the fact that Blobs and Texts go to the
> end of list properties, no matter what their initial order (this only
> impacts heterogeneous Lists)
>
> 4) Key used to have a getKind(), but that was changed to
> getKindClassName() for GWT serialization purposes. I'd like that try
> and change that back, if I can work out how to still get it to work
> with serialization. Using Kind instead of KindClassName seems more
> orthogonal to the datastore Key's class.
+1 (if we can find an easy way to make this work.)
We could add some lombok-like support to create constants on the class
based on the "OOB" properties we create. Then we can do things like
this:
class MyTest {
@Id Long id;
@TrackEmptyNull <@Embedded ? required>
Collection<String> strings;
}
Query q = ObjectifyService.createQuery(MyTest.class);
q.filter(MyTest.StringsEmpty, ObjectifyService.EmtpyCollection);
This would make it easier for people to know what the OOB names are
without just creating strings (which could be wrong). What do you
think?
It probably is about the same difficulty to do either of these. So
which do we prefer? Error or new-trumps-old? I suspect error is
safer in general, to me this indicates a programmer error. They can
work around it with an @OldName method (except on embedded
collections... sigh).
> My main input is: so long as it is clear and obvious to users what the
> OOB property names will be (so they can be used in queries and
> compound indexes) I'm happy. I have a gut preference that the default
> behavior should be zero OOB properties, but you made a good case for
> the opposite in the previous thread.
We have one required OOB property right now (not yet implemented),
which is the list that determines nulls in an @Embedded array or
collection. An example for discussion...
class Person {
String name;
int age;
}
class Town {
...
int population;
@Embedded Person[] nullFolk;
@Embedded Person[] emptyFolk
@Embedded Person[] normalFolk;
}
town = new Town();
town.population = 200;
town.nullFolk = null;
town.emptyFolk = new Person[0];
town.normalFolk = new Person[] { bob, null, sally }
How about killing three birds with one stone?
town.population = 200
town.nullFolk^state = null
town.emptyFolk^state = 0
town.normalFolk^state = [true, false, true]
town.normalFolk.name = ["Bob", null, "Sally"]
town.normalFolk.age = [43, 0, 22]
Of course, if normalFolk contains no nulls, the normalFolk^state field
would not exist.
This particular format of ^state means we can implement:
Query.filterNullCollection(property) by translating to
filter("property^state", null)
Query.filterEmptyCollection(property) by translating to
filter("property^state", 0)
Query.filter(property, null) by translating to filter("property^state", false)
Since we already need this handling for embedded collections, we can
just add something identical for leaf collections of basic types -
minus the [true, false, true] tracking because we don't need it.
Ok, just thinking... maybe we should flip this around:
Query.filter(property, value) always filters on the exact content of
the property.
Query.filterContent(property, value) always filters on the contents of
a collection or array.
What do you think of this?
ofy.query(Town.class).filter("nullFolk", null); // gets the town
ofy.query(Town.class).filterContent("nullFolk", null); // doesn't
ofy.query(Town.class).filter("emptyFolk", new Person[0]); // gets the town
ofy.query(Town.class).filter("normalFolk", null); // DOESN'T get town
ofy.query(Town.class).filterContent("normalFolk", null); // gets town
ofy.query(Town.class).filter("population", 200); // gets the town
ofy.query(Town.class).filterContent("population", 200); // exception,
bad query?
The upside of this is that it's much more intuitive for the user.
The downside of this is that it's one more abstraction removed from
the actual way the datastore works.
I'm on the fence, but I kinda like it. I think it would remove a lot
of confusion surrounding how filtering collections works. Want to
filter a collection? Make it explicit.
> While we are talking about 2.0, there are a couple of things I'd like
> to propose for 2.0:
>
> 1) Instead of calling Class.newInstance(), we should use reflection to
> get a hold of the zero-arg constructor and use that to create new
> instances. That way, we can call private zero-arg constructors.
Good idea. Can we start filing issues for these things? I'm afraid
we'll forget some.
> 2) Should we replace the phrase "OKey" with "Key" in the ObjectFactory
> rawKeyToOKey() and oKeyToRawKey() methods?
I just changed to "rawKeyToTypedKey" and "typedKeyToRawKey".
> 3) We should test and document the fact that Blobs and Texts go to the
> end of list properties, no matter what their initial order (this only
> impacts heterogeneous Lists)
That's weird. Can you point me at some documentation that describes
this? I can see how it would affect the index, but it's really
strange this would affect object retrieval.
This will have really, really bad consequences for Strings that get
turned into Text fields because they exceed 500 chars. It will
corrupt data in @Embedded arrays and lists. Have you verified this
behavior? If so, we must disable the automatic 500-char String ->
Text.
I'll add a unit test to see what happens.
> 4) Key used to have a getKind(), but that was changed to
> getKindClassName() for GWT serialization purposes. I'd like that try
> and change that back, if I can work out how to still get it to work
> with serialization. Using Kind instead of KindClassName seems more
> orthogonal to the datastore Key's class.
Go for it, I'd love to see this too :-)
Jeff
This would be better as "filterCollection()" rather than "filterContent()".
Jeff
The unit test is in ConversionTests. It's bad. This behavior is horrible!
1) We must disable String<->Text autoconversion within @Embedded collections.
2) We must disallow Text and Blob fields inside @Embedded collections.
I'm tempted to disable String<->Text autoconversion entirely.
Jeff
Do you want to truncate or throw an exception?
I want to keep autoconversion to Text for the general case. There
will need to be a few documented restrictions on what you can do
inside embedded collections, and this is one of them: Strings can't
be more than 500 chars.
Jeff
Let's just throw an exception, and that's all :)
On Feb 5, 6:25 am, Jeff Schnitzer <j...@infohazard.org> wrote:
> We should never truncate our otherwise change a user's data. The
> thing is, I actually like autoconversion - the Text class is just an
> annoying wrapper and I wish it would go away.
>
> I want to keep autoconversion to Text for the general case. There
> will need to be a few documented restrictions on what you can do
> inside embedded collections, and this is one of them: Strings can't
> be more than 500 chars.
>
> Jeff
>
> On Thu, Feb 4, 2010 at 1:49 PM, Scott Hernandez
>
> <scotthernan...@gmail.com> wrote:
> > +1 on disabling automatic string->text
>
> > Do you want to truncate or throw an exception?
>
> > On Thu, Feb 4, 2010 at 1:45 PM, Jeff Schnitzer <j...@infohazard.org> wrote:
http://code.google.com/p/objectify-appengine/issues/detail?id=12
http://code.google.com/p/objectify-appengine/issues/detail?id=13
http://code.google.com/p/objectify-appengine/issues/detail?id=14
As an aside, congrats on the @Embedded work. It's a very cool and
powerful feature, and doesn't need to handle every edge case (like
mixed types in a list) to be extremely useful for 99.9% of the cases.
"Perfect is the enemy of good."
/dmc
http://turbomanage.wordpress.com
On Feb 4, 6:01 pm, ZeroCool <zero...@gmail.com> wrote:
> I'd vote on keeping things simple and straight.
> Autoconversion is some kind of complication, and may have
> unpredictable consequence.
> Such as:
> User may lose the index if he just inputs more than 500 chars.
>
> Let's just throw an exception, and that's all :)
>
> On Feb 5, 6:25 am, Jeff Schnitzer <j...@infohazard.org> wrote:
>
> > We should never truncate our otherwise change a user's data. The
> > thing is, I actually like autoconversion - the Text class is just an
> > annoying wrapper and I wish it would go away.
>
> > I want to keep autoconversion to Text for the general case. There
> > will need to be a few documented restrictions on what you can do
> > inside embedded collections, and this is one of them: Strings can't
> > be more than 500 chars.
>
> > Jeff
>
> > On Thu, Feb 4, 2010 at 1:49PM, Scott Hernandez
>
> > <scotthernan...@gmail.com> wrote:
> > > +1 on disabling automatic string->text
>
> > > Do you want to truncate or throw an exception?
>
> > > On Thu, Feb 4, 2010 at 1:45PM, Jeff Schnitzer <j...@infohazard.org> wrote:
The difference is whether we document this (as it is currently in trunk):
* Strings greater than 500 chars will not be indexed
* Embedded collections and arrays cannot contain strings greater than 500 chars
* Embedded collections and arrays cannot contain Text, Blob, or byte[]
Or this, which is how it would have to be if we do not autoconvert
String to Text:
* *All* String values must be less than 500 chars
* Embedded collections and arrays cannot contain Text, Blob, or byte[]
While the documentation may be simpler, it forces the user to do extra
work in the general case.
Jeff
That does seem like a good choice.
> * Embedded collections and arrays cannot contain strings greater than 500
> chars
> * Embedded collections and arrays cannot contain Text, Blob, or byte[]
Are these rules necessary? Isn't it only heterogeneous
arrays/collections containing Text/Blob that are the problem? The
documentation I read made it clear that the relative order of Text in
a list property isn't changed:
"Order is generally preserved, so when entities are returned by
queries and get(), list properties will have values in the same order
as when they were stored. There's one exception to this: Blob and Text
values will be moved to the end of the list. They'll still retain
their original order relative to each other, though."
http://code.google.com/appengine/docs/python/datastore/entitiesandmodels.html#Lists
So Text[] should be fine. It is only an Object[] that may contain a
Text that is the problem?
=Matt
class Thing {
int age
String name;
}
class Parent {
@Id Long id;
@Embedded Thing[] things;
}
For three things, this produces an entity structure of:
things.age = [33, 44, 55]
things.name = ["name1", "name2", "name3"]
If data2 exceeded 500 chars and was autoconverted, it would get moved
to the end and break the index matching to things.age.
So we allow Text/Blob and disable autoconversion inside @Embedded
collections & arrays. Easy enough to fix.
Not sure how we will document the heterogeneous collection aspect. I
have one thought: Let's disallow Object[] entirely. There will be
all sorts of odd behavior if someone declares a field of List<Object>
or Object[]; for example, numbers won't be restored in the same size
as they were saved since BigTable makes everything a long. We won't
know if we should convert from Key to Key<?>. Just a big mess. We
need types.
Jeff
Maybe we should. It sounds like a number of people consider this
autoconversion as possibly dangerous. Having some values of java
"String" indexed, and some not is "dangerous". I documented it in the
earlier versions in the wiki, but felt that it maybe was an overstep
of the default behaviour. If @Unindexed is used there is no problem,
but if the field is indexed, well....
Does anyone have an opinion on this? It would diverge from the Python
notion of filter(), but in a way that I think is more clear to the
user.
Filtering on an exact property value: filter()
Filtering on the contents of a collection: filterCollection()
Jeff
If you want to check special OOB properties (for null/empty
collections) you should need to do it explicitly.
* Empty or null arrays and collections will not be stored on put(),
and consequently the field will be ignored on get().
* A state tracking field is still required to determine nulls in
embedded collections. No way around this.
This does violate the principle of minimal munging ("as much as
possible, objects that are put() and then get() should come back in
the same condition"). However, I think it will be comfortable as long
as we emphasize this design pattern:
* You should always initialize collections and arrays in your
constructors (or field initializers).
In this case, the principal of minimal munging will actually work just
fine. I'm actually tempted to force this requirement at registration
by instantiating an object and checking the field.
We can revisit this decision for 2.1. Adding null/empty state
tracking is forward-compatible.
This still leaves the question of filter() vs filterCollection().
filterCollection() only makes sense if we have state tracking. Given
the uncertainties, let's stay with boring old filter() that works just
like Python.
Btw, the state tracking property for embedded collections will look like t his:
blah.blah^embed = [true, true, false, true]
True is an object, false is a null. If everything is true, the whole
property is missing.
Unless someone screams, I should have this working in an hour.
Jeff
Here is an example.
class Course {
@Id Long id;
String title;
@Embedded List<Student> students = new ArrayList<Student>();
}
class Student {
String name;
}
Course woodshop = new Course ("basic woodshop");
woodshop.students.add(new Student("Jeff"));
woodshop.students.add(new Student("Scott"));
woodshop.students.add(null);
woodshop.students.add(new Student("Frank"));
would result in this:
entity.title = "basic woodshop"
entity.students.name = ["Jeff", "Scott", "Frank"]
entity.students^embed = [true, true, false, true]
On Fri, Feb 5, 2010 at 5:33 PM, Jeff Schnitzer <je...@infohazard.org> wrote:
[snip]
woodshop.students.add(new Student("Jeff"));
woodshop.students.add(new Student("Scott"));
woodshop.students.add(null);
woodshop.students.add(null);
woodshop.students.add(new Student("Frank"));
students.name = ["Jeff", "Scott", "Frank"]
students^null = [2, 3]
I like this.
Jeff
Yeah I like that too. And if there are no nulls, then no
"students^null" property is emitted, correct?
Would the "^null" OOB property always be unindexed? Only if the
collection was? I can't see any reason why you would want to search on
it.
Yes, that's a great strategy, +1.
>
> * Empty or null arrays and collections will not be stored on put(),
> and consequently the field will be ignored on get().
>
> * A state tracking field is still required to determine nulls in
> embedded collections. No way around this.
>
> This does violate the principle of minimal munging ("as much as
> possible, objects that are put() and then get() should come back in
> the same condition"). However, I think it will be comfortable as long
> as we emphasize this design pattern:
>
> * You should always initialize collections and arrays in your
> constructors (or field initializers).
Yep, +1 to all of the above.
> This still leaves the question of filter() vs filterCollection().
> filterCollection() only makes sense if we have state tracking. Given
> the uncertainties, let's stay with boring old filter() that works just
> like Python.
I agree with Scott, we should stick with the Python idiom. The
datastore and GQL *does* work that way, and better to toe "datastore"
the line than to be different.
Jeff, this is awesome stuff!
=Matt