Hi Ben,
Thanks for the email. Comments inline.
On Mon, Jan 27, 2014 at 8:13 PM, Ben Roling <
ben.r...@gmail.com> wrote:
> I was playing around with HBase datasets a little bit this morning and was
> curious how schema evolution is supposed to work. I successfully read and
> wrote some data using the kite-examples HBase example and then tried to
> write some additional data with one new attribute added to the Avro model.
>
> First, I tried to write the data without doing a DatasetRepository.update()
> to give the dataset the new schema. I didn't know if I would be allowed to
> write data with a newer schema that is still compatible to be read by the
> dataset's defined schema.
>
> I added favoriteFood in user.avsc:
>
> ...
> {
> "name": "age",
> "type": "int",
> "default": 0,
> "mapping": { "type": "column", "value": "meta:age" }
> },
> {
> "name": "favoriteFood",
> "type": ["string", "null"],
> "default": "null",
> "mapping": { "type": "column", "value": "meta:favoriteFood" }
> }
> ...
It's not causing the problems you are seeing, but you may want to say
"default": null,
so that the default is a null reference, not a string with value "null".
>
> When I attempted to put an instance of this new User model, it failed with a
> NullPointerException:
>
> Exception in thread "main" java.lang.NullPointerException
> at
> org.kitesdk.data.hbase.avro.VersionedAvroEntityMapper.mapFromEntity(VersionedAvroEntityMapper.java:239)
> at
> org.kitesdk.data.hbase.avro.VersionedAvroEntityMapper.mapFromEntity(VersionedAvroEntityMapper.java:58)
> at
> org.kitesdk.data.hbase.impl.HBaseClientTemplate.put(HBaseClientTemplate.java:447)
> at
> org.kitesdk.data.hbase.impl.HBaseClientTemplate.put(HBaseClientTemplate.java:421)
> at org.kitesdk.data.hbase.impl.BaseDao.put(BaseDao.java:75)
> at org.kitesdk.data.hbase.DaoDataset.put(DaoDataset.java:140)
> at org.kitesdk.examples.data.WriteUserDataset.run(WriteUserDataset.java:58)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.kitesdk.examples.data.WriteUserDataset.main(WriteUserDataset.java:70)
>
> I'm not sure whether you would expect writing data that doesn't exactly
> match the schema of the dataset to be allowed, but even if not it seems an
> NPE is a bug?
This does look like a bug. I've opened
https://issues.cloudera.org/browse/CDK-292
>
> After that I tried to update the schema of the dataset with
> DatasetRepository.update(). That failed too. It failed with this
> exception:
>
> Exception in thread "main" org.kitesdk.data.IncompatibleSchemaException:
> Column mappings of schema not compatible with other schema for the table.
> ... (message trimmed for brevity)
> at
> org.kitesdk.data.hbase.manager.DefaultSchemaManager.validateCompatibleWithTableSchemas(DefaultSchemaManager.java:532)
> at
> org.kitesdk.data.hbase.manager.DefaultSchemaManager.migrateSchema(DefaultSchemaManager.java:293)
> at
> org.kitesdk.data.hbase.HBaseMetadataProvider.update(HBaseMetadataProvider.java:130)
> at
> org.kitesdk.data.hbase.HBaseDatasetRepository.update(HBaseDatasetRepository.java:60)
> at org.kitesdk.examples.data.WriteUserDataset.run(WriteUserDataset.java:50)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.kitesdk.examples.data.WriteUserDataset.main(WriteUserDataset.java:73)
I just ran a unit test with this schema update and it passed for me.
Can you send the part that was trimmed so we can see what it reported?
Also, I just noticed a mistake in the example where the version of
user.avsc that was checked in had the 'age' field, even though it is
meant to be added manually as a migration. I've checked in a fix on
github. You might like to try the example again from scratch to see if
that part works for you.
>
> Just looking very briefly at the code it seems to avoid this exception I
> would have to create new column mappings for all of the attributes in the
> schema just to get this one new column in? I'm thinking maybe this is a bug
> too?
>
> Overall I'm just curious to know more about how schema evolution is expected
> to work with the HBase datasets. An example that covers that would be
> something great to have in the documentation.
I agree. The example we have at the moment is just a start, and we'd
like to add more. This is tracked by
https://issues.cloudera.org/browse/CDK-34 and
https://issues.cloudera.org/browse/CDK-35. In the meantime you can
look at some of the schema migration tests in
https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-hbase/src/test/java/org/kitesdk/data/hbase/avro/ManagedDaoTest.java.
Cheers,
Tom
>
> Thanks,
> Ben
>
> --
> You received this message because you are subscribed to the Google Groups
> "CDK Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
cdk-dev+u...@cloudera.org.
> For more options, visit
>
https://groups.google.com/a/cloudera.org/groups/opt_out.