I am trying to create write nested record in Parquet file using Kite API. But it's failing. I have doubt if I can re-use builder for the nested record again. Can somebody help me with the syntax ?
Error
Exception in thread "main" java.lang.NullPointerException
at org.apache.avro.generic.GenericRecordBuilder.set(GenericRecordBuilder.java:114)
at org.apache.avro.generic.GenericRecordBuilder.set(GenericRecordBuilder.java:104)
at org.kitesdk.examples.data.CreateComplexDatasetParquet.run(CreateComplexDatasetParquet.java:63)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.kitesdk.examples.data.CreateComplexDatasetParquet.main(CreateComplexDatasetParquet.java:77)
Code ( highlighted the line # 63 )
DatasetDescriptor descriptor = new DatasetDescriptor.Builder()
.schemaUri("resource:complex.avsc")
.format(Formats.PARQUET)
.build();
Dataset<Record> users = Datasets.create(
//s"dataset:hdfs:/tmp/data/users", descriptor, Record.class);
"dataset:file:/tmp/data/complex", descriptor, Record.class);
// Get a writer for the dataset and write some users to it
DatasetWriter<Record> writer = null;
try {
writer = users.newWriter();
Random rand = new Random();
GenericRecordBuilder builder = new GenericRecordBuilder(descriptor.getSchema());
String[] citiesArray = {"milpitas","fremont"};
List<String> cities = Arrays.asList(citiesArray);
for (int i = 0; i < 100; i++) {
Record record = builder.set("username", "user-" + i)
.set("creationDate", System.currentTimeMillis())
.set("favoriteColor", colors[rand.nextInt(colors.length)])
.set("cities",cities).set("test_rec", builder.set("test_id", "1").set("test_name", "sample").build())
.build();
writer.write(record);
}
} finally {
if (writer != null) {
writer.close();
}
}
Schema file
{
"type": "record",
"name": "User",
"namespace": "org.kitesdk.examples.data",
"doc": "A user record",
"fields": [
{
"name": "username",
"type": "string"
},
{
"name": "creationDate",
"type": "long"
},
{
"name": "favoriteColor",
"type": "string"
},
{ "name": "cities",
"type": {
"type" : "array",
"items": "string"
}
},
{
"name": "test_rec",
"type": {
"name" : "test_name",
"type": "record",
"fields": [
{
"name": "test_id",
"type": "string"
},
{
"name": "test_name",
"type": "string"
}
]
}
}
]
}