Write nested record using Kite API

28 views
Skip to first unread message

Sunil Parmar

unread,
May 18, 2016, 5:20:11 PM5/18/16
to CDK Development
I am trying to create write nested record in Parquet file using Kite API. But it's failing. I have doubt if I can re-use builder for the nested record again. Can somebody help me with the syntax ?

Error
Exception in thread "main" java.lang.NullPointerException
    at org.apache.avro.generic.GenericRecordBuilder.set(GenericRecordBuilder.java:114)
    at org.apache.avro.generic.GenericRecordBuilder.set(GenericRecordBuilder.java:104)
    at org.kitesdk.examples.data.CreateComplexDatasetParquet.run(CreateComplexDatasetParquet.java:63)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.kitesdk.examples.data.CreateComplexDatasetParquet.main(CreateComplexDatasetParquet.java:77)



Code ( highlighted the line # 63 )
 DatasetDescriptor descriptor = new DatasetDescriptor.Builder()
        .schemaUri("resource:complex.avsc")
        .format(Formats.PARQUET)
        .build();
    Dataset<Record> users = Datasets.create(
        //s"dataset:hdfs:/tmp/data/users", descriptor, Record.class);
        "dataset:file:/tmp/data/complex", descriptor, Record.class);

    // Get a writer for the dataset and write some users to it
    DatasetWriter<Record> writer = null;
    try {
      writer = users.newWriter();
      Random rand = new Random();
      GenericRecordBuilder builder = new GenericRecordBuilder(descriptor.getSchema());
      String[] citiesArray = {"milpitas","fremont"};
      List<String> cities =  Arrays.asList(citiesArray);
      for (int i = 0; i < 100; i++) {
       
        Record record = builder.set("username", "user-" + i)
            .set("creationDate", System.currentTimeMillis())
            .set("favoriteColor", colors[rand.nextInt(colors.length)])
            .set("cities",cities).set("test_rec", builder.set("test_id", "1").set("test_name", "sample").build())
            .build();
        writer.write(record);
      }
    } finally {
      if (writer != null) {
        writer.close();
      }
    }


Schema file

{
  "type": "record",
  "name": "User",
  "namespace": "org.kitesdk.examples.data",
  "doc": "A user record",
  "fields": [
    {
      "name": "username",
      "type": "string"
    },
    {
      "name": "creationDate",
      "type": "long"
    },
    {
      "name": "favoriteColor",
      "type": "string"
    },
    { "name": "cities",
      "type": {
      "type" : "array",
      "items": "string"
      }
    },
    {
     
        "name": "test_rec",
        "type": {
        "name" : "test_name",
        "type": "record",
        "fields": [
       {
         "name": "test_id",
         "type": "string"
        },
        {
         "name": "test_name",
         "type": "string"
        }
      ]
     }
    }
  ]
}
Reply all
Reply to author
Forward
0 new messages