UDF for complex types

315 views
Skip to first unread message

Ankur Goel

unread,
Aug 9, 2016, 3:57:43 PM8/9/16
to Presto
Hi Guys,
          I have table with VARCHAR column where value is Base64 encoded protobuf message.
I am trying to write a UDF that does base64 decoding followed by protobuf parsing and exposes
a ROW type.

I looked at the documentation but did not find anything for this use case.
Any ideas on how to do this ?

Thanks
-Ankur

Ankur Goel

unread,
Aug 9, 2016, 5:03:19 PM8/9/16
to Presto
Is it even possible to return a RowType from a UDF?

Ankur Goel

unread,
Aug 9, 2016, 6:47:10 PM8/9/16
to Presto

This is what my sample code looks like

@Description("Function to test blob decoding")
@ScalarFunction("decode")
@SqlType(StandardTypes.ROW)
public static Block decode(@SqlType(StandardTypes.VARCHAR) Slice string)
{
    // Hard-code the information for now. Get it after blob decoding
List<Type> typeParams = new ArrayList<>();
typeParams.add(BooleanType.BOOLEAN);
typeParams.add(IntegerType.INTEGER);

BlockBuilder builder = new InterleavedBlockBuilder(typeParams, new BlockBuilderStatus(), typeParams.size());
BooleanType.BOOLEAN.writeBoolean(builder, true);
IntegerType.INTEGER.writeLong(builder, 77);
return builder.build();
}

Presto restart fails with the below stack trace

java.lang.NullPointerException: returnType is null
	at java.util.Objects.requireNonNull(Objects.java:228)
	at com.facebook.presto.metadata.FunctionListBuilder.verifyMethodSignature(FunctionListBuilder.java:345)
	at com.facebook.presto.metadata.FunctionListBuilder.processScalarFunction(FunctionListBuilder.java:258)
	at com.facebook.presto.metadata.FunctionListBuilder.scalar(FunctionListBuilder.java:199)


Haozhun Jin

unread,
Aug 9, 2016, 6:52:52 PM8/9/16
to presto...@googlegroups.com

Here are examples to some functions that return rowtypes:

·         https://github.com/prestodb/presto/blob/0.130/presto-main/src/main/java/com/facebook/presto/operator/scalar/TestingRowConstructor.java

 

We removed these functions in recent versions because we now have a generic row constructor. And right now, there isn’t an implementation that returns row types in trunk. If you need to, you can nest complex types inside complex types.

 

The way you write out row type signature has changed since back then. You can find how your type should look like by constructing a RowType you want and call toString on it.

 

We already have a function from_base64. You can write a function that takes a VARBINARY input and return a row. This would probably be better than writing a function that deals specifically with base64-encoded protobuf.

 

Since you didn’t talk about passing in the protobuf definition in your initial email. I’m going to assume that you have pre-known protobuf format. Therefore, the return “row” type will something concrete. If this is not the case, we can discuss further.

Ankur Goel

unread,
Aug 9, 2016, 9:16:44 PM8/9/16
to Presto, hj...@fb.com
Awesome! 

Since the struct is known well in advance, I was able to add the correct method signature and annotations,
along with adding protobuf dependencies.

The UDF then decoded and parsed the blob correctly and provided the expected results.

Your help is highly appreciated.

Thanks
-Ankur 
Reply all
Reply to author
Forward
0 new messages