NullPointerException When Using Custom UDF with GROUP BY in Dremio Community Edition 24.3.2

Hello Dremio Community,

I’ve developed a custom UDF in Dremio named ARRAY_CONTAINS_ALL. The function is designed to check whether all elements of a given input array are present in an array column within a table. For example, if the array column contains [a, b, c, d], and the input array is [a, b], the function should return true because all items in the input array are present in the column. Conversely, if the input array is [a, e], it should return false since ‘e’ is not present.

The function performs as expected in standard queries, such as:

SELECT * FROM table_name WHERE ARRAY_CONTAINS_ALL(array_col_name, ARRAY['a', 'b'])

However, I encounter a NullPointerException when this UDF is used in queries involving the GROUP BY clause, as shown in the following query:

SELECT count(*) FROM table_name WHERE ARRAY_CONTAINS_ALL(array_col_name, ARRAY['a', 'b']) GROUP BY string_col_name;

This issue occurs only when the GROUP BY clause is included, and the UDF is used in the query.

Has anyone faced a similar issue or can offer insights into why this exception occurs and how to resolve it? Any help or guidance to debug this issue would be greatly appreciated.

Thank you in advance for your assistance!

@balaji.ramaswamy Can you help us on this?

Hi @jithin.odattu When you say UDF, how did you develop the UDF, is it a custom jar you deployed?

Yes @balaji.ramaswamy It is a custom jar I developed. I have copied my custom jar into the jars/3rdparty directory.

@jithin.odattu Someone has to look at the jar file to see whay you are getting a NPE

package com.dremio.udfs;

import com.dremio.exec.expr.SimpleFunction;
import com.dremio.exec.expr.annotations.FunctionTemplate;
import com.dremio.exec.expr.annotations.Output;
import com.dremio.exec.expr.annotations.Param;
import org.apache.arrow.vector.complex.reader.FieldReader;
import org.apache.arrow.vector.holders.NullableBitHolder;


public class ArrayContainsAllUDF {
    @FunctionTemplate(
            name = "ARRAY_CONTAINS_ALL",
            scope = FunctionTemplate.FunctionScope.SIMPLE,
            nulls = FunctionTemplate.NullHandling.INTERNAL)
    public static class ArrayContainsAllUDFImpl implements SimpleFunction {

        @Param
        private FieldReader left;

        @Param
        private FieldReader right;

        @Output
        private NullableBitHolder out;

        @Override
        public void setup() {
        }

        @Override
        public void eval() {

            if (!left.isSet()
                    || left.readObject() == null
                    || !right.isSet()
                    || right.readObject() == null) {
                out.isSet = 0;
                return;
            }

            if (left.getMinorType() != org.apache.arrow.vector.types.Types.MinorType.LIST) {
                throw new UnsupportedOperationException(
                        String.format(
                                "First parameter to ARRAY_CONTAINS_ALL must be a LIST. Was given: %s",
                                left.getMinorType().toString()));
            }

            if (right.getMinorType() != org.apache.arrow.vector.types.Types.MinorType.LIST) {
                throw new UnsupportedOperationException(
                        String.format(
                                "Second parameter to ARRAY_CONTAINS_ALL must be a LIST. Was given: %s",
                                right.getMinorType().toString()));
            }

            if (!left.reader().getMinorType().equals(right.reader().getMinorType())) {
                throw new UnsupportedOperationException(
                        String.format(
                                "List of %s is not comparable with %s",
                                left.reader().getMinorType().toString(),
                                right.reader().getMinorType().toString()));
            }


            java.util.List<?> leftList = (java.util.List<?>) left.readObject();
            java.util.List<?> rightList = (java.util.List<?>) right.readObject();

            for (Object o : rightList) {
                if (!leftList.contains(o)) {
                    out.isSet = 1;
                    out.value = 0;
                    return;
                }
            }


            if (leftList.contains(null) || rightList.contains(null)) {
                out.isSet = 0;
                return;
            }

            out.isSet = 1;
            out.value = 1;
        }
    }
}

@balaji.ramaswamy Please see the UDF definition.

Also please see the reference code I have used…

dremio-oss/sabot/kernel/src/main/java/com/dremio/exec/expr/fn/impl/ArrayContains.java at master · dremio/dremio-oss (github.com)

@jithin.odattu I have asked someone from our Engineering team to see if someone can help