There is a test function:
SELECT upper (text1) a FROM dwh.public.test_upper WHERE trunc (months_between (CURRENT_DATE (), CURRENT_DATE ()) / 12) <= 59 limit 10
We are using version 15 of the dremo (upgrade to a newer version in the plans) included with UTF-16 character support.
Earlier in our fork, we made changes to the lower \ upper function at the sabot kernel StringFunctions level so that they work correctly with UTF-8 characters (upstream uses a character code shift for this, which only works for latin symbols).
When calling a test function on a test table, we observe the following behavior: for latin characters, the lower \ upper functions work, but for UTF-8 characters (in our case, Russian), they do not.
The following line is observed in the log:
pool-18-thread-1 - 1e98e714-7da4-0c62-6ca7-4748b90c3e00: frag: 0: 0] DEBUG cdexec.expr.ExpressionSplitter - Expression executed entirely in Gandiva FunctionHolderExpression [args = [ValueVectorReadExpression [fieldId = TypeddsFieldId [fieldIpeddsFieldId , remainder = null]]], name = lower, returnType = varchar, isRandom = false]
As I understand it, we are talking about the fact that the function was performed using Gandiva. This is how I represent the logic for processing requests:
- The request is checked for pushdown capability. In this case, the ARP file for the connector does not have the months_between function, so the pushdown does not work.
- Dremio determines whether the java implementation will be used to execute the function (apparently, for this case, declared in com / dremio / exec / expr / fn / impl / StringFunctions.java) or the gandiva function.
- Judging by the inscription in the log, the lower function from gandiva is applied. I looked at the source code for gandiva - it also uses the character number swig, i.e. works only for Latin.
- As a result, the English letters are brought to the required register, but the Russian ones are not.
We found a workaround - changing the original request or adding an arp file so that the request is always pushed to the database. But for us this is a temporary solution, because some of the requests are generated automatically.
In this regard, the question is: is it possible to somehow disable the execution of specific functions (lower, upper) by means of gandiva, so that our implementation of the function in the StringFunctions class works out?
Using some kind of flag when launching a dremo, or inside a dremo, or changing the source code?
Or is it only solved at the gandiva fix level?