The third parameter of LOCATE function in Dremio SQL is inconsistent with the standard interpretation.
The standard is to treat position as 1-based in both the parameters and the result. In Dremio implementation the result is 1-based, but the third parameter (startPosition) seems to be 0-based.
Examples: LOCATE('abc', 'abcdef', 1) returns 0 (expected: 1) LOCATE('abc', 'abcdef', 0) returns 1 (startPosition=0 is probably invalid; at least MySQL always returns 0 in such case) LOCATE('abc', 'xabcdef', 1) returns 2 (as expected) LOCATE('abc', 'xabcdef', 2) returns 0 (expected: 2)
Unfortunately, this breaks functionality of our existing Tableau workbooks when we try to migrate from file-based Tableau datasources to Dremio datasource. E.g. when filtering by a substring values (contains condition), Tableau generates the following SQL:
SELECT "my_dataset"."ccy_pair" AS "ccy_pair",
"my_dataset"."tenor" AS "tenor"
FROM "mannyu"."my_dataset" "my_dataset"
WHERE ( {fn LOCATE( 'gbp',{fn LCASE( "my_dataset"."ccy_pair")},1)} > 0)
GROUP BY "my_dataset"."ccy_pair",
"my_dataset"."tenor"
This is a perfectly valid SQL, but in Dremio it filters out results where ccy_pair starts with “GBP”.
Assuming you acknowledge the bug, when would it be possible to fix it? Or could you suggest a workaround for Tableau without major changes in the existing workbooks?
I used MongoDB accelerated by a reflection, but tried now with a simple one-row csv file and am getting the same wrong results.
CSV file contents:
1,xabcdef
Dremio query:
SELECT t.B
, {fn LOCATE( 'xabc',{fn LCASE( t.B)}, 1)} as c1
, {fn LOCATE( 'xabc',{fn LCASE( t.B)}, 0)} as c2
, {fn LOCATE( 'abc',{fn LCASE( t.B)}, 1)} as c3
, {fn LOCATE( 'abc',{fn LCASE( t.B)}, 2)} as c4
FROM "@mannyu".one_row as t
Result as JSON:
{"B":"xabcdef","c1":0,"c2":1,"c3":2,"c4":0}
Interestingly, your example works correctly for me. So it looks like executing as an expression and executing with a field value (irrespective of datasource) use different code paths. Could you please try with a CSV like mine?
My bad. Yes, it looks like depending on the execution context, it might return incorrect results. The following query would return 0 for example:
SELECT LOCATE('abc', a, 2) FROM (VALUES('xabcdef')) tbl(a)
As a workaround, it is possible to modify the Tableau TDC file to prevent LOCATE with 3-args to be used until a new version of Dremio with a fix is released, and hopefully Tableau can work around that limitation.
I don’t have an ETA to share at that time, but if you or your company is a registered customer, I would suggest to reach to your dedicated support contact.