Incorrect behaviour of LOCATE function breaks Tableau queries

The third parameter of LOCATE function in Dremio SQL is inconsistent with the standard interpretation.
The standard is to treat position as 1-based in both the parameters and the result. In Dremio implementation the result is 1-based, but the third parameter (startPosition) seems to be 0-based.

Examples:
LOCATE('abc', 'abcdef', 1) returns 0 (expected: 1)
LOCATE('abc', 'abcdef', 0) returns 1 (startPosition=0 is probably invalid; at least MySQL always returns 0 in such case)
LOCATE('abc', 'xabcdef', 1) returns 2 (as expected)
LOCATE('abc', 'xabcdef', 2) returns 0 (expected: 2)

Unfortunately, this breaks functionality of our existing Tableau workbooks when we try to migrate from file-based Tableau datasources to Dremio datasource. E.g. when filtering by a substring values (contains condition), Tableau generates the following SQL:

SELECT "my_dataset"."ccy_pair" AS "ccy_pair",
  "my_dataset"."tenor" AS "tenor"
FROM "mannyu"."my_dataset" "my_dataset"
WHERE ( {fn LOCATE( 'gbp',{fn LCASE( "my_dataset"."ccy_pair")},1)} > 0)
GROUP BY "my_dataset"."ccy_pair",
  "my_dataset"."tenor"

This is a perfectly valid SQL, but in Dremio it filters out results where ccy_pair starts with “GBP”.

Assuming you acknowledge the bug, when would it be possible to fix it? Or could you suggest a workaround for Tableau without major changes in the existing workbooks?

Dremio build: 2.0.1-201804132205050000-10b1de0

Hi @muv,

I run a simplified version of your example (simply SELECT LOCATE('abc', 'xabcdef', 2)) and it returned the correct result:

EXPR$0
2

I’m guessing this might be an incorrect push down to your source. Could you share which source you are using to run this query?

Hi @laurent,

I used MongoDB accelerated by a reflection, but tried now with a simple one-row csv file and am getting the same wrong results.

CSV file contents:

1,xabcdef

Dremio query:

SELECT t.B
    , {fn LOCATE( 'xabc',{fn LCASE( t.B)}, 1)} as c1
    , {fn LOCATE( 'xabc',{fn LCASE( t.B)}, 0)} as c2
    , {fn LOCATE( 'abc',{fn LCASE( t.B)}, 1)} as c3
    , {fn LOCATE( 'abc',{fn LCASE( t.B)}, 2)} as c4
FROM "@mannyu".one_row as t

Result as JSON:

{"B":"xabcdef","c1":0,"c2":1,"c3":2,"c4":0}

Interestingly, your example works correctly for me. So it looks like executing as an expression and executing with a field value (irrespective of datasource) use different code paths. Could you please try with a CSV like mine?

My bad. Yes, it looks like depending on the execution context, it might return incorrect results. The following query would return 0 for example:

SELECT LOCATE('abc', a, 2) FROM (VALUES('xabcdef')) tbl(a)

As a workaround, it is possible to modify the Tableau TDC file to prevent LOCATE with 3-args to be used until a new version of Dremio with a fix is released, and hopefully Tableau can work around that limitation.

Do you mean it will try to generate the same condition in a different way? Do you know how to do this? I would try.

It depends if Tableau can find a way to perform the exact same thing without using the function (after a quick test, it looks it does).

If you want to try, add the following line to the DremioConnector.tdc TDC file (under /My Tableau Repository/Datasources):

<customization name="SQL_STRING_FUNCTIONS" value="73051"/>

Thanks @laurent, this worked - at least it solved this specific issue. Tableau now passes only 2 parameters to LOCATE.

Do you have an idea when the permanent fix will be released? So that we know when to stop maintaining the TDC patch.

I don’t have an ETA to share at that time, but if you or your company is a registered customer, I would suggest to reach to your dedicated support contact.