Errors when using array functions with column data from CSV in array format

Hello,

We’ve got some csv files coming in that have columns of array data like this example column “hobbies”:

"[
  ""Volunteering"",
  ""Traveling"",
  ""Running""
]"

When I try to access the column using array functions I get various errors.

This works:

select array_contains(convert_from('["Swimming","Travel","Fishing"]','json'),'Travel');

But this fails with “‘ARRAY_CONTAINS’ expects an ARRAY, but instead got: ANY”:

select  array_contains(convert_from(hobbies,'json'),'Travel') from ArrayTesting;

This works:

select array_slice(convert_from('["Swimming","Travel","Fishing"]','json'),0,3);

This fails with “‘ARRAY_SLICE’ does not accept the supplied operand types: ANY, INTEGER, INTEGER”:

select array_slice(convert_from(hobbies,'json'),0,3) from ArrayTesting;

Visual comparison of the two looks like the arrays match when selected like this:

Comparing the two convert_from values returns true, again implying that they match:

select 
    case when 
        convert_from('["Swimming","Travel","Fishing"]','json') =  convert_from(hobbies,'json')
        then 'true'
        else 'false'
    end
FROM ArrayTesting;

Is there something under the hood that’s treating these as the same in this instance, but not when used in array functions? Do I need to do some additional conversion to make this compatible?

Below are various tests I’ve run just to see how things respond.

/*
Result: Success
Output:
    EXPR$0
    ["Swimming","Travel","Fishing"]
*/
select convert_from('["Swimming","Travel","Fishing"]','json');

/*
Result: Success
Output:
    EXPR$0
    ["Swimming","Travel","Fishing"]
*/
select convert_from(hobbies,'json')
from ArrayTesting;



/*
Compare the two convert_from values
Result: Values appear to match
Output:
    EXPR$0
    true
*/
select 
    case when 
        convert_from('["Swimming","Travel","Fishing"]','json') =  convert_from(hobbies,'json')
        then 'true'
        else 'false'
    end
FROM ArrayTesting;

/*
Result: Success
Output:
    EXPR$0
    true
*/
select array_contains(convert_from('["Swimming","Travel","Fishing"]','json'),'Travel');


/*
Result: Fail
Output:
    'ARRAY_CONTAINS' expects an ARRAY, but instead got: ANY
*/
select  array_contains(convert_from(hobbies,'json'),'Travel')
from ArrayTesting;


/*
Result: Success
Output:
    EXPR$0
    3
*/
select array_length(convert_from('["Swimming","Travel","Fishing"]','json'));


/*
Result: Fail
Output:
    'ARRAY_LENGTH' does not accept the supplied operand types: ANY
*/
select  array_length(convert_from(hobbies,'json'))
from ArrayTesting;


@gray By any chance would you be able to generate the same in JSON or parquet and see if you hit the same planning issue, trying to narrow down if it is specific to CSV. I will see if we can reproduce this behavior

@balaji.ramaswamy ,

I converted the CSV to JSON and I’m seeing the same issue; both Query 5 and Query 7 fail.

The test JSON file created just contains

{
  "ID": "1",
  "HOBBIES": "[\n  \"Swimming\",\n  \"Travel\",\n  \"Fishing\"\n]"
}

I thought maybe the \n was problematic, so stripped it out, but this also has the same results:

{
  "ID": "1",
  "HOBBIES": "[\"Swimming\",\"Travel\",\"Fishing\"]"
}

@gray Are you able to send a sample file for us to investigate

@balaji.ramaswamy ,

Sure thing, I’ve sent the files to you directly.

Thanks @gray Will get back to you in a couple of days