Select column if exist

I have some parquet file stored on S3. Each parquet is a table on Dremio, for legacy reasons. Some of them ( for an update on the data) has a new Column. I need to query the new Column if it exist otherwise return a default value ( not NULL).

I have created something the following query ( based on stackoverflow example) but it seems it don’t work:

SELECT
 CASE WHEN EXISTS (
          SELECT COLUMN_NAME 
          FROM INFORMATION_SCHEMA.COLUMNS 
          WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 'test_table' AND COLUMN_NAME = 'Column'
         )
 THEN (
       SELECT COLUMN_NAME 
       FROM INFORMATION_SCHEMA.COLUMNS 
       WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 'test_table' AND COLUMN_NAME = 'Column'
      ) 
 ELSE 'Default' END AS Column 
FROM test."test_table"

The solution provide the excution of two queries programmatically. The first check if columkn exist for the table in the INFORMATION_SCHEMA and the secon add the column to the SELECT if the column exist otherwise add it with the default value.

Is there a way to do it in Dremio without making two query programatically?

@stefano.castoldi Once Dremio does schema learning, it should automatically learn the new column, fill in values in Parquet files it exists and leave the rest as null, just to test this, what if you run the below query
select * from <application_table_name> where <column_name> is not null

If this returns the right set of rows which means the rows where the column exists then we can write a CASE statement. If this query returns all rows including the ones the column does not exists then check the value of this column when it foes not exists and use that logic

@balaji.ramaswamy I don’t understand what you mean. Maybe my question was a little bit confusing.

I have two parquet file created at different time :

  • first has column1, column2, column3 when I create dremio mapping(table1) i have a table with column1, column2, column3

  • second has column1, column2, column3, column4 when I create dremio mapping (table2) i have a table with column1, column2, column3,column4

If I make your query on table1 I get an error that the column dose not exist so the query is not executed.

select * from table1 where column4 is not null

if I make the query on the second table it works, because the table2 has the column.

Can you provide me an example?

Hi @stefano.castoldi If columns 1,2,3 are same on both Parquet files, can you consider promoting them under one folder as one PDS so that dataset can have 4 columns with columns 1,2,3 having all rows while column 4 will only populate from the 2 parquet file

I know that it’s the solution, but for legacy reason I can’t do that right now. The only way to solve the problem is to it programmatically through the query of INFORMATION_SCHEMA.