HDFS source with different file schema

Hi Team,

Can you please clarify my understanding regarding HDFS source functionality. What will happen if I try to map a HDFS directory, which have files with different schema (lets say csv files). Will dremio union column’s from all files or will it result in error.
My understanding was dremio will union fields from all files but it results in “Schema change detected but unable to learn schema, query failed. A full table scan may be necessary to fully learn the schema.” error, while in case of parquet file observation is little different i.e. it union schema for multiple file.

Any insight will be helpful. Thanks.

@Monika_Goel this is typically caused by too many variations in the schema (different types, columns, etc.) – Dremio caps its automatic schema learning retries at 10 per query to avoid resource waste. In your case, it looks like 10 retries wasn’t enough as there were more changes detected. Schema learning is continuous, it does not start from the beginning every time you run a query. So, you could try running something like select * from table where random() = random() (any query that’ll will scan the whole table ideally without returning many results) a few times and see if that’ll be sufficient to cover all the variations in the schema.