Why not Apache Hudi?


I watch the bunch of presentation from Subsurface before. Seems a lot of attention going into Apache Iceberg and Delta Lake.

I wonder why not mentioning Apache Hudi? Is there any limitation in licensing or features etc? Is there any comparison on how these kind of layer different?

Currently we are assessing Dremio on AWS. Looks like the features is more advanced in there.


I am also curious about it, in fact, Hudi has been integrated into EMR long ago, it is very attractive to use hudi on aws.

any thought on this? @LucioDaza ?

@balaji.ramaswamy hi balaji, any info on this one? thanks

Our initial focus is Iceberg and Delta Lake given that those two projects seem to have more momentum/demand in the market. But we haven’t ruled out Hudi support.

1 Like

Thanks for the reply. However, through my observation, I found that Hudi is more popular and mature in the market and more people take participate in the community.

@tshiran Assuming work on those is already under way, do you have any information on how Dremio thinks about them as a metadata source when compared to some other less reliable/performant sources such as Hive?

Can you elaborate a bit on the question? Are you asking how metadata refresh works in Dremio when dealing with a transactional table format like Iceberg or Delta?
The experience will certainly be better than HMS or Glue because these formats provide snapshotting/transactions, which makes it possible to know what has changed without having to scan unnecessary metadata. I’m happy to discuss in more detail.

1 Like