Is it possible to support parquet files compressed via ZSTD?
currently not supported. only snappy、gzip、none support
EnumeratedStringValidator PARQUET_WRITER_COMPRESSION_TYPE_VALIDATOR = new EnumeratedStringValidator( PARQUET_WRITER_COMPRESSION_TYPE, "snappy", "snappy", "gzip", "none");
Is there any way to add it?
if you want to support ZSTD, you should modify sabot/kernel/src/main/java/com/dremio/exec/store/parquet/ParquetRecordWriter.java
and also sabot/kernel/src/main/java/com/dremio/exec/ExecConstants.java
I modify kernel and compile one you can have a try GitHub - rongfengliang/dremio-parquet-zstd
Is it possible to merge it?
as far as I know, it seems impossible
We are currently doing it for being able to read ZStandard compressed parquet files
Basically, all that’s needed is adding two native libraries - a version of libhadoop.so that is compiled with ZSTD support and the ZSTD native library.
We are running Dremio using docker, so we do it like this:
--- version: '3' services: executor-0: ... volumes: ... - /opt/dremio/lib/native/libhadoop.so.3.3.2:/opt/dremio/lib/libhadoop.so - /opt/dremio/lib/native/libzstd.so.1.5.2:/opt/dremio/lib/libzstd.so - /opt/dremio/lib/hadoop-common-3.3.2-dremio-202207041927090255-61c2bd1.jar:/opt/dremio/jars/3rdparty/hadoop-common-3.3.2-dremio-202207041927090255-61c2bd1.jar
One word of caution though. There’s a bug in the Dremio CE parquet reader library (which is free but the source isn’t open), which doesn’t release the decompressor after use. It doesn’t make a difference for Snappy, but the ZStandard decompresser uses native memory, which is then leaked. This has caused our executors to get killed by the OOM killer several times per day (~350GB memory leaked before doing so).
hadoop-common-3.3.2 is mapped in the above, in which I made a somewhat hacky fix that releases the native memory in a finalizer such that the leaking is somewhat contained. I have confirmed that fix contains the issue, so I’ll make a thread on it specifically (or maybe someone from Dremio will notice it here). If someone at Dremio doesn’t pick it up I’ll be doing a nicer and more permanent fix for it