Is it possible to support parquet files compressed via ZSTD?
currently not supported. only snappyăgzipănone support
EnumeratedStringValidator PARQUET_WRITER_COMPRESSION_TYPE_VALIDATOR = new EnumeratedStringValidator(
PARQUET_WRITER_COMPRESSION_TYPE, "snappy", "snappy", "gzip", "none");
Is there any way to add it?
if you want to support ZSTD, you should modify sabot/kernel/src/main/java/com/dremio/exec/store/parquet/ParquetRecordWriter.java
and also sabot/kernel/src/main/java/com/dremio/exec/ExecConstants.java
I modify kernel and compile one you can have a try GitHub - rongfengliang/dremio-parquet-zstd
Is it possible to merge it?
as far as I know, it seems impossible
We are currently doing it for being able to read ZStandard compressed parquet files
Basically, all thatâs needed is adding two native libraries - a version of libhadoop.so that is compiled with ZSTD support and the ZSTD native library.
We are running Dremio using docker, so we do it like this:
---
version: '3'
services:
executor-0:
...
volumes:
...
- /opt/dremio/lib/native/libhadoop.so.3.3.2:/opt/dremio/lib/libhadoop.so
- /opt/dremio/lib/native/libzstd.so.1.5.2:/opt/dremio/lib/libzstd.so
- /opt/dremio/lib/hadoop-common-3.3.2-dremio-202207041927090255-61c2bd1.jar:/opt/dremio/jars/3rdparty/hadoop-common-3.3.2-dremio-202207041927090255-61c2bd1.jar
One word of caution though. Thereâs a bug in the Dremio CE parquet reader library (which is free but the source isnât open), which doesnât release the decompressor after use. It doesnât make a difference for Snappy, but the ZStandard decompresser uses native memory, which is then leaked. This has caused our executors to get killed by the OOM killer several times per day (~350GB memory leaked before doing so).
Thatâs why hadoop-common-3.3.2
is mapped in the above, in which I made a somewhat hacky fix that releases the native memory in a finalizer such that the leaking is somewhat contained. I have confirmed that fix contains the issue, so Iâll make a thread on it specifically (or maybe someone from Dremio will notice it here). If someone at Dremio doesnât pick it up Iâll be doing a nicer and more permanent fix for it