Dremio utf-8 ko_KR charset support, An error occurred while entering UTF-8 Korean

hello? everyone!!

I am tesing dremio 24.2 version.

I created an iceberg DB table in minio and tried to input data.

An error occurred while entering UTF-8 Korean.


  1. DDL is
CREATE TABLE minio."dremio-bucket".test2 (username VARCHAR not null, addr1 VARCHAR null, tel varchar null, age int)

  1. And DML is
insert into minio."dremio-bucket".test2 (username, addr1, tel, age) values ('netbee', '파주시', '01037388009',46)
  1. And server error message is here
2023-12-04 16:14:02,298 [1a920534-d978-5753-cb8d-56d633015700:foreman] INFO  c.d.exec.work.foreman.AttemptManager - 1a920534-d978-5753-cb8d-56d633015700: State change requested ENQUEUED --> FAILED, Exception com.dremio.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: java.lang.AssertionError: to VARCHAR(3) from _UTF-8'???':VARCHAR(3) CHARACTER SET "UTF-8" COLLATE "UTF-8$en_US$primary"
2023-12-04 16:14:02,299 [1a920534-d978-5753-cb8d-56d633015700:foreman] ERROR c.d.exec.work.foreman.AttemptManager - AssertionError: to VARCHAR(3) from _UTF-8'???':VARCHAR(3) CHARACTER SET "UTF-8" COLLATE "UTF-8$en_US$primary"
com.dremio.common.exceptions.UserException: AssertionError: to VARCHAR(3) from _UTF-8'???':VARCHAR(3) CHARACTER SET "UTF-8" COLLATE "UTF-8$en_US$primary"
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:926)
at com.dremio.exec.work.foreman.AttemptManager$AttemptResult.close(AttemptManager.java:765)
at com.dremio.exec.work.foreman.AttemptManager.moveToState(AttemptManager.java:888)
at com.dremio.exec.work.foreman.AttemptManager.run(AttemptManager.java:513)
at com.dremio.common.concurrent.ContextMigratingExecutorService$1.run(ContextMigratingExecutorService.java:69)
at com.dremio.context.RequestContext.run(RequestContext.java:109)
at com.dremio.common.concurrent.ContextMigratingExecutorService.lambda$decorate$4(ContextMigratingExecutorService.java:226)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.dremio.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: java.lang.AssertionError: to VARCHAR(3) from _UTF-8'???':VARCHAR(3) CHARACTER SET "UTF-8" COLLATE "UTF-8$en_US$primary"
at com.dremio.exec.work.foreman.AttemptManager.run(AttemptManager.java:514)
... 6 common frames omitted
Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError: to VARCHAR(3) from _UTF-8'???':VARCHAR(3) CHARACTER SET "UTF-8" COLLATE "UTF-8$en_US$primary"
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at com.dremio.exec.work.foreman.AttemptManager.run(AttemptManager.java:467)
... 6 common frames omitted
Caused by: java.lang.AssertionError: to VARCHAR(3) from _UTF-8'???':VARCHAR(3) CHARACTER SET "UTF-8" COLLATE "UTF-8$en_US$primary"
at com.dremio.exec.planner.logical.ValuesRel.verifyRowType(ValuesRel.java:168)
at com.dremio.exec.planner.logical.ValuesRel.<init>(ValuesRel.java:75)
at com.dremio.exec.planner.logical.ValuesRule.onMatch(ValuesRule.java:38)
at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:214)
at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:650)
at com.dremio.exec.planner.DremioVolcanoPlanner.findBestExp(DremioVolcanoPlanner.java:113)
at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:321)
at com.dremio.exec.planner.sql.handlers.PlannerUtil.lambda$transform$1(PlannerUtil.java:185)
at com.dremio.exec.planner.sql.handlers.PlannerUtil.doTransform(PlannerUtil.java:209)
at com.dremio.exec.planner.sql.handlers.PlannerUtil.transform(PlannerUtil.java:198)
at com.dremio.exec.planner.sql.handlers.DrelTransformer.convertToDrel(DrelTransformer.java:146)
at com.dremio.exec.planner.sql.handlers.query.DataAdditionCmdHandler.convertToDrel(DataAdditionCmdHandler.java:295)
at com.dremio.exec.planner.sql.handlers.query.DataAdditionCmdHandler.getPlan(DataAdditionCmdHandler.java:218)
at com.dremio.exec.planner.sql.handlers.query.InsertTableHandler.doInsert(InsertTableHandler.java:110)
at com.dremio.exec.planner.sql.handlers.query.InsertTableHandler.getPlan(InsertTableHandler.java:76)
at com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan(HandlerToExec.java:59)
at com.dremio.exec.work.foreman.AttemptManager.plan(AttemptManager.java:565)
at com.dremio.exec.work.foreman.AttemptManager.lambda$run$4(AttemptManager.java:462)
at com.dremio.service.commandpool.ReleasableBoundCommandPool.lambda$getWrappedCommand$3(ReleasableBoundCommandPool.java:140)
at com.dremio.service.commandpool.CommandWrapper.run(CommandWrapper.java:70)
at com.dremio.context.RequestContext.run(RequestContext.java:109)
at com.dremio.common.concurrent.ContextMigratingExecutorService.lambda$decorate$4(ContextMigratingExecutorService.java:226)
at com.dremio.common.concurrent.ContextMigratingExecutorService$ComparableRunnable.run(ContextMigratingExecutorService.java:206)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 common frames omitted

Thanks for raising this. There are a couple of workarounds that may work as of right now. The easiest one is to just use a shorthand to specify that the characters need to be treated as utf8. This can be done on your insert by doing the following:

insert into minio."dremio-bucket".test2 (username, addr1, tel, age) values ('netbee', _utf8'파주시', '01037388009',46);

Notice the _utf8 being prepended to the Korean string.

Another option if you don’t want to adjust insert statements with the utf8 marker is to set the default to UTF-16LE in your dremio-env file. This is the line you will need to add in order to make that work:

DREMIO_JAVA_SERVER_EXTRA_OPTS="-Dsaffron.default.charset=UTF-16LE"

To make this update, stop your dremio instance, update the dremio-env file, and then restart your dremio instance. This should allow you to run your original insert statement without the workaround mentioned above.

I hope this helps,

Dan

1 Like