CTAS Folder already exists at path and JSON schema learning

Hello,
having Build 4.2.1-202004111451200819-0c3ecaea Community Edition 11/04/2020 16:56:30 in docker on windows 10

i encounter a " Folder already exists at path" in a CTAS query, even if i delete the target directory before launching the query.

i’m trying to create parquet files from a SELECT on a vds over a physical json files repository (approx 35000 files in .gz compressed), dremio need mutiple attemps to have the correct schema over JSON files and i suspect problem is linked between “This query was attempted x times due to schema learning x-1” and “Folder already exists at path”

any advices ?

query is :

create table sata.“elsevier_query”.thomas as SELECT * FROM “@thomasm”.nifi_crossref.“vds_nifi_output” where publisher=‘Elsevier BV’ and type=‘journal-article’

Thanks

Thomas

2020-05-27 17:04:30,341 [213188ea-fffd-7fdf-69da-19a7c216ce00/9:foreman-planning] ERROR c.d.s.commandpool.CommandWrapper - command 213188ea-fffd-7fdf-69da-19a7c216ce00/9:foreman-planning failed

com.dremio.common.exceptions.UserException: Folder already exists at path: sata.elsevier.query.

at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:802)

at com.dremio.exec.store.dfs.FileSystemPlugin.createNewTable(FileSystemPlugin.java:1302)

at com.dremio.exec.catalog.CatalogImpl.createNewTable(CatalogImpl.java:379)

at com.dremio.exec.catalog.SourceAccessChecker.createNewTable(SourceAccessChecker.java:161)

at com.dremio.exec.catalog.DelegatingCatalog.createNewTable(DelegatingCatalog.java:150)

at com.dremio.exec.planner.sql.handlers.query.DataAdditionCmdHandler.convertToDrel(DataAdditionCmdHandler.java:228)

at com.dremio.exec.planner.sql.handlers.query.DataAdditionCmdHandler.getPlan(DataAdditionCmdHandler.java:155)

at com.dremio.exec.planner.sql.handlers.query.CreateTableHandler.getPlan(CreateTableHandler.java:53)

at com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan(HandlerToExec.java:59)

at com.dremio.exec.work.foreman.AttemptManager.plan(AttemptManager.java:392)

at com.dremio.exec.work.foreman.AttemptManager.lambda$run$1(AttemptManager.java:310)

at com.dremio.service.commandpool.CommandWrapper.run(CommandWrapper.java:62)

at com.dremio.common.concurrent.ContextMigratingExecutorService.lambda$decorate$3(ContextMigratingExecutorService.java:192)

at com.dremio.common.concurrent.ContextMigratingExecutorService$ComparableRunnable.run(ContextMigratingExecutorService.java:174)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

We have the problem, feature request submitted.

What we are using successfully as a workaround: insert a document that contains the full schema into your dataset and make sure that it is read EARLY. It must be within the first 4096 documents because they determine the schema of the whole dataset. So, depending on your structure, either create a aaaaa.json or a folder aaaa/ with the prepared document in it. I hope the idea is clear and that it does help.

Best, Tim