Hi!
I installed Dremio CE (v 22.0.0-202206221430090603-1fa4049f
) on an EC2 instance.
Configured IAM Role profiles giving permissions to a S3 bucket for reading and writing (as said on Dremio docs).
Configured dremio.conf with the param: dist: "dremioS3:///my-private-bucket/accel"
Configured the file core-site.xml file with this content:
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.dremioS3.impl</name>
<description>The FileSystem implementation. Must be set to com.dremio.plugins.s3.store.S3FileSystem</description>
<value>com.dremio.plugins.s3.store.S3FileSystem</value>
</property>
<property>
<name>fs.s3a.aws.credentials.provider</name>
<description>The credential provider type.</description>
<value>com.amazonaws.auth.InstanceProfileCredentialsProvider</value>
</property>
</configuration>
Connected a mongodb source and write some queries…
After creating the reflections (the files are shown in S3), I stopped to have results on my queries!
When I click “preview” button, I got data. Nice…
In the same query, when I click “Run”, the result panel “flashes” a little, and get the “No results” message.
I noticed this only happens when using the FLATTEN command… here is my query:
SELECT code, date_start, nested_0.group_platform.transactions AS platform_transactions, nested_0.group_platform.sessions AS platform_sessions, nested_0.group_platform."_id" AS platform_name
FROM (
SELECT code, date_start, FLATTEN(group_platform) AS group_platform
FROM MySource.summaries AS summaries
) nested_0
When using the default reflection storage method (in data folder), this behavior didn’t happen. Reflections are created and the data is shown correctly (more than 600k rows).
I’m new to Dremio and this is driving me nuts!
What may be causing this behavior? Maybe I could limit myself to store reflections on EBS…
You might be hitting a bug where if there are nulls in the “group_platform” array column then zero records are returned. This bug has been fixed in version 23.
In the query profile, if you compare the FLATTEN and PROJECT operators, you’ll see that the PROJECT sees no records in the “Run” profile. The “Preview” profile is limited by 10,000 records so maybe the query didn’t encounter any null values in the “group_platform” column.
You can try a workaround by re-writing the SQL using a left outer join similar to:
SELECT x.c1, x.c2, x.c3, y.flat_c4
FROM “flatten_result” x
LEFT OUTER JOIN (SELECT c1, c2, c3, FLATTEN(c4) as flat_c4 FROM “flatten_result”) y ON x.c1 = y.c1
I don’t think this problem has anything to do with reflections or whether dremio.iceberg.enabled is enabled or not. (Both of your profiles shows use of the same reflection materialization stored in Iceberg table format).
You definitely don’t want to disable dremio.iceberg.enabled or else the unlimited splits feature will be disabled. See Dremio
Hello!
I removed all reflections, enabled dremio.iceberg.enabled again and recreated them. Now they’re working!
One thing I noticed: If I try to create a reflection on a query that uses “FLATTEN”, it shows “Reflection cannot accelerate. Reflection in progress”.
I changed my query to use LEFT OUTER JOIN as you told in your answer, but the result is the same.
When I try to create reflections over other VDS (without having any FLATTEN command on source hiearchy), they work well.
P.S.: I limited the data to only few rows to make another test… no NULL fields… and the reflection continue to not accelerate. When I remove the FLATTEN field, It works.
SELECT "_id", nested_0.group_items.counter_c AS group_items_counter_c, nested_0.group_items.counter_b AS group_items_counter_b, nested_0.group_items.counter_a AS group_items_counter_a, nested_0.group_items."_id" AS group_items_id
FROM (
SELECT "_id", flatten(group_items) AS group_items
FROM "@user".flatten_dataset AS flatten_dataset
) nested_0
Your REFRESH REFLECTION profile looks good. You can see 3 records written in the PARQUET_WRITER and the ICEBERG_MANIFEST_WRITER above that.
I tried our your JSON file, built a raw reflection on the JSON and used the reflection to accelerate queries without any problems. Everything works fine on my end.
I think you should query sys.reflections and sys.materializations to see if you can get anymore information why this reflection is in the CANNOT_ACCELERATE_SCHEDULED state. I can see you uploaded the dataset into your home folder and by design those dataset’s refresh policy are supposed to never refresh and never expire. The “Reflection in progress” message is misleading. That issue is fixed in v23.
select *
from sys.materializations inner join sys.reflections on sys.materializations.reflection_id = sys.reflections.reflection_id
where sys.reflections.reflection_id = 'ef0a5252-b4bb-431f-8f00-631af15eb930'
@almirb There could have been an issue either during reflection creation or query substitution, can we get the server.log when this reflection got created which is “2022-08-15 19:22:31” and also during query execution which is “2022-08-16 00:59:22”, also provide PARQUET or JSON download of “sys.reflections” and “select * from sys.materializations”
I did a fresh install from tarball file and connect distributed storage to one of my S3 buckets.
Next I uploaded the JSON file as a PDS and then created the de VDS with the query below:
SELECT "_id", nested_0.group_items.counter_c AS group_items_counter_c, nested_0.group_items.counter_b AS group_items_counter_b, nested_0.group_items.counter_a AS group_items_counter_a, nested_0.group_items."_id" AS group_items_id
FROM (
SELECT "_id", flatten(group_items) AS group_items
FROM "@almir".flatten_dataset AS flatten_dataset
) nested_0
… and tried to query the VDS.
The query wasn’t accelerated (profile available in zip file below).
I think here are all the needed files for debugging purposes (including my s3 bucket content). dremio_debug_files.zip (66,9,KB)
I not mentioned before, but I’m using OpenJDK 11:
openjdk version “11.0.16” 2022-07-19
OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu120.04)
OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Ubuntu-0ubuntu120.04, mixed mode, sharing)
I went through the files you uploaded. So, this is what I can see, it’s clear the reflection materialization was built successfully as I can see it in DONE state with bytes in the sys.materializations table. However, in sys.reflections, the status is CANNOT_ACCELERATE_SCHEDULED. This doesn’t happen often. It means that you do have a successful materialization but it is not available to be used for query acceleration/substitution.
In your server.log I see this:
01:00 WARN: [kryo] Unable to load class org.apache.calcite.sql.type.OperandTypes$$Lambda$250/0x00000008403d4040 with kryo’s
kyro is the serializer we use for the reflection logical plans. In Dremio EE, we use a different serializer and that might explain why I can’t reproduce the same error that you get. I am able to accelerate queries with the reflection.
So, let’s see if you are able to use the other serializer. This is specified in sabot-module.conf. In server.log on server start, you can see where we scan the classpaths looking for all instances of this config file. You can create this same file by name and add this somewhere to your JVM classpath.
I created the configuration as you said. Here are my findings.
After creating the configuration you said, I got that erros on loading this class on server.log:
2022-08-23 12:32:59,840 [main] INFO com.dremio.common.config.SabotConfig - User Error Occurred [ErrorId: e1bb3df5-f170-4fd6-a241-1c63933daa65]
com.dremio.common.exceptions.UserException: Failure while attempting to load instance of the class of type com.dremio.exec.planner.serialization.RelSerializerFactory requested at path dremio.planning.serializer.
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:890)
at com.dremio.common.config.SabotConfig.getInstance(SabotConfig.java:69)
at com.dremio.exec.planner.serialization.RelSerializerFactory.getFactory(RelSerializerFactory.java:78)
at com.dremio.exec.planner.serialization.RelSerializerFactory.getPlanningFactory(RelSerializerFactory.java:65)
at com.dremio.exec.planner.sql.SqlConverter.getSerializerFactory(SqlConverter.java:239)
at com.dremio.exec.planner.acceleration.MaterializationExpander.deserializePlan(MaterializationExpander.java:245)
at com.dremio.exec.planner.acceleration.MaterializationExpander.expand(MaterializationExpander.java:70)
at com.dremio.exec.planner.acceleration.MaterializationDescriptor.getMaterializationFor(MaterializationDescriptor.java:160)
at com.dremio.service.reflection.ReflectionServiceImpl$CacheHelperImpl.expand(ReflectionServiceImpl.java:1136)
at com.dremio.service.reflection.ReflectionServiceImpl$CacheHelperImpl.expand(ReflectionServiceImpl.java:1117)
at com.dremio.service.reflection.MaterializationCache.updateEntry(MaterializationCache.java:222)
at com.dremio.service.reflection.MaterializationCache.safeUpdateEntry(MaterializationCache.java:212)
at com.dremio.service.reflection.MaterializationCache.updateCache(MaterializationCache.java:145)
at com.dremio.service.reflection.MaterializationCache.compareAndSetCache(MaterializationCache.java:109)
at com.dremio.service.reflection.MaterializationCache.refresh(MaterializationCache.java:102)
at com.dremio.service.reflection.ReflectionServiceImpl.start(ReflectionServiceImpl.java:303)
at com.dremio.service.SingletonRegistry$AbstractServiceReference.start(SingletonRegistry.java:137)
at com.dremio.service.ServiceRegistry.start(ServiceRegistry.java:88)
at com.dremio.service.SingletonRegistry.start(SingletonRegistry.java:33)
at com.dremio.dac.daemon.DACDaemon.startServices(DACDaemon.java:196)
at com.dremio.dac.daemon.DACDaemon.init(DACDaemon.java:202)
at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:104)
Caused by: java.lang.ClassNotFoundException: com.dremio.exec.planner.serializer.ProtoRelSerializerFactory
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:315)
at com.dremio.common.config.SabotConfig.getInstance(SabotConfig.java:64)
... 20 common frames omitted
2022-08-23 12:32:59,903 [main] INFO c.d.s.r.ReflectionServiceImpl - Reflections masterInit
Investigating Dremio jar’s, (dremio-sabot-kernel-22.0.0-202206221430090603-1fa4049f.jar) I found out this class path serializer doesn’t exists… the path would be com.dremio.exec.planner.serialization.ProtoRelSerializerFactory rather than com.dremio.exec.planner.serializer.ProtoRelSerializerFactory (It was my guess…). Tried to change the path and tried again:
2022-08-23 12:40:14,364 [scheduler-21] INFO com.dremio.common.config.SabotConfig - User Error Occurred [ErrorId: 0eed65b3-b28b-45ab-accb-017820fe6f4d]
com.dremio.common.exceptions.UserException: Failure while attempting to load instance of the class of type com.dremio.exec.planner.serialization.RelSerializerFactory requested at path dremio.planning.serializer.
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:890)
at com.dremio.common.config.SabotConfig.getInstance(SabotConfig.java:69)
at com.dremio.exec.planner.serialization.RelSerializerFactory.getFactory(RelSerializerFactory.java:78)
at com.dremio.exec.planner.serialization.RelSerializerFactory.getPlanningFactory(RelSerializerFactory.java:65)
at com.dremio.exec.planner.sql.SqlConverter.getSerializerFactory(SqlConverter.java:239)
at com.dremio.exec.planner.acceleration.MaterializationExpander.deserializePlan(MaterializationExpander.java:245)
at com.dremio.exec.planner.acceleration.MaterializationExpander.expand(MaterializationExpander.java:70)
at com.dremio.exec.planner.acceleration.MaterializationDescriptor.getMaterializationFor(MaterializationDescriptor.java:160)
at com.dremio.service.reflection.ReflectionServiceImpl$CacheHelperImpl.expand(ReflectionServiceImpl.java:1136)
at com.dremio.service.reflection.ReflectionServiceImpl$CacheHelperImpl.expand(ReflectionServiceImpl.java:1117)
at com.dremio.service.reflection.MaterializationCache.updateEntry(MaterializationCache.java:222)
at com.dremio.service.reflection.MaterializationCache.safeUpdateEntry(MaterializationCache.java:212)
at com.dremio.service.reflection.MaterializationCache.updateCache(MaterializationCache.java:145)
at com.dremio.service.reflection.MaterializationCache.compareAndSetCache(MaterializationCache.java:109)
at com.dremio.service.reflection.MaterializationCache.refresh(MaterializationCache.java:102)
at com.dremio.service.reflection.ReflectionServiceImpl.refreshCache(ReflectionServiceImpl.java:463)
at com.dremio.service.reflection.ReflectionServiceImpl$CacheRefresher.run(ReflectionServiceImpl.java:1232)
at com.dremio.service.scheduler.LocalSchedulerService$CancellableTask.run(LocalSchedulerService.java:226)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: com.dremio.exec.planner.serialization.ProtoRelSerializerFactory
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:315)
at com.dremio.common.config.SabotConfig.getInstance(SabotConfig.java:64)
... 22 common frames omitted
After playing a little with the configs, I realized the refault serializer config of Dremio CE is: com.dremio.exec.planner.serialization.kryo.KryoRelSerializerFactory.
Looking inside dremio-sabot-kernel-22.0.0-202206221430090603-1fa4049f.jar, I saw there were only kyro packages and nothing about ProtoRelSerializerFactory:
What can I try next? Maybe this serializer really doens’t exists in CE version of Dremio…
Sorry for the delay. It sounds like you really tried to debug the problem. When you installed CE edition, was there a JAR available named “dremio-ce-sabot-serializer*.jar”? If so, this jar should be added to the class path. The original setting I gave you was correct. If you can find this jar, you can confirm that the package+class exists in the jar.
With a Google search, I found out this file exists on a AWS Edition on Dremio CE (at least in older versions), because It appears on the user’s error log):
Could I get this file through other method?
Thanks!
I got the file dremio-ce-sabot-serializer-22.0.0-202206221302090800-5be46fc9.jar from the Dremio - AWS CE AMI disk and put inside /opt/dremio/jars folder.
I placed the sabot-module.conf with the line inside the conf folder: dremio.planning.serializer = com.dremio.exec.planner.serializer.ProtoRelSerializerFactory
After restarting Dremio, the reflection was created sucessfully. No more Kyro errors.
Maybe Dremio’s Team could add this file to the Dremio CE tarball version.
That’s awesome you got it working! I will create an internal JIRA to see if it is possible to update the CE serializer to be the same as EE serializer.
Hi @Benny_Chow !
I saw Dremio CE was updated to version 23.0.1! =D
Could you say if the serializer was changed to ProtoRelSerializerFactory or maybe the problem I described at first message were solved with one of the fixes of this new version?