Reflections on S3 - no results when hitting "run"

Hi!
I installed Dremio CE (v 22.0.0-202206221430090603-1fa4049f
) on an EC2 instance.

Configured IAM Role profiles giving permissions to a S3 bucket for reading and writing (as said on Dremio docs).

Configured dremio.conf with the param:
dist: "dremioS3:///my-private-bucket/accel"

Configured the file core-site.xml file with this content:

<?xml version="1.0"?>
<configuration>
<property>
   <name>fs.dremioS3.impl</name>
   <description>The FileSystem implementation. Must be set to com.dremio.plugins.s3.store.S3FileSystem</description>
   <value>com.dremio.plugins.s3.store.S3FileSystem</value>
</property>
<property>
   <name>fs.s3a.aws.credentials.provider</name>
   <description>The credential provider type.</description>
   <value>com.amazonaws.auth.InstanceProfileCredentialsProvider</value>
</property>
</configuration>

Connected a mongodb source and write some queries…
After creating the reflections (the files are shown in S3), I stopped to have results on my queries!
When I click “preview” button, I got data. Nice…
In the same query, when I click “Run”, the result panel “flashes” a little, and get the “No results” message.
I noticed this only happens when using the FLATTEN command… here is my query:

SELECT code, date_start, nested_0.group_platform.transactions AS platform_transactions, nested_0.group_platform.sessions AS platform_sessions, nested_0.group_platform."_id" AS platform_name
    FROM (
      SELECT code, date_start, FLATTEN(group_platform) AS group_platform
      FROM MySource.summaries AS summaries
    ) nested_0

When using the default reflection storage method (in data folder), this behavior didn’t happen. Reflections are created and the data is shown correctly (more than 600k rows).

I’m new to Dremio and this is driving me nuts!
What may be causing this behavior? Maybe I could limit myself to store reflections on EBS…

Thanks!

@almirb

Are you able to send us the profile of both preview and run?

Hello!
Here are the files.
5978a6de-df57-495e-b9ea-a7037518ae3c_preview.zip (17,0,KB)
cb049d6d-657f-412f-b46c-da5fd5276247_run.zip (15,8,KB)

I also noticed that disabling iceberg on “Support Settings” :
dremio.iceberg.enabled: false
… and recreating the reflections fixes the problem.

What would be the drawbacks of disabling iceberg?

Thanks!

You might be hitting a bug where if there are nulls in the “group_platform” array column then zero records are returned. This bug has been fixed in version 23.

In the query profile, if you compare the FLATTEN and PROJECT operators, you’ll see that the PROJECT sees no records in the “Run” profile. The “Preview” profile is limited by 10,000 records so maybe the query didn’t encounter any null values in the “group_platform” column.

You can try a workaround by re-writing the SQL using a left outer join similar to:

SELECT x.c1, x.c2, x.c3, y.flat_c4
FROM “flatten_result” x
LEFT OUTER JOIN (SELECT c1, c2, c3, FLATTEN(c4) as flat_c4 FROM “flatten_result”) y ON x.c1 = y.c1

I don’t think this problem has anything to do with reflections or whether dremio.iceberg.enabled is enabled or not. (Both of your profiles shows use of the same reflection materialization stored in Iceberg table format).

You definitely don’t want to disable dremio.iceberg.enabled or else the unlimited splits feature will be disabled. See Dremio

1 Like

Hello!
I removed all reflections, enabled dremio.iceberg.enabled again and recreated them. Now they’re working!
One thing I noticed: If I try to create a reflection on a query that uses “FLATTEN”, it shows “Reflection cannot accelerate. Reflection in progress”.
I changed my query to use LEFT OUTER JOIN as you told in your answer, but the result is the same.
When I try to create reflections over other VDS (without having any FLATTEN command on source hiearchy), they work well.

Here are all related files:
cannot_accelerate01


7be225c2-6010-48a1-ad15-29bee11cd31b_LOAD_MAT_METADATA.zip (4,9,KB)
68a72055-a8b5-4321-b9ac-131622113def_REFRESH_REFLECTION.zip (40,6,KB)

P.S.: I limited the data to only few rows to make another test… no NULL fields… and the reflection continue to not accelerate. When I remove the FLATTEN field, It works.

Continuing my saga in reflections creation, I created a minimum reproducible example.

  1. Upload this item as a PDS:
    flatten_dataset.json.zip (362,Bytes)

  2. Create a VDS based on the query below:

SELECT "_id", nested_0.group_items.counter_c AS group_items_counter_c, nested_0.group_items.counter_b AS group_items_counter_b, nested_0.group_items.counter_a AS group_items_counter_a, nested_0.group_items."_id" AS group_items_id
FROM (
  SELECT "_id", flatten(group_items) AS group_items
  FROM "@user".flatten_dataset AS flatten_dataset
) nested_0
  1. See the result:

  2. Try to create a raw reflection on this VDS.

  3. Check the result. Mine is not good… :frowning:
    image

Here are the reflection creation job history files:
b7b03119-ff40-4a66-b842-78867fb31de6_LOAD_MAT_METADATA.zip (4,9,KB)
93cd169a-472a-41d7-9a8a-a700bcc858d9_REFRESH_REFLECTION.zip (14,8,KB)

P.S.1: It doesn’t depend on dremio.iceberg.enabled. Same result when enabled/disabled.
P.S.2: Tried on two different machines. Doesn’t work.

Hi Amir

Your REFRESH REFLECTION profile looks good. You can see 3 records written in the PARQUET_WRITER and the ICEBERG_MANIFEST_WRITER above that.

I tried our your JSON file, built a raw reflection on the JSON and used the reflection to accelerate queries without any problems. Everything works fine on my end.

I think you should query sys.reflections and sys.materializations to see if you can get anymore information why this reflection is in the CANNOT_ACCELERATE_SCHEDULED state. I can see you uploaded the dataset into your home folder and by design those dataset’s refresh policy are supposed to never refresh and never expire. The “Reflection in progress” message is misleading. That issue is fixed in v23.

Here is the return of the query:

select * 
from sys.materializations inner join sys.reflections on sys.materializations.reflection_id = sys.reflections.reflection_id
where sys.reflections.reflection_id = 'ef0a5252-b4bb-431f-8f00-631af15eb930'

1d0514d4-ac11-5647-6c82-ae11c97b6e00.zip (776,Bytes)

Apparently, no failures.

And here is the profile of the query:

select * FROM Curated.Temp.flatten_dataset_vds

The query wasn’t accelerated.
4a0c831c-0bb3-4a77-bf91-14c41c19c118.zip (10,8,KB)

When will be the version 23 released to the public?

Thanks!

@almirb There could have been an issue either during reflection creation or query substitution, can we get the server.log when this reflection got created which is “2022-08-15 19:22:31” and also during query execution which is “2022-08-16 00:59:22”, also provide PARQUET or JSON download of “sys.reflections” and “select * from sys.materializations”

Hi @balaji.ramaswamy !

I did a fresh install from tarball file and connect distributed storage to one of my S3 buckets.
Next I uploaded the JSON file as a PDS and then created the de VDS with the query below:

SELECT "_id", nested_0.group_items.counter_c AS group_items_counter_c, nested_0.group_items.counter_b AS group_items_counter_b, nested_0.group_items.counter_a AS group_items_counter_a, nested_0.group_items."_id" AS group_items_id
FROM (
  SELECT "_id", flatten(group_items) AS group_items
  FROM "@almir".flatten_dataset AS flatten_dataset
) nested_0

Later created the Raw Reflection:

… and tried to query the VDS.
The query wasn’t accelerated (profile available in zip file below).

I think here are all the needed files for debugging purposes (including my s3 bucket content).
dremio_debug_files.zip (66,9,KB)

I not mentioned before, but I’m using OpenJDK 11:
openjdk version “11.0.16” 2022-07-19
OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu120.04)
OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Ubuntu-0ubuntu120.04, mixed mode, sharing)

I hope you can find whats happening here.
Thanks!

Hi!
Did you find something useful with the information I provided?

Thanks!

Hello @almirb

I went through the files you uploaded. So, this is what I can see, it’s clear the reflection materialization was built successfully as I can see it in DONE state with bytes in the sys.materializations table. However, in sys.reflections, the status is CANNOT_ACCELERATE_SCHEDULED. This doesn’t happen often. It means that you do have a successful materialization but it is not available to be used for query acceleration/substitution.

In your server.log I see this:

01:00 WARN: [kryo] Unable to load class org.apache.calcite.sql.type.OperandTypes$$Lambda$250/0x00000008403d4040 with kryo’s

kyro is the serializer we use for the reflection logical plans. In Dremio EE, we use a different serializer and that might explain why I can’t reproduce the same error that you get. I am able to accelerate queries with the reflection.

So, let’s see if you are able to use the other serializer. This is specified in sabot-module.conf. In server.log on server start, you can see where we scan the classpaths looking for all instances of this config file. You can create this same file by name and add this somewhere to your JVM classpath.

In this file, just specify:

dremio.planning.serializer = com.dremio.exec.planner.serializer.ProtoRelSerializerFactory

Let’s see if that works… there should be no more “kyro” warnings in server.out.

1 Like

Hi @Benny_Chow !

I created the configuration as you said. Here are my findings.
After creating the configuration you said, I got that erros on loading this class on server.log:

2022-08-23 12:32:59,840 [main] INFO  com.dremio.common.config.SabotConfig - User Error Occurred [ErrorId: e1bb3df5-f170-4fd6-a241-1c63933daa65]
com.dremio.common.exceptions.UserException: Failure while attempting to load instance of the class of type com.dremio.exec.planner.serialization.RelSerializerFactory requested at path dremio.planning.serializer.
	at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:890)
	at com.dremio.common.config.SabotConfig.getInstance(SabotConfig.java:69)
	at com.dremio.exec.planner.serialization.RelSerializerFactory.getFactory(RelSerializerFactory.java:78)
	at com.dremio.exec.planner.serialization.RelSerializerFactory.getPlanningFactory(RelSerializerFactory.java:65)
	at com.dremio.exec.planner.sql.SqlConverter.getSerializerFactory(SqlConverter.java:239)
	at com.dremio.exec.planner.acceleration.MaterializationExpander.deserializePlan(MaterializationExpander.java:245)
	at com.dremio.exec.planner.acceleration.MaterializationExpander.expand(MaterializationExpander.java:70)
	at com.dremio.exec.planner.acceleration.MaterializationDescriptor.getMaterializationFor(MaterializationDescriptor.java:160)
	at com.dremio.service.reflection.ReflectionServiceImpl$CacheHelperImpl.expand(ReflectionServiceImpl.java:1136)
	at com.dremio.service.reflection.ReflectionServiceImpl$CacheHelperImpl.expand(ReflectionServiceImpl.java:1117)
	at com.dremio.service.reflection.MaterializationCache.updateEntry(MaterializationCache.java:222)
	at com.dremio.service.reflection.MaterializationCache.safeUpdateEntry(MaterializationCache.java:212)
	at com.dremio.service.reflection.MaterializationCache.updateCache(MaterializationCache.java:145)
	at com.dremio.service.reflection.MaterializationCache.compareAndSetCache(MaterializationCache.java:109)
	at com.dremio.service.reflection.MaterializationCache.refresh(MaterializationCache.java:102)
	at com.dremio.service.reflection.ReflectionServiceImpl.start(ReflectionServiceImpl.java:303)
	at com.dremio.service.SingletonRegistry$AbstractServiceReference.start(SingletonRegistry.java:137)
	at com.dremio.service.ServiceRegistry.start(ServiceRegistry.java:88)
	at com.dremio.service.SingletonRegistry.start(SingletonRegistry.java:33)
	at com.dremio.dac.daemon.DACDaemon.startServices(DACDaemon.java:196)
	at com.dremio.dac.daemon.DACDaemon.init(DACDaemon.java:202)
	at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:104)
Caused by: java.lang.ClassNotFoundException: com.dremio.exec.planner.serializer.ProtoRelSerializerFactory
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:315)
	at com.dremio.common.config.SabotConfig.getInstance(SabotConfig.java:64)
	... 20 common frames omitted
2022-08-23 12:32:59,903 [main] INFO  c.d.s.r.ReflectionServiceImpl - Reflections masterInit

Investigating Dremio jar’s, (dremio-sabot-kernel-22.0.0-202206221430090603-1fa4049f.jar) I found out this class path serializer doesn’t exists… the path would be com.dremio.exec.planner.serialization.ProtoRelSerializerFactory rather than com.dremio.exec.planner.serializer.ProtoRelSerializerFactory (It was my guess…). Tried to change the path and tried again:

2022-08-23 12:40:14,364 [scheduler-21] INFO  com.dremio.common.config.SabotConfig - User Error Occurred [ErrorId: 0eed65b3-b28b-45ab-accb-017820fe6f4d]
com.dremio.common.exceptions.UserException: Failure while attempting to load instance of the class of type com.dremio.exec.planner.serialization.RelSerializerFactory requested at path dremio.planning.serializer.
	at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:890)
	at com.dremio.common.config.SabotConfig.getInstance(SabotConfig.java:69)
	at com.dremio.exec.planner.serialization.RelSerializerFactory.getFactory(RelSerializerFactory.java:78)
	at com.dremio.exec.planner.serialization.RelSerializerFactory.getPlanningFactory(RelSerializerFactory.java:65)
	at com.dremio.exec.planner.sql.SqlConverter.getSerializerFactory(SqlConverter.java:239)
	at com.dremio.exec.planner.acceleration.MaterializationExpander.deserializePlan(MaterializationExpander.java:245)
	at com.dremio.exec.planner.acceleration.MaterializationExpander.expand(MaterializationExpander.java:70)
	at com.dremio.exec.planner.acceleration.MaterializationDescriptor.getMaterializationFor(MaterializationDescriptor.java:160)
	at com.dremio.service.reflection.ReflectionServiceImpl$CacheHelperImpl.expand(ReflectionServiceImpl.java:1136)
	at com.dremio.service.reflection.ReflectionServiceImpl$CacheHelperImpl.expand(ReflectionServiceImpl.java:1117)
	at com.dremio.service.reflection.MaterializationCache.updateEntry(MaterializationCache.java:222)
	at com.dremio.service.reflection.MaterializationCache.safeUpdateEntry(MaterializationCache.java:212)
	at com.dremio.service.reflection.MaterializationCache.updateCache(MaterializationCache.java:145)
	at com.dremio.service.reflection.MaterializationCache.compareAndSetCache(MaterializationCache.java:109)
	at com.dremio.service.reflection.MaterializationCache.refresh(MaterializationCache.java:102)
	at com.dremio.service.reflection.ReflectionServiceImpl.refreshCache(ReflectionServiceImpl.java:463)
	at com.dremio.service.reflection.ReflectionServiceImpl$CacheRefresher.run(ReflectionServiceImpl.java:1232)
	at com.dremio.service.scheduler.LocalSchedulerService$CancellableTask.run(LocalSchedulerService.java:226)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: com.dremio.exec.planner.serialization.ProtoRelSerializerFactory
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:315)
	at com.dremio.common.config.SabotConfig.getInstance(SabotConfig.java:64)
	... 22 common frames omitted

After playing a little with the configs, I realized the refault serializer config of Dremio CE is: com.dremio.exec.planner.serialization.kryo.KryoRelSerializerFactory.

Looking inside dremio-sabot-kernel-22.0.0-202206221430090603-1fa4049f.jar, I saw there were only kyro packages and nothing about ProtoRelSerializerFactory:

image

What can I try next? Maybe this serializer really doens’t exists in CE version of Dremio…

Thanks!

Hi @Benny_Chow !

Any news on this?

Thanks!

Sorry for the delay. It sounds like you really tried to debug the problem. When you installed CE edition, was there a JAR available named “dremio-ce-sabot-serializer*.jar”? If so, this jar should be added to the class path. The original setting I gave you was correct. If you can find this jar, you can confirm that the package+class exists in the jar.

Hello!

This file doesn’t exist on tarball version 22.0.0 and neither on 22.1.1 (as shown on figure below):

With a Google search, I found out this file exists on a AWS Edition on Dremio CE (at least in older versions), because It appears on the user’s error log):

Could I get this file through other method?
Thanks!

I got the file dremio-ce-sabot-serializer-22.0.0-202206221302090800-5be46fc9.jar from the Dremio - AWS CE AMI disk and put inside /opt/dremio/jars folder.
I placed the sabot-module.conf with the line inside the conf folder:
dremio.planning.serializer = com.dremio.exec.planner.serializer.ProtoRelSerializerFactory

After restarting Dremio, the reflection was created sucessfully. No more Kyro errors.

Maybe Dremio’s Team could add this file to the Dremio CE tarball version.

Thanks for all efforts.

1 Like

That’s awesome you got it working! I will create an internal JIRA to see if it is possible to update the CE serializer to be the same as EE serializer.

1 Like

Hi! Do you know when the next Dremio CE version will be available? Do you think It’ll include this change? Thanks!

Hi @Benny_Chow !
I saw Dremio CE was updated to version 23.0.1! =D
Could you say if the serializer was changed to ProtoRelSerializerFactory or maybe the problem I described at first message were solved with one of the fixes of this new version?

Thanks!