Failure to create Aggregation Reflections

Hello all!

I have an interesting issue that occurs when I attempt to create an aggregation reflection on one of my data sets. After selecting 3 dimensions (all of which are just text columns) for the aggregation reflection and saving, I get a yellow warning triangle on the reflection (with a footprint of 0 bytes). Following the link on the warning, I see that the reflection creation failed with the following message:

Reflection could not be created as it uses context-sensitive functions. Functions like IS_MEMBER, USER, etc. cannot be used in reflections since they require context to complete.

This confuses me because all of the columns I used for dimensions are all just simple text fields representing fields in a data model. They are not generated fields, like the error message seems to be suggesting. These three fields are the only dimensions in the aggregation reflection, and there are no other sort, partition, or measure fields selected.

Any idea what might be going on? I couldn’t find any information about this error message online, so I’m not sure where to start looking.

@cygnus

Would you be able to provide us with the profile and the screenshot of the reflection settings?

Thanks
Bali

Hi Bali,

Sorry for the delay! I can certainly provide an example query profile (attached), but I think that we’ve identified the possible issue. One quick note on the profile: I didn’t have the time to create a truly distinct data source from our normal data, so I have redacted some of the original columns in the profile that do not relate to this issue. dummy_table_with_user_example.zip (9.2 KB)

I am trying to create a reflection on an HDFS Parquet-backed table. One of the columns of the underlying data model is called user. However, Dremio uses user as a reserved word to indicate the user that is logged into the Dremio UI. This means that if I run SELECT * FROM DataSource, I will only see the currently logged in Dremio user as the values under the “user” column. This is an issue in and of itself, but it gets worse when it comes to creating reflections.

In the Dremio codebase, you have the following:

if(!ExpansionNode.findNodes(relNode, r -> r.isContextSensitive()).isEmpty()) {
  throw UserException.validationError()
    .message("Reflection could not be created as it uses context-sensitive functions. "
        + "Functions like IS_MEMBER, USER, etc. cannot be used in reflections since "
        + "they require context to complete.")
    .build(logger);
}

inside of ReflectionPlanNormalizer. This shows that if any node isContextSensitive(), the reflection will fail. It doesn’t matter that we aren’t using the user field in our data model in the reflection, it will fail simply because Dremio thinks that our user column is the reserved Dremio user instead of its own column. You can see that it’ll be an issue from the profile:

ExpansionNode(path=[REDACTED_MODEL_PATH], contextSensitive=[true]): rowcount = 2.8898855E7, cumulative cost = {5.779771E7 rows, 1.936223285E9 cpu, 9.8256107E8 io, 9.8256107E8 network, 0.0 memory}, id = 1003680

Since the whole expansion node for the reflection query is contextSensitive, we will always fail.

We also see when we look at the provided profile, that somewhere in the chain of tables, we get user=['$dremio$'], meaning that Dremio is indeed misinterpreting a parquet file with a column of user as a Dremio user.

@cygnus

Can you try to qualify and quote users, for example say your table name is “users” and you have a column name called “user”, then the below query would work

SELECT user, users.“user” name FROM users

The output would be

{“user”:“dremio”,“name”:“steven”}

Thanks
Bali

@balaji.ramaswamy

Hm, interestingly that did not seem to work in my instance:

So, that’s very curious, if you’d expect that to work. For this particular table, we are reading Parquet files from HDFS, and when downloading an individual parquet file, I can confirm that the user field has the correct data as expected (I would never expect to see reader in the raw data). So, something must have gone wrong at some point in the process for it now to be associated with just the Dremio-specific user context.

@cygnus

I just validated on a CSV file and works fine, see attached “user.csv.zip”, uncompress and upload to Dremio

Then run below query

SELECT user, “user”.“user” as realuser, username FROM “user”

Expected output

{“user”:“dremio”,“realuser”:“Stefan”,“username”:“Edberg”}
{“user”:“dremio”,“realuser”:“Boris”,“username”:“Becker”}

user.csv.zip (234 Bytes)