Raw reflection of Disconnected / temporarily unreachable source (example s3 not reachable behind firewall) is not shown

Hi,

I have connected to S3 and enabled raw reflection on dataset and set expiry to 1 week an refresh to 1 week.

When i take my laptop to connect into Office network ( where the S3 is not accessible ), Dremio doesn’t show the source at all.

I was expecting that, i can see & use the dataset ( since raw reflection is already enabled) . I assumed that is the benefit of RAW Reflection where even we cannot connect to the source ( in this case not able to connect to S3 for example - say internet is off or firewall doesn’t allow) the reflection / cache will work and return data.

Is this by design ( i.e. Even if raw reflection is enabled, the connection is still validated ) or is this is a bug or some config i have to set ?.

Hi @Sathish_Senathi.

Is that your only data source on your Dremio UI? In other words, do you have any local data sources that show up?

Thanks,
@balaji.ramaswamy

Hi Balaji,

No I have few of them. The steps to reproduce is

  1. Create a Sample source ( which connects to S3 )

-. Save the dataset becomes purple

  • Enable raw reflection on the dataset under this sample source and make
    sure reflection is done ( the fire symbol appears)

-Query the dataset successfully

  1. Create other sources ( say flat file )

  2. Switch off wifi ( disconnect internet) when u go into sample source it
    shows nothing

What I am wondering is how can continue to use the dataset even when the
data is not reachable ( due to wifi off for example i.e offline)

My usecase is that the data source is not reachable during daytime but I
want data ( cached data via reflection) to be still available. Is this
possible ?

Thanks

Hi Sathish - thanks for posting this. I think it’s an interesting idea.

Currently reflections are not designed as an availability feature. If a source is offline, then queries cannot be executed against the source. The idea of reflections is that they provide the query planner additional options for creating a query plan. You could, for example, create a reflection on a subset of columns/fields in your source, and the query planner would have the option of using a reflection if the columns/fields are part of the reflection, or push the query down if they aren’t.

This might be a feature we consider in the future, but currently Dremio is not designed to support your use case.

Hi Kelly,

I have a similar use case. Was wondering if you have developed such capability in Dremio during past couple years?

Thanks