S3 Table not found after a while

#1

Hello,

   I created a table from jsons on s3, it works fine, but after a while ( probably an hours or two ), I got this error "{"message": "Log searching failed with exception Table 'json' not found."}" when trying to access the table.

 It looks like a timeout of sorts to me. And the closest info I found is this.

 https://community.dremio.com/t/getting-a-timeout-error-when-trying-to-load-results-from-a-s3/760/2

 But the s3 properties field no longer in the current version of dremio.

 Any hint that could be causing this ?

   Thanks for reading,
#2

Hi @marvinwu

The error you pasted is does not seem due to the timeout issue. Have you increased “fs.s3a.connection.maximum”? Also are your executors also on EC2?

Thanks
@balaji.ramaswamy

#3

HI @balaji.ramaswamy, thanks much for the reply, No I haven’t increased connection maximum yet. I am using the mac standalone version.

I have just increased the “fs.s3a.connection.maximum” to 3000, will report back if it fixes the problem.

appreciate the help

#4

Hi @balaji.ramaswamy, I have increased the fs.s3a.connection.maximum to 3k, but i still have the same error, when I try to access the table , it says Error while expanding the view

The setting on the s3 source is the following:

any good suggestions that I could try ?

ps. if I recreate the table from the same source from s3, both the new and the original table works fine.

#5

Hi @marvinwu

It looks like you are unable to expand the VDS, can you please send us the profile?

Share A Query Profile

Thanks
@balaji.ramaswamy

#6

Hi Balaji,

Query Profile attached, appreciate your help, have a good weekend.

5e973e51-a686-4b7c-9708-f736a8a474b4.zip (5.3 KB)

#7

@marvinwu

You are trying to select “data_media_type” from “hope-datalake”.“hope-datalake”.“full-length-movie”.“reddit-post” and it seems that column is not there, at least according to Dremio. If this column was added subsequently please refresh the dataset and try again. If this column does not exist at all then please remove it from the query

To refresh metadata, click on “New Query” and run the below

alter pds “hope-datalake”.“hope-datalake”.“full-length-movie”.“reddit-post” refresh metadata

#8

Hi @balaji.ramaswamy, thanks much for the analysis, the table was created from a partitioned json data from s3, it is static and we are not change the set. But let me try to add the refresh procedure and will report back

#9

@balaji.ramaswamy, after some more digging, I think this issue is somewhat related to dremio s3 driver,

I created two identical tables , the data set is identical, the only difference is one created from s3, the other from local NAS( I copied the s3 data to my mac), and no change to both of the dataset in between.

The table data set from s3 will have the data not found error after an hour or two, while the one from local NAS is okay.

if I uncheck the “Remove dataset definitions if underlying data is unavailable.” option in the s3 source properties. The error will not happen. But the down side is it will not refresh the data ever again.