Error creating Reflection

Hi,

I’m getting an error when attempting to create a Reflection on a dataset. What exactly does this error mean and how do I fix it?

SYSTEM ERROR: TimeoutException: Timeout waiting for task.

SqlOperatorImpl PARQUET_WRITER
Location 1:17:1
Fragment 1:17

[Error Id: 6d7b8b30-6d20-48f2-9904-69256855646b on ip-X-X-X-X.us-west-2.compute.internal:-1]

  (java.io.IOException) Timeout occured during I/O request for sabot://ip-X-X-X-X.us-west-2.compute.internal:45678
    com.dremio.exec.store.dfs.RemoteNodeFileSystem.getFileStatus():547
    com.dremio.exec.store.dfs.PseudoDistributedFileSystem$GetFileStatusTask$1.call():1045
    com.dremio.exec.store.dfs.PseudoDistributedFileSystem$GetFileStatusTask$1.call():1041
    java.util.concurrent.FutureTask.run():266
    java.util.concurrent.ThreadPoolExecutor.runWorker():1149
    java.util.concurrent.ThreadPoolExecutor$Worker.run():624
    java.lang.Thread.run():748
  Caused By (java.util.concurrent.TimeoutException) Timeout waiting for task.
    com.google.common.util.concurrent.AbstractFuture$Sync.get():269
    com.google.common.util.concurrent.AbstractFuture.get():96
    com.google.common.util.concurrent.ForwardingFuture.get():69
    com.google.common.util.concurrent.AbstractCheckedFuture.checkedGet():107
    com.dremio.exec.store.dfs.RemoteNodeFileSystem.getFileStatus():544
    com.dremio.exec.store.dfs.PseudoDistributedFileSystem$GetFileStatusTask$1.call():1045
    com.dremio.exec.store.dfs.PseudoDistributedFileSystem$GetFileStatusTask$1.call():1041
    java.util.concurrent.FutureTask.run():266
    java.util.concurrent.ThreadPoolExecutor.runWorker():1149
    java.util.concurrent.ThreadPoolExecutor$Worker.run():624
    java.lang.Thread.run():748

Anyone? A hint to point me in the right direction?

Apologies, we are heads down getting 1.5 ready for everyone.

What is the storage subsystem you are using here? EBS? S3?

Thanks for responding @kelly. It’s not clear to me what nodes are responsible for what when it comes to reflections, but here’s the topology:

1 master node
3 coordinator nodes
3 executor nodes
All are running in Docker containers on ECS, using host networking. Each has a local EBS volume and a shared EFS attached.

master, coordinator, and executor nodes all have the same paths in dremio.conf:

paths: {
  local: "/host", # /host is a Docker volume that points to the local EBS volume
  dist: "pdfs:///share", # /share is a Docker volume that points to the shared EFS drive
  db: "pdfs:///share/db",
  accelerator: "pdfs:///share/accelerator",
  downloads: "pdfs:///share/downloads",
  uploads: "pdfs:///share/downloads",
  results: "pdfs:///share/results",
  scratch: "pdfs:///share/scratch"
}

It wasn’t clear to me based on the docs which nodes should utilize distributed storage and which shouldn’t. This https://docs.dremio.com/deployment/distributed-storage.html only talks about configuring the different distributed storage types. I’m also not clear on how service discovery works in Dremio, and how nodes are discovering each other.

Thanks in advance for any help!

You may want to try S3 for storing your reflections, see details here: https://docs.dremio.com/deployment/distributed-storage.html

While EFS should work, that’s not something we have tested much. What was the rationale for using EFS instead of S3?

Also, if you tell us more about your deployment we may be able to weigh in on the number of coordinators/executors you have configured. There may be a better use of resources, or maybe you’ve got it all figured out. :slight_smile:

My assumption (untested, unverified, just a hunch) was that latency would be a problem if storing reflections on S3, vs a NAS. I can certainly try S3 though.

I have absolutely nothing figured out! LOL.

Our goal is very simple: analytic queries must respond in under a second. We have a moving window of a year’s worth of data that we need to query in an ad hoc fashion. We have a table of about 550M posts, and about 50M profiles that will need to be joined. There may be join tables to denormalize things like hash tags associated with a post, but I haven’t gotten that far yet with Dremio.

So whatever hardware, topology, or configurations we need to make to meet that sub second query goal happen, we’re all ears! :slightly_smiling_face:

Thanks @kelly

Fun!

Let’s see what we can figure out here. Some of my colleagues may chime in.

It would help to know a few things:

  1. what are the data sources?
  2. can you share some sample queries, or representative queries?
  3. what is the tool you will use to generate the queries/perform visualization?
  4. how many concurrent queries do you need to serve with the sub-second SLA?
  5. are you able to query the raw data currently?

Thanks!

  1. JSON files in an S3 bucket, 1 file per day of year, gzipped.
  2. Nothing exotic, just simple counts and group bys with at least one join involved.
  3. Queries will be generated via a JDBC connection to Dremio. Results are visualized in our product.
  4. At peak it would be about 10 concurrent queries, but it will be 2-3 concurrent queries at a time most of the time though.
  5. Yes, I can query the raw data in S3 just fine in Dremio.

Thanks!

Thanks, that’s helpful. Sounds promising.

Sounds like the data can be partitioned by time? In what increment does your yearly window move? Daily, weekly, monthly?

1.5 will have some important features that I think may be useful for your use case, and that’s coming pretty soon. Meanwhile, I think that if you try using S3 to store your reflections instead of EFS, that’s worth trying. Also, I think you’ll want to create one or more aggregation reflections to support your group by queries. You may not need raw reflections at all if all queries are group by, but we can explore that as you get going.

The moving window is monthly. So at the beginning of every month, we move the moving window of a year’s worth of data forward by 1 month.

Sounds good. I will try moving the reflections to S3 and see how that goes, and keep an eye out for 1.5.

As for our dremio.conf files above, do those all look kosher, the usage of EFS not withstanding?

Thanks again @kelly!

In case it’s useful, official Docker image now available: Official Docker Image for Dremio

More on tools for Docker and Kubernetes here.