JSON Sub Scan extremely slow

I’m running Dremio on a regular Digital Ocean bucket and the “SF Incidents” query from the tutorial is taking 4-5+ minutes. This was taking just a few seconds running locally on my Mac so I think this is probably something with my setup potentially? Here’s the profile:

|SqlOperatorImpl ID|Type|Min Setup Time|Avg Setup Time|Max Setup Time|Min Process Time|Avg Process Time|Max Process Time|Min Wait Time|Avg Wait Time|Max Wait Time|Avg Peak Memory|Max Peak Memory|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|00-xx-00|SCREEN|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.002s|0.002s|0.002s|194KB|194KB|
|00-xx-01|PROJECT|0.001s|0.001s|0.001s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|194KB|194KB|
|00-xx-02|WRITER_COMMITTER|0.020s|0.020s|0.020s|0.032s|0.032s|0.032s|0.000s|0.000s|0.000s|195KB|195KB|
|00-xx-03|ARROW_WRITER|0.008s|0.008s|0.008s|0.001s|0.001s|0.001s|0.001s|0.001s|0.001s|194KB|194KB|
|00-xx-04|PROJECT|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|-|-|
|00-xx-05|PROJECT|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|-|-|
|00-xx-06|LIMIT|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|-|-|
|00-xx-07|TOP_N_SORT|0.018s|0.018s|0.018s|0.001s|0.001s|0.001s|0.000s|0.000s|0.000s|210KB|210KB|
|00-xx-08|HASH_AGGREGATE|0.001s|0.001s|0.001s|0.001s|0.001s|0.001s|0.000s|0.000s|0.000s|1MB|1MB|
|00-xx-09|HASH_AGGREGATE|0.001s|0.001s|0.001s|0.082s|0.082s|0.082s|0.000s|0.000s|0.000s|1MB|1MB|
|00-xx-10|PROJECT|0.000s|0.000s|0.000s|0.005s|0.005s|0.005s|0.000s|0.000s|0.000s|80KB|80KB|
|00-xx-11|SELECTION_VECTOR_REMOVER|0.000s|0.000s|0.000s|0.058s|0.058s|0.058s|0.000s|0.000s|0.000s|161KB|161KB|
|00-xx-12|FILTER|0.003s|0.003s|0.003s|0.031s|0.031s|0.031s|0.000s|0.000s|0.000s|138KB|138KB|
|00-xx-13|JSON_SUB_SCAN|0.110s|0.110s|0.110s|0.825s|0.825s|0.825s|4m47s|4m47s|4m47s|161KB|161KB|

You can see it’s the “JSON_SUB_SCAN” process, specifically the “4m47s” wait time (average, min, and max)

Does that tell me anything about my setup that I should change to improve this? I assume my mac is a little more powerful than a standard DO droplet but not by much, and the internet connection on my droplet is confirmed to be very fast (800MBPS down / 200 up)

EDIT: just re-sized my instance to have 4GB of RAM and it’s maybe a little faster for some queries but not others. Do I just need to throw RAM at the problem?

What are the exact specs (CPU/mem) of your Digital Ocean box where you are doing your testing? I’m guessing the JSON_SUB_SCAN may be taking awhile due to inablilty to multi-thread.

FYI - you mentioned 4gb of ram, that is significantly below our recommended requirements - https://docs.dremio.com/deployment/system-requirements.html

Totally makes sense - I’m mostly trying to figure out the “minimum setup where this will work for some light testing”. My macbook seems to work fine, for POC purposes it’d be great to get a served version at parity with my local version basically.

Just upgrade to these specs:

8 GB
4 vCPUs
25 GB

And still getting really laggy, long queries for the same reason. > 4 minutes for simple stuff that takes 4-5 seconds locally :confused:

It is a shared machine though, do you think it just won’t work with that setup? Maybe need to go to dedicated CPU droplet or something? I suppose it could be the specs, but upgrading pretty significantly twice now has made ~ no difference so I’m wondering if something else is going on, like whether it’s possible multi-threading is disabled on my setup somehow.

If it’s helpful, here’s my cpuinfo at the moment (not sure this is helpful though)

cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 79
model name	: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
stepping	: 1
microcode	: 0x1
cpu MHz		: 2199.998
cache size	: 30720 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
bogomips	: 4399.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 79
model name	: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
stepping	: 1
microcode	: 0x1
cpu MHz		: 2199.998
cache size	: 30720 KB
physical id	: 1
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
bogomips	: 4399.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 79
model name	: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
stepping	: 1
microcode	: 0x1
cpu MHz		: 2199.998
cache size	: 30720 KB
physical id	: 2
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
bogomips	: 4399.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 79
model name	: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
stepping	: 1
microcode	: 0x1
cpu MHz		: 2199.998
cache size	: 30720 KB
physical id	: 3
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
bogomips	: 4399.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

Just upgraded again to the dedicated CPU threads with these specs:

8 GB
4 vCPUs
50 GB

And still the same situation, that should be a better setup than my laptop so I feel like something is actually wrong with my config or the program itself possibly.

Ok had a minor breakthrough at least - I had previously been running the newest TAR of Dremio (2.0.5) using these instructions, which were written for v1.4.4 but worked fine:

I had a hunch, so I went back and destroyed my Droplet, started a new one with 8GB of RAM, 4vCPU (same specs I had been using unsuccessfully), and 50GB of disk but installed it with v.1.4.4 and things are much better. The query from the SF Incidents Tutorial now takes about 30 - 40 seconds, still slow but similar to what is happening on my laptop (which is about 10-20 seconds).

Is it possible that:

A. the setup instructions I’m using (the GIST) are leading me to do something in the config that is breaking things

OR

B. something on v.2.0.5 is materially different and less performant on Ubuntu 1.6. machines or some other aspect of my dev setup.

?

Ran another droplet, same specs, on CentOS with 2.0.5 installed via the RPM, same bad performance as my previous installations of 2.0.5. Upgrading the CentOS installation to

32 GB
16 vCPUs
160 GB
CPU Optimized Droplet

Brings 2.0.5 on CentOS to a good state, queries taking 2-10 seconds.

I guess this answers my question? Is that just sort of the “minimum viable droplet” essentially?

I can’t replicate getting this setup to run quickly, I’ve re-built it a few times to be sure and it becomes slow every time. I thought I had the solution but clearly something else is going on.

32 GB
16 vCPUs
160 GB
CPU Optimized Droplet

Still extremely slow. As I mentioned above, I got this setup to work well one time, but upon re-building that same setup a few more times, none of the subsequent installations work as expected.

The only other info I have is it’s only JSON that is having this issue, which makes sense sense it’s the JSON_SUB_SCAN that is the blocker

Man I’ve really tried everything I can think of at this point - I’ve installed every version of Dremio going back to 1.4.4, tried every size Droplet possible, CentOS vs. Ubuntu, etc. Is there some configuration I’m missing? This is my build process which I simplified, maybe I’m missing something? In every single configuration and setup I have the JSON queries take 30 seconds - 10 minutes, except for two times when, for some reason, the server could process them quickly. I thought I had a deterministic reason why that was happening but I couldn’t repeat it.
sudo yum install java-1.8.0-openjdk

sudo yum install wget

wget https://download.dremio.com/community-server/2.0.5-201806021755080191-767cfb5/dremio-community-2.0.5-201806021755080191_767cfb5_1.noarch.rpm

sudo yum localinstall dremio-community-2.0.5-201806021755080191_767cfb5_1.noarch.rpm

sudo service dremio start

(of course insert a different version of Dremio where applicable if you’re trying for an older version)

It would help to understand what exactly you are trying to run or test. A query profile would help

I know you mentioned you are running a “query from the SF Incidents Tutorial”. I assume you are following our tutorial here? If yes, that dataset is actually kind of small (50mb, 150491 records), small to the point that your original setup (I recall 4vcpu/8gb) should run in less than 10seconds. Also small to the point that multi-threading isn’t necessary.

A big random guess here - you are running your Dremio in DigitalOcean and you are trying to connect to a AWS S3 dataset. Those 2 cloud providers are direct competitors. Maybe some type of network throttling?

Here is the direct download link to the SF JSON. It may be worth trying to upload this into DigitalOcean via our UI and testing that directly rather than going through AWS S3.

I agree it’s weird! I’ll post a profile below. It’s only JSON and always because of the “JSON_SUB_SCAN” operation taking up all the time. I’m going to try putting that data source in my own S3 bucket as well as in Digital Ocean directly in the UI and see if it makes a difference. I’ll also share the full profile not just the text.

|SqlOperatorImpl ID|Type|Min Setup Time|Avg Setup Time|Max Setup Time|Min Process Time|Avg Process Time|Max Process Time|Min Wait Time|Avg Wait Time|Max Wait Time|Avg Peak Memory|Max Peak Memory|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|00-xx-00|SCREEN|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.002s|0.002s|0.002s|194KB|194KB|
|00-xx-01|PROJECT|0.001s|0.001s|0.001s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|194KB|194KB|
|00-xx-02|WRITER_COMMITTER|0.020s|0.020s|0.020s|0.032s|0.032s|0.032s|0.000s|0.000s|0.000s|195KB|195KB|
|00-xx-03|ARROW_WRITER|0.008s|0.008s|0.008s|0.001s|0.001s|0.001s|0.001s|0.001s|0.001s|194KB|194KB|
|00-xx-04|PROJECT|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|-|-|
|00-xx-05|PROJECT|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|-|-|
|00-xx-06|LIMIT|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|0.000s|-|-|
|00-xx-07|TOP_N_SORT|0.018s|0.018s|0.018s|0.001s|0.001s|0.001s|0.000s|0.000s|0.000s|210KB|210KB|
|00-xx-08|HASH_AGGREGATE|0.001s|0.001s|0.001s|0.001s|0.001s|0.001s|0.000s|0.000s|0.000s|1MB|1MB|
|00-xx-09|HASH_AGGREGATE|0.001s|0.001s|0.001s|0.082s|0.082s|0.082s|0.000s|0.000s|0.000s|1MB|1MB|
|00-xx-10|PROJECT|0.000s|0.000s|0.000s|0.005s|0.005s|0.005s|0.000s|0.000s|0.000s|80KB|80KB|
|00-xx-11|SELECTION_VECTOR_REMOVER|0.000s|0.000s|0.000s|0.058s|0.058s|0.058s|0.000s|0.000s|0.000s|161KB|161KB|
|00-xx-12|FILTER|0.003s|0.003s|0.003s|0.031s|0.031s|0.031s|0.000s|0.000s|0.000s|138KB|138KB|
|00-xx-13|JSON_SUB_SCAN|0.110s|0.110s|0.110s|0.825s|0.825s|0.825s|4m47s|4m47s|4m47s|161KB|161KB|

@anthony here is the profile the query from the Tutorial I just ran.

Of course, when I’m actually going to post the profile it runs the fastest it has run in 10 deployments at 1 minute 6 seconds, but that’s still 10x longer than it takes on my Mac so I assume this shares the problem this thread is about.

be3f4748-7ba9-44f3-afda-3eac79bd0380.zip (6.5 KB)

I will subsequently be testing and adding profiles for:

  1. this same data uploaded directly to Digital Ocean
  2. this same query with the same data but from a separate bucket that I control (in case they’re rate limiting YOUR bucket?)
  3. this same query run from an AWS instance if I can spin that up easily enough

Wow ok I think I can stop the search. Running this same data from my own S3 bucket solved this problem 100%. Wow! What a wild goose chase!

I think something is up with running the sample data in your public S3 bucket at Dremio, S3 or Digital Ocean (must be them?) or someone may be rate limiting it. That’s crazy. One of the weirdest bugs I’ve encountered, I should have tried my own dataset way earlier in this process.

Glad it works now :smiley: AWS is huge with many data centers around the world. Maybe just a networking topology glitch in the matrix…

Yeah seriously! Thank you so much, super helpful especially on a weekend I’m so grateful for the assistance.