High setup time for PARQUET_ROW_GROUP_SCAN

Ran into this issue today so wanted to share for others.

Running Dremio locally on macOS and accelerated queries were consistently taking > 15 seconds when I knew they should be under a second. Looking at the query profile I could see 15 seconds for setup time for PARQUET_ROW_GROUP_SCAN. Nothing in the logs to help figure out what was going on.

Looking at a series of jstacks captured during the query execution you could see things like the following:

e3 - 23adb94e-a5c2-24d7-51b1-e24f75089d00:frag:2:1" #216 prio=5 os_prio=31 tid=0x00007fe379956800 nid=0xbe03 runnable [0x00007000079ad000]
   java.lang.Thread.State: RUNNABLE
	at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
	at java.net.InetAddress.getLocalHost(InetAddress.java:1500)
	- locked <0x000000074003e860> (a java.lang.Object)

Where java is locked on getLocalHost. @steven pointed me at this blog:

As the post explains, /etc/hosts needs to be amended to add the output of the hostname command. So, for me this meant changing the original first entry in /etc/hosts:

127.0.0.1 localhost

To append the output of hostname:

127.0.0.1 localhost Henrys-MacBook-Pro-2.local

I did the same for the ::1 entry.

This solved the issue for me.

I believe this issue is limited to macOS when storing data reflections locally, in case anyone else out there runs into something similar.

2 Likes