Hi there,
I’m using hive external tables as my physical source and trying to create virtual datasets out of them. Below is my cluster specs:
- 2 co-ordinators (m5d.2xl)
- 10 executors (m5d.4xl)
When i select try to open the hive table in the sql query window to save it as a virtual dataset (as-is), i do not see any CPU or memory activity in the admin --> node activity screen. CPU usage across all nodes is stuck at 0% to 1% max and memory usage is stuck at 0.73%; there is no spike what so ever. But my virtual dataset creation fails with the following error:
ConnectionPoolTimeoutException: Timeout waiting for connection from pool
I’ve a 2 part question here:
[1] How do i monitor my CPU, memory and disk usage across all nodes (both coordinators and executors). How do i debug the activities across nodes when a certain job is in progress.
[2] For a dataset of of just 400MB, what could be a problem here? Given I was able to create virtual datasets out of even bigger datasets ranging from 1-50GB.