Hi, I have about a dozen mysql-compatible Aurora sources which I’d like to use with Dremio. These average ~1-3 TB each, and several hundred tables (of which 3-5 tables are > 100GB, and ~50 are > 500MB).
Problem is, adding each source takes far too long. As I write this, the load of one source has been running for 2 hours with no apparent sign of progress (on a Dremio instance on my laptop); due to this I haven’t been able to successfully load any of them into Dremio.
Hi Evan, are you referring to the Add Source dialog not completing or are you building a reflection?
Sorry for the question - you use the phrases “add source” and “load” and I just want to be clear about what is consuming time.
Have you successfully connected to a small Aurora instance with just a few databases/tables? The reason I ask is that Aurora sources are not part of our regular testing and so there may be a basic issue in play here. We can take a closer look at this part of your scenario.
When setting up the initial connection it should be very quick, certainly not hours, so there’s definitely something going on.
Thanks Kelly. This was from adding a source - the Add Source dialog was just spinning after clicking ‘save’. I’ll try spinning up a smaller Aurora instance to connect to and see how that goes.
Adding an Aurora mysql source with a single ~5MB, 50k row table (and ~100 small ‘information_schema’-like tables) completed successfully in just a few seconds, so the issue I was experiencing does appear to be related to the table sizes.
Aurora is a managed mysql-compatible db instance. Storage is an SSD-based virtualized storage layer, but clients only have access through standard DB connection methods.
I once again attempted to add the smallest Aurora source DB today(~700GB). I started it a minute or two before noon, and received the error ‘Unable to connect to source aurora_test’ at a little after 4pm. Logs showed no errors/activity between noon and when the exception was generated.
2018-01-11 11:45:08,849 [main] INFO com.dremio.dac.server.WebServer - Started on http://localhost:9047
2018-01-11 15:45:00,373 [scheduler-17] INFO c.d.s.a.AccelerationServiceImpl - removing acceleration 3bf085e3-21ed-4b34-8568-5a260cbd882e because dataset is removed
2018-01-11 16:11:28,789 [qtp685033452-193] INFO c.d.e.s.StoragePluginRegistryImpl - User Error Occurred [ErrorId: 0ad6961b-d824-4651-a794-1f4027e951dd]
com.dremio.common.exceptions.UserException: Unable to connect to source aurora_test
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:648) ~[dremio-common-1.3.1-201712020438070881-a7af5c8.jar:1.3.1-201712020438070881-a7af5c8]
at com.dremio.exec.store.StoragePluginRegistryImpl.create(StoragePluginRegistryImpl.java:439) [dremio-sabot-kernel-1.3.1-201712020438070881-a7af5c8.jar:1.3.1-201712020438070881-a7af5c8]
at com.dremio.exec.store.StoragePluginRegistryImpl.createOrUpdate(StoragePluginRegistryImpl.java:288) [dremio-sabot-kernel-1.3.1-201712020438070881-a7af5c8.jar:1.3.1-201712020438070881-a7af5c8]
at com.dremio.dac.service.source.SourceService.registerSourceWithRuntime(SourceService.java:124) [dremio-dac-backend-1.3.1-201712020438070881-a7af5c8.jar:1.3.1-201712020438070881-a7af5c8]
at com.dremio.dac.service.source.SourceService.registerSourceWithRuntime(SourceService.java:116) [dremio-dac-backend-1.3.1-201712020438070881-a7af5c8.jar:1.3.1-201712020438070881-a7af5c8]
at com.dremio.dac.resource.PutSourceResource.putSource(PutSourceResource.java:81) [dremio-dac-backend-1.3.1-201712020438070881-a7af5c8.jar:1.3.1-201712020438070881-a7af5c8]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_151]
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[na:1.8.0_151]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[na:1.8.0_151]
at java.lang.reflect.Method.invoke(Unknown Source) ~[na:1.8.0_151]
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) [jersey-common-2.25.1.jar:na]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) [jersey-common-2.25.1.jar:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) [jersey-common-2.25.1.jar:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:297) [jersey-common-2.25.1.jar:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:267) [jersey-common-2.25.1.jar:na]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) [jersey-common-2.25.1.jar:na]
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) [jersey-container-servlet-core-2.25.1.jar:na]
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) [jersey-container-servlet-core-2.25.1.jar:na]
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) [jersey-container-servlet-core-2.25.1.jar:na]
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) [jersey-container-servlet-core-2.25.1.jar:na]
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) [jersey-container-servlet-core-2.25.1.jar:na]
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812) [jetty-servlet-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) [jetty-servlet-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) [jetty-servlets-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:301) [jetty-servlets-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [jetty-servlet-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [jetty-servlet-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) [jetty-servlet-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:95) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.Server.handle(Server.java:499) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544) [jetty-io-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) [jetty-util-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) [jetty-util-9.2.22.v20170606.jar:9.2.22.v20170606]
at java.lang.Thread.run(Unknown Source) [na:1.8.0_151]
Sure. I could also share additional detail on table structure if we move to a private conversation.
First data source, 1.5TB, 403 tables in one schema. Following are tables with data_size + index_size > 5gb.
table_rows is from information_schema and is not exact.
There may be some temp tables in the total table count, but those are almost always relatively small.
table_rows
avg_row_length
column_count
table_size_mb
3,360,799,337
73
3
456,243
2,479,584,893
73
6
425,494
765,086,366
68
4
90,840
268,528,126
84
10
70,885
311,070,252
71
13
42,801
63,899,952
637
9
41,764
452,272,434
41
4
34,242
212,252,574
35
5
31,895
604,812
49528
23
28,641
609,264,348
30
1
17,865
82,520,682
58
6
16,358
134,022,384
74
5
15,486
131,954,224
74
5
15,243
124,704,228
74
5
14,535
123,965,146
74
5
14,304
121,164,110
74
5
14,081
121,315,372
73
5
13,945
121,297,219
73
5
13,929
118,381,956
74
5
13,778
117,841,140
73
5
13,628
117,121,851
73
5
13,515
70,117,289
50
4
11,966
59,157,707
63
6
10,318
40,686,328
263
3
10,236
109,471
95188
7
9,941
12,065,438
730
5
8,875
64,832,290
55
5
6,634
518,305
11634
3
5,751
2nd data source, 112GB, 156 tables in 9 schema. Following are tables > 1GB data + index.
No temp tables here. This is more like a data warehouse, storing summaries and aggregations of data from sources like the first one above.
do you see this warning cropping up in the log file:
“Took longer than {} seconds to query row count for {}, Using default value
of {}”?
Do you see this line in the log file as well:
“Table, {}, has {} rows.”
Where the {} are placeholders. When getting metadata, Dremio will run a
SELECT COUNT(*) on tables it detects to get row count estimates. I’m
wondering if the process of getting the row count on each table is causing
this to be slow.
I attempted this on two separate Windows machines, one running win 8 and one win 10 with a fresh Dremio install, both 64 bit Windows, with Java 1.8.0-151+. In neither case did I see any files in the Dremio log directory generate any lines similar to what you mention (checked via grep -ri).
On the Windows 10 machine, when attempting to add the source, here is what I saw in server.log:
2018-01-23 00:32:04,093 [qtp762677300-169] DEBUG c.d.dac.service.source.SourceService - Connection Object:
{
"type" : "jdbc",
"driver" : "org.mariadb.jdbc.Driver",
"url" : "jdbc:mariadb://[dbhost]:3306",
"username" : "[user]",
"password" : "[pass]",
"fetchSize" : 0
}
2018-01-23 00:32:05,427 [qtp762677300-168] DEBUG c.d.dac.server.socket.SocketServlet - socket message received.
2018-01-23 00:32:20,429 [qtp762677300-166] DEBUG c.d.dac.server.socket.SocketServlet - socket message received.
... (8 hours of entries like this)
2018-01-23 08:23:51,063 [qtp762677300-168] DEBUG c.d.dac.server.socket.SocketServlet - socket message received.
2018-01-23 08:24:06,064 [qtp762677300-170] DEBUG c.d.dac.server.socket.SocketServlet - socket message received.
2018-01-23 08:24:19,581 [qtp762677300-169] INFO c.d.e.s.StoragePluginRegistryImpl - User Error Occurred [ErrorId: 001d1a40-efde-457a-bc18-842087854fd4]
com.dremio.common.exceptions.UserException: Unable to connect to source aurora
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:648) ~[dremio-common-1.3.1-201712020438070881-a7af5c8.jar:1.3.1-201712020438070881-a7af5c8]
at com.dremio.exec.store.StoragePluginRegistryImpl.create(StoragePluginRegistryImpl.java:439) [dremio-sabot-kernel-1.3.1-201712020438070881-a7af5c8.jar:1.3.1-201712020438070881-a7af5c8]
at com.dremio.exec.store.StoragePluginRegistryImpl.createOrUpdate(StoragePluginRegistryImpl.java:288) [dremio-sabot-kernel-1.3.1-201712020438070881-a7af5c8.jar:1.3.1-201712020438070881-a7af5c8]
at com.dremio.dac.service.source.SourceService.registerSourceWithRuntime(SourceService.java:124) [dremio-dac-backend-1.3.1-201712020438070881-a7af5c8.jar:1.3.1-201712020438070881-a7af5c8]
at com.dremio.dac.service.source.SourceService.registerSourceWithRuntime(SourceService.java:116) [dremio-dac-backend-1.3.1-201712020438070881-a7af5c8.jar:1.3.1-201712020438070881-a7af5c8]
at com.dremio.dac.resource.PutSourceResource.putSource(PutSourceResource.java:81) [dremio-dac-backend-1.3.1-201712020438070881-a7af5c8.jar:1.3.1-201712020438070881-a7af5c8]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_161]
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[na:1.8.0_161]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[na:1.8.0_161]
at java.lang.reflect.Method.invoke(Unknown Source) ~[na:1.8.0_161]
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) [jersey-common-2.25.1.jar:na]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) [jersey-common-2.25.1.jar:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) [jersey-common-2.25.1.jar:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:297) [jersey-common-2.25.1.jar:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:267) [jersey-common-2.25.1.jar:na]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) [jersey-common-2.25.1.jar:na]
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) [jersey-server-2.25.1.jar:na]
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) [jersey-container-servlet-core-2.25.1.jar:na]
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) [jersey-container-servlet-core-2.25.1.jar:na]
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) [jersey-container-servlet-core-2.25.1.jar:na]
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) [jersey-container-servlet-core-2.25.1.jar:na]
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) [jersey-container-servlet-core-2.25.1.jar:na]
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812) [jetty-servlet-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) [jetty-servlet-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) [jetty-servlets-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:301) [jetty-servlets-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [jetty-servlet-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [jetty-servlet-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) [jetty-servlet-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:95) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.Server.handle(Server.java:499) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258) [jetty-server-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544) [jetty-io-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) [jetty-util-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) [jetty-util-9.2.22.v20170606.jar:9.2.22.v20170606]
at java.lang.Thread.run(Unknown Source) [na:1.8.0_161]
I noticed this time you aren’t able to connect to Aurora at all. Are you able to connect to your Aurora instance with other JDBC tools, such as DBVisualizer?
Hi James,
Thanks for the suggestion. DBVisualizer 64-bit with non-bundled Java (1.8.0_151) is able to connect to and query the Aurora source without any issues.
Driver used in that app for the connection:
MySQL Connector Java
mysql-connector-java-5.1.42 ( Revision: 1f61b0b0270d9844b006572ba4e77f19c0f230d4 )
Upgrading to 1.4.4 did not fix this. I am going to try an install on Linux.
Update: Ubuntu 16 seems to exhibit identical behavior on 1.4.4 - logfile output is identical. Connection object details, then nothing but ‘socket message received’ every 15 seconds.