External Schedular tool

Ram2h
Hi Laurent - Can you please provide a few external schedular tools which can be connected to Dremio.

Also it would be helpful if you can give me any community link where this already being discussed.

Thanks,
Ram
Ram4d laurent
Hi Laurent - I need to Query an Elastic search index which is created on a daily basis. Is there any way i can Schedule a job say every 4 hours so that the query runs and pulls the data from index at specified interval.

Reply

laurent4d
is it a new question, unrelated to the current topic? Make sure to open a new topic next time so it’s easier for people to find answers for similar questions. As for now, Dremio has no way to run a query periodically, but that should be something easy to do with an external tool connecting to Dremio.

Reply

created
2h
last reply
30m
2
replies
2
views
2
users

Message
laurentRam

laurent2h
I don’t think there is a community link for this yet. My request was more about opening a new one if you have a new question so that people can discover it more easily.

As for schedulers, I cannot make any recommendation as it really depends on your need, but it can be as simple as a tool like sqlline to run JDBC queries and crontab to run a script periodically.

Reply

Ram30m2
Hi Laurent - I am running a Dremio sql query on an ElasticSearch index which has my application log details (pls see sample below). Basically i am extracting required details for my daily metrics from the logs. I can rename the daily index files (ES-196"."logstash-order-2018.03.04-log) which is created on a daily basis to a common name like ‘test-alias’ as described in one of your forum.

My requirement is to shcedule this query at 1:00 AM early morning. Are you saying Dremio supports Crontab ? If so how to configure it. I understand you cannot comment on the specific external tools, but how to integrate external scheduler tools to Dremio. Is there any documenation available ?

Any help on this is highly appreciated.

SELECT operation, serviceName, COUNT() AS Count_Star
FROM (
SELECT extract_pattern(log_message, '(?<=Label=)(.)(?=,LastValue)’, 0, ‘INDEX’) AS operation,extract_pattern(log_message, ‘\d+’, 0, ‘INDEX’) AS lastvalue, extract_pattern(log_message, ‘(?<=Last_Access=)(.)(?=))’, 0, ‘INDEX’) AS lastaccess, serviceName, cspUserId, serviceVersion
FROM “ES-196”.“logstash-order-2018.03.04-log”.syslog AS syslog
WHERE regexp_like(log_message, '.?\QLabel=\E.*?’) ) nested_0
GROUP BY operation , serviceName

Reply

laurent1m
Sorry, but it’s not very feasible for me to reply to private messages. Could you please open a new topic at https://community.dremio.com?

Laurent

There are many tools out there to run period jobs, one of the most well-known being cron on Unix systems (https://en.wikipedia.org/wiki/Cron). Then, using a CLI JDBC client like sqlline (https://github.com/julianhyde/sqlline) and maybe some scripting, it would be a simple way to run queries periodically, and generate an export every 4hours or so.

If community have some recommendation about similar tools, please share in this topic!

I understand on the cron and the other shceduling tools. But how do i integrate them with Dremio?

I need to run the provided Elastic search dremio query periodically.

I’m not sure if I understand your use case, but let me try.

I think you would write a small program to run your query. This program could be written in Java, Python, etc. The program would connect to Dremio over ODBC or JDBC to issue a query, and to do something with the results that are returned, such as write them out to your NAS as a flat file, or as you seem familiar with PyArrow, maybe you would write a Python program that would save the results out as Parquet.

Your cron job would then execute your Python program on the schedule you specify.

In the future you’ll be able to issue the query via REST, but today you can only issue SQL queries via ODBC/JDBC. Dremio’s REST interface as of 1.4 is only available for creating virtual datasets and creating data sources, along with a few other operations.

Does that help?

Thanks I got that. I am planning to connect Tableau via ODBC with Dremio so that i can make use of Tableau scheduler. I will let you know in case of any issues.