first of all congratulations for a really great piece of infrastructure. The premises are great.
I was wondering if Dremio is meant/suitable for big STREAMING data, for example 1TB a day of streaming logs.
would one (realistically) be able to operate and query the data as its streaming?
- are there streaming connectors?
- would Dremio understand when data is appended to a file on HDFS?
- how about if the data is e.g. in Elasticsearch and grows continuosly and switches index name every day?
I am thinking that if the above is a use case that is supported by Dremio, then it would implement a kind of “splunk” schema last capabilities: reformat the data by using SQL operators , get some results slowly at the beginning and faster later after one reindexes (a.k.a. reflections)
however, to use Dremio in this way should Dremio not offer better streaming results support? i get the idea that Dremio would in many cases simply read the entire original file before returning any results vs giving you “some” first … (or is it a matter of simply putting a “limit”) in a SQL query statement and Dremio would in fact return just a few?
sorry if i may be unclear as i am still investigating this.
thanks in advance.