There is a problem when loading a CSV with Windows end of line (CrLf).
Generate a file with only 1. Here is the code to do so:
#!/usr/bin/env python3
one = b’\x31\x3b’
with open(‘buggy.txt’, ‘wb’) as fw:
for _ range(100000):
–for _ range(50):
----fw.write(one)
–fw.write(b’\x31\x0d\x0a’)
Now if you create a data source on this file. You import it with Windows end of line ‘\r\n’.
And you do:
select * from ‘buggy.txt’ where A = ‘’;
You will find some lines … You should not given that all fields are filled with 1.
Apparently, it is linked to the size of the file and for the lines that begin on an address + 1 divided by 4ko.
Our Dremio version is
4.1.8-202003120636020140-9c2a6b13
The provided code works as soon as you replace the – with space. The editor suppresses space character. So I could not indent my python source code correctly.
Well, how would you like me to share this file with you ?
How did you load this file in dremio ? What did you do and how did you try to reproduce it ?
You have to do the following :
Put it on a linux machine
Map it in dremio as a windows file (end of line CRLF)
Execute the sql order.
We are 3 devs to have executed this test on several setups and on different files (big ones). Each time each of us has reproduced this bug. So, there is a problem.