There is a problem when loading a CSV with Windows end of line (CrLf).
Generate a file with only 1. Here is the code to do so:
one = b’\x31\x3b’
with open(‘buggy.txt’, ‘wb’) as fw:
for _ range(100000):
–for _ range(50):
Now if you create a data source on this file. You import it with Windows end of line ‘\r\n’.
And you do:
select * from ‘buggy.txt’ where A = ‘’;
You will find some lines … You should not given that all fields are filled with 1.
Apparently, it is linked to the size of the file and for the lines that begin on an address + 1 divided by 4ko.
Our Dremio version is
The provide code is not working. Can you provide me the buggy.txt directly that you have generated?
The provided code works as soon as you replace the – with space. The editor suppresses space character. So I could not indent my python source code correctly.
Well, how would you like me to share this file with you ?
It’s not the issue with the for loop. It’s with line 2: one = b’\x31\x3b’.
Anyway you can attach the ‘buggy.txt’ using the upload option here itself.
Yes, in fact, when you copy / paste the code over above, there are two problems:
- you have to replace the quote … it is the wrong character
- between the _ and range, in is missing …
You have the file in attachmentbuggy.zip (34.3 KB)
I do not see any issues.
select * from ‘buggy.txt’ where A = ’ ',
returning no rows.
Can you explain this ‘Apparently, it is linked to the size of the file and for the lines that begin on an address + 1 divided by 4ko.’
How did you load this file in dremio ? What did you do and how did you try to reproduce it ?
You have to do the following :
- Put it on a linux machine
- Map it in dremio as a windows file (end of line CRLF)
- Execute the sql order.
We are 3 devs to have executed this test on several setups and on different files (big ones). Each time each of us has reproduced this bug. So, there is a problem.
Thank you to have had a look.
I’m able to reproduce the case.
I will get back with the root cause and solution.
We have a solution. You just load the file as unix/linux file whatever the end of line is … We are just signaling a bug.