i was using the dremio in which we have one job for each table, which loading as csv, the data which get from oracle table (data source) to Google Cloud Storage bucket.
than we have config the bucket in dremio so by that we are able to access PDS
in that i have found out that the count of the table vs csv file vs PDS, counts are not matching.
help me here and correct me if i’m doing the wrong thing.
Hi @mevadadhruv ,
Please answer the following:
- What queries do you run to determine the counts for all 3 types of sources?
- What are the exact counts for each source?
- What type of CSV delimiters do you use?
- If you run each query twice in Dremio, do you get the same count for both runs?
hi @bogdan.coman ,
1.select (*) from tablename; (for all 3 types of sources),
2. 45300 in my db source while creating csv from that, csv also get same count but later on, on dremio i got 45296.
3. i’m actually using talend studio to create csv, let me share something which is helpful,
4. yes same
@mevadadhruv Are you able to send the profiles that gave wrong count?
- 45300 in my db source while creating csv from that, csv also get same count but later on, on dremio i got 45296.
What do you mean by “csv also get same count but later on”? You run the query on the CSV multiple times, get different results and eventually get 45300 back?
Do you have reflections enabled?
How did you place the data in the PDS? Did you use COPY INTO? If yes, can we also get the profile of that query? Just want to double check whether or not 4 rows were rejected when you copied the data over.
Also, I wonder if you have just over 30,000 records in the csv instead, then the amount of records missing will be 3 instead of 4?