Bad data in TPC-DS catalog_sales.cs_sold_date_sk

I’m trying to run some TPC-DS queries and the date surrogate key in catalog_sales.cs_sold_date_sk contains the wrong data:

select ‘catalog_sales.cs_sold_date_sk-min’, min(cs_sold_date_sk) from catalog_sales
select ‘catalog_sales.cs_sold_date_sk-max’, max(cs_sold_date_sk) from catalog_sales
select ‘date_dim.d_date_sk-min’, min(d_date_sk) from date_dim
select ‘date_dim.d_date_sk-max’, max(d_date_sk) from date_dim
order by 1


The expected values should be:

Any suggestions? Thanks

I assume you are querying the data in the “S3 Sample Source”, is that right? Yes, that data is not correct according to tpc-ds spec, and should not be used for benchmarking purposes.

1 Like

Hey Steven! Yes, that’s right the “S3 Sample Source”. Thanks for responding.