Cannot register Iceberg tables in bucket with different region than project

I’ve got a project in us-east-1, and an AWS S3 bucket in eu-north-1 as an external data source. I created a table test_table in eu-north-1 using a CTAS in the GUI as a test (confirming the bucket is reachable by Dremio), and noticed that it’s not automatically registered in the Dremio Open Catalog. So I tried registering it manually using PyIceberg to connect to the Iceberg REST Catalog API, and supplying the metadata location in the bucket. But it looks like the Dremio server is signing the S3 request with the project region, and not the bucket region, leading to a failure:

>>> catalog.register_table(('test','test_table'), "s3://<bucket>/test_table/metadata/v1.metadata.json")
Traceback (most recent call last):
  File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/pyiceberg/catalog/rest/__init__.py", line 598, in register_table
    response.raise_for_status()
  File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/requests/models.py", line 1026, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://catalog.dremio.cloud/api/iceberg/v1/<project>/namespaces/test/register

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 338, in wrapped_f
    return copy(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 477, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 378, in iter
    result = action(retry_state)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 400, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
                                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 480, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/pyiceberg/catalog/rest/__init__.py", line 600, in register_table
    _handle_non_200_response(exc, {409: TableAlreadyExistsError})
  File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/pyiceberg/catalog/rest/response.py", line 111, in _handle_non_200_response
    raise exception(response) from exc
pyiceberg.exceptions.BadRequestError: S3Exception: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'eu-north-1' (Service: S3, Status Code: 400, Request ID: <redacted>, Extended Request ID: <redacted>) (SDK Attempt Count: 1)

Does Dremio not support this pattern of having datasets in different region buckets registered in the same Catalog?

PS. I am running Dremio Cloud NG, not Classic.

Thanks for reporting this and for the detailed description.

We’ve identified the issue and implemented a fix. I can’t say exactly when it will land in Dremio Software, but it should reach Dremio Cloud Next Gen in the next few weeks.

Thanks again for bringing this to our attention.

1 Like

Thanks for the quick response, and nice to hear that this was a bug. This is a very important use case for us, and probably any other users who use Dremio in a stack with multi-tenancy/data residency requirements.