I’ve got a project in us-east-1, and an AWS S3 bucket in eu-north-1 as an external data source. I created a table test_table in eu-north-1 using a CTAS in the GUI as a test (confirming the bucket is reachable by Dremio), and noticed that it’s not automatically registered in the Dremio Open Catalog. So I tried registering it manually using PyIceberg to connect to the Iceberg REST Catalog API, and supplying the metadata location in the bucket. But it looks like the Dremio server is signing the S3 request with the project region, and not the bucket region, leading to a failure:
>>> catalog.register_table(('test','test_table'), "s3://<bucket>/test_table/metadata/v1.metadata.json")
Traceback (most recent call last):
File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/pyiceberg/catalog/rest/__init__.py", line 598, in register_table
response.raise_for_status()
File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/requests/models.py", line 1026, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://catalog.dremio.cloud/api/iceberg/v1/<project>/namespaces/test/register
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 338, in wrapped_f
return copy(f, *args, **kw)
^^^^^^^^^^^^^^^^^^^^
File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 477, in __call__
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 378, in iter
result = action(retry_state)
^^^^^^^^^^^^^^^^^^^
File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 400, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 480, in __call__
result = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/pyiceberg/catalog/rest/__init__.py", line 600, in register_table
_handle_non_200_response(exc, {409: TableAlreadyExistsError})
File "/home/<user>/projects/<workspace>/.venv/lib/python3.12/site-packages/pyiceberg/catalog/rest/response.py", line 111, in _handle_non_200_response
raise exception(response) from exc
pyiceberg.exceptions.BadRequestError: S3Exception: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'eu-north-1' (Service: S3, Status Code: 400, Request ID: <redacted>, Extended Request ID: <redacted>) (SDK Attempt Count: 1)
Does Dremio not support this pattern of having datasets in different region buckets registered in the same Catalog?
PS. I am running Dremio Cloud NG, not Classic.