You can use POST catalog/ to create the S3 source in Dremio via the API. For example (using cURL):
curl --request POST \
--url 'http://localhost:9047/api/v3/catalog/?=' \
--header 'authorization: _dremio{authorization token}' \
--header 'content-type: application/json' \
--data '{
"entityType": "source",
"config": {
"accessKey": "your S3 access key here",
"accessSecret": "your S3 access secret here",
"secure": false,
"allowCreateDrop": true,
"rootPath": "/",
"credentialType": "ACCESS_KEY",
"enableAsync": true,
"compatibilityMode": false,
"isCachingEnabled": true,
"maxCacheSpacePct": 100,
"requesterPays": false,
"enableFileStatusCheck": true
},
"type": "S3",
"name": "testing-S3",
"metadataPolicy": {
"authTTLMs": 86400000,
"namesRefreshMs": 3600000,
"datasetRefreshAfterMs": 3600000,
"datasetExpireAfterMs": 10800000,
"datasetUpdateMode": "PREFETCH_QUERIED",
"deleteUnavailableDatasets": true,
"autoPromoteDatasets": false
},
"accelerationGracePeriodMs": 10800000,
"accelerationRefreshPeriodMs": 3600000,
"accelerationNeverExpire": false,
"accelerationNeverRefresh": false,
"allowCrossSourceSelection": false,
"accessControlList": {},
"permissions": [],
"checkTableAuthorizer": true
}'
Note that the rootPath
here is set to /
so you will see all the buckets in this S3 account that the credentials have access to. Then, as @balaji.ramaswamy noted, assuming that part_0.csv
and part_1.csv
have the same schema, you can promote (format) the datadir
folder to a physical dataset which will contain records from both of the files.