I have a table in greenplum with around 5 million records, in the first time of init data to dremio, seems it occupied a lot of resource to the greenplum server which make the greenplum stuck.
Is there any approach to control/limit the speed/rate of loading data to dremo?
Not directly from Dremio. Dremio simply pushes logic down into a source and waits for it to execute. You could use Resource Queues to manage when a query is run so it doesn’t starve other higher priority queries.
I would recommend building raw reflections on top of your virtual data sets in Dremio that use your Green Plum data (and set an appropriate refresh rate on them), which will allow you to ensure queries are only executed live against Green Plum at an appropriate interval. You can also schedule them (through REST) to run at say off-peaks (as well as using the resource queues feature in Greenplum).