Throughput is the number of requests or the amount of data that an inference service can process per unit time, usually measured in "requests per second" or "samples per second". It is a key indicator for evaluating the concurrent processing capability of inference services.



