Batching
Pipelines automatically batches requests that are received via HTTP or from a Worker. Batching helps reduce the number of output files written to your destination, which can make them more efficient to query.
There are three ways to define how requests are batched:
batch-max-mb: The maximum amount of data that will be batched, in megabytes. Default is 10 MB, maximum is 100 MB.batch-max-rows: The maximum number of rows or events in a batch before data is written. Default, and maximum, is 10,000 rows.batch-max-seconds: The maximum duration of a batch before data is written, in seconds. Default is 15 seconds, maximum is 600 seconds.
All three batch definitions work together. Whichever limit is reached first triggers the delivery of a batch.
For example, a batch-max-mb = 100 MB and a batch-max-seconds = 600 means that if 100 MB of events are posted to the Pipeline, the batch will be delivered. However, if it takes longer than 600 seconds for 100 MB of events to be posted, a batch of all the messages that were posted during those 600 seconds will be created and delivered.
You can configure the following batch-level settings to adjust how Pipelines create a batch:
| Setting | Default | Minimum | Maximum |
|---|---|---|---|
Maximum Batch Size batch-max-mb | 10 MB | 0.001 MB | 100 MB |
Maximum Batch Timeout batch-max-seconds | 15 seconds | 0 seconds | 600 seconds |
Maximum Batch Rows batch-max-rows | 10,000 rows | 1 row | 10,000 rows |