Backfills¶
BackfillResult¶
Returned by CodeRepository.backfill().
| Attribute | Type | Description |
|---|---|---|
backfill_id |
str |
Unique identifier for the backfill |
num_partitions |
int |
Total number of partitions in the backfill |
num_runs |
int |
Number of runs created |
status |
str |
Current status ("in_progress", "completed", "failed", "canceled") |
completed |
int |
Number of completed partitions |
failed |
int |
Number of failed partitions |
canceled |
int |
Number of canceled partitions |
run_ids |
list[str] |
IDs of runs created by the backfill |
is_dry_run |
bool |
Whether this was a dry-run preview |
partition_keys |
list[PartitionKey] |
Partition keys included in the backfill |
BackfillStatus¶
Returned by CodeRepository.get_backfill().
| Attribute | Type | Description |
|---|---|---|
backfill_id |
str |
Unique identifier for the backfill |
status |
str |
Current status ("in_progress", "completed", "failed", "canceled") |
total_partitions |
int |
Total number of partitions |
completed_partitions |
int |
Number of completed partitions |
failed_partitions |
int |
Number of failed partitions |
canceled_partitions |
int |
Number of canceled partitions |
run_ids |
list[str] |
IDs of runs created by the backfill |
error |
str \| None |
Error message if the backfill failed |
tags |
list[tuple[str, str]] |
Tags attached to every run launched by the backfill |
BackfillStrategy¶
Controls how partition keys are grouped into runs during a backfill.
BackfillStrategy.multi_run()¶
One run per partition key (default).
BackfillStrategy.single_run()¶
All partition keys in a single run.
BackfillStrategy.per_dimension()¶
BackfillStrategy.per_dimension(
multi_run: list[str],
single_run: list[str],
) -> BackfillStrategy.PerDimension
Per-dimension control for multi-dimensional partitions.
| Parameter | Type | Description |
|---|---|---|
multi_run |
list[str] |
Dimensions iterated across runs (at least one required) |
single_run |
list[str] |
Dimensions batched within each run (at least one required) |
BackfillStrategy.PerDimension attributes:
| Attribute | Type |
|---|---|
multi_run |
list[str] |
single_run |
list[str] |
Warning
A dimension cannot appear in both multi_run and single_run.
PartitionKeyRange¶
Specifies a range of partition keys for a backfill.
PartitionKeyRange.single()¶
Range of single-dimension partition keys from from_key to to_key (inclusive).
PartitionKeyRange.multi()¶
Multi-dimension partition key range. Each dimension value can be:
- A
(from, to)tuple for a range - A
list[str]for explicit keys
rs.PartitionKeyRange.multi({
"date": ("2024-01-01", "2024-01-07"),
"region": ["us", "eu", "asia"],
})
CodeRepository.backfill()¶
CodeRepository.backfill(
selection: list[str] | None = None,
partition_keys: list[PartitionKey] | None = None,
partition_range: PartitionKeyRange | None = None,
strategy: BackfillStrategy | None = None,
failure_policy: str = "continue",
max_concurrency: int = 4,
tags: list[tuple[str, str]] | None = None,
config: dict[str, dict[str, Any]] | None = None,
block: bool = True,
dry_run: bool = False,
) -> BackfillResult
Launch a backfill to reprocess partitions.
| Parameter | Type | Default | Description |
|---|---|---|---|
selection |
list[str] \| None |
None |
Asset keys to backfill. None selects all partitioned assets. |
partition_keys |
list[PartitionKey] \| None |
None |
Explicit list of partition keys to process. |
partition_range |
PartitionKeyRange \| None |
None |
Range of partition keys. Mutually exclusive with partition_keys. |
strategy |
BackfillStrategy \| None |
None |
How to group partitions into runs. Falls back to asset-level strategy, then MultiRun. |
failure_policy |
str |
"continue" |
"continue" to keep processing on failure, "stop_on_failure" to halt. |
max_concurrency |
int |
4 |
Maximum number of concurrent runs. |
tags |
list[tuple[str, str]] \| None |
None |
Tags attached to the backfill and its runs. Use ("rivers/priority", "N") to override default priority (-10). |
config |
dict[str, dict[str, Any]] \| None |
None |
Per-asset config overrides (keyed by asset name). |
block |
bool |
True |
If True, wait for the backfill to complete before returning. |
dry_run |
bool |
False |
If True, compute the plan without executing. |
Note
Provide either partition_keys or partition_range, not both.
Priority
Backfill runs default to priority -10 (lower than regular runs at priority 0), ensuring scheduled and manually triggered runs are dequeued first when a run queue is configured. Override with tags=[("rivers/priority", "5")].
CodeRepository.get_backfill()¶
Get the current status of a backfill by ID. Returns None if not found.
CodeRepository.cancel_backfill()¶
Cancel a running or requested backfill. Returns True if the in-process coordinator was signalled (the backfill has live state in this process); returns False and falls back to a storage-level cancel marker otherwise.
CodeRepository.rerun_backfill()¶
CodeRepository.rerun_backfill(
backfill_id: str,
block: bool = True,
dry_run: bool = False,
) -> BackfillResult
Re-launch the failed and canceled partitions of a previous backfill.
| Parameter | Type | Default | Description |
|---|---|---|---|
backfill_id |
str |
required | ID of the backfill to retry. |
block |
bool |
True |
If True, wait for the rerun to complete before returning. |
dry_run |
bool |
False |
If True, compute the plan without executing. |