pipeline-manager: dismiss deployment error and clear bootstrap policy#5663
pipeline-manager: dismiss deployment error and clear bootstrap policy#5663
Conversation
|
This sounds like a breaking change for all orchestration workflows (that is, all scripts need to call dismiss error before calling start to be idempotent, and that's a poor UX). Could you clarify? What's wrong with start() always overriding deployment_errors? |
|
Error dismissal is not integrated into the |
|
It's for the following use case:
|
|
For the client (chiefly, Python, fda doesn't have retries at least currently), it needs to use the separate endpoint for this reason. Such that when it does
|
And that looks looks fine to me. If they want the deployment error to block a start in orchestration code, they can call start(ignore_deployment_error=false). What am I missing? |
|
The default behavior of the HTTP API |
|
I could add a Also, it's unclear what idempotency means when the pipeline is already running and one calls due to retry |
|
I'm with @lalithsuresh . It's a footgun if an error from the previous run can cause /start to fail in the default case. I think the default behavior of /start should be |
|
I do like the new |
e3e739a to
2f4c855
Compare
|
@claude Review the code changes |
|
Claude finished @snkas's task in 2m 50s —— View job Code Review
Overall the implementation is well-structured and the bootstrap_policy fix is a good correction. A few issues worth addressing below. Bugs1. Busy-wait loop in test without sleep ( while time.time() < deadline:
if pipeline.status() == PipelineStatus.STOPPED:
return
raise TimeoutError(...)There is no 2. Docstring says "False by default" but actual default is def start_pipeline(
self,
pipeline_name: str,
...
dismiss_error: bool = True, # <-- actual default
) -> Optional[PipelineStatus]:
"""
:param dismiss_error: Set True to dismiss any deployment error before starting. False by default. # <-- wrong
"""The docstring says "False by default" but the signature has Design Issues3.
4. Two-step dismiss+start in Python client is intentionally non-atomic The Python 5. HTTP API default for The PR description says "dismiss_error=false (default)" for the fn default_pipeline_start_dismiss_error() -> bool {
true
}The implementation matches what the reviewers (ryzhyk, lalithsuresh) requested. The PR description should be updated to reflect this. Minor6. Response body of Ok(HttpResponse::Ok().json(json!("Pipeline deployment error has been dismissed")))This serializes as a JSON string Positives
|
|
@claude Review the code changes again |
|
Claude finished @snkas's task in 3m 54s —— View job Code Review
|
|
@claude Review the code changes again |
|
Claude finished @snkas's task in 3m 53s —— View job Code Review
All items from the previous two reviews have been addressed. One minor item remains outstanding; the other "still present" issue from the last review was a false positive. Minor1.
error_occurred = False
try:
pipeline.start(dismiss_error=False)
except FelderaAPIError as e:
error_occurred = True
assert e.status_code == 400
assert e.error_code == "CannotStartWithUndismissedError"
assert error_occurredClarification from previous review: The previous review stated this "silently passes" if the exception isn't raised — that's incorrect. That said, with pytest.raises(FelderaAPIError) as exc_info:
pipeline.start(dismiss_error=False)
assert exc_info.value.status_code == 400
assert exc_info.value.error_code == "CannotStartWithUndismissedError"
assert pipeline.deployment_error() is not NoneCorrection from Previous ReviewThe The previous review flagged this as inconsistent with other lifecycle endpoints: Ok(HttpResponse::Ok().json(json!("Pipeline deployment error has been dismissed")))On inspection, all lifecycle action endpoints in this codebase return bare JSON strings in the same style:
The Positives
|
|
Probably we should document the new API in documentations docs.feldera.com dir pipelines/lifecycle.md and modifying.md, mention that /dismiss_error is a sync operation that guarantees error is cleared in tutorials/restapi. Also would be helpful to add an example how to use this api in python/docs/examples.rst and/or introduction.rst |
docs.feldera.com/docs/changelog.md
Outdated
| The `ignore_deployment_error` parameter has been removed from the Python | ||
| `pipeline.start()` method. Instead, make use of the newly added `dismiss_error` parameter. | ||
| A pipeline deployment error can now be separately dismissed using a dedicated endpoint | ||
| and the corresponding client functions. |
There was a problem hiding this comment.
it would be good to add a code-snippet of how this was invoked before and how an invocation of start looks like now
There was a problem hiding this comment.
I can update the changelog with all api changes in my PR, if it works
There was a problem hiding this comment.
Edited Changelog to:
The `ignore_deployment_error` parameter has been removed from the Python
`pipeline.start()` method. Instead, make use of the newly added `dismiss_error` parameter.
If you do not want the pipeline to start if there is a pre-existing deployment error,
you should call `pipeline.start(dismiss_error=False)`. Otherwise call `pipeline.start()`
which is by default equivalent to `pipeline.start(dismiss_error=True)` (preserving
existing behavior). If the start results in an error occurring, the method will still
throw an error as before. A pipeline deployment error can now also be separately dismissed
using a dedicated endpoint and the corresponding client functions (e.g.,
`pipeline.dismiss_error()` for the Python client).
d435b9a to
c86adee
Compare
@igorscs I updated |
|
@claude Please review the code changes again |
|
Claude finished @snkas's task in 3m 27s —— View job Code Review
All items from previous reviews have been addressed. One minor style nit remains; nothing blocking. Minor (carried from review #3)
error_occurred = False
try:
pipeline.start(dismiss_error=False)
except FelderaAPIError as e:
error_occurred = True
assert e.status_code == 400
assert e.error_code == "CannotStartWithUndismissedError"
assert error_occurredAs noted in review #3, this is functionally correct ( Correctness of New Items Since Last Review
The
@staticmethod
def from_str(value):
for member in BootstrapPolicy:
if member.name.lower() == value.lower():
return member
raise ValueError(...)
The DB fix is applied symmetrically in both the model (
The check
fda dismiss-error p1 # no-op (no error present), idempotent
fda start p1 --no-dismiss-error # start without dismissing, succeeds since no error
fda start p1 # normal start with dismiss (default)This sequence correctly exercises both code paths. Changelog ( The migration guidance is clear: calls using Positives
|
mythical-fred
left a comment
There was a problem hiding this comment.
The implementation is correct and the design is sound. The key design choices are well-reasoned:
- as the HTTP API default matches what Leonid and Lalith asked for — old errors from a previous crash shouldn't silently block a retry
- The two-step dismiss-then-start in the Python client (always calling first, then ) is the right call for idempotent retries: a retry must not silently eat a new crash error that occurred in the retry window
- being cleared on Stopping/Stopped but preserved through Provisioning/Provisioned is the correct lifecycle semantics
The bot did four rounds of careful review, each catching real issues (busy-wait loop, wrong docstring, missing on //) — all were fixed. The result is a tight, well-tested PR. Happy to approve.
Add the `/dismiss_error` endpoint for the user to dismiss the content of the `deployment_error` field, which is set when the pipeline becomes `Stopping`/`Stopped` due to an fatal error occurring deployment. The functionality is also integrated into the `/start` endpoint with the parameter `?dismiss_error=false/true` (default: `true`). Python API changes: - `pipeline.start()` has the `ignore_deployment_error` parameter removed (not backward compatible) - `pipeline.start()` has the `dismiss_error` parameter added which is by default `True` - `pipeline.dismiss_error()` is added CLI client `fda` changes: - `fda start` has the `--no-dismiss-error` parameter added - `fda dismiss-error` is added The `bootstrap_policy` field is fixed to be cleared along with `deployment_initial` when the pipeline is stopping or has stopped. Signed-off-by: Simon Kassing <simon.kassing@feldera.com>
c86adee to
19ba231
Compare

Add the
/dismiss_errorendpoint for the user to dismiss the content of thedeployment_errorfield, which is set when the pipeline becomesStopping/Stoppeddue to an fatal error occurring deployment. The functionality is also integrated into the/startendpoint with the parameter?dismiss_error=false/true(default:true).Python API changes:
pipeline.start()has theignore_deployment_errorparameter removed (not backward compatible)pipeline.start()has thedismiss_errorparameter added which is by defaultTruepipeline.dismiss_error()is addedCLI client
fdachanges:fda starthas the--no-dismiss-errorparameter addedfda dismiss-erroris addedThe
bootstrap_policyfield is fixed to be cleared along withdeployment_initialwhen the pipeline is stopping or has stopped.PR information
fdatestignore_deployment_errorparameter for the Python APIpipeline.start()method