There’s a category of bash one-liners I keep rewriting: “run this thing, and if it fails, try again a few times with a delay.” curl against a service that’s still booting. A kubectl rollout status that flickers. A flaky integration test. A docker pull from a registry having a bad five seconds.
You can write the loop yourself in five lines of bash. I’ve done it hundreds of times. But I always forget the exit code handling, the sleep arithmetic, the cap on retries. So I wrote retry — a single binary that does exactly this, with proper exponential backoff if you want it.
The basic shape
# Default: 3 tries, no delay
retry "flaky-command"
# 5 attempts, 2s between each
retry --max-tries 5 --delay 2s "curl https://api.example.com"
# Exponential backoff — recommended for anything network-y
retry --backoff exponential --base-delay 1s --max-delay 30s \
"curl https://api.example.com"
That’s the whole thing. The command goes in quotes, the flags configure how patient you are.
Exponential backoff, properly
The reason exponential backoff matters: when a service is overloaded, hammering it every second makes things worse. Doubling the delay each attempt — 1s, 2s, 4s, 8s — gives the upstream room to recover.
# 1s, 2s, 4s, 8s, ...
retry --backoff exponential "make test"
# Custom multiplier and max delay
retry --backoff exp --base-delay 100ms --multiplier 1.5 --max-delay 10s \
"flaky-service-check"
# Short-form flags, infinite tries with a 1-minute ceiling
retry -B exp -b 500ms -M 1m -t 10 "network-dependent-command"
You can also configure everything via environment variables, which is handy in CI:
export RETRY_MAX_TRIES=5
export RETRY_BACKOFF=exponential
export RETRY_BASE_DELAY=500ms
export RETRY_MAX_DELAY=30s
retry "your-command"
Where I actually use it
Three places, every week:
- Waiting for services to come up in docker-compose tests.
retry --backoff exp -t 20 "curl -sf http://localhost:8080/health"is much nicer than asleep 30that’s either too short or too long. - CI/CD jobs that call flaky external APIs. Anything that touches a third-party SaaS. The retry loop turns transient 502s into a non-event.
- Kubernetes init containers. Wait for a database, a service, a config to exist. Drop the binary in, exit 0 when ready.
The repo has a whole examples/ directory with worked-through cases for GitHub Actions, GitLab CI, Docker healthchecks, k8s init containers, database migrations, and more. Worth a browse if you want copy-paste starting points.
It’s also a Go library
If you’d rather embed the retry logic in your own program than shell out, the same package is importable:
import (
"context"
"time"
"github.com/sgaunet/retry/pkg/logger"
"github.com/sgaunet/retry/pkg/retry"
)
func main() {
r, _ := retry.NewRetry("your-command", retry.NewStopOnMaxTries(5))
r.SetBackoffStrategy(retry.NewExponentialBackoff(
time.Second,
time.Minute,
2.0,
))
appLogger := logger.NewLogger("info")
_ = r.RunWithLogger(context.Background(), appLogger)
}
The library has a few things the CLI doesn’t expose directly:
- Composite stop conditions — combine “max tries OR max duration” with
NewCompositeCondition - Custom success conditions — exit code 0 isn’t always success; you can match on output substrings via
NewSuccessContains - Multiple backoff strategies — fixed, exponential, linear, Fibonacci, jitter-wrapped, custom
The full GoDoc lives at pkg.go.dev/github.com/sgaunet/retry/pkg/retry.
Install
A static binary from the release page is the easiest path. There’s also a Docker image at ghcr.io/sgaunet/retry, mostly useful as a COPY --from=... source in your own Dockerfiles.
Closing
- Source: github.com/sgaunet/retry
- License: MIT
It’s a small tool that does one thing and gets out of the way. If you’ve been writing the same until ...; do sleep N; done loop in bash for years, this might save you a few keystrokes — and remove a class of bugs you didn’t know you were shipping.