There’s a category of bash one-liners I keep rewriting: “run this thing, and if it fails, try again a few times with a delay.” curl against a service that’s still booting. A kubectl rollout status that flickers. A flaky integration test. A docker pull from a registry having a bad five seconds.

You can write the loop yourself in five lines of bash. I’ve done it hundreds of times. But I always forget the exit code handling, the sleep arithmetic, the cap on retries. So I wrote retry — a single binary that does exactly this, with proper exponential backoff if you want it.

The basic shape

# Default: 3 tries, no delay
retry "flaky-command"

# 5 attempts, 2s between each
retry --max-tries 5 --delay 2s "curl https://api.example.com"

# Exponential backoff — recommended for anything network-y
retry --backoff exponential --base-delay 1s --max-delay 30s \
  "curl https://api.example.com"

That’s the whole thing. The command goes in quotes, the flags configure how patient you are.

Exponential backoff, properly

The reason exponential backoff matters: when a service is overloaded, hammering it every second makes things worse. Doubling the delay each attempt — 1s, 2s, 4s, 8s — gives the upstream room to recover.

# 1s, 2s, 4s, 8s, ...
retry --backoff exponential "make test"

# Custom multiplier and max delay
retry --backoff exp --base-delay 100ms --multiplier 1.5 --max-delay 10s \
  "flaky-service-check"

# Short-form flags, infinite tries with a 1-minute ceiling
retry -B exp -b 500ms -M 1m -t 10 "network-dependent-command"

You can also configure everything via environment variables, which is handy in CI:

export RETRY_MAX_TRIES=5
export RETRY_BACKOFF=exponential
export RETRY_BASE_DELAY=500ms
export RETRY_MAX_DELAY=30s

retry "your-command"

Where I actually use it

Three places, every week:

  • Waiting for services to come up in docker-compose tests. retry --backoff exp -t 20 "curl -sf http://localhost:8080/health" is much nicer than a sleep 30 that’s either too short or too long.
  • CI/CD jobs that call flaky external APIs. Anything that touches a third-party SaaS. The retry loop turns transient 502s into a non-event.
  • Kubernetes init containers. Wait for a database, a service, a config to exist. Drop the binary in, exit 0 when ready.

The repo has a whole examples/ directory with worked-through cases for GitHub Actions, GitLab CI, Docker healthchecks, k8s init containers, database migrations, and more. Worth a browse if you want copy-paste starting points.

It’s also a Go library

If you’d rather embed the retry logic in your own program than shell out, the same package is importable:

import (
    "context"
    "time"

    "github.com/sgaunet/retry/pkg/logger"
    "github.com/sgaunet/retry/pkg/retry"
)

func main() {
    r, _ := retry.NewRetry("your-command", retry.NewStopOnMaxTries(5))
    r.SetBackoffStrategy(retry.NewExponentialBackoff(
        time.Second,
        time.Minute,
        2.0,
    ))

    appLogger := logger.NewLogger("info")
    _ = r.RunWithLogger(context.Background(), appLogger)
}

The library has a few things the CLI doesn’t expose directly:

  • Composite stop conditions — combine “max tries OR max duration” with NewCompositeCondition
  • Custom success conditions — exit code 0 isn’t always success; you can match on output substrings via NewSuccessContains
  • Multiple backoff strategies — fixed, exponential, linear, Fibonacci, jitter-wrapped, custom

The full GoDoc lives at pkg.go.dev/github.com/sgaunet/retry/pkg/retry.

Install

A static binary from the release page is the easiest path. There’s also a Docker image at ghcr.io/sgaunet/retry, mostly useful as a COPY --from=... source in your own Dockerfiles.

Closing

It’s a small tool that does one thing and gets out of the way. If you’ve been writing the same until ...; do sleep N; done loop in bash for years, this might save you a few keystrokes — and remove a class of bugs you didn’t know you were shipping.