Argo Workflows runs DAG and step-based pipelines as Kubernetes-native CRDs. Each step is a container. By 2026 it’s the de-facto choice for K8s-shop pipelines and ML training jobs. This post is the working set.
Why Argo
- Each step is a container — language-agnostic; reproducible.
- K8s-native — uses pods; integrates with autoscaler, GPUs, etc.
- DAG: parallelism, dependencies, conditionals.
- Artifacts: shared between steps via S3 / GCS / Minio.
- UI for monitoring runs.
Hello world
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-
spec:
entrypoint: main
templates:
- name: main
container:
image: alpine:3.20
command: [sh, -c]
args: ["echo Hello from Argo"]
argo submit -n argo workflow.yaml
argo watch -n argo @latest
Pod runs; logs streamed. Fundamentals.
DAG
spec:
entrypoint: pipeline
templates:
- name: pipeline
dag:
tasks:
- name: extract
template: extract-step
- name: transform-a
template: transform
arguments: { parameters: [{ name: input, value: "a" }] }
dependencies: [extract]
- name: transform-b
template: transform
arguments: { parameters: [{ name: input, value: "b" }] }
dependencies: [extract]
- name: load
template: load-step
dependencies: [transform-a, transform-b]
Parallel transforms; serial load. Argo schedules based on the graph.
Parameters
spec:
entrypoint: main
arguments:
parameters:
- name: dataset
value: "users"
templates:
- name: main
inputs:
parameters:
- name: dataset
container:
image: my/etl:1.0
command: [python, run.py]
args: ["--dataset", "{{inputs.parameters.dataset}}"]
Inputs flow as args / env vars. Override at submit:
argo submit -p dataset=orders workflow.yaml
Artifacts
- name: extract
outputs:
artifacts:
- name: data
path: /tmp/data.json
s3:
bucket: my-pipelines
key: extract-output.json
container:
image: my/extract:1.0
# writes to /tmp/data.json
- name: transform
inputs:
artifacts:
- name: input
path: /tmp/input.json
container: ...
Argo handles upload/download to/from S3. Steps share data without sharing volumes.
Retries and timeouts
- name: flaky-step
retryStrategy:
limit: 3
retryPolicy: OnError
backoff:
duration: 1m
factor: 2
maxDuration: 30m
activeDeadlineSeconds: 3600
container: ...
Retry on errors with exponential backoff. Hard timeout per step.
Conditional execution
- name: maybe-deploy
when: "{{steps.test.outputs.parameters.passed}} == true"
template: deploy-step
Skip steps based on previous outputs.
Resources / GPU
container:
image: my/training:1.0
resources:
requests: { cpu: "8", memory: "32Gi", nvidia.com/gpu: "1" }
limits: { cpu: "16", memory: "64Gi", nvidia.com/gpu: "1" }
nodeSelector:
accelerator: nvidia-h100
Per-step GPU request. Argo schedules on appropriate node. Combined with Karpenter / cluster autoscaler: spin up GPU nodes only when needed.
For ML training pipelines: this is the killer combo.
Loops / fanout
- name: process-list
steps:
- - name: process
template: process-item
arguments: { parameters: [{ name: item, value: "{{item}}" }] }
withItems:
- alpha
- beta
- gamma
Generates one pod per item. For dynamic lists from previous step’s output: withParam referencing JSON output.
CronWorkflow
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata: { name: nightly-etl }
spec:
schedule: "0 2 * * *"
workflowSpec: {...}
Replaces cron-in-Kubernetes for pipeline scheduling.
Argo Events
For event-triggered workflows (S3 upload, Kafka message, webhook):
# EventSource → Sensor → triggers Workflow
Reactive pipelines. Less common pattern but powerful.
When Argo wins
- K8s-shop: already running K8s for apps.
- Polyglot pipelines: each step a different language.
- GPU / ML: native K8s GPU integration.
- Heavy parallelism: K8s scales pods.
- CI for ML: train + eval + deploy as DAG.
When it doesn’t
- Python-native simple workflows: Prefect / Dagster simpler DX.
- Heavy data orchestration with metadata: Dagster.
- Existing Airflow shop: don’t migrate just because.
- Non-K8s deployment: Argo needs K8s.
Argo vs alternatives
| Strengths | Weaknesses | |
|---|---|---|
| Argo Workflows | K8s-native; container-per-step; GPU | YAML; learning curve |
| Airflow | Mature; huge plugin ecosystem | Python-centric; ops-heavy |
| Prefect | Python DX; cloud option | Less K8s-native |
| Dagster | Asset-centric; data lineage | Different mental model |
| Temporal | Durable execution; code-first | More for app workflows than data |
For ML / DevOps pipelines on K8s: Argo. For data engineering: Dagster or Airflow. For app workflows: Temporal.
See Temporal Workflow Engine .
Operational realities
- Workflow controller handles DAG; pods come and go.
- PostgreSQL / Mysql as state store (older versions used SQLite; not recommended).
- Logs go to standard pod logs; persist via your usual log pipeline.
- History: archived workflows in DB; clean up periodically.
- Resource limits: workflows can OOM if generating thousands of pods. Concurrency limits.
CI / CD with Argo Workflows
# On PR
- run: argo submit ci.yaml -p git_sha=$SHA -p branch=$BRANCH --watch
Pipelines as workflows. Shareable templates. Multi-step builds, tests, deploys.
For app deploys: Argo CD is the sibling (GitOps). See GitOps with Argo CD .
Common mistakes
1. Heavy steps without resource limits
OOM-killed; debugging confusing. Set resources.
2. No retries on flaky steps
Network blips kill the workflow. retryStrategy.
3. Massive YAML
DRY: use templates and parameters. Reusable libraries.
4. Forgot to clean up history
Workflow CRDs accumulate. ttlStrategy or periodic cleanup.
5. Cron + every minute
CronWorkflow with * * * * *: pods every minute, all day. Most schedules can be coarser.
What I’d ship today
For K8s pipelines:
- Argo Workflows for ML / data / CI pipelines.
- CronWorkflows for scheduled jobs.
- Argo Events for reactive triggers.
- Karpenter + Argo for autoscaling GPU.
- Standard templates in your org (DRY).
- TTL strategy to clean up history.
Read this next
- Argo Workflows vs Airflow
- GitOps with Argo CD
- Kubernetes Resource Limits 2026
- Temporal Workflow Engine 2026
If you want my Argo Workflows templates (ML training, CI, ETL), it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .