Argo Workflows runs DAG and step-based pipelines as Kubernetes-native CRDs. Each step is a container. By 2026 it’s the de-facto choice for K8s-shop pipelines and ML training jobs. This post is the working set.

Why Argo

  • Each step is a container — language-agnostic; reproducible.
  • K8s-native — uses pods; integrates with autoscaler, GPUs, etc.
  • DAG: parallelism, dependencies, conditionals.
  • Artifacts: shared between steps via S3 / GCS / Minio.
  • UI for monitoring runs.

Hello world

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-
spec:
  entrypoint: main
  templates:
    - name: main
      container:
        image: alpine:3.20
        command: [sh, -c]
        args: ["echo Hello from Argo"]
argo submit -n argo workflow.yaml
argo watch -n argo @latest

Pod runs; logs streamed. Fundamentals.

DAG

spec:
  entrypoint: pipeline
  templates:
    - name: pipeline
      dag:
        tasks:
          - name: extract
            template: extract-step
          - name: transform-a
            template: transform
            arguments: { parameters: [{ name: input, value: "a" }] }
            dependencies: [extract]
          - name: transform-b
            template: transform
            arguments: { parameters: [{ name: input, value: "b" }] }
            dependencies: [extract]
          - name: load
            template: load-step
            dependencies: [transform-a, transform-b]

Parallel transforms; serial load. Argo schedules based on the graph.

Parameters

spec:
  entrypoint: main
  arguments:
    parameters:
      - name: dataset
        value: "users"
  templates:
    - name: main
      inputs:
        parameters:
          - name: dataset
      container:
        image: my/etl:1.0
        command: [python, run.py]
        args: ["--dataset", "{{inputs.parameters.dataset}}"]

Inputs flow as args / env vars. Override at submit:

argo submit -p dataset=orders workflow.yaml

Artifacts

- name: extract
  outputs:
    artifacts:
      - name: data
        path: /tmp/data.json
        s3:
          bucket: my-pipelines
          key: extract-output.json
  container:
    image: my/extract:1.0
    # writes to /tmp/data.json

- name: transform
  inputs:
    artifacts:
      - name: input
        path: /tmp/input.json
  container: ...

Argo handles upload/download to/from S3. Steps share data without sharing volumes.

Retries and timeouts

- name: flaky-step
  retryStrategy:
    limit: 3
    retryPolicy: OnError
    backoff:
      duration: 1m
      factor: 2
      maxDuration: 30m
  activeDeadlineSeconds: 3600
  container: ...

Retry on errors with exponential backoff. Hard timeout per step.

Conditional execution

- name: maybe-deploy
  when: "{{steps.test.outputs.parameters.passed}} == true"
  template: deploy-step

Skip steps based on previous outputs.

Resources / GPU

container:
  image: my/training:1.0
  resources:
    requests: { cpu: "8", memory: "32Gi", nvidia.com/gpu: "1" }
    limits: { cpu: "16", memory: "64Gi", nvidia.com/gpu: "1" }
  nodeSelector:
    accelerator: nvidia-h100

Per-step GPU request. Argo schedules on appropriate node. Combined with Karpenter / cluster autoscaler: spin up GPU nodes only when needed.

For ML training pipelines: this is the killer combo.

Loops / fanout

- name: process-list
  steps:
    - - name: process
        template: process-item
        arguments: { parameters: [{ name: item, value: "{{item}}" }] }
        withItems:
          - alpha
          - beta
          - gamma

Generates one pod per item. For dynamic lists from previous step’s output: withParam referencing JSON output.

CronWorkflow

apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata: { name: nightly-etl }
spec:
  schedule: "0 2 * * *"
  workflowSpec: {...}

Replaces cron-in-Kubernetes for pipeline scheduling.

Argo Events

For event-triggered workflows (S3 upload, Kafka message, webhook):

# EventSource → Sensor → triggers Workflow

Reactive pipelines. Less common pattern but powerful.

When Argo wins

  • K8s-shop: already running K8s for apps.
  • Polyglot pipelines: each step a different language.
  • GPU / ML: native K8s GPU integration.
  • Heavy parallelism: K8s scales pods.
  • CI for ML: train + eval + deploy as DAG.

When it doesn’t

  • Python-native simple workflows: Prefect / Dagster simpler DX.
  • Heavy data orchestration with metadata: Dagster.
  • Existing Airflow shop: don’t migrate just because.
  • Non-K8s deployment: Argo needs K8s.

Argo vs alternatives

StrengthsWeaknesses
Argo WorkflowsK8s-native; container-per-step; GPUYAML; learning curve
AirflowMature; huge plugin ecosystemPython-centric; ops-heavy
PrefectPython DX; cloud optionLess K8s-native
DagsterAsset-centric; data lineageDifferent mental model
TemporalDurable execution; code-firstMore for app workflows than data

For ML / DevOps pipelines on K8s: Argo. For data engineering: Dagster or Airflow. For app workflows: Temporal.

See Temporal Workflow Engine .

Operational realities

  • Workflow controller handles DAG; pods come and go.
  • PostgreSQL / Mysql as state store (older versions used SQLite; not recommended).
  • Logs go to standard pod logs; persist via your usual log pipeline.
  • History: archived workflows in DB; clean up periodically.
  • Resource limits: workflows can OOM if generating thousands of pods. Concurrency limits.

CI / CD with Argo Workflows

# On PR
- run: argo submit ci.yaml -p git_sha=$SHA -p branch=$BRANCH --watch

Pipelines as workflows. Shareable templates. Multi-step builds, tests, deploys.

For app deploys: Argo CD is the sibling (GitOps). See GitOps with Argo CD .

Common mistakes

1. Heavy steps without resource limits

OOM-killed; debugging confusing. Set resources.

2. No retries on flaky steps

Network blips kill the workflow. retryStrategy.

3. Massive YAML

DRY: use templates and parameters. Reusable libraries.

4. Forgot to clean up history

Workflow CRDs accumulate. ttlStrategy or periodic cleanup.

5. Cron + every minute

CronWorkflow with * * * * *: pods every minute, all day. Most schedules can be coarser.

What I’d ship today

For K8s pipelines:

  • Argo Workflows for ML / data / CI pipelines.
  • CronWorkflows for scheduled jobs.
  • Argo Events for reactive triggers.
  • Karpenter + Argo for autoscaling GPU.
  • Standard templates in your org (DRY).
  • TTL strategy to clean up history.

Read this next

If you want my Argo Workflows templates (ML training, CI, ETL), it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .