MongoDB Cheatsheet 03 — Aggregation Pipeline

Aggregation pipeline.

Core stages

db.orders.aggregate([
    { $match: { status: "completed", created: { $gte: ISODate("2026-01-01") } } },
    { $group: { _id: "$user_id", total: { $sum: "$amount" }, count: { $sum: 1 } } },
    { $sort: { total: -1 } },
    { $limit: 100 },
    { $project: { user_id: "$_id", total: 1, count: 1, avg: { $divide: ["$total", "$count"] }, _id: 0 } },
])

$match early

Push $match first; uses indexes. Same for $sort (with index).

$group operators

$sum $avg $min $max
$first $last
$push (array)
$addToSet
$count

$lookup (join)

{ $lookup: {
    from: "users",
    localField: "user_id",
    foreignField: "_id",
    as: "user",
} },
{ $unwind: "$user" },

Slower than relational JOIN. Use when needed; index foreignField.

$lookup with pipeline

{ $lookup: {
    from: "posts",
    let: { uid: "$_id" },
    pipeline: [
        { $match: { $expr: { $eq: ["$author_id", "$$uid"] } } },
        { $sort: { created: -1 } },
        { $limit: 5 },
    ],
    as: "recent_posts",
} }

{ $facet: {
    counts: [{ $group: { _id: "$status", count: { $sum: 1 } } }],
    top: [{ $sort: { score: -1 } }, { $limit: 10 }],
    avg: [{ $group: { _id: null, avg: { $avg: "$score" } } }],
} }

Runs sub-pipelines on same input.

$unwind

{ $unwind: "$tags" }                          // explode array
{ $unwind: { path: "$tags", preserveNullAndEmptyArrays: true } }

$addFields / $set

{ $addFields: { full_name: { $concat: ["$first", " ", "$last"] } } }

$project

{ $project: { name: 1, email: 1, _id: 0 } }

Window functions ($setWindowFields)

{ $setWindowFields: {
    partitionBy: "$category",
    sortBy: { created: 1 },
    output: {
        cumTotal: { $sum: "$amount", window: { documents: ["unbounded", "current"] } },
        rank: { $rank: {} },
    },
} }

$merge / $out

{ $merge: "reports.daily" }           // upsert
{ $out: "reports.daily" }              // replace collection

Date operators

$year $month $dayOfMonth $hour
$dateToString $dateFromString
$dateAdd $dateSubtract $dateDiff
$dateTrunc

{ $project: { day: { $dateToString: { format: "%Y-%m-%d", date: "$created" } } } }

String operators

$concat $substr $toLower $toUpper
$split $trim $regex
$strLenCP $indexOfCP

Conditional

$cond $ifNull $switch

{ $project: { status: { $cond: [{ $gt: ["$age", 18] }, "adult", "minor"] } } }

Array operators

$size $arrayElemAt $slice $map $filter $reduce
$in $allElementsTrue $anyElementTrue

{ $project: { active_tags: { $filter: { input: "$tags", cond: { $eq: ["$$this.active", true] } } } } }

Performance

Index for $match and $sort at start.
Avoid $lookup at scale; denormalize.
allowDiskUse: true for big aggregations.
Limit early.

explain

db.coll.explain("executionStats").aggregate([...])

Common mistakes

$match after $group → can’t use index.
$sort without index → in-memory sort (32MB limit).
$lookup on unindexed field → full scan per doc.
Forgetting allowDiskUse on big runs.
Trying to do everything in DB; sometimes app-side is clearer.

MongoDB Cheatsheet 03 — Aggregation Pipeline

Core stages

$match early

$group operators

$lookup (join)

$lookup with pipeline

$facet (multiple pipelines)

$unwind

$addFields / $set

$project

Window functions ($setWindowFields)

$merge / $out

Date operators

String operators

Conditional

Array operators

Performance

explain

Common mistakes

Read this next

Core stages#

$match early#

$group operators#

$lookup (join)#

$lookup with pipeline#

$facet (multiple pipelines)#

$unwind#

$addFields / $set#

$project#

Window functions ($setWindowFields)#

$merge / $out#

Date operators#

String operators#

Conditional#

Array operators#

Performance#

explain#

Common mistakes#

Read this next#

Core stages

$match early

$group operators

$lookup (join)

$lookup with pipeline

$facet (multiple pipelines)

$unwind

$addFields / $set

$project

Window functions ($setWindowFields)

$merge / $out

Date operators

String operators

Conditional

Array operators

Performance

explain

Common mistakes

Read this next