Aggregation pipeline.

Core stages

db.orders.aggregate([
    { $match: { status: "completed", created: { $gte: ISODate("2026-01-01") } } },
    { $group: { _id: "$user_id", total: { $sum: "$amount" }, count: { $sum: 1 } } },
    { $sort: { total: -1 } },
    { $limit: 100 },
    { $project: { user_id: "$_id", total: 1, count: 1, avg: { $divide: ["$total", "$count"] }, _id: 0 } },
])

$match early

Push $match first; uses indexes. Same for $sort (with index).

$group operators

$sum $avg $min $max
$first $last
$push (array)
$addToSet
$count

$lookup (join)

{ $lookup: {
    from: "users",
    localField: "user_id",
    foreignField: "_id",
    as: "user",
} },
{ $unwind: "$user" },

Slower than relational JOIN. Use when needed; index foreignField.

$lookup with pipeline

{ $lookup: {
    from: "posts",
    let: { uid: "$_id" },
    pipeline: [
        { $match: { $expr: { $eq: ["$author_id", "$$uid"] } } },
        { $sort: { created: -1 } },
        { $limit: 5 },
    ],
    as: "recent_posts",
} }

$facet (multiple pipelines)

{ $facet: {
    counts: [{ $group: { _id: "$status", count: { $sum: 1 } } }],
    top: [{ $sort: { score: -1 } }, { $limit: 10 }],
    avg: [{ $group: { _id: null, avg: { $avg: "$score" } } }],
} }

Runs sub-pipelines on same input.

$unwind

{ $unwind: "$tags" }                          // explode array
{ $unwind: { path: "$tags", preserveNullAndEmptyArrays: true } }

$addFields / $set

{ $addFields: { full_name: { $concat: ["$first", " ", "$last"] } } }

$project

{ $project: { name: 1, email: 1, _id: 0 } }

Window functions ($setWindowFields)

{ $setWindowFields: {
    partitionBy: "$category",
    sortBy: { created: 1 },
    output: {
        cumTotal: { $sum: "$amount", window: { documents: ["unbounded", "current"] } },
        rank: { $rank: {} },
    },
} }

$merge / $out

{ $merge: "reports.daily" }           // upsert
{ $out: "reports.daily" }              // replace collection

Date operators

$year $month $dayOfMonth $hour
$dateToString $dateFromString
$dateAdd $dateSubtract $dateDiff
$dateTrunc
{ $project: { day: { $dateToString: { format: "%Y-%m-%d", date: "$created" } } } }

String operators

$concat $substr $toLower $toUpper
$split $trim $regex
$strLenCP $indexOfCP

Conditional

$cond $ifNull $switch
{ $project: { status: { $cond: [{ $gt: ["$age", 18] }, "adult", "minor"] } } }

Array operators

$size $arrayElemAt $slice $map $filter $reduce
$in $allElementsTrue $anyElementTrue
{ $project: { active_tags: { $filter: { input: "$tags", cond: { $eq: ["$$this.active", true] } } } } }

Performance

  • Index for $match and $sort at start.
  • Avoid $lookup at scale; denormalize.
  • allowDiskUse: true for big aggregations.
  • Limit early.

explain

db.coll.explain("executionStats").aggregate([...])

Common mistakes

  • $match after $group → can’t use index.
  • $sort without index → in-memory sort (32MB limit).
  • $lookup on unindexed field → full scan per doc.
  • Forgetting allowDiskUse on big runs.
  • Trying to do everything in DB; sometimes app-side is clearer.

Read this next

If you want my aggregation cookbook, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .