Multimodal

AI/LLM Cheatsheet 14 — Multimodal LLMs

Cheatsheet: vision LLMs, image inputs, audio, video.

Multimodal LLMs in 2026 — Vision, Audio, and What's Actually Useful

Practical multimodal: vision-aware document understanding, audio transcription + reasoning, image-from-text, video understanding, and where multimodal pays off.