Multimodal AI
Core ConceptsSimple Definition
AI systems that can process and generate multiple types of data — text, images, audio, video, and code.
Full Explanation
Early AI models handled only one modality (text-only or image-only). Modern multimodal models like GPT-4o, Gemini Ultra, and Claude 3 can accept images, audio, and documents alongside text. This enables use cases like analyzing a photograph, describing a video, reading a chart, or answering questions about a PDF — all in natural language.
Example
GPT-4o can analyze a photo of a restaurant menu and recommend dishes based on dietary restrictions.
Related Terms
Last verified: 2026-03-30← Back to Glossary