Gemini is Google's native multimodal model that can process video and large context, with models like Pro, Flash, and Ultra, and it can ground results to Google Search.

Which model should I use

Use Flash for fast, cost-effective RAG; choose Pro for well rounded tasks; pick Ultra when you need the largest reasoning capacity.

gemini

npx machina-cli add skill G1Joshi/Agent-Skills/gemini --openclaw

Files (1)

SKILL.md

1.3 KB

Gemini

Gemini is Google's native multimodal model. Uniquely, it accepts video and huge context (2M+ tokens) natively. 2025 sees Gemini 2.0/3.0.

When to Use

Massive Context: "Here is a 1-hour video. Find the timestamp where..."
Multimodal Live: Real-time voice/video interaction.
Google Ecosystem: Integrated with Vertex AI, Search (Grounding), and Workspace.

Core Concepts

Models

Pro: The best all-rounder.
Flash: Extremely fast and cheap. High throughput.
Ultra: The largest reasoning model.

Grounding

Connects the model to Google Search to provide citations and up-to-date info.

Context Initial Caching

Cache the context (e.g., a massive manual) to reduce cost/latency on subsequent queries.

Best Practices (2025)

Do:

Use Flash for RAG: 2.0 Flash is smart enough for most RAG & cheaper/faster.
Use Grounding: Eliminate hallucinations by enforcing "Google Search" grounding.
Upload Video: Don't transcribe video manually; Gemini watches it.

Don't:

Don't confuse with PaLM: Gemini replaced PaLM 2 completely.

References

Gemini API Documentation

Source

git clone https://github.com/G1Joshi/Agent-Skills/blob/main/skills/ai-ml/gemini/SKILL.mdView on GitHub

Overview

Gemini is Google's native multimodal model that natively accepts video input and enormous context (2M+ tokens). It offers Pro, Flash, and Ultra variants and can ground results to Google Search to provide citations and up-to-date information. Context Initial Caching helps preload large manuals to reduce latency and cost.

How This Skill Works

Gemini processes multimodal inputs with built-in grounding to connect to Google Search for citations and up-to-date information. It offers model variants that balance speed and reasoning power, with Flash for fast RAG tasks, Pro as an all-rounder, and Ultra for large reasoning. Context Initial Caching preloads large context to cut latency on subsequent queries.

When to Use It

Massive context tasks such as analyzing a 1-hour video and extracting exact timestamps
Multimodal live interactions with real-time voice and video processing
Integrated Google ecosystem workflows via Vertex AI, Grounding, and Workspace
Video-heavy analytics and content summarization for media libraries
RAG pipelines that rely on video data and up-to-date citations from Google Search

Quick Start

Step 1: Set up Gemini in Vertex AI and enable Grounding to Google Search
Step 2: Upload your video and enable Context Initial Caching if you have large manuals or docs
Step 3: Run queries using Flash for RAG or Ultra/Pro for deeper reasoning, and review citations

Best Practices

Use Flash for most RAG tasks to get high throughput at lower cost
Enable Grounding to attach Google Search citations and reduce hallucinations
Upload video directly; Gemini can watch and interpret video without manual transcription
Leverage Context Initial Caching for large manuals or datasets to reduce latency and cost
Choose the model by task: Pro for general purpose, Ultra for large reasoning, Flash for fast RAG

Example Use Cases

Index and answer questions about a 1-hour lecture by locating exact moments and topics
Provide real-time answers during live customer support with video for context
Grounded QA over a large product manual integrated with search citations
Tag, summarize, and catalog video assets in a media library
Research assistant that summarizes multi-source video content with up to date citations

Frequently Asked Questions

Add this skill to your agents