Get the FREE Ultimate OpenClaw Setup Guide →

gemini

npx machina-cli add skill G1Joshi/Agent-Skills/gemini --openclaw
Files (1)
SKILL.md
1.3 KB

Gemini

Gemini is Google's native multimodal model. Uniquely, it accepts video and huge context (2M+ tokens) natively. 2025 sees Gemini 2.0/3.0.

When to Use

  • Massive Context: "Here is a 1-hour video. Find the timestamp where..."
  • Multimodal Live: Real-time voice/video interaction.
  • Google Ecosystem: Integrated with Vertex AI, Search (Grounding), and Workspace.

Core Concepts

Models

  • Pro: The best all-rounder.
  • Flash: Extremely fast and cheap. High throughput.
  • Ultra: The largest reasoning model.

Grounding

Connects the model to Google Search to provide citations and up-to-date info.

Context Initial Caching

Cache the context (e.g., a massive manual) to reduce cost/latency on subsequent queries.

Best Practices (2025)

Do:

  • Use Flash for RAG: 2.0 Flash is smart enough for most RAG & cheaper/faster.
  • Use Grounding: Eliminate hallucinations by enforcing "Google Search" grounding.
  • Upload Video: Don't transcribe video manually; Gemini watches it.

Don't:

  • Don't confuse with PaLM: Gemini replaced PaLM 2 completely.

References

Source

git clone https://github.com/G1Joshi/Agent-Skills/blob/main/skills/ai-ml/gemini/SKILL.mdView on GitHub

Overview

Gemini is Google's native multimodal model that natively accepts video input and enormous context (2M+ tokens). It offers Pro, Flash, and Ultra variants and can ground results to Google Search to provide citations and up-to-date information. Context Initial Caching helps preload large manuals to reduce latency and cost.

How This Skill Works

Gemini processes multimodal inputs with built-in grounding to connect to Google Search for citations and up-to-date information. It offers model variants that balance speed and reasoning power, with Flash for fast RAG tasks, Pro as an all-rounder, and Ultra for large reasoning. Context Initial Caching preloads large context to cut latency on subsequent queries.

When to Use It

  • Massive context tasks such as analyzing a 1-hour video and extracting exact timestamps
  • Multimodal live interactions with real-time voice and video processing
  • Integrated Google ecosystem workflows via Vertex AI, Grounding, and Workspace
  • Video-heavy analytics and content summarization for media libraries
  • RAG pipelines that rely on video data and up-to-date citations from Google Search

Quick Start

  1. Step 1: Set up Gemini in Vertex AI and enable Grounding to Google Search
  2. Step 2: Upload your video and enable Context Initial Caching if you have large manuals or docs
  3. Step 3: Run queries using Flash for RAG or Ultra/Pro for deeper reasoning, and review citations

Best Practices

  • Use Flash for most RAG tasks to get high throughput at lower cost
  • Enable Grounding to attach Google Search citations and reduce hallucinations
  • Upload video directly; Gemini can watch and interpret video without manual transcription
  • Leverage Context Initial Caching for large manuals or datasets to reduce latency and cost
  • Choose the model by task: Pro for general purpose, Ultra for large reasoning, Flash for fast RAG

Example Use Cases

  • Index and answer questions about a 1-hour lecture by locating exact moments and topics
  • Provide real-time answers during live customer support with video for context
  • Grounded QA over a large product manual integrated with search citations
  • Tag, summarize, and catalog video assets in a media library
  • Research assistant that summarizes multi-source video content with up to date citations

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers