Executive Overview: The Gemini 3 Launch
On November 18, 2025, Google DeepMind released Gemini 3 Pro, marking what may be the most significant milestone in the current AI arms race. Arriving just six days after OpenAI's GPT-5.1 and seven months after Gemini 2.5, the release underscores the blistering pace of frontier AI development—and Google's determination to reclaim technical leadership.
We're hosting a Gemini 3 Build Day on December 13th, co-sponsored by Google, to help builders unlock the potential of these capabilities. This memo provides a technical deep dive into Gemini 3's innovations and concludes with 25 curated project ideas designed to showcase what makes this model unique.
Headline Benchmarks
Gemini 3 Pro became the first large language model to cross 1500 Elo on LMArena, the industry's most-watched human-preference benchmark. It outperformed competitors on 19 of 20 major benchmarks evaluated at launch.
Note: Bold indicates leader. Claude leads on SWE-Bench for real-world bug fixes.
Key Technical Differentiators
- Native multimodality: Trained from scratch on text, images, audio, and video simultaneously—not bolted together post-hoc
- Sparse Mixture-of-Experts (MoE): Trillions of total parameters with only ~15-20B activated per query, enabling massive capacity at manageable inference cost
- 1M token context window: Equivalent to ~700,000 words or 11 hours of audio—10x larger than GPT-5.1's ~128K
- Generative UI: Novel capability to dynamically generate complete interactive interfaces, tools, and applications on the fly
- Agentic architecture: Deep Think reasoning mode, tool calling, and autonomous task execution via new Antigravity IDE
Strategic Positioning
Gemini 3 represents Google's answer to whether vertical integration can outcompete specialized players. The model benefits from a decade of TPU development (now seventh-generation Ironwood chips), proprietary training infrastructure (JAX + ML Pathways), and unmatched distribution: 650M monthly Gemini app users and 2B monthly AI Overviews users in Search. The launch of the Antigravity IDE positions Google directly against Cursor, Windsurf, Claude Code, and OpenAI Codex in the emerging "agentic coding" platform war.
Research & Development Context
Google's LLM Evolution: A Brief History
Google's path to Gemini 3 traces through several pivotal moments:
- 2017 — "Attention Is All You Need": Foundational Transformer paper (Vaswani et al.), cited over 173,000 times
- 2021 — LaMDA: Conversational AI optimized for open-ended dialogue
- 2022 — PaLM: Large language model demonstrating scaling laws
- Feb 2023 — Bard: Rushed ChatGPT response; widely seen as a misstep that galvanized internal focus
- Dec 2023 — Gemini 1.0: First native multimodal model; outperformed on 30/32 benchmarks
- Feb 2024 — Gemini 1.5: Introduced 1M token context window (10x GPT-4)
- Dec 2024 — Gemini 2.0: "Agentic era" model with multi-step planning and tool use
- Mar 2025 — Gemini 2.5: "Thinking model" with chain-of-thought; scored 4.9% on ARC-AGI-2
- Nov 2025 — Gemini 3.0: Frontier reasoning + generative UI; 31.1% on ARC-AGI-2 (6x improvement)
Core Technical Innovations
1. Native Multimodality
Research Foundation: The core architectural question in multimodal AI is when to fuse information from different modalities. The seminal April 2025 paper "Scaling Laws for Native Multimodal Models" (Apple + Sorbonne) demonstrated that early-fusion models trained from scratch achieve comparable or better performance than late-fusion approaches—with fewer parameters required. Crucially, they found that Mixture-of-Experts architectures "significantly benefit early-fusion NMMs, showing substantial improvements over dense models at equivalent inference costs."
Gemini 3's Implementation: Uses a unified early-fusion architecture where text, images, audio, and video are converted into a shared token space and processed jointly through the same transformer backbone. The Gemini 1.5 Technical Report states: "Since the model is natively multimodal and supports interleaving of data from different modalities, it can support a mix of audio, visual, text, and code inputs in the same input sequence."
This enables unique capabilities like cross-modal needle-in-a-haystack retrieval—finding a specific frame in hours of video based on a text query—with >99% accuracy up to 10M tokens.
Competitive Comparison:
- GPT-5.1: Uses late-fusion approach; separate vision encoder bolted onto language model. Strong but lacks native cross-modal reasoning depth.
- Claude 4.5: Primarily text-focused with image understanding added; limited video/audio support.
- Gemini 3: Native multimodal from ground up; true cross-modal attention; strongest performance on video understanding (87.6% Video-MMMU).
2. Mixture-of-Experts Architecture
Research Foundation: The MoE concept dates to 1991 ("Adaptive mixtures of local experts," Jacobs et al.). The modern deep learning application was revolutionized by the 2017 paper "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer" (Shazeer et al), which enabled models with billions of parameters while keeping per-query compute tractable.
Key subsequent advances:
- GShard (2020): Lepikhin et al. scaled MoE to 600B parameters for multilingual translation; introduced token-level expert routing
- Switch Transformer (2021): Fedus et al. simplified routing to top-1 expert selection, achieving 7x pre-training speedups. First trillion-parameter model.
- Expert Choice Routing (2022): Google Research inverted the paradigm—experts select tokens rather than tokens selecting experts—achieving 2x+ training efficiency
Gemini 3's Implementation: Sparse mixture-of-expert Transformer-based model trained on TPUv4/Ironwood across multiple datacenters. The architecture enables trillions of total parameters while activating only ~15-20B per query, providing massive model capacity with manageable inference cost.
What distinguishes Gemini's implementation is the combination of MoE with native multimodality. The Apple/Sorbonne scaling laws paper noted that MoE had been "extensively studied for language models" but "application to multimodal systems remains limited." Gemini demonstrates that MoE + early-fusion multimodality can work at scale.
Competitive Comparison:
- GPT-5.1: Believed to be dense transformer (all parameters active per query). More predictable but less parameter-efficient.
- Claude 4.5: Architecture undisclosed; likely dense or hybrid.
- Mixtral/DeepSeek: Open-weight MoE models demonstrating cost-efficiency; DeepSeek V3 reportedly trained for ~$5.5M vs. hundreds of millions for US rivals.
- Gemini 3: MoE + native multimodality + Google's TPU infrastructure = unique combination of scale, efficiency, and capability.
3. Agentic Coding & Antigravity
Research Context: "Agentic AI" refers to systems that move beyond single-turn responses to autonomous planning, tool use, and multi-step task execution. This builds on research in planning (classical AI), tool-augmented LLMs (Toolformer, 2023), and code generation (Codex, AlphaCode). The agentic coding space has seen rapid commercialization in 2024-2025, with tools like Cursor, Windsurf, Claude Code, and OpenAI Codex.
Antigravity: Google's Entry: Launched November 18, 2025 alongside Gemini 3, Antigravity is an "agentic development platform" that Google describes as evolving "the IDE toward an agent-first future."
Key features:
- Two Interfaces: Editor View (familiar AI-powered IDE) and Manager View (mission control for orchestrating multiple agents)
- Artifact System: Agents produce human-readable artifacts (task lists, implementation plans, screenshots, browser recordings) rather than raw tool-call logs
- Browser Control: Agents can autonomously drive Chrome to test applications, with screen recordings as verification
- Model Flexibility: Supports Gemini 3, Claude Sonnet 4.5, and OpenAI GPT-OSS—no vendor lock-in
- Knowledge Base Learning: Agents save useful context and code snippets to improve future tasks
Google acquired the Windsurf team (including CEO Varun Mohan) in July 2025 for $2.4B.
Competitive Comparison:
4. Generative UI
Research Background: Generative UI represents a shift from AI generating content to AI generating complete user experiences. Prior work includes Claude Artifacts (July 2024), ChatGPT Canvas (October 2024), and academic research on dynamic interface generation.
Google's research paper "Generative UI: LLMs are Effective UI Generators" (November 2025) introduces a framework for dynamically creating "immersive visual experiences and interactive interfaces—such as web pages, games, tools, and applications—that are automatically designed and fully customized in response to any question, instruction, or prompt."
Gemini 3's Implementation: Uses three components: (1) Tool access to image generation, web search, and other services; (2) Carefully crafted system instructions with formatting specifications; (3) Post-processing to address common issues.
Deployed as "Dynamic View" in the Gemini app and AI Mode in Google Search, generative UI enables use cases like interactive probability learning tools, event planners, and custom galleries—all generated on-the-fly.
Competitive Comparison:
- Claude Artifacts: Generates standalone code snippets, React components, SVGs—user must render separately. Strength: code quality and iteration.
- ChatGPT Canvas: Document/code editing workspace with inline suggestions. Strength: collaborative refinement. Weakness: no live preview.
- Gemini Generative UI: Renders complete, styled, interactive experiences automatically within the product. Strength: zero-friction user experience. Weakness: longer generation times (can take 1+ minute).
5. Extended Context Window
Capability: Gemini 3 Pro supports 1M input tokens (~700,000 words) and 64K output tokens. For context: this is 10x GPT-5.1's ~128K and 5x Claude's ~200K.
Research Foundation: The Gemini 1.5 Technical Report demonstrated near-perfect (>99%) retrieval accuracy up to 10M tokens—"a generational leap over existing models." The architecture combines efficient attention mechanisms with learned positional encodings that generalize to longer sequences.
Practical Impact: Enables processing entire codebases, multi-hour videos, book-length documents, or days of audio in a single context. This is particularly powerful for code review, legal document analysis, and video understanding tasks.
6. Reasoning Advances
The most striking Gemini 3 result is on ARC-AGI-2, a benchmark testing abstract pattern recognition: 31.1% vs. 4.9% for Gemini 2.5 Pro (6x improvement in 7 months).
Deep Think mode: Extended reasoning capability that achieves 41% on Humanity's Last Exam (vs. 37.5% base). Indicates chain-of-thought prompting with internal "thinking" tokens.
Investment Implication: Rapid progress on reasoning benchmarks suggests we may be approaching systems that can tackle novel problems rather than just pattern-matching against training data. ARC-AGI was specifically designed to resist memorization.
Transitioning from Research to Building
The technical innovations above—native multimodality, 1M token context, MoE efficiency, agentic coding, generative UI, and advanced reasoning—create unique opportunities for builders. The following project ideas are designed to showcase these capabilities in action.
Projects are organized by domain and difficulty level:
- Beginner (8-16 hours): Focus on generative UI and basic multimodality
- Intermediate (1-2 days): Add agentic coding or extended workflows
- Advanced (2-3 days): Combine multiple capabilities or integrate specialized tools
Hackathon Project Ideas
1. Gaming & Interactive Experiences
Project 1: Gesture-Controlled Particle Physics Playground
Difficulty: Intermediate
Build an interactive web app where users manipulate millions of 3D particles via webcam gestures, simulating nebulae, fluid dynamics, or electromagnetic fields for educational demos.
Tech Stack: MediaPipe (hand tracking), React Three Fiber or Three.js, Gemini 3 API
Gemini 3 Advantage: Multimodality (real-time video input analysis), coding agents (generate complex Three.js particle systems with physics from natural language), generative UI (dynamic controls adapting to simulation type). Native video understanding + ability to generate complete working code from descriptions like "create gravity-based particle system with 100k particles responding to hand position."
Project 2: Browser-Based Retro Game Generator
Difficulty: Beginner-Intermediate
Build a tool that generates playable browser games (Pong variants, Snake, simple platformers) from text descriptions. Users describe mechanics and visual style; Gemini generates the complete HTML5 Canvas or Phaser.js game.
Tech Stack: Phaser.js or HTML5 Canvas, Vercel/Netlify, Gemini 3 API
Gemini 3 Advantage: Coding agents (generate complete game logic, collision detection, scoring in one shot), generative UI (styled game interfaces with menus/HUDs), advanced reasoning (plan multi-step mechanics like enemy AI). Antigravity's agentic coding + generative UI means going from "make me a breakout game with power-ups" to playable demo in minutes.
Project 3: Interactive Story World with Dynamic Maps
Difficulty: Intermediate
Create a text-based adventure game where Gemini generates branching narratives, character dialogue, and dynamically updates an SVG map as players explore. Each location has AI-generated descriptions and encounters.
Tech Stack: React or vanilla JS, SVG rendering, Gemini 3 API
Gemini 3 Advantage: Advanced reasoning (maintain narrative coherence across branches, track world state), generative UI (dynamically create/update SVG maps, character portraits, inventory), 1M token context (remember entire story history for consistent callbacks). Context window allows sprawling narratives without losing plot threads or character relationships.
2. Developer Tools & Agentic Coding
Project 4: Codebase Documentation Generator
Difficulty: Intermediate
Point the tool at a GitHub repository; it analyzes the entire codebase and generates architectural diagrams, API documentation, component relationship maps, and onboarding guides for new developers.
Tech Stack: Gemini 3 API, Mermaid.js, GitHub API, Antigravity IDE (optional)
Gemini 3 Advantage: 1M token context (process entire codebases at 50K-500K tokens), coding agents (understand structure, dependencies, patterns), generative UI (interactive documentation with collapsible sections), advanced reasoning (infer architectural decisions). Only model that can fit entire medium-large codebases in context at once for holistic analysis.
Project 5: One-Shot SaaS Prototype Generator
Difficulty: Beginner-Intermediate
CLI tool that generates full-stack app prototypes from single prompts. Input: "Build a lead generation CRM with email automation." Output: Complete Next.js app with frontend, backend routes, database schema, and deployment config.
Tech Stack: Gemini 3 API via Antigravity or CLI, Next.js, Vercel, Supabase (optional)
Gemini 3 Advantage: Agentic coding (Antigravity scaffolds entire projects autonomously), generative UI (styled, responsive interfaces matching app purpose), function calling (integrate external APIs like Stripe/SendGrid), advanced reasoning (plan database schema, auth flow, API structure). Antigravity's agent-first design uniquely suited for autonomous full-stack development.
Project 6: PR Review Assistant with Codebase Context
Difficulty: Advanced
GitHub app that reviews pull requests with full awareness of the entire codebase. Identifies potential bugs, suggests optimizations, checks style consistency, and validates that changes align with architectural patterns used elsewhere in the repo.
Tech Stack: Gemini 3 API, GitHub API and webhooks, Redis (caching), Probot framework
Gemini 3 Advantage: 1M token context (load entire codebase for contextual review), coding agents (identify bugs, suggest refactors, check test coverage), advanced reasoning (understand how PR changes affect other system parts). Context window large enough to see both PR diff AND full codebase for informed reviews.
Project 7: Automated Workflow Builder with Browser Control
Difficulty: Intermediate
Natural language interface to automate web workflows: scrape data from sites, fill forms, clean Excel files, sync to databases. Uses Antigravity's browser control to execute tasks autonomously.
Tech Stack: Gemini 3 API via Antigravity, Puppeteer/Playwright, Supabase or Airtable
Gemini 3 Advantage: Browser agents (navigate websites, interact with elements, extract data), multimodality (process Excel files, PDFs, images during workflow), advanced reasoning (chain multi-step tasks: scrape → clean → transform → sync), function calling (structured outputs for database writes). Antigravity's browser control + multimodal processing ideal for complex web automation.
3. Productivity & Automation
Project 8: Multi-Agent Research Dashboard
Difficulty: Intermediate-Advanced
Input a research query like "Analyze the EV battery supply chain in Southeast Asia." Multiple Gemini agents collaborate: one searches the web, another analyzes financial data, another generates visualizations, and a coordinator synthesizes findings into an interactive report.
Tech Stack: Gemini 3 API with function calling, Recharts or D3.js, Google Search API or Serper, React
Gemini 3 Advantage: Function calling (structured web search, data extraction), advanced reasoning (multi-step research planning, source evaluation), generative UI (dynamic dashboards reorganizing based on findings), Deep Think mode (extended reasoning for complex analysis). Combination of function calling for gathering + Deep Think for analysis + generative UI for presentation.
Project 9: Smart Meeting Transcription & Action Tracker
Difficulty: Intermediate
Upload meeting recordings (audio or video up to 11 hours); Gemini transcribes, identifies speakers, extracts action items with timestamps, generates summaries by topic, and creates follow-up task lists with assignees.
Tech Stack: Gemini 3 API, Linear or Asana API (task creation), simple web interface
Gemini 3 Advantage: 1M token context (process multi-hour meetings without chunking), multimodality (native audio/video understanding), advanced reasoning (identify implicit action items, distinguish decisions from discussions), function calling (structured output for task management tools). Can process 11-hour meetings in single context window (GPT-5.1 would require splitting).
Project 10: Legal Contract Cross-Reference Analyzer
Difficulty: Advanced
Upload 10-50 contracts; tool identifies contradictions, extracts key terms, builds comparison tables, flags unusual clauses, and generates risk summaries. Perfect for M&A due diligence or contract management.
Tech Stack: Gemini 3 API, PDF parsing libraries, React, Airtable or Notion
Gemini 3 Advantage: 1M token context (load dozens of contracts simultaneously for cross-referencing), advanced reasoning (identify logical contradictions, evaluate clause implications), multimodality (process scanned PDFs with mixed text/tables), function calling (structured extraction of dates, parties, terms). Only model with context window large enough for batch contract analysis.
4. Creative & Design Tools
Project 11: SVG Animation Website Builder
Difficulty: Beginner-Intermediate
Turn sketches or text descriptions into animated SVG-based websites. Gemini generates the SVGs, animation timings, and full website code. Users can iterate with natural language: "make the logo bounce more" or "add a parallax background."
Tech Stack: Gemini 3 API, Framer Motion or GSAP, Tailwind CSS, Vercel
Gemini 3 Advantage: Coding agents (generate SVG code, animation logic, responsive layouts), generative UI (complete styled websites from descriptions), multimodality (optional: analyze uploaded sketch images for layout inspiration). Generative UI excels at creating complete, styled interactive experiences from scratch.
Project 12: Gesture-Based UI Prototyping Tool
Difficulty: Advanced
Prototype hand-controlled interfaces without coding. Record hand gestures via webcam; Gemini generates the JavaScript code for gesture recognition and UI responses. Export to Figma or working web prototype.
Tech Stack: MediaPipe, Three.js or vanilla JS, Figma API (optional), Gemini 3 API
Gemini 3 Advantage: Multimodality (real-time video analysis for gesture mapping), coding agents (generate gesture recognition logic and event handlers), generative UI (create interactive UI components responding to gestures). Native video understanding + code generation enables rapid gesture-to-interaction prototyping.
Project 13: AI Podcast Production Studio
Difficulty: Intermediate-Advanced
Generate podcast episodes from outlines: Gemini writes scripts, creates voice synthesis prompts for ElevenLabs, generates synchronized visual slides/animations, and produces timestamped show notes.
Tech Stack: Gemini 3 API, ElevenLabs API, Remotion or similar, YouTube API
Gemini 3 Advantage: Advanced reasoning (narrative planning, pacing, natural dialogue generation), multimodality (coordinate text/script + audio/voice + video/visuals), function calling (orchestrate external services for TTS, video rendering). Advanced reasoning for engaging narratives + function calling to coordinate production pipeline.
Project 14: Dynamic Brand Asset Generator
Difficulty: Intermediate
Input brand guidelines (colors, fonts, values); Gemini generates complete brand asset sets: social media templates, presentation decks, email signatures, website mockups—all maintaining consistent style.
Tech Stack: Gemini 3 API, Canva API or direct HTML/CSS generation, Figma API
Gemini 3 Advantage: Generative UI (create styled templates across multiple formats), advanced reasoning (maintain brand consistency across assets), coding agents (generate CSS/HTML matching brand guidelines precisely). Generative UI's ability to create complete, styled interfaces makes it ideal for design system generation.
5. Education & Learning
Project 15: Video Lecture Analyzer & Quiz Generator
Difficulty: Beginner-Intermediate
Upload 3-hour lecture videos; get timestamped summaries, auto-generated quiz questions with difficulty ratings, key concept extraction, and visual learning aids. Students can ask questions about specific parts.
Tech Stack: Gemini 3 API, simple web interface, YouTube player API (timestamp navigation)
Gemini 3 Advantage: 1M token context (process 11-hour videos without chunking), multimodality (native video understanding, extract slides/diagrams), advanced reasoning (generate pedagogically sound quiz questions, identify learning objectives). Can process entire lecture courses in single context for coherent quiz generation.
Project 16: Interactive Science Simulation Builder
Difficulty: Intermediate
Upload textbook diagrams or descriptions (e.g., photosynthesis, Newtonian mechanics); Gemini generates step-by-step 3D interactive simulations with quizzes and real-time explanations.
Tech Stack: Three.js or Babylon.js, Gemini 3 API, WebGL
Gemini 3 Advantage: Multimodality (analyze textbook diagrams and images), coding agents (generate Three.js/WebGL code for physics simulations), generative UI (interactive controls, step-by-step tutorials), advanced reasoning (break complex processes into teachable steps). Diagram analysis + code generation for interactive educational experiences.
Project 17: Adaptive Language Learning Companion
Difficulty: Intermediate
Webcam-based app for sign language or spoken language learning. Analyzes user gestures/pronunciation, provides real-time feedback, generates custom practice exercises based on mistakes, and adapts difficulty.
Tech Stack: MediaPipe (gesture tracking), Web Speech API, TensorFlow.js (optional enhancement), Gemini 3 API
Gemini 3 Advantage: Multimodality (real-time video/audio analysis), advanced reasoning (diagnose specific learning gaps, create targeted exercises), generative UI (adaptive lesson interfaces). Multimodal understanding enables feedback on both visual (gestures) and audio (pronunciation) simultaneously.
Project 18: Historical Document Explorer
Difficulty: Advanced
Upload collections of historical documents (letters, newspapers, images); Gemini creates an interactive timeline with searchable entities, relationship graphs between people/places/events, and generated narrative summaries.
Tech Stack: Gemini 3 API, D3.js (timeline/graph visualizations), React, OCR preprocessing
Gemini 3 Advantage: 1M token context (process large document collections simultaneously), multimodality (analyze text documents, photographs, maps together), advanced reasoning (build entity relationship graphs, identify historical patterns), generative UI (interactive timelines and exploration interfaces). Large context window + multimodal processing ideal for cross-referencing historical sources.
6. Data Analysis & Research
Project 19: Multi-Modal Research Assistant
Difficulty: Intermediate-Advanced
Upload research papers (PDFs) + supplementary videos + GitHub repos + datasets; get unified synthesis with cross-references, code examples that implement paper algorithms, and visual explanations of concepts.
Tech Stack: Gemini 3 API, Jupyter notebooks (optional), Mermaid.js
Gemini 3 Advantage: Native multimodality (process PDFs, videos, code repositories together), 1M token context (hold multiple papers + codebases simultaneously), advanced reasoning (synthesize findings across modalities, identify connections), coding agents (implement algorithms from papers). Only model that can natively process academic PDFs, implementation code, and lecture videos together for unified understanding.
Project 20: Real-Time Conference Video QA System
Difficulty: Advanced
Stream a multi-hour conference or webinar; users ask questions like "What did the panelist say about climate policy?" and get exact timestamps, verbatim quotes, and related segments from other speakers.
Tech Stack: Gemini 3 API, video streaming setup (YouTube Live API or custom), vector database (optional)
Gemini 3 Advantage: 1M token context (hold hours of video in memory), multimodality (video understanding + temporal reasoning), advanced reasoning (identify thematically related segments across presentations). 11-hour video context window enables entire conference in single query context.
Project 21: Financial Document Analysis Pipeline
Difficulty: Advanced
Upload earnings reports, SEC filings, analyst presentations; Gemini extracts financial metrics, builds comparison tables across quarters/competitors, flags unusual disclosures, and generates investment thesis summaries.
Tech Stack: Gemini 3 API, PDF parsing, financial data APIs (Alpha Vantage, Polygon.io), Recharts
Gemini 3 Advantage: 1M token context (compare dozens of financial documents simultaneously), multimodality (process tables, charts, text from PDFs), advanced reasoning (identify trends, contradictions, risks), function calling (structured data extraction for quantitative analysis). Context window enables cross-document financial analysis that would require separate queries with other models.
7. Multimodal Content Processing
Project 22: Video Content Repurposing Engine
Difficulty: Intermediate
Upload long-form videos (podcasts, tutorials, presentations); Gemini generates: social media clips with captions, blog post summaries, Twitter threads, LinkedIn carousels, and timestamped highlight reels.
Tech Stack: Gemini 3 API, FFmpeg (video clipping), Canva API (optional)
Gemini 3 Advantage: 1M token context (process entire long-form videos), multimodality (video understanding for identifying key moments), generative UI (create social media graphics and layouts), advanced reasoning (identify compelling hooks and viral moments). Can watch entire 3-hour podcast to identify best clips vs. analyzing pre-selected segments.
Project 23: Accessibility Enhancement Tool
Difficulty: Intermediate
Upload images, videos, PDFs; Gemini generates: detailed alt text, audio descriptions for visually impaired users, simplified text explanations, and keyboard navigation code for interactive elements.
Tech Stack: Gemini 3 API, Web Speech API, ARIA attribute generators
Gemini 3 Advantage: Multimodality (analyze visual content for detailed descriptions), advanced reasoning (generate contextually appropriate alt text, not just object detection), coding agents (add ARIA labels and keyboard navigation to existing code). Deep visual understanding enables truly descriptive alt text beyond simple object labeling.
Project 24: Medical Image Analysis Teaching Tool
Difficulty: Advanced
Upload medical diagrams, X-rays, or educational videos; Gemini generates interactive quizzes, branching diagnostic simulations, and explanatory annotations. For medical student training or patient education.
Tech Stack: Gemini 3 API, React, medical image libraries (optional: OpenSeadragon for DICOM)
Gemini 3 Advantage: Multimodality (analyze medical images and videos), advanced reasoning (create branching diagnostic scenarios), generative UI (interactive anatomy diagrams with labels), Deep Think mode (extended reasoning for complex diagnostic trees). Multimodal medical understanding + reasoning depth for educational scenario generation.
Note: Include disclaimers that this is for educational purposes only
Project 25: Virtual Museum Tour Creator
Difficulty: Intermediate-Advanced
Upload historical photos, architectural drawings, and artifact images; Gemini generates: explorable 3D environments (via Three.js code generation), guided audio tours, interactive fact panels, and AR overlay specifications.
Tech Stack: Three.js or Babylon.js, Gemini 3 API, AR.js (optional), Web Speech API
Gemini 3 Advantage: Multimodality (analyze historical images and architectural plans), coding agents (generate Three.js code for 3D virtual spaces), generative UI (interactive information panels and navigation), advanced reasoning (historical context and narrative generation). Can analyze collection of historical images + generate the 3D exploration code in one workflow.
See you at the Build Day on December 13th! We're excited to see what you create.


