Non ci sono articoli nel tuo carrello.
This workflow enables multimodal file analysis using Google Gemini tools connected to a text-only LLM agent. Users can upload images, videos, audio files, or documents via a chat interface. The workflow will:
Unlike end-to-end multimodal LLMs (like Gemini 1.5 or GPT-4o), this template:
Feature | Benefit |
---|---|
π§© Modular | LLM + Tools are decoupled; can update them independently |
πΈ Cost-Efficient | No need to pay for full multimodal models; only use tools when needed |
π§ Tool-based Reasoning | Agent invokes tools on demand, just like OpenAIβs Toolformer setup |
β‘ Fast | Groq LLMs offer ultra-fast responses with low latency |
π Memory | Includes context buffer for multi-turn chats (15 messages) |
chatTrigger
.If no files: prompt is passed directly to the agent.
If files are included:
A new chatInput
is dynamically generated:
User message
Media: [array of file data]
The Langchain Agent
receives:
The enriched prompt
File URLs
Memory context (15 turns)
Access to 4 Gemini tools:
IMG
: analyze imageVIDEO
: analyze videoAUDIO
: analyze audioDOCUMENT
: analyze documentThe agent autonomously decides whether and how to use tools, then responds with concise output.
Category | Node / Tool | Purpose |
---|---|---|
Chat Input | chatTrigger |
User interface with file support |
File Processing |
splitOut , splitInBatches
|
Process each uploaded file |
Upload | googleGemini |
Uploads each file to Gemini, gets URL |
Metadata |
set , aggregate
|
Builds structured file info |
AI Agent | Langchain Agent |
Receives context + file data |
Tools | googleGeminiTool |
Analyze media with Gemini |
LLM |
lmChatGroq (Qwen 32B) |
Text reasoning, high-speed |
Memory | memoryBufferWindow |
Maintains session context |
Replace existing credentials on:
Upload a file
GeminiTool
(IMG, VIDEO, AUDIO, DOCUMENT)lmChatGroq
"Hola, ΒΏquΓ© dice este PDF?"
Uploads a document β Agent routes it to Gemini DOCUMENT tool β Receives extracted content β LLM summarizes it in Spanish.
multimodal, agent, langchain, groq, gemini, image analysis, audio analysis, document parsing, video analysis, file uploader, chat assistant, LLM tools, memory, AI tools