Cooper

Cooper

Multimodal Chat Assistant with GPT-4o for Text, Images, and PDFs

Chat with thing

This n8n template lets you build a smart AI chat assistant that can handle text, images, and PDFs โ€” using OpenAI's GPT-4o multimodal model. It supports dynamic conversations and file analysis, making it great for AI-driven support bots, personal assistants, or embedded chat widgets.


๐Ÿ” How it Works

  • The chat trigger node kicks off a session using n8n's hosted chat UI.
  • Users can send text or upload images or PDFs โ€” the workflow checks if a file was included.
  • If an image is uploaded, the file is converted to base64 and analyzed using GPT-4o's vision capabilities.
  • GPT-4o generates a natural language description of the image and answers the user's question in context.
  • A memory buffer keeps track of the conversation thread, so follow-up questions are handled intelligently.
  • OpenAI's chat model handles both text-only and mixed media input seamlessly.

๐Ÿงช How to Use

  • You can embed this in a website or use it with your own webhook/chat interface.
  • The logic is modular โ€” just swap out the chatTrigger node for another input (eg form or API).
  • To use with documents, you can modify the logic to pass PDF content to GPT-4 directly.
  • You can extend it with action nodes, eg saving results to Notion, Airtable, or sending replies via email or Slack.

๐Ÿ” Requirements

  • Your OpenAI GPT-4o API key
  • Set File Upload on the chat

๐Ÿš€ Use Cases

  • PDF explainer bot
  • Internal knowledge chat with media support
  • Personal assistant for mixed content
Do you want to automate your business?

Let's talk about your project