Multi-Modal Travel Planning Agent in Minutes

🇻🇪🇨🇱 Dev.to | LinkedIn | GitHub | Twitter

GitHub Repository: Strands Agent Samples

Build a production-ready multi-modal AI travel assistant using Amazon Bedrock and Strands Agents. Process images, PDFs, and videos with persistent memory. Complete Python tutorial with code examples.

Part 4: Building a Real-World Travel Assistant

Travel planning involves processing multiple content types: destination photos, booking documents, PDF guides, and video tours. Traditional text-based chatbots can't handle this variety. With Strands Agents, you can build an AI travel assistant that processes images, documents, and videos simultaneously, creating personalized recommendations based on visual and structured data in minutes.

This post continues our series on multi-modal AI agents with Strands Agents. We've covered basic multi-modal processing, FAISS memory integration, and scalable Amazon S3 Vectors storage. Today, you'll build a real-world travel assistant that analyzes destination photos, processes booking documents, and creates personalized travel recommendations.

What you'll build:

✈️ Photo Analysis - Identify destinations, architectural styles, and vacation types from images
📄 Document Processing - Extract booking details, itineraries, and travel guides from PDFs
🎥 Video Understanding - Analyze destination videos for activities and experiences
💾 Persistent Memory - Remember user preferences across conversations using Amazon S3 Vectors

The Three Components: Model, Tools, Agent

The travel assistant uses three core components from the Strands Agents framework:

from strands import Agent
from strands.models import BedrockModel
from strands_tools import image_reader, file_read
from video_reader import video_reader
from s3_memory import s3_vector_memory

# Configure model
model = BedrockModel(
    model_id="us.anthropic.claude-3-5-sonnet-20241022-v2:0",
    region="us-east-1"
)

# Create travel assistant agent
travel_agent = Agent(
    model=model,
    tools=[image_reader, file_read, video_reader, s3_vector_memory],
    system_prompt=TRAVEL_ASSISTANT_PROMPT
)

This configuration provides complete multi-modal processing capabilities with minimal code.

Built-In Tools: Your Agent's Superpowers

Strands Agents includes 40+ production-ready tools. For travel planning, you need three built-in tools and two custom tools:

🖼️ `image_reader` - Visual Intelligence

The image_reader tool processes images in PNG, JPEG, GIF, and WebP formats. It understands:

Destination characteristics (architecture, landscapes, activities)
Visual themes (beach vacations, mountain retreats, urban exploration)
Seasonal indicators (weather conditions, crowd levels)

# Analyze destination photos
response = travel_agent(
    "Analyze these destination photos and tell me what type of "
    "vacation this would be: photos/destination1.jpg, photos/destination2.jpg"
)

📄 `file_read` - Document Processing

The file_read tool handles PDF, CSV, DOCX, XLS, and XLSX files. Use this tool for:

Itinerary documents (schedules, bookings, confirmations)
Travel guides (recommendations, maps, tips)
Budget spreadsheets (costs, expenses, comparisons)

# Extract booking details
response = travel_agent(
    "Extract all booking confirmations and create a summary "
    "from documents/travel-itinerary.pdf"
)

🔧 Additional Built-In Tools

Strands Agents includes 40+ built-in tools such as:

http_request - Fetch real-time flight prices, weather data, or hotel availability
retrieve - Implement RAG with Amazon Bedrock Knowledge Bases
python_repl - Perform budget calculations or itinerary optimizations

Check the complete tools documentation for advanced capabilities.

Custom Tools: Extending Capabilities

For specialized functionality, you can create custom tools. This travel assistant uses two custom tools created in previous posts:

🎥 video_reader - Video Analysis

Created in Part 1 of this series, this tool processes travel videos to extract:

Destination activities and experiences
Visual themes and aesthetics
Temporal information (seasons, times of day)
💾 s3_vector_memory - Persistent Memory

Created in Part 3 of this series, this tool provides:

User preference storage with isolation by USER_ID
Conversation history across sessions
Semantic search for relevant context

Building the Travel Assistant

Step 1: Define the System Prompt

Define the system prompt to guide agent behavior and decision-making:

TRAVEL_ASSISTANT_PROMPT = """You are an expert travel planning assistant with multi-modal 
content processing capabilities. You help travelers by analyzing destination photos, processing 
booking documents, and creating personalized travel recommendations.

Your capabilities:
- **Image Analysis**: Use image_reader to analyze destination photos, identify locations, 
  understand visual themes, and suggest similar destinations
- **Document Processing**: Use file_read to extract information from PDFs, itineraries, 
  booking confirmations, and travel guides
- **Video Analysis**: Use video_reader to understand destination experiences from video content
- **Persistent Memory**: Use s3_vector_memory to remember user preferences and conversation history

When processing requests:
1. Identify the content type (image, document, or video)
2. Use the appropriate tool to process the content
3. Analyze the information thoroughly
4. Provide actionable, personalized recommendations
5. Always respond in the user's language

For destination photos:
- Identify key features (architecture, nature, activities)
- Determine vacation type (relaxation, adventure, cultural)
- Suggest similar destinations or complementary activities

For travel documents:
- Extract structured information (dates, locations, costs)
- Identify potential conflicts or optimization opportunities
- Highlight important details travelers might miss

Always format responses clearly with sections, bullet points, and highlighted information.
"""

Step 2: Initialize the Agent (Done!)

The agent is already initialized with the code shown earlier. Strands Agents handles all the complexity of:

Tool orchestration and selection
Multi-modal content routing
Error handling and retries
Response formatting

Testing the Travel Assistant

Creating User Preferences

With persistent memory stored in Amazon S3 by USER_ID, provide the agent with your preferences

USER_ID = "user123"  # Unique identifier for memory isolation

response = travel_assistant(
    f"""Hi! I'm planning my next trip and wanted to share my travel preferences with you.

    Here's what I love:
    - **Architecture**: I'm fascinated by modern architecture, especially Art Nouveau and Modernist styles
    - **Food**: I prefer gluten-free cuisine and love exploring local food markets
    - **Sustainability**: I try to travel sustainably - public transport, eco-friendly hotels, supporting local businesses
    - **Activities**: I enjoy walking tours, photography, and cultural experiences over beach/resort vacations
    - **Pace**: I prefer a relaxed pace with time to really experience each place

    Please remember these preferences for our future conversations.

    USER_ID: {USER_ID}"""
)

The agent automatically stores these preferences in Amazon S3 Vectors, making them available for all future conversations.

Analyzing Destination Photos

response = travel_assistant(
    f"""I found this amazing photo of a place I'd love to visit!

    Please analyze the image at: output/professional_travel_photography_of_alcatraz.png

    USER_ID: {USER_ID}"""
)

The agent performs these operations:

Retrieves your stored preferences from S3 memory
Analyzes the image using image_reader
Compares visual features with your preferences (architecture, activities, pace)
Provides personalized recommendations based on your profile
Stores this analysis for future context

Processing Travel Documents

response = travel_assistant(
    f"""I have my flight booking confirmation. Can you extract the details 
    and check if it aligns with my preferences?

    Document: documents/flight_confirmation.pdf

    USER_ID: {USER_ID}"""
)

Testing Memory Persistence

To test the agent's ability to remember across conversations, clear the conversation history and ask a follow-up question:

# Clear conversation history to simulate a new session
travel_assistant.messages.clear()

# Ask a follow-up question
response = travel_assistant(
    f"""Based on my preferences, can you suggest some destinations I might enjoy?

    USER_ID: {USER_ID}"""
)

The agent retrieves your preferences from S3 memory and provides personalized recommendations, even though the conversation history was cleared.

Generating Sample Travel Content

To test your travel assistant, you can generate sample images, videos, and documents using Strands Agents' built-in generative tools. Check out the travel content generator script:

from strands import Agent
from strands.models import BedrockModel
from strands_tools import generate_image, file_write, nova_reels, use_aws # Ready to use!

agent = Agent(
    model=BedrockModel(model_id="us.anthropic.claude-3-5-sonnet-20241022-v2:0"),
    tools=[generate_image, generate_video, nova_reels],
    system_prompt=system_prompt 
)

# Generate content with a prompt
response = agent("Generate a travel image of San Francisco and a 6-second video tour")

This script generates travel content for destinations.

Try It Yourself

Prerequisites

AWS account with Amazon Bedrock access
Python 3.9 or later
AWS credentials configured (AWS CLI, environment variables, or IAM role)

Installation

1. Clone repository

git clone https://github.com/elizabethfuentes12/strands-agent-samples
cd strands-agent-samples/notebook/multimodal_understanding

2. Create virtual environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Configure AWS credentials

Configure AWS credentials for Amazon Bedrock access. See AWS CLI Configuration for details.

5. Run the notebook

06-travel-assistant-demo.ipynb

Explore the complete notebook with additional examples, code explanations, and output samples.

What You Learned

Process destination photos with image_reader
Extract data from travel documents with file_read
Analyze destination videos with custom video_reader
Create intelligent, context-aware recommendations
Build production-ready multi-modal agents with persistent memory
Deploy agents from prototype to production using Amazon S3 Vectors

Series Recap

Throughout this series, we've built progressively sophisticated agents:

Part 1: Basic Multi-Modal Processing - Process images, documents, and videos with minimal code
Part 2: Adding FAISS Memory - Persistent memory for local development and prototyping
Part 3: Scaling with Amazon S3 Vectors - Production-ready, enterprise-scale memory with cloud infrastructure
Part 4 (This Post): Real-World Application - Build a practical travel assistant with all components integrated

Each part built on the previous, maintaining Strands Agents' core principle: powerful AI agents should remain accessible to build.

Resources

Strands Agents Documentation - Complete framework reference
Community Tools Package - 40+ production-ready tools
GitHub Repository - All code samples from this series
Travel Assistant Notebook - Complete implementation
Strands Agent Builder - Interactive development toolkit
Amazon Bedrock - Managed foundation models service
Amazon S3 Vectors - Scalable vector storage
Getting Started Course - Free comprehensive learning path

¡Gracias! 🇻🇪🇨🇱