Multi-Modal Travel Planning Agent in Minutes
🇻🇪🇨🇱 Dev.to | LinkedIn | GitHub | Twitter
GitHub Repository: Strands Agent Samples
Build a production-ready multi-modal AI travel assistant using Amazon Bedrock and Strands Agents. Process images, PDFs, and videos with persistent memory. Complete Python tutorial with code examples.
Part 4: Building a Real-World Travel Assistant
Travel planning involves processing multiple content types: destination photos, booking documents, PDF guides, and video tours. Traditional text-based chatbots can't handle this variety. With Strands Agents, you can build an AI travel assistant that processes images, documents, and videos simultaneously, creating personalized recommendations based on visual and structured data in minutes.
This post continues our series on multi-modal AI agents with Strands Agents. We've covered basic multi-modal processing, FAISS memory integration, and scalable Amazon S3 Vectors storage. Today, you'll build a real-world travel assistant that analyzes destination photos, processes booking documents, and creates personalized travel recommendations.
What you'll build:
- ✈️ Photo Analysis - Identify destinations, architectural styles, and vacation types from images
- 📄 Document Processing - Extract booking details, itineraries, and travel guides from PDFs
- 🎥 Video Understanding - Analyze destination videos for activities and experiences
- 💾 Persistent Memory - Remember user preferences across conversations using Amazon S3 Vectors
The Three Components: Model, Tools, Agent
The travel assistant uses three core components from the Strands Agents framework:
from strands import Agent
from strands.models import BedrockModel
from strands_tools import image_reader, file_read
from video_reader import video_reader
from s3_memory import s3_vector_memory
# Configure model
model = BedrockModel(
model_id="us.anthropic.claude-3-5-sonnet-20241022-v2:0",
region="us-east-1"
)
# Create travel assistant agent
travel_agent = Agent(
model=model,
tools=[image_reader, file_read, video_reader, s3_vector_memory],
system_prompt=TRAVEL_ASSISTANT_PROMPT
)
This configuration provides complete multi-modal processing capabilities with minimal code.
Built-In Tools: Your Agent's Superpowers
Strands Agents includes 40+ production-ready tools. For travel planning, you need three built-in tools and two custom tools:
🖼️ image_reader - Visual Intelligence
The image_reader tool processes images in PNG, JPEG, GIF, and WebP formats. It understands:
- Destination characteristics (architecture, landscapes, activities)
- Visual themes (beach vacations, mountain retreats, urban exploration)
- Seasonal indicators (weather conditions, crowd levels)
# Analyze destination photos
response = travel_agent(
"Analyze these destination photos and tell me what type of "
"vacation this would be: photos/destination1.jpg, photos/destination2.jpg"
)
📄 file_read - Document Processing
The file_read tool handles PDF, CSV, DOCX, XLS, and XLSX files. Use this tool for:
- Itinerary documents (schedules, bookings, confirmations)
- Travel guides (recommendations, maps, tips)
- Budget spreadsheets (costs, expenses, comparisons)
# Extract booking details
response = travel_agent(
"Extract all booking confirmations and create a summary "
"from documents/travel-itinerary.pdf"
)
🔧 Additional Built-In Tools
Strands Agents includes 40+ built-in tools such as:
-
http_request- Fetch real-time flight prices, weather data, or hotel availability -
retrieve- Implement RAG with Amazon Bedrock Knowledge Bases -
python_repl- Perform budget calculations or itinerary optimizations
Check the complete tools documentation for advanced capabilities.
Custom Tools: Extending Capabilities
For specialized functionality, you can create custom tools. This travel assistant uses two custom tools created in previous posts:
- 🎥
video_reader- Video Analysis
Created in Part 1 of this series, this tool processes travel videos to extract:
- Destination activities and experiences
- Visual themes and aesthetics
Temporal information (seasons, times of day)
💾
s3_vector_memory- Persistent Memory
Created in Part 3 of this series, this tool provides:
- User preference storage with isolation by
USER_ID - Conversation history across sessions
- Semantic search for relevant context
Building the Travel Assistant
Step 1: Define the System Prompt
Define the system prompt to guide agent behavior and decision-making:
TRAVEL_ASSISTANT_PROMPT = """You are an expert travel planning assistant with multi-modal
content processing capabilities. You help travelers by analyzing destination photos, processing
booking documents, and creating personalized travel recommendations.
Your capabilities:
- **Image Analysis**: Use image_reader to analyze destination photos, identify locations,
understand visual themes, and suggest similar destinations
- **Document Processing**: Use file_read to extract information from PDFs, itineraries,
booking confirmations, and travel guides
- **Video Analysis**: Use video_reader to understand destination experiences from video content
- **Persistent Memory**: Use s3_vector_memory to remember user preferences and conversation history
When processing requests:
1. Identify the content type (image, document, or video)
2. Use the appropriate tool to process the content
3. Analyze the information thoroughly
4. Provide actionable, personalized recommendations
5. Always respond in the user's language
For destination photos:
- Identify key features (architecture, nature, activities)
- Determine vacation type (relaxation, adventure, cultural)
- Suggest similar destinations or complementary activities
For travel documents:
- Extract structured information (dates, locations, costs)
- Identify potential conflicts or optimization opportunities
- Highlight important details travelers might miss
Always format responses clearly with sections, bullet points, and highlighted information.
"""
Step 2: Initialize the Agent (Done!)
The agent is already initialized with the code shown earlier. Strands Agents handles all the complexity of:
- Tool orchestration and selection
- Multi-modal content routing
- Error handling and retries
- Response formatting
Testing the Travel Assistant
Creating User Preferences
With persistent memory stored in Amazon S3 by USER_ID, provide the agent with your preferences
USER_ID = "user123" # Unique identifier for memory isolation
response = travel_assistant(
f"""Hi! I'm planning my next trip and wanted to share my travel preferences with you.
Here's what I love:
- **Architecture**: I'm fascinated by modern architecture, especially Art Nouveau and Modernist styles
- **Food**: I prefer gluten-free cuisine and love exploring local food markets
- **Sustainability**: I try to travel sustainably - public transport, eco-friendly hotels, supporting local businesses
- **Activities**: I enjoy walking tours, photography, and cultural experiences over beach/resort vacations
- **Pace**: I prefer a relaxed pace with time to really experience each place
Please remember these preferences for our future conversations.
USER_ID: {USER_ID}"""
)
The agent automatically stores these preferences in Amazon S3 Vectors, making them available for all future conversations.
Analyzing Destination Photos
response = travel_assistant(
f"""I found this amazing photo of a place I'd love to visit!
Please analyze the image at: output/professional_travel_photography_of_alcatraz.png
USER_ID: {USER_ID}"""
)
The agent performs these operations:
- Retrieves your stored preferences from S3 memory
- Analyzes the image using
image_reader - Compares visual features with your preferences (architecture, activities, pace)
- Provides personalized recommendations based on your profile
- Stores this analysis for future context
Processing Travel Documents
response = travel_assistant(
f"""I have my flight booking confirmation. Can you extract the details
and check if it aligns with my preferences?
Document: documents/flight_confirmation.pdf
USER_ID: {USER_ID}"""
)
Testing Memory Persistence
To test the agent's ability to remember across conversations, clear the conversation history and ask a follow-up question:
# Clear conversation history to simulate a new session
travel_assistant.messages.clear()
# Ask a follow-up question
response = travel_assistant(
f"""Based on my preferences, can you suggest some destinations I might enjoy?
USER_ID: {USER_ID}"""
)
The agent retrieves your preferences from S3 memory and provides personalized recommendations, even though the conversation history was cleared.
Generating Sample Travel Content
To test your travel assistant, you can generate sample images, videos, and documents using Strands Agents' built-in generative tools. Check out the travel content generator script:
from strands import Agent
from strands.models import BedrockModel
from strands_tools import generate_image, file_write, nova_reels, use_aws # Ready to use!
agent = Agent(
model=BedrockModel(model_id="us.anthropic.claude-3-5-sonnet-20241022-v2:0"),
tools=[generate_image, generate_video, nova_reels],
system_prompt=system_prompt
)
# Generate content with a prompt
response = agent("Generate a travel image of San Francisco and a 6-second video tour")
This script generates travel content for destinations.
Try It Yourself
Prerequisites
- AWS account with Amazon Bedrock access
- Python 3.9 or later
- AWS credentials configured (AWS CLI, environment variables, or IAM role)
Installation
1. Clone repository
git clone https://github.com/elizabethfuentes12/strands-agent-samples
cd strands-agent-samples/notebook/multimodal_understanding
2. Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
3. Install dependencies
pip install -r requirements.txt
4. Configure AWS credentials
Configure AWS credentials for Amazon Bedrock access. See AWS CLI Configuration for details.
5. Run the notebook
06-travel-assistant-demo.ipynb
Explore the complete notebook with additional examples, code explanations, and output samples.
What You Learned
- Process destination photos with
image_reader - Extract data from travel documents with
file_read - Analyze destination videos with custom
video_reader - Create intelligent, context-aware recommendations
- Build production-ready multi-modal agents with persistent memory
- Deploy agents from prototype to production using Amazon S3 Vectors
Series Recap
Throughout this series, we've built progressively sophisticated agents:
- Part 1: Basic Multi-Modal Processing - Process images, documents, and videos with minimal code
- Part 2: Adding FAISS Memory - Persistent memory for local development and prototyping
- Part 3: Scaling with Amazon S3 Vectors - Production-ready, enterprise-scale memory with cloud infrastructure
- Part 4 (This Post): Real-World Application - Build a practical travel assistant with all components integrated
Each part built on the previous, maintaining Strands Agents' core principle: powerful AI agents should remain accessible to build.
Resources
- Strands Agents Documentation - Complete framework reference
- Community Tools Package - 40+ production-ready tools
- GitHub Repository - All code samples from this series
- Travel Assistant Notebook - Complete implementation
- Strands Agent Builder - Interactive development toolkit
- Amazon Bedrock - Managed foundation models service
- Amazon S3 Vectors - Scalable vector storage
- Getting Started Course - Free comprehensive learning path
¡Gracias! 🇻🇪🇨🇱
Dev.to | LinkedIn | GitHub | Twitter | Instagram | YouTube | Linktr.ee