Quick Start Guide
This guide will help you create your first browser automation agent with Browser-Use in just a few minutes. By the end of this guide, you'll have a working agent that can perform web tasks based on natural language instructions.
Prerequisites
Before you begin, make sure you have:
- Browser-Use installed (see the Installation Guide)
- A language model API key set up in your
.envfile - Basic understanding of Python and async functions
1. Create Your First Browser Agent
Let's create a simple script that uses Browser-Use to search for information on Wikipedia and extract a summary.
Create a new file named first_browser_agent.py and add the following code:
import asyncio
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from browser_use import Agent
# Load environment variables from .env file
load_dotenv()
async def main():
# Initialize the language model
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Create a browser agent with a specific task
agent = Agent(
task="Search for 'machine learning' on Wikipedia, read the introduction, and provide a summary of what machine learning is.",
llm=llm
)
# Run the agent and get the result
result = await agent.run()
# Print the result
print("Agent's response:")
print(result)
if __name__ == "__main__":
asyncio.run(main())2. Run Your Agent
Save the file and run it using Python:
python first_browser_agent.pyYou should see a browser window open, navigate to a search engine, search for Wikipedia's machine learning page, and then provide a summary of what machine learning is based on the introduction section.
3. Understanding the Code
Let's break down what's happening in this example:
We import the necessary modules:
asynciofor working with asynchronous codedotenvto load our environment variablesChatOpenAIas our language modelAgentfrom browser-use, which is the main class we'll work with
We initialize a language model (in this case, OpenAI's GPT-4o)
We create an
Agentinstance with:task: A natural language description of what we want the agent to dollm: The language model the agent will use for understanding and planning
We call
agent.run()to execute the task, which:- Opens a browser
- Plans the steps needed to complete the task
- Executes those steps by controlling the browser
- Returns the final result
4. Customizing the Agent
You can customize your agent in several ways:
Specifying a Different Task
Simply change the task parameter to describe what you want the agent to do:
agent = Agent(
task="Find the current weather in Tokyo on weather.com and tell me if I need an umbrella today.",
llm=llm
)Using a Different Browser
You can specify which browser to use:
agent = Agent(
task="Search for concert tickets on Ticketmaster",
llm=llm,
browser="firefox" # Options: "chromium", "firefox", or "webkit"
)Adding Memory
For more complex tasks that require remembering previous steps:
from browser_use import Agent, Memory
agent = Agent(
task="Research vacation options in Hawaii, find the top 3 beaches, and then find nearby hotels for each.",
llm=llm,
memory=Memory() # Enables the agent to remember information between steps
)5. Next Steps
Now that you've created your first agent, you can explore more advanced features:
- Learn about Browser Actions that agents can perform
- Understand how to use Memory for complex multi-step tasks
- Explore Vision Support for image-based interactions
- See Complex Examples for more sophisticated use cases
Congratulations! You've successfully created your first browser automation agent with Browser-Use.