Quick Start Guide

This guide will help you create your first browser automation agent with Browser-Use in just a few minutes. By the end of this guide, you'll have a working agent that can perform web tasks based on natural language instructions.

Prerequisites

Before you begin, make sure you have:

Browser-Use installed (see the Installation Guide)
A language model API key set up in your .env file
Basic understanding of Python and async functions

1. Create Your First Browser Agent

Let's create a simple script that uses Browser-Use to search for information on Wikipedia and extract a summary.

Create a new file named first_browser_agent.py and add the following code:

python

import asyncio
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from browser_use import Agent

# Load environment variables from .env file
load_dotenv()

async def main():
    # Initialize the language model
    llm = ChatOpenAI(model="gpt-4o", temperature=0)
    
    # Create a browser agent with a specific task
    agent = Agent(
        task="Search for 'machine learning' on Wikipedia, read the introduction, and provide a summary of what machine learning is.",
        llm=llm
    )
    
    # Run the agent and get the result
    result = await agent.run()
    
    # Print the result
    print("Agent's response:")
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

2. Run Your Agent

Save the file and run it using Python:

bash

python first_browser_agent.py

You should see a browser window open, navigate to a search engine, search for Wikipedia's machine learning page, and then provide a summary of what machine learning is based on the introduction section.

3. Understanding the Code

Let's break down what's happening in this example:

We import the necessary modules:
- asyncio for working with asynchronous code
- dotenv to load our environment variables
- ChatOpenAI as our language model
- Agent from browser-use, which is the main class we'll work with
We initialize a language model (in this case, OpenAI's GPT-4o)
We create an Agent instance with:
- task: A natural language description of what we want the agent to do
- llm: The language model the agent will use for understanding and planning
We call agent.run() to execute the task, which:
- Opens a browser
- Plans the steps needed to complete the task
- Executes those steps by controlling the browser
- Returns the final result

4. Customizing the Agent

You can customize your agent in several ways:

Specifying a Different Task

Simply change the task parameter to describe what you want the agent to do:

python

agent = Agent(
    task="Find the current weather in Tokyo on weather.com and tell me if I need an umbrella today.",
    llm=llm
)

Using a Different Browser

You can specify which browser to use:

python

agent = Agent(
    task="Search for concert tickets on Ticketmaster",
    llm=llm,
    browser="firefox"  # Options: "chromium", "firefox", or "webkit"
)

Adding Memory

For more complex tasks that require remembering previous steps:

python

from browser_use import Agent, Memory

agent = Agent(
    task="Research vacation options in Hawaii, find the top 3 beaches, and then find nearby hotels for each.",
    llm=llm,
    memory=Memory()  # Enables the agent to remember information between steps
)

5. Next Steps

Now that you've created your first agent, you can explore more advanced features:

Learn about Browser Actions that agents can perform
Understand how to use Memory for complex multi-step tasks
Explore Vision Support for image-based interactions
See Complex Examples for more sophisticated use cases

Congratulations! You've successfully created your first browser automation agent with Browser-Use.

Quick Start Guide ​

Prerequisites ​

1. Create Your First Browser Agent ​

2. Run Your Agent ​

3. Understanding the Code ​

4. Customizing the Agent ​

Specifying a Different Task ​

Using a Different Browser ​

Adding Memory ​

5. Next Steps ​

Quick Start Guide

Prerequisites

1. Create Your First Browser Agent

2. Run Your Agent

3. Understanding the Code

4. Customizing the Agent

Specifying a Different Task

Using a Different Browser

Adding Memory

5. Next Steps