Skip to content

Quick Start Guide

This guide will help you create your first browser automation agent with Browser-Use in just a few minutes. By the end of this guide, you'll have a working agent that can perform web tasks based on natural language instructions.

Prerequisites

Before you begin, make sure you have:

  • Browser-Use installed (see the Installation Guide)
  • A language model API key set up in your .env file
  • Basic understanding of Python and async functions

1. Create Your First Browser Agent

Let's create a simple script that uses Browser-Use to search for information on Wikipedia and extract a summary.

Create a new file named first_browser_agent.py and add the following code:

python
import asyncio
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from browser_use import Agent

# Load environment variables from .env file
load_dotenv()

async def main():
    # Initialize the language model
    llm = ChatOpenAI(model="gpt-4o", temperature=0)
    
    # Create a browser agent with a specific task
    agent = Agent(
        task="Search for 'machine learning' on Wikipedia, read the introduction, and provide a summary of what machine learning is.",
        llm=llm
    )
    
    # Run the agent and get the result
    result = await agent.run()
    
    # Print the result
    print("Agent's response:")
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

2. Run Your Agent

Save the file and run it using Python:

bash
python first_browser_agent.py

You should see a browser window open, navigate to a search engine, search for Wikipedia's machine learning page, and then provide a summary of what machine learning is based on the introduction section.

3. Understanding the Code

Let's break down what's happening in this example:

  1. We import the necessary modules:

    • asyncio for working with asynchronous code
    • dotenv to load our environment variables
    • ChatOpenAI as our language model
    • Agent from browser-use, which is the main class we'll work with
  2. We initialize a language model (in this case, OpenAI's GPT-4o)

  3. We create an Agent instance with:

    • task: A natural language description of what we want the agent to do
    • llm: The language model the agent will use for understanding and planning
  4. We call agent.run() to execute the task, which:

    • Opens a browser
    • Plans the steps needed to complete the task
    • Executes those steps by controlling the browser
    • Returns the final result

4. Customizing the Agent

You can customize your agent in several ways:

Specifying a Different Task

Simply change the task parameter to describe what you want the agent to do:

python
agent = Agent(
    task="Find the current weather in Tokyo on weather.com and tell me if I need an umbrella today.",
    llm=llm
)

Using a Different Browser

You can specify which browser to use:

python
agent = Agent(
    task="Search for concert tickets on Ticketmaster",
    llm=llm,
    browser="firefox"  # Options: "chromium", "firefox", or "webkit"
)

Adding Memory

For more complex tasks that require remembering previous steps:

python
from browser_use import Agent, Memory

agent = Agent(
    task="Research vacation options in Hawaii, find the top 3 beaches, and then find nearby hotels for each.",
    llm=llm,
    memory=Memory()  # Enables the agent to remember information between steps
)

5. Next Steps

Now that you've created your first agent, you can explore more advanced features:

  • Learn about Browser Actions that agents can perform
  • Understand how to use Memory for complex multi-step tasks
  • Explore Vision Support for image-based interactions
  • See Complex Examples for more sophisticated use cases

Congratulations! You've successfully created your first browser automation agent with Browser-Use.