Web Scraping Domain.com.au with Golang

Ibrahim B.Ibrahim B.
5 min read

Web scraping enables developers to extract and process information from websites, supporting a range of uses such as data analysis, market research, and API building. For Australian real estate data, Domain.com.au is a valuable resource. This blog will walk you through scraping Domain.com.au using Golang, so you can gather real estate data with ease.

For non-technical users or those looking for a ready-made solution, I’ve already built an Apify actor for Domain.com.au, which you can use without needing to code. The actor automates the data extraction, making it simple for users to retrieve property details without needing technical expertise. This blog will guide you through:

  1. Setting up a Golang project for scraping.

  2. Choosing libraries for web scraping.

  3. Writing code to extract data.

  4. Handling anti-bot measures.

  5. Storing and utilizing scraped data.

1. Project Setup

First, let's create a new project directory for our Golang code.

mkdir domain-scraper
cd domain-scraper
go mod init domain-scraper

This will initialize a new Golang project with a go.mod file. We’ll use colly, a popular scraping library in Golang, which is both efficient and feature-rich.

2. Choosing the Right Libraries

The Golang colly library is an excellent choice for web scraping due to its ease of use and support for handling cookies, headers, and sessions. We’ll also use goquery, which integrates with colly to simplify HTML parsing.

Install colly and goquery:

go get -u github.com/gocolly/colly
go get -u github.com/PuerkitoBio/goquery

3. Scraping Basic Data

Identify the Structure of Domain.com.au

Before diving into code, inspect the structure of Domain.com.au pages using browser developer tools. Typical data points on a listing might include:

Property Title Price Location Description Agent Details Writing the Scraper Code

Let's write a basic scraper to extract these details.

package main

import (
    "fmt"
    "log"

    "github.com/gocolly/colly"
    "github.com/PuerkitoBio/goquery"
)

func main() {
    // Instantiate default collector
    c := colly.NewCollector(
        colly.AllowedDomains("domain.com.au"),
    )

    // Set up error handling
    c.OnError(func(_ *colly.Response, err error) {
        log.Println("Request failed:", err)
    })

    // Extract property details
    c.OnHTML(".css-1kkm9qk", func(e *colly.HTMLElement) {
        title := e.ChildText(".css-164r41r")
        price := e.ChildText(".css-1rzse3v")
        address := e.ChildText(".css-t54e5i")
        agent := e.ChildText(".css-1gkcyyc")
        description := e.ChildText(".css-1yuhvjn")

        fmt.Printf("Title: %s\nPrice: %s\nAddress: %s\nAgent: %s\nDescription: %s\n", 
            title, price, address, agent, description)
    })

    // URL of the property listing
    err := c.Visit("https://www.domain.com.au/some-property-url")
    if err != nil {
        log.Fatal("Failed to scrape the page:", err)
    }
}

In this code:

  • We initialize a colly.Collector with AllowedDomains set to domain.com.au to restrict the scraper.

  • We define the CSS selectors to target specific elements, such as the title, price, and agent information.

  • The OnHTML function takes a CSS selector to extract content. Here, .css-1kkm9qk is the main container for listing details (you’ll need to adjust selectors based on the current structure of Domain.com.au).

  • The Visit function sends the HTTP request to the provided URL.

4. Handling Anti-Bot Measures

Domain.com.au, like many large websites, might use anti-bot mechanisms. Here are some ways to work around common issues:

#### User-Agent Spoofing

Changing the User-Agent string can prevent the server from blocking requests.

c.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"

Delay and Random Intervals

To avoid rate-limiting, add random delays between requests

c.Limit(&colly.LimitRule{
    DomainGlob:  "*domain.com.au",
    RandomDelay: 2 * time.Second,
})

This code snippet sets a random delay of up to 2 seconds between requests to domain.com.au.

Proxy Rotation

Use a proxy pool to rotate IPs between requests, which is particularly useful if your IP is blocked frequently. Here’s how to set up a proxy:

c.SetProxy("http://yourproxy:port")

5. Saving Data to a Database

To store the scraped data, use a database like PostgreSQL or MongoDB. Here’s an example of saving data in SQLite for simplicity.

Install the SQLite driver:

codego get -u github.com/mattn/go-sqlite3

Then, create a function to save the data.

import (
    "database/sql"
    _ "github.com/mattn/go-sqlite3"
)

func saveToDatabase(title, price, address, agent, description string) {
    database, err := sql.Open("sqlite3", "./domain.db")
    if err != nil {
        log.Fatal(err)
    }
    defer database.Close()

    statement, _ := database.Prepare("CREATE TABLE IF NOT EXISTS properties (title TEXT, price TEXT, address TEXT, agent TEXT, description TEXT)")
    statement.Exec()

    statement, _ = database.Prepare("INSERT INTO properties (title, price, address, agent, description) VALUES (?, ?, ?, ?, ?)")
    _, err = statement.Exec(title, price, address, agent, description)
    if err != nil {
        log.Println("Failed to insert data:", err)
    } else {
        fmt.Println("Data inserted successfully!")
    }
}

Call saveToDatabase() within your OnHTML callback.

goCopy codesaveToDatabase(title, price, address, agent, description)

6. Running the Scraper

Compile and run your scraper:

codego run main.go

If everything is set up correctly, the script will visit the Domain.com.au page, extract the details, and save them to the SQLite database.

For Non-Technical Users: Apify Actor

For those who want to collect Domain.com.au data but aren’t comfortable with programming, I’ve built an Apify actore automates the data extraction for Domain.com.au, allowing you to gather property details without any coding. You simply configure the settings on Apify, and the actor does the rest.

Conclusion

In this post, we built a simple but powerful scraper for Domain.com.au using Golang and colly. We also discussed techniques to handle anti-bot measures, like user-agent spoofing and request delays. Remember to scrape responsibly, and always check a website’s Terms of Service to ensure compliance.

This setup should serve as a foundation to expand your scraper with additional features, such as concurrent requests or broader database storage options. With Golang's efficiency, you’ll find scraping with it to be fast and reliable.

0
Subscribe to my newsletter

Read articles from Ibrahim B. directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ibrahim B.
Ibrahim B.

Web scraping freelancer