Introduction

As part of my Algorithms and Analysis (COSC2123) course at RMIT, I was presented with a problem that was part pathfinding, part resource management, and part statistical guesswork. Task D of our assignment, the "Knapsack-Maze," was a classic algorithmic challenge wrapped in a layer of uncertainty. Big shout-out to Elham Naghizade, Edward Small, and the rest of the teaching team for creating this problem!

The Core Problem

Setup

You’re given a maze layout (walls, corridors, entrance, and exit). This is fully known and navigable.
Things we know:
- The total number of treasures in the maze (|T|)
- Their total combined value (v)
- Their total combined weight (w)
Things we don’t know:
- The locations of the treasures
- The individual values or weights of any treasure
The objective was to maximize a "Reward" , defined as:

🪙

Reward = (Total Value of Collected Treasure) - (Number of Unique Cells Explored)

Rules to Play By

For any implementation to be valid:

The path must start at the entrance and end at the exit.
Each step in the path must go to an adjacent, connected cell.
You cannot exceed the knapsack’s capacity.
Every unique cell explored costs you, and moving through already-visited cells is free.
You cannot use prior knowledge of treasure locations:
- A cell can only be checked for treasure after it has been visited.
- No simulation or assumptions about treasure positions are allowed beyond probabilistic reasoning.

How did I approach the problem?

When I first saw this problem, I was lost. How do you plan a path when you don’t know where the heck you’re going? It felt impossible. I got sucked into this rabbit hole, googling fancy algorithms like Dijkstra’s and A*, thinking they’d magically solve it. I even looked at something called Monte Carlo simulations. Total waste of time. None of them worked because I didn’t know where the treasures were, and assuming that broke the rules.

Then, during a consultation for the assignment, I had a major “duh” moment. The TA pointed out something I’d skimmed over in the assignment details: moving through places I’d already been didn’t cost me anything, but exploring a new cell (opening it) did. I felt so dumb for missing that, but it clicked.

That gave me my first real idea: use BFS to find the shortest path from the start to the exit. Then, I would go back to the start and explore (opening it) each cell along that shortest path. After collecting all the treasures found along this path, I would use the 0/1 knapsack algorithm to decide which ones to keep. But it seemed too simple, like I wasn’t using the other info I had, like how many treasures there were and their total value.

I knew I needed to think about probabilities. I spent a whole day stuck, super annoyed, and gave up for the night. Then, it hit me. Years ago, I watched some YouTube videos by 3Blue1Brown and this guy Josh Starmer about how machines can “guess” better as they get more info, like in Naive Bayes stuff. That was it. I didn’t need a perfect plan; I needed something that I could learn as I went.

So, my final plan was to stick to the shortest path from start to finish as my safe route, but take smart side trips based on what I learned about the maze. I called it my BeliefSystemBFSSolver, which sounds cringe, but I had to name something.

A Technical Breakdown

My solution came together in three steps: path finding, probabilistic heuristics, and dynamic programming.

Baseline pathfinding with BFS

First, I ran a simple search called BFS to map out the shortest route from the entrance to the exit. Since planning doesn’t cost anything, this was my go-to path. If checking out new spots didn’t seem worth it, I’d just stick to this route and head straight out.

Probabilistic heuristics

With the main path sorted, I started at the entrance and moved along it, checking each spot and its unvisited neighbors. To decide if a new spot was worth exploring, I used what I knew about the maze, the number of treasures, the combined value, and the combined weight. To make this decision, I came up with a “gain“ formula to quantify the potential of a detour. If the gain is positive, the algorithm would go ahead and do the detour.

🪙

Expected Gain = (Expected Treasure Value x Probability Treasure Fits) - Exploration Cost

Expected Treasure Value: This estimates the average value of a treasure in an unvisited cell, assuming treasures are spread evenly across the maze. I used v_rem / (N-E), where:

v_rem: Total value of treasures not yet found.
N: Total number of cells in the maze.
E: Number of cells already explored.

For example, if there’s 10,000 worth of treasures left (v_rem = 10000), the maze has 100 cells (N = 100), and I’ve explored 20 (E = 20), then each of the 80 remaining cells might hold a treasure worth 10000 / 80 = 125 on average.

Probability Treasure Fits: This guesses the chance a treasure will fit in my knapsack, based on its weight. I used min(C_rem / w_max, 1), where:

C_rem: Remaining capacity in my knapsack.
w_max: Maximum possible weight of a single treasure.

Say my bag can hold 10 kg (C_rem = 10), and the heaviest treasure could be 20 kg (w_max = 20). I’d estimate a 10 / 20 = 0.5 chance it fits. If C_rem is high, this is closer to 1; if it’s low, it shrinks, so I avoid detours when my bag’s nearly full.

Exploration Cost: It costs 1 unit for each new cell being explored.

Example: Two Iterations of the Belief System

Iteration 1:

Starting point: I’ve explored 10 cells (E = 10), found 0 treasures, so v_rem = 500 (all treasures left), and my bag’s full at C_rem = 20.
Expected Treasure Value: v_rem / (N-E) = 500 / (100 - 10) = 500 / 90 ≈ 5.56.
Probability Treasure Fits: min(C_rem / w_max, 1) = min(20 / 8, 1) = 1 (plenty of room).
Exploration Cost: 1.
Expected Gain: (5.56 × 1) - 1 = 4.56. Since it’s positive, I detour.
Result: I explore the cell, find a treasure worth $50, weighing 4 kg. I update: v_rem = 500 - 50 = 450, E = 10 + 1 = 11, C_rem = 20 - 4 = 16.

Iteration 2:

Now, at the next cell on the path, checking another neighbor.
Updated values: v_rem = 450, E = 11, C_rem = 16.
Expected Treasure Value: 450 / (100 - 11) = 450 / 89 ≈ 5.06.
Probability Treasure Fits: min(16 / 8, 1) = 1 (still enough room).
Exploration Cost: 1.
Expected Gain: (5.06 × 1) - 1 = 4.06. Still positive, so I detour again.
Result: This cell has no treasure, so I update: v_rem = 450 (no change), E = 11 + 1 = 12, C_rem = 16 (no change).

Dynamic Programming

After the exploration phase was completed, we would have a collection of found treasure. To ensure the final selection was truly optimal, I used a standard 0/1 Knapsack Dynamic Programming algorithm. This would give us the maximum possible value within the knapsack capacity limit.

Pseudocode

def solver(entrance, exit):
    # Initialize knapsack, belief system
    knapsack = []
    E = 0                 # Number of explored cells
    k = 0                 # Number of treasures found
    T_rem = []            # Remaining treasures
    v_rem = 0             # Remaining total value
    C_rem = 0             # Remaining knapsack capacity

    path = []
    explored_cells = []
    value = 0
    reward = 0

    # Step 1: Get BFS path from entrance to exit
    bfs_path = get_bfs_path(entrance, exit)

    # Step 2: Traverse the BFS path
    for cell in bfs_path:
        if not cell.visited:
            path.append(cell)
            cell.visited = True
            E += 1

            if cell.contains_treasure:
                knapsack.append(cell.treasure)
                update_belief_system()  # Update E, k, T_rem, v_rem, C_rem, etc.

            # Check neighbors not on BFS path
            for neighbor in cell.unvisited_neighbors(not_in_bfs=True, no_wall=True):
                mu_rem = v_rem / (total_cells - E)
                p_fits = min(C_rem / max_treasure_weight(), 1)
                gain = mu_rem * p_fits - 1

                if gain > 0:
                    visit(neighbor)
                    path.append(neighbor)
                    E += 1
                    update_belief_system()
                    path.append(cell)  # Return to original cell

    # Step 3: Select optimal treasures using knapsack DP
    selected_treasures, value, reward = knapsack_dp(knapsack, C_rem)

    # Step 4: Final output
    return path, explored_cells, value, reward

Assumptions

My approach assumes treasures are spread out evenly across the maze, and it worked pretty well for that. It struck a good balance between sticking to the safe path and taking side trips to find treasures.

I tested it with two treasures: when their values were low, like 5 at most, the math often said it wasn’t worth exploring, so it stuck to the main path and saved effort.

But when the treasures were worth a lot, like up to 1000, the potential payoff made it worth checking out nearby spots, and the system went for it.

That said, it’s not perfect. If treasures are more likely to be farther from the start or exit—like if they’re clustered at the maze’s edges—my approach might miss them. Since it only checks spots right next to the main path and assumes treasures are evenly spread, it doesn’t adjust for those far-off clusters. To fix this, I could tweak the formula to guess higher values for spots deeper in the maze, encouraging it to explore farther when it makes sense.

Conclusion

A quick nap cleared my head. I remembered some probability stuff from machine learning, and it just fit this problem perfectly. Ended up with a decent score on the assignment!

Knapsack Maze with Partial Information

Table of contents