Motivation

Introduce a way for people to learn reinforcement learning.
Encourage people to build more libraries like these.
Learn RL myself.

Purpose

Allow users to train RL agents on a simple game so they can feel more comfortable tackling complex problems.

Game intro

Let’s say we have a board:

So there are unique numbers from 1 to 8 and a blank space.

Puzzle goal: through a series of changes, make the numbers come in ascending order from left to right, top to bottom. The blank space must be in the bottom-right corner.

Actions: Slide the blocks into the blank space, changing their positions.

Solution:

Note the game is non-trivial. Even 3x3 example took 16 moves. When dimensions increase, the solution length explodes.

Puzzle15 library

The Puzzle15Gym library uses Puzzle15 library.

It contains all needed methods to create random puzzles with given width and height or predefined ones, list available moves, change puzzle state by performing a move and return the puzzle state.

15PuzzleGym - custom AI gym lib for the puzzle

15PuzzleGym is the library to train our RL agents on the puzzle.

Git repository for the project: https://github.com/EvalVis/Puzzle15Gym.

The repository contains examples on how to initialize the environment and make the moves.

Gym lib usage example

Let’s have extensive examples.

First let’s make a move:

import gym
import puzzle15Gym

env_3x3_fixed = gym.make('Puzzle3x3Fixed-v0')

observation, info = env_3x3_fixed.reset()
print(observation)
# output: [ 2  8  6  7  1  3 -1  5  4]
# So the board is like this:
# 2 8 6
# 7 1 3
# -1 5 4
# -1 means blank space.
print(info)
#output: {'valid_actions': [0, 1]}
# Since blank space is in bottom-left corner we can only move 0 (up) or 1 (right).

action = env_3x3_fixed.action_space.sample()
print(action)
# output: 3 # This is invalid action.
# sample() does not guarantee it will return a valid one.
# For a valid action use: action = random.choice(info["valid_actions"])

observation, reward, done, truncated, info = env_3x3_fixed.step(action)
print(observation)
# output: [ 2  8  6  7  1  3 -1  5  4] # Invalid action results in no action.
print(reward)
# output: -2 # A reward of -2 for violating game rules.
print(done)
# output: False # Game is still not solved.
print(truncated)
# #output: False # This is the first move. Game will truncate if unsolved for too long.
print(info)
# output: {'valid_actions': [0, 1]} # Same position, same valid actions.

We can also render visually.

This code:

import time
for i in range(20):
    action = env_3x3_fixed.action_space.sample()
    observation, reward, done, truncated, info = env_3x3_fixed.step(action)
    env_3x3_fixed.render()
    time.sleep(0.5)
env_3x3_fixed.close()

Results in this visual output:

Installation

If you want to code a RL agent yourself, grab a keyboard, install the puzzle via https://pypi.org/project/puzzle15Gym/ and have a go!

Custom AI gym lib for sliding block puzzle

Table of contents