Custom AI gym lib for sliding block puzzle

This message aims to introduce a new custom AI gym library: https://pypi.org/project/puzzle15Gym/.
It is the library for playing the classical sliding block puzzle: https://en.wikipedia.org/wiki/15_puzzle with AI agents.
Motivation
Introduce a way for people to learn reinforcement learning.
Encourage people to build more libraries like these.
Learn RL myself.
Purpose
Allow users to train RL agents on a simple game so they can feel more comfortable tackling complex problems.
Game intro
Let’s say we have a board:
So there are unique numbers from 1 to 8 and a blank space.
Puzzle goal: through a series of changes, make the numbers come in ascending order from left to right, top to bottom. The blank space must be in the bottom-right corner.
Actions: Slide the blocks into the blank space, changing their positions.
Solution:
Note the game is non-trivial. Even 3x3 example took 16 moves. When dimensions increase, the solution length explodes.
Puzzle15 library
The Puzzle15Gym library uses Puzzle15 library.
It contains all needed methods to create random puzzles with given width and height or predefined ones, list available moves, change puzzle state by performing a move and return the puzzle state.
15PuzzleGym - custom AI gym lib for the puzzle
15PuzzleGym is the library to train our RL agents on the puzzle.
Git repository for the project: https://github.com/EvalVis/Puzzle15Gym.
The repository contains examples on how to initialize the environment and make the moves.
Gym lib usage example
Let’s have extensive examples.
First let’s make a move:
import gym
import puzzle15Gym
env_3x3_fixed = gym.make('Puzzle3x3Fixed-v0')
observation, info = env_3x3_fixed.reset()
print(observation)
# output: [ 2 8 6 7 1 3 -1 5 4]
# So the board is like this:
# 2 8 6
# 7 1 3
# -1 5 4
# -1 means blank space.
print(info)
#output: {'valid_actions': [0, 1]}
# Since blank space is in bottom-left corner we can only move 0 (up) or 1 (right).
action = env_3x3_fixed.action_space.sample()
print(action)
# output: 3 # This is invalid action.
# sample() does not guarantee it will return a valid one.
# For a valid action use: action = random.choice(info["valid_actions"])
observation, reward, done, truncated, info = env_3x3_fixed.step(action)
print(observation)
# output: [ 2 8 6 7 1 3 -1 5 4] # Invalid action results in no action.
print(reward)
# output: -2 # A reward of -2 for violating game rules.
print(done)
# output: False # Game is still not solved.
print(truncated)
# #output: False # This is the first move. Game will truncate if unsolved for too long.
print(info)
# output: {'valid_actions': [0, 1]} # Same position, same valid actions.
We can also render visually.
This code:
import time
for i in range(20):
action = env_3x3_fixed.action_space.sample()
observation, reward, done, truncated, info = env_3x3_fixed.step(action)
env_3x3_fixed.render()
time.sleep(0.5)
env_3x3_fixed.close()
Results in this visual output:
Campaign start
I will be coding a RL agent for this puzzle.
It should be around several months from now, since I still have some other side projects to finish.
Let’s consider this a prequel.
If you want to code a RL agent yourself, grab a keyboard install https://pypi.org/project/puzzle15Gym/ and have a go!
Subscribe to my newsletter
Read articles from Evaldas Visockas directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
