Introduction

Rock, Paper, Scissors (RPS) is a classic hand-based, zero-sum game where two players face off to determine a winner. Each player selects one of three possible hand signs: Rock, Paper, or Scissors. A simple set of rules determines the game's outcome: Rock crushes Scissors, Scissors cut Paper, and Paper covers Rock. Importantly, players make their choices simultaneously, without knowing the opponent's selection. Each round results in a win, loss, or draw, and RPS is often used as a fun and quick way to resolve disputes or make decisions when parties are at an impasse.

Beyond its everyday utility, RPS has intrigued researchers, statisticians, and scientists, inspiring numerous studies. The game's principles have been applied to understand competitive dynamics in ecological systems, economic market cycles, and the development of rule-based and machine learning algorithms (Zhou, 2019; Bédard-Couture and Kharma, 2019). For instance, Ali, Nakao, and Chen (2000) created a genetic algorithm for RPS, leveraging past game outcomes to inform future moves. More recently, Wu, Tang, Mitchell, and Li (2024) explored the strategic capabilities of large language models (LLMs) in games like RPS. These studies highlight the diverse applications of machine learning techniques in RPS.

In this article, I present a simple but greedy algorithm for playing Rock-Paper-Scissors. The algorithm is evaluated in two distinct experiments, each involving 10,000 games:

One in which the opponent selected actions uniformly at random,
And another in which the opponent exhibited a consistent bias toward a specific action.

Experiment Design

Hypothesis

IF the algorithm can accurately estimate the probabilities of the opponent's action choices, and the opponent exhibits a consistent bias toward certain actions;

THEN the algorithm will achieve a higher win rate than loss or draw rates;

ULTIMATELY supporting the conclusion that the proposed approach is effective for playing Rock-Paper-Scissors against biased opponents.

This hypothesis is tested in both scenarios

Metrics

The following metrics were used to measure the performance of the algorithm in each scenario

Win Rate, Loss Rate, and Draw Rate: The proportion of wins, losses, and draws. More wins are desirable as this supports the hypothesis identified above.
Cumulative Rewards: The sum of all wins, draws, and losses encoded as (1, 0, -1) respectively. > 0 is desirable, <= 0 is undesirable.
Errors in bias estimation: The difference between of action probabilities that the algorithm computed internally and the overall distribution of the opponent’s action observed after the games were concluded. Mathematically defined as:

$$\boldsymbol{e} =\hat{\boldsymbol{p}} - \boldsymbol{p}$$

$$\begin{array}{ll} \text{where:} & \\ \hat{\boldsymbol{p}} = \text{estimated probabilities of opponent's actions}\\ \boldsymbol{p} = \text{actual probabilities of opponent's actions*}\\ \\ \\ *\quad\text{known after game is concluded} \end{array}$$

Algorithm Description

The following describes the characteristics and behaviours of the algorithm

$$\begin{align*} &\text{Initialize:} \\ &P(\text{Rock}) = P(\text{Paper}) = P(\text{Scissors}) = 0.33 \\ &\text{For each round:} \\ &\quad \text{If first round:} \\ &\quad \quad \text{Choose Rock} \\ &\quad \text{Else:} \\ &\quad \quad \text{Update } P(\text{action}) \text{ based on opponent's previous actions} \\ &\quad \quad \text{Determine opponent's most probable action:} \\ &\quad \quad \quad \text{action}^* = \arg\max_{\text{action}} P(\text{action}) \\ &\quad \quad \text{Select action that defeats } \text{action}^* \end{align*}$$

The algorithm is designed to predict the opponent's choice by examining their previous playing patterns, which are represented as probabilities.

Initially, without any information about the opponent's behavior, the algorithm assumes each action—Rock, Paper, and Scissors—is equally likely, assigning them a probability of 0.33 each. Rock is therefore, arbitrarily chosen.
As the game progresses, the algorithm develops more knowledge about the opponent’s behaviours by updating action probabilities regularly.
At each round, the algorithm:
- Infers the opponent’s most likely action
- Knowing the rules, it chooses an action that is likely to beat the potential action of the opponent and plays that action
- Thereafter, it observes the opponent's action and updates its internal estimates accordingly.
Updating the probabilities of each action is as follows:

$$p_a = \frac{h_a}{H}\\$$

$$\begin{array}{ll} \text{where:} & \\ p_a & = \text{Probability of taking action } a \\ h_a & = \text{Number of times action } a \text{ is played} \\ H & = \text{Total number of plays} \end{array}$$

Results

Experiment 1: Random Player Opponent

The Random Player opponent in this experiment is a player without any bias for a certain action. This, therefore, means that at the end of the game, their choice patterns will follow a uniform distribution. This was the exact behaviour in this experiment.

The results are summarized below:

Win Rate, Draw Rate, and Loss Rate

Win Rate: 33.05%
Draw Rate: 33.47%
Loss Rate: 33.48%

Cumulative Rewards

Error in Bias Estimation

Rock: -0.000066566657
Paper: -0.000035333533
Scissors: -0.0000331233123

I also chose to run another set of 10,000 games with the same opponent, the results are interesting to observe as well.

Win Rate, Draw Rate, and Loss Rate

Win Rate: 33.59%
Draw Rate: 33.49%
Loss Rate: 32.92%

Cumulative Rewards

Error in Bias Estimation

Rock: -0.0000325832583
Paper: -0.0000665966597
Scissors: -0.0000340134013

Interpretation

The algorithm did not win the first set of 10,000 games, even though its estimated action probabilities closely matched the actual distribution observed after the game. The opponent's behavior followed a uniform distribution, playing each action with approximately equal frequency. Despite this, short-term variations in move frequencies — inherent to any random process — created what appeared to be patterns. The algorithm responded to these fluctuations, adapting continuously as if tracking meaningful changes in strategy — a dynamic resembling a game of cat and mouse.

These “naive“ adaptations resulted in suboptimal performance, leading to a net loss. In the second run, similar fluctuations happened to align more favorably, producing a cumulative win. This variation is expected and consistent with the stochastic nature of the opponent’s strategy.

Overall, these results demonstrate that a uniform distribution of actions is inherently difficult to exploit. Within the context of this algorithm, the lack of consistent bias in the opponent’s behavior means there is no persistent pattern to learn — only noise. As such, performance will naturally vary, depending on how the algorithm responds to incidental short-term deviations in the opponent's play.

What happens when there is a bias to exploit?

Experiment 2: Opponent with a bias for a certain action

Win Rate, Draw Rate, and Loss Rate

Win Rate: 61.05%
Draw Rate: 19.67%
Loss Rate: 19.28%

Cumulative Rewards

4,138

Error in Bias Estimation

Rock: -0.0000389538954
Paper: -0.0000192819282
Scissors: -0.0000196719672

Also running another set of games, here are the results.

Win Rate, Draw Rate, and Loss Rate

Win Rate: 59.64%
Draw Rate: 20.39%
Loss Rate: 19.99%

Cumulative Rewards

3,925

Error in Bias Estimation

Rock: -0.0000403340334
Paper: -0.0000199219922
Scissors: -0.0000204120412

Interpretation

In this round of experiments, the opponent exhibited a clear bias toward the action Rock, selecting it approximately 61% of the time. The algorithm successfully detected and capitalized on this bias, significantly increasing its cumulative rewards. Notably, the algorithm also achieved very low error in its probability estimations, highlighting its ability to model opponent behavior when a consistent bias is present accurately. This confirms that the algorithm not only adapts effectively but also improves its prediction accuracy when the environment contains exploitable structure.

Limitations of this approach

The algorithm relies on hardcoded rules, which limit its flexibility and generalization. As a result, it cannot be easily adapted to other games or dynamic environments without manual intervention.
The algorithm cannot learn the game mechanics autonomously. It does not improve its understanding of the environment through gameplay, making it dependent on prior knowledge and hardcoded rules rather than experiential learning.
Opponent modeling is limited to a single-step prediction. The algorithm only considers the most likely next action, without accounting for sequences or conditional behavior. This restricts its strategic depth and responsiveness to more complex opponent patterns.
The underlying decision logic is relatively simple and predictable, making it potentially exploitable by more intelligent or adaptive opponents who recognize its behavior and counter it effectively.

Conclusion

This exploration into a simple algorithm for Rock, Paper, Scissors (RPS) demonstrates the intriguing dynamics of strategic decision-making in a seemingly simple game. The experiments conducted reveal the challenges and opportunities inherent in predicting and exploiting opponent behavior. In scenarios where the opponent's actions are uniformly distributed, the algorithm's performance fluctuates around an expected value, highlighting the difficulty of exploiting randomness. However, when a bias is present, the algorithm effectively adapts and capitalizes on this predictable behavior, significantly improving its win rate and cumulative rewards. These findings underscore the value of detecting and exploiting patterns in competitive environments.

In a sequel to this piece, I will build on the identified limitations to build a more robust algorithm.

References

Zhou, H. J. (2016). The rock–paper–scissors game. Contemporary Physics, 57(2), 151-163.
Bédard-Couture, R., & Kharma, N. N. (2019, September). Playing Iterated Rock-Paper-Scissors with an Evolutionary Algorithm. In IJCCI (pp. 205-212).
Ali, F. F., Nakao, Z., & Chen, Y. W. (2000, July). Playing the rock-paper-scissors game with a genetic algorithm. In Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No. 00TH8512) (Vol. 1, pp. 741-745). IEEE.
Wu, Y., Tang, X., Mitchell, T. M., & Li, Y. (2023). Smartplay: A benchmark for LLMs as intelligent agents. arXiv preprint arXiv:2310.01557.

Learning to Win: An Algorithm for Rock-Paper-Scissors (RPS) – Part 1

Introduction

Experiment Design

Hypothesis

Metrics

Algorithm Description

Results

Experiment 1: Random Player Opponent

Win Rate, Draw Rate, and Loss Rate

Cumulative Rewards

Error in Bias Estimation

Win Rate, Draw Rate, and Loss Rate

Cumulative Rewards

Error in Bias Estimation

Interpretation

Experiment 2: Opponent with a bias for a certain action

Win Rate, Draw Rate, and Loss Rate

Cumulative Rewards

Error in Bias Estimation

Win Rate, Draw Rate, and Loss Rate

Cumulative Rewards

Error in Bias Estimation

Interpretation

Limitations of this approach

Conclusion

References

Subscribe to my newsletter

Oluwaseyi Ogunnowo

Oluwaseyi Ogunnowo