Chapter 3 Building Models of Strategic Decision-Making

3.1 Learning goals

Becoming more aware of the issue involved in theory building (and assessment);
Identifying a small set of verbal models that we can then formalize in mathematical cognitive models and algorithms for simulations and model fitting.

3.2 Introduction

In order to do computational models we need a phenomenon to study (and ideally some data), throughout the course you will be asked undergo several experiments, which provides specific behaviors to model.

The matching pennies game provides a fun starting point for exploring cognitive modeling. This simple game allows us to examine how humans make decisions in strategic situations, while introducing fundamental concepts in model development and validation. Through this chapter, we will progress from observing actual gameplay behavior to developing formal models that capture decision-making processes.

3.3 The Matching Pennies Game

In the matching pennies game, two players engage in a series of choices. One player attempts to match the other’s choice, while the other player aims to achieve a mismatch, and they repeatedly play with each other. This is a prototypical example of interacting behaviors that are usually tackled by game theory, and bring up issues of theory of mind and recursivity.

For an introduction see the paper: Waade, Peter T., et al. “Introducing tomsup: Theory of mind simulations using Python.” Behavior Research Methods 55.5 (2023): 2197-2231.

3.4 Game Structure

The game proceeds as follows:

Two players sit facing each other
Each round, both players choose either “left” or “right” to indicate where they believe a penny is hidden
The matcher wins by choosing the same hand as their opponent
The hider wins by choosing the opposite hand
Points are awarded: +1 for winning, -1 for losing
Repeat

This simple structure creates a rich environment for studying decision-making strategies, learning, and adaptation.

3.5 Empirical Investigation

3.5.1 Data Collection Protocol

If you are attending my class you have been (or will be) asked to participate in a matching pennies game. This game provides the foundation for our modeling efforts. By observing gameplay and collecting data, we can develop models that capture the cognitive processes underlying decision-making in strategic situations.

Participants play 30 rounds as the matcher and 30 rounds as the hider, allowing us to observe behavior in both roles. While playing, participants track their scores, which can provide quantitative data for later analysis. Participants are also asked to reflect on their strategies and the strategies they believe their opponents are using, as that provides valuable materials to build models on.

3.5.2 Initial Observations

Through the careful observation and discussion of gameplay we do in class, several patterns typically emerge. For instance, players often demonstrate strategic adaptation, adjusting their choices based on their opponent’s previous moves. They may attempt to identify patterns in their opponent’s behavior while trying to make their own choices less predictable. The tension between exploitation of perceived patterns and maintenance of unpredictability creates fascinating dynamics for modeling.

3.6 Empirical explorations

Below you can observe how a previous year of CogSci did against bots (computational agents) playing according to different strategies. Look at the plots below, where the x axes indicate trial, the y axes how many points the CogSci’ers scored (0 being chance, negative means being completely owned by the bots, positive owning the bot) and the different colors indicate different strategies employed by the bots. Strategy “-2” was a Win-Stay-Lose-Shift bot: when it got a +1, it repeated its previous move (e.g. right if it had just played right), otherwise it would perform the opposite move (e.g. left if it had just played right). Strategy “-1” was a biased Nash both, playing “right” 80% of the time. Strategy “0” indicates a reinforcement learning bot; “1” a bot assuming you were playing according to a reinforcement learning strategy and trying to infer your learning and temperature parameters; “2” a bot assuming you were following strategy “1” and trying to accordingly infer your parameters.

library(tidyverse)
d <- read_csv("data/MP_MSc_CogSci22.csv") %>% 
  mutate(BotStrategy = as.factor(BotStrategy))

d$Role <- ifelse(d$Role == 0, "Matcher", "Hider")

ggplot(d, aes(Trial, Payoff, group = BotStrategy, color = BotStrategy)) + 
  geom_smooth(se = F) + 
  theme_classic() + 
  facet_wrap(.~Role)

That doesn’t look too good, ah? What about individual variability? In the plot below we indicate the score of each of the former students, against the different bots.

d1 <- d %>% group_by(ID, BotStrategy) %>% 
  dplyr::summarize(Score = sum(Payoff))

ggplot(d1, aes(BotStrategy, Score, label = ID)) +
  geom_point(aes(color = ID)) +
  geom_boxplot(alpha = 0.3) +
  theme_classic()

Now, let’s take a bit of group discussion. Get together in groups, and discuss which strategies and cognitive processes might underlie your and the agents’ behaviors in the game. One thing to keep in mind is what a model is: a simplification that can help us make sense of the world. In other words, any behavior is incredibly complex and involves many complex cognitive mechanisms. So start simple, and if you think it’s too simple, progressively add simple components.

Once your study group has discussed a few (during the PE), let’s discuss them.

3.7 Notes from previous years

3.7.1 From Observation to Theory

The transition from observing gameplay to building formal models requires careful consideration of multiple factors. We must identify which aspects of behavior to model explicitly while deciding which details can be abstracted away.

3.7.2 Core Modeling Considerations

When developing models of matching pennies behavior, we must address several key questions:

What information do players use to make decisions?
How do players integrate past experiences with current choices?
What role does randomness play in decision-making?
How do players adapt their strategies over time?
Are there notions and models from previous cognitive science courses that can help us understand the behavior?

These questions guide our model development process, helping us move from verbal theories to mathematical formulations.

3.7.3 The distinction between participant and researcher perspectives

As participants we might not be aware of the strategy we use, or we might believe something erroneous. The exercise here is to act as researchers: what are the principles underlying the participants’ behaviors, no matter what the participants know or believe? Note that talking to participants and being participants helps developing ideas, but it’s not the end point of the process. Also note that as cognitive scientists we can rely on what we have learned about cognitive processes (e.g. memory).

Another important component of the distinction is that participants leave in a rich world: they rely on facial expressions and bodily posture, the switch strategies, etc. On the other hand, the researcher is trying to identify one or few at most “simple” strategies. Rich bodily interactions and mixtures or sequences of multiple strategies are not a good place to start modeling. These aspects are a poor starting point for building your first model, and are often pretty difficult to fit to empirical data. Nevertheless, they are important intuitions that the researcher should (eventually?) accommodate.

3.8 Building Formal Models

Based on observed behavior patterns and theoretical considerations, we can develop several candidate models of decision-making in the matching pennies game.

3.8.1 Random Choice Model

The simplest model assumes players make choices randomly, independent of history or context. Players might simply be randomly choosing “head” or “tail” independently on the opponent’s choices and of how well they are doing. Choices could be fully at random (50% “head”, 50% “tail”) or biased (e.g. 60% “head”, 40% tail). While this may seem overly simplistic, it provides an important baseline for comparison and introduces key concepts in model specification.

3.8.2 Immediate reaction (Win-Stay-Lose-Shift)

Another simple strategy is simply to follow the previous choice: if it was successful keep it, if not change it. This strategy is also called Win-Stay-Lose-Shift (WSLS).

The model can be formalized as: \[P(a_t = a_{t-1}) = \begin{cases} p_w & \text{if win at } t-1 \ 1 - p_l & \text{if loss at } t-1 \end{cases}\] where \(a_t\) represents the action at time \(t\), and \(p_w\) and \(p_l\) are the probabilities of staying after wins and losses respectively.

Alternatively, one could do the opposite: Win-Shift-Lose-Stay.

3.8.3 Keep track of the bias (perfect memory)

A more sophisticated approach considers how players track and respond to their opponent’s choice patterns. This model maintains a running estimate of the opponent’s choice probabilities and updates these estimates based on observed choices.

3.8.4 Keep track of the bias (imperfect memory)

A player could not be able to keep in mind all previous trials, or decide to forget old trials, in case the biase shifts over time. So we could use only the last n trials, or do a weighted mean with weigths proportional to temporal closeness (the more recent, the higher the weight).

3.8.5 Reinforcement learning

Since there is a lot of leeway in how much memory we should keep of previous trials, we could also use a model that explicitly estimates how much players are learning on a trial by trial basis (high learning, low memory; low learning, high memory). This is the model of reinforcement learning, which we will deal with in future chapters. Shortly described, reinforcement learning assumes that each choice has a possible reward (probability of winning) and at every trial given the feedback received updates the expected value of the choice taken. The update depends on the prediction error (difference between expected and actual reward) and the learning rate.

3.8.6 k-ToM

Reinforcement learning is a neat model, but can be problematic when playing against other agents: what the game is really about is not assessing the probability of the opponent choosing “head” generalizing from their past choices, but predicting what they will do. This requires making an explicit model of how the opponent chooses. k-ToM models will be dealt with in future chapters, but can be here anticipated as models assuming that the opponent follows a random bias (0-ToM), or models us as following a random bias (1-ToM), or models us modeling them as following a random bias (2-ToM), etc.

3.8.7 Other possible strategies

Many additional strategies can be generated by combining former strategies. Generating random output is hard, so if we want to confuse the opponent, we could act first choosing tail 8 times, and then switching to a WSLS strategy for 4 trials, and then choosing head 4 times. Or implementing any of the previous strategies and doing the opposite “to mess with the opponent”.

3.9 Cognitive constraints

As we discuss strategies, we can also identify several cognitive constraints that we know from former studies: in particular, memory, perseveration, and errors.

3.9.1 Memory

Humans have limited memory and a tendency to forget that is roughly exponential. Models assuming perfect memory for longer stretches of trials are unrealistic. We could for instance use the exponential decay of memory to create weights following the same curve in the “keeping track of bias” models. Roughly, this is what reinforcement learning is doing via the learning rate parameter.

3.9.2 Perseveration

Winning choice is not changed. People tend to have a tendency to perseverate with “good” choices independently of which other strategy they might be using.

3.9.3 Errors

Humans make mistakes, get distracted, push the wrong button, forget to check whether they won or lost before. So a realistic model of what happens in these games should contain a certain chance of making a mistake. E.g. a 10% chance that any choice will be perfectly random instead of following the strategy.

Such random deviations from the strategy might also be conceptualized as explorations: keeping the door open to the strategy not being optimal and therefore testing other choices. For instance, one could have an imperfect WSLS where the probability of staying if winning (or shifting if losing) is only 80% and not 100%. Further, these deviations could be asymmetric, with the probability of staying if winning is 80% and of shifting if losing is 100%; for instance if negative and positive feedback are perceived asymmetrically.

3.10 Continuity between models

Many of these models are simply extreme cases of others. For instance, WSLS is a reinforcement learning model with an extreme learning rate (reward replaces the formerly expected value without any moderation), which is also a memory model with a memory of 1 previous trial. k-ToM builds on reinforcement learning: at level 1 assumes the other is a RL agent.

3.11 Mixture of strategies

We discussed that there are techniques to consider the data generated by a mixture of models: estimating the probability that they are generated by model 1 or 2 or n. This probability can then be conditioned, according to our research question, to group (are people w schizophrenia more likely to employ model 1) or ID (are different participants using different models), or condition, or… We discussed that we often need lots of data to disambiguate between models, so conditioning e.g. on trial would in practice almost (?) never work.

3.12 Differences from more traditional (general linear model-based) approaches

In a more traditional approach we would carefully set up the experiment to discriminate between hypotheses. For instance, if the hypothesis is that humans deploy ToM only when playing against intentional agents, we can set agents with increasing levels of k-ToM against humans, set up two framings (this is a human playing hide and seek, this is a slot machine), and assess whether humans perform differently. E.g. whether they perform better when thinking it’s a human. We analyze performance e.g. as binary outcome on a trial by trial base and condition its rate on framing and complexity. If framing makes a difference in the expected direction, we are good.

If we do this properly, thanks to the clever experimental designs we set up, we can discriminate between hypotheses. And that is good. However, cognitive modeling opens additional opportunities. For instance, we can actually reconstruct which level of recursion the participants are enacting and if it changes over time. This might be very useful in the experimental setup, and crucial in more observational setups. Cognitive modeling also allows us to discriminate between different cognitive components more difficult to assess by looking at performance only. For instance, why are participants performing less optimally when facing a supposedly non-intentional agent? Is their learning rate different? Is their estimate of volatility different?

In other setups, e.g. a gambling context, we might observe that some participants (e.g. parkinson’s patients) are gambling away much. Is this due to changes in their risk-seeking propensities, loss aversion, or changes in the ability to actually learn the reward structure? Experimental setups help, but cognitive modeling can provide more nuanced and direct evidence.