Chapter 3 Building Models of Strategic Decision-Making

You’ve just played Matching Pennies and discussed strategies with your classmates. You probably noticed patterns in your opponent’s play, tried to be unpredictable, and maybe even changed your strategy mid-game. This chapter helps you translate those observations and intuitions into the language of cognitive modeling—preparing you to implement formal models in Chapter 3.

3.1 Learning Goals

This chapter bridges the gap between observing behavior and developing testable theories. By the end of this chapter, using the Matching Pennies game as a case study, you will be able to:

  • Identify Key Modeling Steps: Understand the process of moving from behavioral observations and participant reflections to formulating initial verbal theories of underlying cognitive strategies.
  • Appreciate Theory Building Challenges: Recognize common issues in theory development, such as the participant vs. researcher perspective, the need for simplification, and incorporating known cognitive constraints.
  • Generate Candidate Models: Propose several distinct verbal models (e.g., random choice, simple heuristics, memory-based strategies) that could plausibly explain behavior in a strategic decision-making task.
  • Connect to Formalization: Understand why translating these verbal models into precise, formal models (covered in the next chapter) is a necessary step for rigorous testing and simulation.

3.2 Introduction: Observing Behavior to Theorize Mechanisms

Chapter 1 emphasized the importance of modeling underlying generative mechanisms. To do this for cognition, we first need a behavior to explain. This chapter uses the Matching Pennies game as our initial cognitive phenomenon. It’s a simple strategic interaction, yet rich enough to illustrate the process of developing and refining cognitive models.

Our goal here is not yet to build the final computational models, but to practice the crucial preceding steps: 1. Observing behavior in a specific task (through experiments and data exploration). 2. Reflecting on potential cognitive strategies and constraints (drawing on observations, participant reports, and cognitive science principles). 3. Formulating initial verbal theories or candidate models that describe the potential underlying mechanisms.

This process lays the groundwork for Chapter 3, where we will translate these verbal ideas into precise, formal models ready for simulation and testing.

3.3 The Matching Pennies Game

In the matching pennies game, two players engage in a series of choices. One player attempts to match the other’s choice, while the other player aims to achieve a mismatch, and they repeatedly play with each other. This is a prototypical example of interacting behaviors that are usually tackled by game theory, and bring up issues of theory of mind and recursivity.

For an introduction see the paper: Waade, Peter T., et al. “Introducing tomsup: Theory of mind simulations using Python.” Behavior Research Methods 55.5 (2023): 2197-2231.

3.4 Game Structure

The game proceeds as follows:

  1. Two players sit facing each other
  2. Each round, both players choose either “left” or “right” to indicate where they believe a penny is hidden
  3. The matcher wins by choosing the same hand as their opponent
  4. The hider wins by choosing the opposite hand
  5. Points are awarded: +1 for winning, -1 for losing
  6. Repeat

This simple structure creates a rich environment for studying decision-making strategies, learning, and adaptation.

3.5 Empirical Investigation

3.5.1 Data Collection Protocol

If you are attending my class you have been (or will be) asked to participate in a matching pennies game. This game provides the foundation for our modeling efforts. By observing gameplay and collecting data, we can develop models that capture the cognitive processes underlying decision-making in strategic situations.

Participants play 30 rounds as the matcher and 30 rounds as the hider, allowing us to observe behavior in both roles. While playing, participants track their scores, which can provide quantitative data for later analysis. Participants are also asked to reflect on their strategies and the strategies they believe their opponents are using, as that provides valuable materials to build models on.

3.5.2 Initial Observations

Through the careful observation and discussion of gameplay we do in class, several patterns typically emerge. For instance, players often demonstrate strategic adaptation, adjusting their choices based on their opponent’s previous moves. They may attempt to identify patterns in their opponent’s behavior while trying to make their own choices less predictable. The tension between exploitation of perceived patterns and maintenance of unpredictability creates fascinating dynamics for modeling.

3.6 Empirical explorations

Below you can observe how a previous year of CogSci did against bots (computational agents) playing according to different strategies. Look at the plots below, where the x axes indicate trial, the y axes how many points the CogSci’ers scored (0 being chance, negative means being completely owned by the bots, positive owning the bot) and the different colors indicate different strategies employed by the bots. Strategy “-2” was a Win-Stay-Lose-Shift bot: when it got a +1, it repeated its previous move (e.g. right if it had just played right), otherwise it would perform the opposite move (e.g. left if it had just played right). Strategy “-1” was a biased Nash both, playing “right” 80% of the time. Strategy “0” indicates a reinforcement learning bot; “1” a bot assuming you were playing according to a reinforcement learning strategy and trying to infer your learning and temperature parameters; “2” a bot assuming you were following strategy “1” and trying to accordingly infer your parameters.

# Load necessary libraries
library(tidyverse)
library(tidyverse)

# --- 1. Data Loading / Generation ---
data_path <- file.path("data", "MP_MSc_CogSci22.csv")

if (file.exists(data_path)) {
  d <- read_csv(data_path)
} else {
  # Generate synthetic data if file is missing (Reproducibility check)
  set.seed(42)
  n_students <- 20
  n_trials <- 30
  strategies <- c(-2, -1, 0, 1, 2)
  d <- expand_grid(
    ID = factor(1:n_students),
    BotStrategy = strategies,
    Role = c(0, 1), # 0=Matcher, 1=Hider
    Trial = 1:n_trials
  ) %>%
    mutate(
      # Random payoffs for demonstration
      Payoff = sample(c(-1, 1), n(), replace = TRUE, prob = c(0.45, 0.55))
    )
  warning("Using synthetic data for demonstration.")
}

# --- 2. Data Cleaning (Crucial Step!) ---
# Map cryptic codes to human-readable labels
bot_labels <- c(
  "-2" = "WSLS Bot",
  "-1" = "Bias Bot (80%)",
  "0"  = "RL Bot",
  "1"  = "ToM-1 Bot",
  "2"  = "ToM-2 Bot"
)

d_clean <- d %>%
  mutate(
    # Make BotStrategy a factor with meaningful names
    BotStrategy = factor(BotStrategy, 
                         levels = names(bot_labels), 
                         labels = bot_labels),
    # Make Role a factor
    Role = factor(Role, levels = c(0, 1), labels = c("Matcher", "Hider"))
  )

# --- 3. Plot Collective Performance ---
ggplot(d_clean, aes(x = Trial, y = Payoff, color = BotStrategy)) +
  geom_smooth(se = FALSE, method = "loess", span = 0.5) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  facet_wrap(~Role) +
  labs(
    title = "Human vs. Machine: Average Performance",
    subtitle = "Positive values indicate humans winning against bots",
    y = "Average Payoff",
    color = "Opponent Strategy"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

That doesn’t look too good, ah? What about individual variability? In the plot below we indicate the score of each of the former students, against the different bots.

# --- Plot 2: Individual Variability in Scores ---
# Calculate the total score for each student (ID) against each bot strategy.
d_summary <- d %>%
  group_by(ID, BotStrategy, Role) %>% # Group by student, bot, and role
  summarize(TotalScore = sum(Payoff), .groups = "drop") # Calculate total score

# Visualize the distribution of total scores for each bot strategy.
# geom_boxplot shows the distribution, geom_point shows individual student scores.
print(
  ggplot(d_summary, aes(x = BotStrategy, y = TotalScore)) +
    geom_boxplot(aes(fill = Role), alpha = 0.3, outlier.shape = NA) + # Boxplot showing distribution
    geom_jitter(aes(color = ID), width = 0.2, alpha = 0.7) + # Individual student points
    facet_wrap(~Role) + # Separate plots for Matcher and Hider
    labs(
      title = "Distribution of Total Scores Against Different Bots",
      subtitle = "Shows individual student variability",
      x = "Bot Strategy",
      y = "Total Score (Sum of Payoffs)",
      fill = "Player Role"
    ) +
    theme_classic() +
    theme(legend.position = "none") # Hide legend for individual IDs
)

3.6.1 From Observation to Theory: Identifying Potential Mechanisms

The plots above reveal patterns: average performance changes over time, varies by opponent, and differs across individuals. Gameplay observations and participant reflections (from class discussion or collected data) add qualitative insights – perhaps players mention trying to be unpredictable, guessing opponent biases, or repeating winning moves.

The crucial next step is to distill these rich, complex observations into simplified, plausible mechanisms or strategies. This involves abstraction:

  • Identifying Core Patterns: What recurring behaviors seem most important? (e.g., reacting to wins/losses, tracking opponent frequencies).

  • Simplifying: Can we capture the essence of a strategy without modeling every detail of a player’s thought process or interaction? (e.g., modeling WSLS instead of complex pattern detection).

  • Drawing on Cognitive Principles: How do known cognitive constraints (like limited memory or processing errors, discussed below) shape plausible strategies?

For instance, observing that players often change their choice after a loss might lead us to propose a “Lose-Shift” component as part of a candidate model. Observing that performance differs against biased vs. adaptive bots suggests players might be trying to learn or adapt, leading to memory-based or learning models.

This process generates verbal models – initial hypotheses about the strategies at play. Key modeling considerations guide this translation:

  • What information do players likely use? (Own past choices? Opponent’s choices? Payoffs?)

  • How far back does memory plausibly extend? (Last trial? Last 5 trials? Exponential decay?)

  • What is the role of randomness? (True randomness? Exploration? Implementation errors?)

  • How might strategies adapt over time or differ between individuals?

Answering these helps refine our verbal models, paving the way for formalization. The goal isn’t to capture everything, but to propose distinct, testable mechanisms.

3.6.2 The distinction between participant and researcher perspectives

As participants we might not be aware of the strategy we use, or we might believe something erroneous. The exercise here is to act as researchers: what are the principles underlying the participants’ behaviors, no matter what the participants know or believe? Note that talking to participants and being participants helps developing ideas, but it’s not the end point of the process. Also note that as cognitive scientists we can rely on what we have learned about cognitive processes (e.g. memory).

Another important component of the distinction is that participants leave in a rich world: they rely on facial expressions and bodily posture, the switch strategies, etc. On the other hand, the researcher is trying to identify one or few at most “simple” strategies. Rich bodily interactions and mixtures or sequences of multiple strategies are not a good place to start modeling. These aspects are a poor starting point for building your first model, and are often pretty difficult to fit to empirical data. Nevertheless, they are important intuitions that the researcher should (eventually?) accommodate.

3.7 Candidate Verbal Models

Based on the behavioral patterns observed, participant discussions, and core cognitive principles, we can now propose several candidate verbal models for behavior in the matching pennies game. Each represents a different hypothesis about the underlying cognitive strategy. These are starting points, deliberately simplified:

  • Random Choice Model: Reflects the idea that players might try to be unpredictable or simply lack a clear strategy.
  • Win-Stay-Lose-Shift (WSLS): Captures the common heuristic of repeating successful actions and changing unsuccessful ones.
  • Bias Tracking (Memory Models): Addresses the observation that players seem to react to opponent patterns, potentially by estimating choice frequencies. Variations account for memory limits.
  • Reinforcement Learning: A more formal learning model capturing trial-by-trial value updates based on prediction errors, potentially explaining adaptation.
  • k-ToM (Theory of Mind): Accounts for the strategic nature of the game, where players model their opponent’s mind (or model the opponent modeling their mind).

These verbal models, derived from our initial analysis, are the hypotheses we will formalize and test in subsequent chapters. Let’s briefly describe the core idea of each:

  • Random Choice Model:
    • Mechanism: Choices are made randomly, potentially with a fixed bias (e.g., 60% right, 40% left), independent of game history or opponent actions.
    • Rationale: Serves as the simplest baseline. It reflects the possibility that players might try to be deliberately unpredictable, haven’t figured out a strategy, or that their behavior appears random from the observer’s perspective. It introduces the basic concept of choice probability (the \(\theta\) parameter we’ll estimate later).
  • Win-Stay-Lose-Shift (WSLS):
    • Mechanism: A simple heuristic: repeat the last choice if it led to a win, switch to the other choice if it led to a loss.
    • Rationale: Captures the intuitive tendency to stick with success and abandon failure. This is a common heuristic observed in simple learning tasks.
    • Formalization Hint: This can be formalized using probabilities of staying/shifting conditional on the previous outcome. A simple version might be deterministic (always stay/shift), while a probabilistic version allows for occasional deviations: \[P(\text{stay} | \text{outcome}_{t-1}) = \begin{cases} p_{stay\_win} & \text{if win at } t-1 \\ p_{stay\_loss} & \text{if loss at } t-1 \end{cases}\] (Note: \(p_{shift} = 1 - p_{stay}\)). The original formula used \(p_w\) for staying after a win and \(1-p_l\) for staying after a loss (meaning \(p_l\) is the probability of shifting after a loss).
  • Memory-Based Bias Tracking:
    • Mechanism: Assumes players estimate the opponent’s choice bias (e.g., probability of choosing ‘right’) based on past observations and use this estimate to guide their own choices (e.g., predict and counter).
    • Rationale: Reflects the observation that players seem to react to opponent tendencies. Addresses the “what information do players use?” question.
    • Variations:
      • Perfect Memory: Uses all past trials equally to estimate the bias. (Less plausible cognitively).
      • Imperfect/Limited Memory: Uses only recent trials (e.g., last n trials) or gives more weight to recent trials (exponential decay), reflecting cognitive constraints.
  • Reinforcement Learning (RL):
    • Mechanism: Players learn the expected value of each choice (‘left’ vs. ‘right’) based on the rewards (wins/losses) received. Choices are made based on these learned values (e.g., choosing the option with the higher expected value more often). Learning occurs trial-by-trial based on prediction errors (difference between expected and actual reward) modulated by a learning rate.
    • Rationale: Provides a formal framework for learning from feedback, naturally incorporating imperfect memory (controlled by the learning rate – high learning rate means relying more on recent trials). Connects to broader theories of learning in psychology and neuroscience. (Preview: Formal RL models will be detailed in Chapter 12).
  • k-ToM (Theory of Mind):
    • Mechanism: Players explicitly model their opponent’s strategy. Level 0 (0-ToM) assumes the opponent is random/biased. Level 1 (1-ToM) assumes the opponent is using a 0-ToM strategy and tries to best respond. Level 2 (2-ToM) assumes the opponent is using a 1-ToM strategy, and so on.
    • Rationale: Directly addresses the strategic, interactive nature of the game, where predicting the opponent’s intentions or model of you might be crucial, going beyond simple pattern detection. (Preview: These models will be discussed in later chapters).
  • Combined/Switching Strategies:
    • Mechanism: Real behavior might involve combining elements of the above (e.g., WSLS with a baseline bias) or switching between strategies (e.g., tracking bias for a while, then switching to random). Generating complex, non-stationary sequences might also be a strategy to appear unpredictable.
    • Rationale: Generating random output is hard, so if we want to confuse the opponent, we could act first choosing tail 8 times, and then switching to a WSLS strategy for 4 trials, and then choosing head 4 times. Or implementing any of the previous strategies and doing the opposite “to mess with the opponent”. The combined strategies model also acknowledges that single, simple models might be insufficient. Motivates concepts like mixture models (Chapter 8) where behavior is seen as a probabilistic blend of simpler strategies, or models where strategy parameters change over time. Yet these mixture models are hard to fit to data and can quickly spiral out of control (many possible combinations)

3.7.1 Plausibility Check: Cognitive Constraints

You probably noticed you couldn’t remember every single move your opponent made. Maybe you lost track of older trials, and recent trials felt more important. These aren’t failures: they’re features of human cognition. Our models should reflect these constraints to be cognitively realistic. Let’s see what difference these constraints make for predictions.

3.7.1.1 Memory Limitations

  • Constraint: Humans have limited working memory and exhibit forgetting, often approximated by exponential decay. Perfect recall of long trial sequences is unrealistic.
  • Modeling Implication: This favors models incorporating memory decay or finite history windows (like imperfect memory models or RL with a learning rate < 1) over perfect memory models. It suggests that even bias-tracking models should discount older information.

3.7.1.2 Perseveration Tendencies

  • Constraint: People sometimes exhibit perseveration – repeating a previous action, especially if it was recently successful or chosen, even if a different strategy might suggest otherwise. This can be distinct from rational “win-stay”.
  • Modeling Implication: This might be incorporated as an additional bias parameter influencing the choice probability (e.g., a small added probability of repeating the last action \(a_{t-1}\) regardless of outcome) or interact with feedback processing (e.g., strengthening the ‘stay’ tendency after wins).

3.7.1.3 Noise and Errors

  • Constraint: Human behavior is inherently noisy. People make mistakes, have attentional lapses, press the wrong button, or misunderstand feedback. Behavior rarely perfectly matches a deterministic strategy.
  • Modeling Implication: Models should almost always include a “noise” component. This can be implemented in several ways:
    • Lapse Rate: A probability (e.g., \(\epsilon\)) that on any given trial, the agent makes a random choice instead of following their primary strategy (as used in the Mixture Model chapter).
    • Decision Noise (Softmax): In models where choices are based on comparing values (like RL), a ‘temperature’ parameter can control the stochasticity. High temperature leads to more random choices, low temperature leads to more deterministic choices based on values.
    • Imperfect Heuristics: Parameters within a strategy might reflect imperfect application (e.g., in WSLS, \(p_{stay\_win} < 1\) or \(p_{shift\_loss} < 1\)). This can also capture asymmetric responses to feedback (e.g., being more likely to shift after a loss than stay after a win).
    • Exploration: Random deviations can also be framed as adaptive exploration, allowing the agent to test actions that their current strategy deems suboptimal.

3.7.2 Relationships Between Models

It’s useful to note that these candidate models aren’t always entirely distinct. Often, simpler models emerge as special cases of more complex ones: * A Random Choice model is like a Memory-Based model where the influence of memory is zero. * WSLS can be seen as a specific type of RL model with a very high learning rate and sensitivity only to the immediately preceding trial’s outcome. * A 0-ToM model might resemble a Bias Tracking model.

Recognizing these connections can guide a principled modeling approach, starting simple and adding complexity only as needed and justified by data or theory.

3.7.3 Handling Heterogeneity: Mixture Models

What if different participants use different strategies, or a single participant switches strategies during the game? This is where mixture models become relevant (explored in detail in Chapter 8). * Concept: Instead of assuming one model generated all the data, a mixture model assumes the data is a probabilistic blend from multiple candidate models (e.g., 70% of choices from WSLS, 30% from Random Bias). * Purpose: Allows capturing heterogeneity within or across individuals without needing to know a priori which strategy was used on which trial or by which person. The model estimates the probability that each data point came from each component strategy. * Challenge: Mixture models often require substantial data to reliably distinguish between components and estimate their mixing proportions.

3.7.4 Cognitive Modeling vs. Traditional Statistical Approaches (e.g., GLM)

How does this modeling approach differ from standard statistical analyses you might have learned, like ANOVAs or the General Linear Model - GLM?

  • Focus: GLM approaches typically focus on identifying statistical effects: Does factor X significantly influence outcome Y? (e.g., Does the opponent’s strategy affect the player’s win rate?). Cognitive modeling focuses on identifying the underlying process or mechanism: How does the opponent’s strategy lead to changes in the player’s choices via specific computations (like learning, memory updating, or strategic reasoning)?
  • Theory: Cognitive models are usually derived from theories about mental processes. GLMs are more general statistical tools, often used agnostically regarding the specific cognitive mechanism.
  • Parameters: Cognitive models estimate parameters that often have direct psychological interpretations (e.g., learning rate, memory decay, decision threshold, bias weight). GLM parameters represent statistical associations (e.g., regression coefficients).
  • Data Level: Cognitive models often predict behavior at the trial level (e.g., predicting the choice on trial t based on history up to t-1). GLM analyses often aggregate data (e.g., comparing average win rates across conditions).
  • Prediction vs. Explanation: While both aim to explain data, cognitive modeling often places a stronger emphasis on generating the observed behavior pattern from the hypothesized mechanism, allowing for simulation and prediction of fine-grained details.

Example Revisited: In the Matching Pennies game: * A GLM approach might test if Payoff ~ BotStrategy * Role + (1|ID) shows a significant effect of BotStrategy. * A cognitive modeling approach would fit different strategy models (WSLS, RL, etc.) to the choice data and compare them (using methods from Ch 7) to see which mechanism best explains the choices made against different bots, potentially revealing why performance differs (e.g., due to changes in estimated learning rates or strategy weights).

Both approaches are valuable, but cognitive modeling aims for a deeper, mechanistic level of explanation about the underlying cognitive processes.

3.8 Conclusion: From Observations to Verbal Theories

This chapter took us from observing behavior in a specific task – the Matching Pennies game – to the crucial stage of formulating initial theories about the cognitive processes involved. We explored how analyzing gameplay data, considering participant reports, applying cognitive principles (like memory limits and error proneness), and contrasting different potential strategies (Random, WSLS, Memory-based, RL, k-ToM) helps us generate plausible verbal models.

We saw that the path from raw behavior to a testable model involves significant abstraction and simplification. We also highlighted the importance of distinguishing between the participant’s experience and the researcher’s theoretical stance, and how cognitive modeling differs from traditional statistical approaches by focusing on underlying mechanisms.

You now have a conceptual map of candidate models and understand why cognitive constraints matter. But verbal descriptions like ‘win-stay-lose-shift’ hide crucial ambiguities: Does ‘stay’ mean always stay or usually stay? How do we handle the first trial? In Chapter 3, you’ll implement these models in code, forcing you to make every assumption explicit. This is where modeling becomes rigorous and often reveals that our verbal intuitions were vaguer than we thought.

The next chapter, “From verbal descriptions to formal models,” tackles exactly this challenge. We will take some of the candidate models discussed here (like Random Choice and WSLS) and translate them into precise mathematical algorithms and R functions. This formalization will force us to be explicit about our assumptions and enable us to simulate agent behavior, setting the stage for fitting these models to data and evaluating their performance in later chapters.