Section number: Figure 7.8 The minmax-Q algorithm.
Date: 5/2/2014
Name: Haden Lee
Email: haden[dot]lee[at]stanford[dot]edu
Content: A minor typo in the pseudo-code. "... \sum_{a'} (\Pi(s, a') * Q ..." should be "... \sum_{a'} (\Pi'(s, a') * Q ..." (just change \Pi to \Pi' as in the domain of argmax).
Page number: p.220 (e-book)
Section number: 7.5
Date: 4/30/2014
Name: Haden Lee
Email: haden[dot]lee[at]stanford[dot]edu
Content: A minor typo at the bottom of the page: "... and let \alpha^t(s_i) be ..." should be "\alpha^t(s)" to be consistent within the section (just remove the subscript "i").
Page number: p.221 (e-book)
Section number: 7.5, Definitions 7.5.1 and 7.5.2.
Date: 4/30/2014
Name: Haden Lee
Email: haden[dot]lee[at]stanford[dot]edu
Content: Definition 7.5.1 should be $R^t(s) = \alpha^t(s) - \alpha^t$ instead of $\alpha^t - \alpha^t(s)$ for two reasons: (A) The book mentions this definition's consistency with Def 3.4.5, but it seems the opposite is true when it is defined as in the book. (B) With the definition in the book, no-regret leraning rule (def 7.5.2) seems wrong because if there is some pure strategy s that is 'better' than what the agent has been playing, then Pr([lim inf R^t(s)] <= 0) is always equal to 1, but this is exaclty what we want to avoid. Instead, either Def 7.5.2 should be changed to Pr([lim inf R^t(s)] <= 0) = 0 OR Def 7.5.1 should be changed (just flip the sign). The latter change seems more reasonable. In addition, with this change, on page 221, a bullet point on Regret matching must also be changed. The weight for pure strategy s is defined as \sigma_i^{t+1}(s) = \frac{R^t(s)}{\sum_{s' \in S_i} R^t(s')}, but note that (with the new definition of R^t(s)), this weight can be negtaive. One natural fix is the following: \sigma_i^{t+1}(s) = \frac{ max(0, R^t(s)) }{\sum_{s' \in S_i} max(0, R^t(s')) }. That is, treat negative R^t(s) terms as zeros because if R^t(s) < 0 then it implies that what I have been playing is better than playing s, and thus I have no reason to place any weight on s. Instead, I should place weights on the strategies that gave me positive regrets.
The following errors are fixed in the second printing of the book and online PDF v1.1
Page number: 200
Section number:7.3
Date: Feb 27, 2010
Name: Kevin
Content: changed the notation for other player's strategy and set of strategies for consistency. The paragraph now reads: As in fictitious play, each player begins the game with some prior beliefs. After each round, the player uses Bayesian updating to update these beliefs. Let $S_{-i}^i$ be the set of the opponent's strategies considered possible by player $i$, and $H$ be the set of possible histories of the game. Then we can use Bayes' rule to express the probability assigned by player $i$ to the event in which the opponent is playing a particular strategy $s_{-i}\in S_{-i}^i$ given the observation of history $h\in H$, as \[P_i (s_{-i} | h) = \frac{P_i (h | s_{-i}) P_i (s_{-i})}{\sum_{s_{-i}' \in S_{-i}^i} P_i (h | s_{-i}') P_i (s_{-i}') }.\]
Page number: 204
Section number: 7.4
Date: Feb 27, 2010
Name: Kevin
Content: Added footnote: "For consistency with the literature on reinforcement learning, in this section we use the notation $s$ and $S$ for a state and set of states respectively, rather than for a strategy profile and set of strategy profiles as elsewhere in the book."
Page number: 211
Section number: 7.6
Date: Feb 27, 2010
Name: Kevin
Content: Renamed the set of target opponent strategies from $S$ to $\tilde{S}$ for consistency with the rest of the book, in which $S$ denotes the set of all strategy profiles.
Page number: 213-218
Section number: 7.7
Date:Feb 27, 2010
Name: Nicolas Lambert
Content: All instances of S should be replaced by s for consistency with the rest of the book.
Page number: 215
Section number:7.7.1
Date:10.28.09
Name:Yoav
Email:
Content:Defs 7.7.3 and 7.7.4 (stable steady state and asymp stable state) are missing "for sufficiently small $\epsilon$..."