> > 
 Page number: p.220 (ebook)
 Section number: 7.5
 Date: 4/30/2014
 Name: Haden Lee
 Email: haden[dot]lee[at]stanford[dot]edu
 Content: A minor typo at the bottom of the page: "... and let \alpha^t(s_i) be ..." should be "\alpha^t(s)" to be consistent within the section (just remove the subscript "i").
 Page number: p.221 (ebook)
 Section number: 7.5, Definitions 7.5.1 and 7.5.2.
 Date: 4/30/2014
 Name: Haden Lee
 Email: haden[dot]lee[at]stanford[dot]edu
 Content: Definition 7.5.1 should be $R^t(s) = \alpha^t(s)  \alpha^t$ instead of $\alpha^t  \alpha^t(s)$ for two reasons: (A) The book mentions this definition's consistency with Def 3.4.5, but it seems the opposite is true when it is defined as in the book. (B) With the definition in the book, noregret leraning rule (def 7.5.2) seems wrong because if there is some pure strategy s that is 'better' than what the agent has been playing, then Pr([lim inf R^t(s)] <= 0) is always equal to 1, but this is exaclty what we want to avoid. Instead, either Def 7.5.2 should be changed to Pr([lim inf R^t(s)] <= 0) = 0 OR Def 7.5.1 should be changed (just flip the sign). The latter change seems more reasonable.
In addition, with this change, on page 221, a bullet point on Regret matching must also be changed. The weight for pure strategy s is defined as \sigma_i^{t+1}(s) = \frac{R^t(s)}{\sum_{s' \in S_i} R^t(s')}, but note that (with the new definition of R^t(s)), this weight can be negtaive. One natural fix is the following: \sigma_i^{t+1}(s) = \frac{ max(0, R^t(s)) }{\sum_{s' \in S_i} max(0, R^t(s')) }. That is, treat negative R^t(s) terms as zeros because if R^t(s) < 0 then it implies that what I have been playing is better than playing s, and thus I have no reason to place any weight on s. Instead, I should place weights on the strategies that gave me positive regrets.
