Difference: LearningTeachingErrata (4 vs. 5)

Revision 52014-04-30 - HadenLee

Line: 1 to 1
 
META TOPICPARENT name="Errata"

Chapter 7: Learning and Teaching

Changed:
<
<
  • Page number:
    • Section number:
    • Date:
    • Name:
    • Email:
    • Content:
  • Page number:
    • Section number:
    • Date:
    • Name:
    • Email:
    • Content:
>
>
  • Page number: p.220 (e-book)
    • Section number: 7.5
    • Date: 4/30/2014
    • Name: Haden Lee
    • Email: haden[dot]lee[at]stanford[dot]edu
    • Content: A minor typo at the bottom of the page: "... and let \alpha^t(s_i) be ..." should be "\alpha^t(s)" to be consistent within the section (just remove the subscript "i").
  • Page number: p.221 (e-book)
    • Section number: 7.5, Definitions 7.5.1 and 7.5.2.
    • Date: 4/30/2014
    • Name: Haden Lee
    • Email: haden[dot]lee[at]stanford[dot]edu
    • Content: Definition 7.5.1 should be $R^t(s) = \alpha^t(s) - \alpha^t$ instead of $\alpha^t - \alpha^t(s)$ for two reasons: (A) The book mentions this definition's consistency with Def 3.4.5, but it seems the opposite is true when it is defined as in the book. (B) With the definition in the book, no-regret leraning rule (def 7.5.2) seems wrong because if there is some pure strategy s that is 'better' than what the agent has been playing, then Pr([lim inf R^t(s)] <= 0) is always equal to 1, but this is exaclty what we want to avoid. Instead, either Def 7.5.2 should be changed to Pr([lim inf R^t(s)] <= 0) = 0 OR Def 7.5.1 should be changed (just flip the sign). The latter change seems more reasonable.
      In addition, with this change, on page 221, a bullet point on Regret matching must also be changed. The weight for pure strategy s is defined as \sigma_i^{t+1}(s) = \frac{R^t(s)}{\sum_{s' \in S_i} R^t(s')}, but note that (with the new definition of R^t(s)), this weight can be negtaive. One natural fix is the following: \sigma_i^{t+1}(s) = \frac{ max(0, R^t(s)) }{\sum_{s' \in S_i} max(0, R^t(s')) }. That is, treat negative R^t(s) terms as zeros because if R^t(s) < 0 then it implies that what I have been playing is better than playing s, and thus I have no reason to place any weight on s. Instead, I should place weights on the strategies that gave me positive regrets.
 

The following errors are fixed in the second printing of the book and online PDF v1.1

  • Page number: 200
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback