The Hot Stove Effect: Why AI Learns to Be a Pessimist

featured-image

Why learning algorithms are pessimistic: Hot Stove Effect shows negativity bias persists in Bayesian & sampling-based learning.

:::infoAuthor:(1) Jerker Denrell, University of Warwick ([email protected]):::Table of LinksAbstract and 1. Introduction2.

Illustration3. Intuition4. General Theorem about Biased Averages5.



Alternative Learning Models6. Bayesian Updating7. Implications for Understanding LearningAppendix and ReferencesAbstract.

The Hot Stove Effect is a negativity bias resulting from the adaptive character of learning. The mechanism is that learning algorithms that pursue alternatives with positive estimated values, but avoid alternatives with negative estimated values, will correct errors of overestimation but fail to correct errors of underestimation. Here, we generalize the theory behind the Hot Stove Effect to settings in which negative estimates do not necessarily lead to avoidance but to a smaller sample size (i.

e., a learner selects fewer of alternative B if B is believed to be inferior but does not entirely avoid B). We formally demonstrate that the negativity bias remains in this set-up.

We also show there is a negativity bias for Bayesian learners in the sense that most such learners underestimate the expected value of an alternative.1. IntroductionLearning from experience does not necessarily generate unbiased beliefs, partly due to psychological biases but also due to biases in the information learners’ sample and get exposed to.

One important bias in sampling is the so-called "hot stove effect" (Denrell & March 2001) which refers to the asymmetry in error correction generated by adaptive learning processes. The key idea is that the tendency to avoid alternatives with unfavorable past outcomes generates a biased set of experiences. Alternatives that are underestimated - believed to be worse than what they are – are unlikely to be tried and sampled again, which implies that errors of underestimation are unlikely to be corrected.

Alternatives that are overestimated - believed to be better than what they are - are likely to be tried and sampled again which implies that errors of overestimation are likely to be corrected. This asymmetry in error correction generates a biased set of experiences that, in turn, can give rise to biased judgments (Denrell, 2005), including in-group bias and apparent risk-averse behavior (Denrell, 2005, 2007).There is good experimental support for the hot-stove effect at the individual level and researchers in psychology have relied on the hot-stove effect to explain regularities in risk taking in experimental studies (Erev & Roth, 2014) and why people underestimate the trustworthiness of others (Fetchenhauer & Dunning, 2014).

Researchers in finance (Dittmar & Duchin, 2016) have used field data to demonstrate that the hot stove effect can explain the risk-taking behavior of executives. The hot stove effect also has important implications for information aggregation and online reviews: If consumers avoid products with poor reviews and consumers review products they buy, negative reviews will be more persistent than positive reviews, generating biased averages (Mens et al., 2018).

Past theoretical work on the hot stove effect has assumed that negative experiences may lead to avoidance, that is, the alternative has not been tried at all (Denrell, 2005, 2007). Clearly, if no more information is available, a negative impression will persist. In many settings, however, a negative belief or impression may not lead to avoidance of the alternative, but rather to a smaller sample size.

An animal with a more favorable impression of the energy content of a plant of type A than of a plant of type B may search for plants of type A. During this search for plants of type A, some plants of type B may be incidentally found. The result is that the animal samples more plants of type A (because the search is focused on such plants) than of type B, but the animal does not avoid plants of type B but simply samples fewer of them.

Similarly, a firm may prefer to hire graduates from university A, but may hire some, although fewer, graduates from university B if there are not enough graduates from A that accept its offers.In this paper, I generalize the theory behind the hot-stove effect and show that it holds even if a negative impression only leads to a reduction in the sample size, not necessarily to avoidance. Specifically, I show that the final belief will be biased for a broad class of learning algorithms in which the sample size is a function of the past belief.

If the sample size is higher and the past belief was more positive, there is a negativity bias: The final belief will be lower than the expected value of the random variable the learner is learning about. This result also applies to taking averages: the average of a sample will be biased if the total sample size is a function of the average based on an initial subset. The bias will be eliminated in the long run as the number of samples increases, but in the short run it can be significant.

I also examine whether the bias remains for a Bayesian learner. I show that there is no bias on average for a Bayesian learner: the average belief is equal to the expected value of the random variable the learner is learning about. However, I also show that most Bayesian learners will underestimate the variable they are learning about if the sample size is an increasing function of the initial belief.

These results imply that a large class of sensible and adaptive learning processes can be expected to generate biased beliefs, even if decision-makers process the available information in a seemingly unbiased way (i.e., taking averages).

Indeed, even rational Bayesian learners will tend to underestimate the expected value of an alternative (i.e., most will do so) if the total sample size is higher when the initially observed payoffs are high.

Adaptive sampling processes are common and often necessary to reduce search costs. It is not sensible, for example, to continue to sample an alternative a fixed number of times if initial trials reveal that this alternative has a much lower payoff than other available alternatives. The results in this paper show that even unbiased processing of information generated by such sampling policies can generate seemingly biased beliefs.

Adaptive sampling policies thus offer an alternative explanation of biases in beliefs, such as a tendency to underestimate the extent to which others are trustworthy.2. Illustration \ 3.

Intuition \ \As this example shows, adaptive sampling implies that negative first-period averages are more ’persistent’: they are given a larger weight than positive firstperiod averages. Because negative and positive first-period averages of the same magnitude are equally likely, when the payoff distribution is normal with mean zero (this distribution is symmetric around zero), the greater persistence of negative first-period averages explains the overall negative bias.The impact of variance can also be understood in a similar way.

If the payoff distribution is more variable, the first period averages will tend to differ more from zero, both in the positive and negative directions. Positive first-period averages will tend to regress more to the mean (zero) than negative averages. When the negative first-period averages are more extreme because the variance is greater, the result is a stronger bias.

Of course, the bias will be eliminated in the long run, as the number of samples increases since the average of n samples of a random variable X will converge to its expected value. In the short run, however, the bias can be significant.4.

General Theorem about Biased AveragesThe bias generated by an adaptive sampling size policy does not only hold for the normal distribution and the specific binary sampling policy considered above (high above zero, low below), but it holds for any distribution and a large class of adaptive sampling policies.\ \Proof: See the Appendix.For unimodal symmetric distributions, centered on the expected value E[x] = u, it can also be shown that a majority of the averages, after the second period, are below u if higher averages in the first period lead to larger sample sizes in the second period:\ \Proof: See the Appendix.

5. Alternative Learning ModelsSo far we have assumed that the learner computes the average of the observed payoffs, but a similar bias holds for several other learning models. Suppose, for example, that the learner gives more weight to the most recently observed payoff.

This results in a similar bias:\ \Proof: See the Appendix.6. Bayesian updating \ \We now show that Bayesian updating remains unbiased in this latter sense, even if the sample size in period two depends on the belief after period one.

This is true for any distribution:\ \ \ \Proof: See the Appendix.Thus, even if Bayesian learners are learning about a symmetric distribution and they have a prior that is symmetric around zero (m = 0), most learners will have a negative belief after sampling. This is due to the adaptive sampling policy.

The intuition is similar to why the learning policy that takes the average is biased: negative initial beliefs are more persistent because the learner will then not take many additional samples, and the initial few negative observations will be weighted heavily. Note that such a bias would not occur if the learner followed a fixed sampling policy and decided, at the outset, how many samples to take. If the sample size was fixed, the belief would be equally likely to be positive or negative, when learning about a normally distributed payoff with a mean taken from normally distributed prior with mean zero.

Note also that when sampling is adaptive, all Bayesian learners know that only 50% of the means are negative. Still, most Bayesian learners believe that the mean they observe is negative.\ \ \That a majority of Bayesian learners end up with a negative belief may seem paradoxical since on average there is no bias: E[b2] = 0 when m = 0, i.

e., the average belief, averaging over all Bayesian learners, is equal to the mean of the prior. The paradox is resolved by noting that the learners who believe that the mean is negative are less confident in this estimate than the learners who believe that the mean is positive and therefore took a larger sample in the second period.

This results in a skewed distribution of beliefs after the second period. Most beliefs are below m = 0 but those below m = 0 are less likely to be extreme beliefs than beliefs above m = 0 since beliefs below m = 0 are based on smaller sample sizes than those above m = 0. This can be seen in Figure 1.

\ 7. Implications for Understanding LearningThe bias resulting from adaptive sampling policies implies that even seemingly unbiased learning algorithms, such as averaging or Bayesian updating, can result in biased beliefs, at least in the sense that most learners underestimate (or overestimate) an alternative. This offers an alternative explanation of some judgment biases; an explanation that does not require a psychological bias that assumes biases in information processing.

For example, suppose that it is observed that a firm tends to underestimate the productivity of graduates from universities. Such a negativity bias. can be explained by an adaptive sampling policy (the firm tends to hire fewer people from universities it has had a worse experience with) and does not require motivated reasoning or a cognitive bias.

Appendix \ \Proof of Theorem 2:Without loss generality we only consider cases when u = 0. Aunimodal random variable symmetric around the mode k can always be represented as x = k + ε where ε has a unimodal distribution symmetric around zero.We first prove the following Lemma\ \The proof now follows by induction, because the sum of k independent random variables with a unimodal distribution symmetric around u is also unimodal symmetric centered on u (Shaked & Shantikumar (2007), page 173).

\ \Consider a fixed value of r. Suppose r > 0. The average after the second period becomes less than zero whenever\ \The probability that the average is below zero after the second period is thus\ \Altogether, the probability that the average is below zero after the second period equals\ \ \ \Altogether, the probability that a positive belief after period one turns into a negative belief after period two equals\ \where f(z) is the density of the sum of the payoffs in the first period which is a normal distribution with mean zero.

\ \Altogether, the probability that a negative belief after period one turns into a positive belief after period two, Pr(− → +), equals\ \After the variable substitution, z = −y, this integral equals\ \where we have used the fact that f(−y) = f(y).We wish to show that Pr(+ → −) > Pr(− → +), which requires us to show that\ \over the positive domain of z. To do so, note first that the derivative of\ \with respect to w is\ ReferencesDenrell, J.

(2005). Why most people disapprove of me: Experience sampling in impression formation. Psychological Review, 112, 951–978.

Denrell, J. (2007). Adaptive learning and risk taking.

Psychological Review, 114, 177–187.Denrell, J., & March, J.

(2001). Adaptation as information restriction: The hot stove effect. Psychological Review, 12, 523–538.

Dittmar, A., & Duchin, R. (2016).

Looking in the rear-view mirror: The effect of managers’ professional experiences on corporate financial policy. Review of Financial Studies, 29, 565–602.Erev, I.

, & Roth, A. (2014). Maximization, learning, and economic behavior.

Proceedings of the National Academy of Sciences, 111, 10818–10825.Fetchenhauer, D., & Dunning, D.

(2014). Why so cynical?: Asymmetric feedback underlies misguided skepticism regarding the trustworthiness of others. Psychological Science, 21, 189–193.

Mens, G. L., Kovács, B.

, Avrahami, J., & Kareev, Y. (2018).

How endogenous crowd formation undermines the wisdom of the crowd in online ratings. Psychological Science, 29, 1475–1490.Shaked, M.

, & Shantikumar, J. G. (2007).

Stochastic orders. New York, NY: Springer. Williams, D.

(1991). Probability with martingales. Cambridge, UK: Cambridge University Press.

:::infoAuthor:(1) Jerker Denrell, University of Warwick ([email protected])::::::infoThis paper is available on arxiv under CC BY 4.0 DEED license.

:::.