A reading list on Bayesian methods

This list is intended to introduce some of the tools of Bayesian statistics and machine learning that can be useful to computational research in cognitive science. The first section mentions several useful general references, and the others provide supplementary readings on specific topics. If you would like to suggest some additions to the list, contact Tom Griffiths.

The sections covered in this list are:

General introduction
Classics on the interpretation of probability
Model selection and model averaging
The EM algorithm
Monte Carlo methods
Graphical models
Hidden Markov models and DBNs
Bayesian methods and neural networks

General introduction

There are no comprehensive treatments of the relevance of Bayesian methods to cognitive science. However, Trends in Cognitive Sciences recently ran a special issue (Volume 10, Issue 7) on probabilistic models of cognition that has a number of relevant papers. You can also check out the IPAM graduate summer school on probabilistic models of cognition at which many of the authors of these papers gave presentations.

The slides from three tutorials on Bayesian methods presented at the Annual Meeting of the Cognitive Science Society might also be of interest:

The 2004 tutorial by Josh Tenenbaum and Tom Griffiths (384 slides, 10.5MB, PowerPoint format).
The 2006 tutorial by Tom Griffiths, Josh Tenenbaum, and Charles Kemp (Part I) (Part II) (Part III) (Part IV)
The 2008 tutorial by Tom Griffiths, Josh Tenenbaum, and Charles Kemp (Part I, PPT) (Part I, PDF) (Part II, PPT) (Part II, PDF) (Part III, PPT) (Part III, PDF) (Part IV, PPT) (Part IV, PDF)
The 2010 tutorial by Tom Griffiths, Josh Tenenbaum, and Charles Kemp (Part I, PPT) (Part I, PDF) (Part II, PPT) (Part II, PDF) (Part III, PPT) (Part III, PDF) (Part IV, PPT) (Part IV, PDF)

The 2006, 2008, and 2010 tutorials were based on material appearing in three papers:

Griffiths, T. L., & Yuille, A. (2006). A primer on probabilistic inference. Trends in Cognitive Sciences, 10, (online supplement to issue 7). (pdf)
Tenenbaum, J. B., Griffiths, T. L., & Kemp, C. (2006). Theory-based Bayesian models of inductive learning and reasoning. Trends in Cognitive Science, 10, 309-318. (pdf)
Griffiths, T. L., Kemp, C., and Tenenbaum, J. B. (2008). Bayesian models of cognition. In Ron Sun (ed.), The Cambridge handbook of computational cognitive modeling. Cambridge University Press. (manuscript pdf)

Modern artificial intelligence uses a lot of statistical notions, and one of the best places to learn about some of these ideas and their relevance to topics related to cognition is

Russell, S., & Norvig, P. (2002). Artificial Intelligence: A Modern Approach (2nd ed.). Englewood Cliffs, NJ: Prentice Hall.

Radford Neal gave a tutorial presentation at NIPS 2004 on Bayesian machine learning, which outlines some of the philosophy of Bayesian inference, its relevance to the study of learning, and some fundamental methods.

David Mackay has written an excellent introduction to information theory and statistical inference which covers many topics relevant to cognitive science:

Mackay, D. J. C. (2003). Information theory, inference, and learning algorithms. Cambridge, UK: Cambridge University Press.

Information about the book is available on his website, where you can also download a copy for online viewing.

Two introductory books on Bayesian statistics (as statistics, rather than the basis for AI, machine learning, or cognitive science) that assume only a basic background, are

Sivia, D. S. (1996). Data analysis: A Bayesian tutorial. Oxford: Oxford University Press.
Lee, P. M. (1997). Bayesian statistics. New York: Wiley.

There are several advanced texts on Bayesian statistics motivated by statistical decision theory:

Berger, J. O. (1993). Statistical decision theory and Bayesian analysis. New York: Springer.
Robert, C. P. (2001). The Bayesian choice: From decision-theoretic foundations to computational implementation. New York: Springer.

The latter is more recent and covers computational methods relevant to Bayesian statistics. The relevance of statistical decision theory to human and machine learning is illustrated in the early chapters of

Duda, R. O., and Hart, P. E. (1973). Pattern classification and scene analysis. New York: Wiley.

which are largely reproduced in the second edition

Duda, R. O., Hart, P. E., and Stork, D. G. (2000). Pattern classfication. New York: Wiley.

The subjective interpretation of probability motivates other advanced texts:

Bernardo, J. M., & Smith, M. F. A. (1994). Bayesian theory. New York: Wiley.
Jaynes, E. T. (1994). Probability theory: The logic of science. (now available as a bound book)

The former builds on the work of De Finetti, exploring its consequences in a range of situations. The latter comes out of the approach taken by E. T. Jaynes in statistical physics.

Finally, there are also several advanced texts motivated by statistical applications and data analysis:

Box, G. E. P., and Tiao, G. C. (1992). Bayesian inference in statistical analysis. New York: Wiley.
Gelman, A., Carlin, J. B., Stern, H. S., Rubin, D. B. (1995). Bayesian data analysis. London: Chapman and Hall.

The former is a classic, illustrating how frequentist methods can be understood from a Bayesian perspective and then going far beyond them. The latter considers the practical problems that can be addressed using Bayesian models, and has chapters on modern computational techniques.

Tom Minka has a number of tutorial papers that apply these ideas in several important cases, including inferring a gaussian distribution, inference about the uniform distribution, and Bayesian linear regression.

Classics on the interpretation of probability

De Finetti gives a detailed account of the structure and consequences of subjective probability. Jeffreys discusses the idea of uninformative priors, and defines the approach to choosing priors that bears his name. Savage is the classic text on the decision-theoretic approach to probability.

De Finetti, B. (1992). Theory of probability. New York: Wiley.
Jeffreys, H. (1939/1998). Theory of probability. Oxford: Oxford University Press.
Savage, L. J. (1954). The foundations of statistics. New York: Wiley.

Model selection and model averaging

A number of papers on model selection and model averaging by Raftery and colleagues are available here. There is also a webpage listing research on Bayesian model averaging. Some good reviews of both topics are:

Kass, R. E., and Raftery, A. E. (1994). Bayes factors.Technical Report No. 254, Department of Statistics, University of Washington.
Hoeting, J. A., Madigan, D., and Raftery, A. E. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14, 382-401.
Wasserman, L. (1997). Bayesian model selection and model averaging. Technical Report No. 666, Statistics Department, Carnegie Mellon University.

Mackay gives a detailed account of how these methods can be applied in artificial neural networks:

MacKay, D. J. C. (1995) Probable networks and plausible predictions - A review of practical Bayesian methods for supervised neural networks. Neuron, 6, 469-505.

The EM algorithm

A general introduction to the EM algorithm and its applications is given by Ghahramani and Jordan. Some of the motivation behind EM is explored by Neal and Hinton and in a tutorial by Minka.

Ghahramani, Z. and Jordan, M. I. (1994). Learning from incomplete data. Technical Report No. 1509, AI Lab, MIT.
Neal, R. M., and Hinton, G. E. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan (ed.) Learning in graphical models, pp. 355-368. Cambridge, MA: MIT Press.

Monte Carlo methods

Mackay motivates and explains several Monte Carlo methods. Neal gives a detailed introduction to Markov chain Monte Carlo. The other two books give examples of how these methods can be used in Bayesian models.

Mackay, D. J. C. (1998). Introduction to Monte Carlo methods. In M. I. Jordan (ed.) Learning in graphical models, pp. 175-204. Cambridge, MA: MIT Press.
Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto.
Gilks, W. R. , Richardson, S., and Spiegelhalter, D. J. (1995). Markov chain Monte Carlo in practice. London:Chapman and Hall.
Gelman, A., Carlin, J. B., Stern, H. S., Rubin, D. B. (1995). Bayesian data analysis. London: Chapman and Hall.

Graphical models

The classic reference on graphical models in artificial intelligence is

Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann.

This is supplemented by Pearl's more recent book, which considers how graphical models can be used to understand causality. In both books, the first two chapters introduce and motivate the ideas involved, while the later chapters explore the consequences of these ideas.

Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press.

Kevin Murphy has both a toolbox for simulating Bayesian networks in Matlab and a detailed tutorial on the subject, including an extensive reading list. Introductions to inference and learning in Bayesian networks are provided by Jordan and Weiss and Heckerman.

Jordan, M. I., and Weiss, Y. (2002). Graphical models: Probabilistic inference. In M. Arbib (Ed.), The Handbook of Brain Theory and Neural Networks, 2nd edition. Cambridge, MA: MIT Press.
Heckerman, D. (1995). A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research.

Hidden Markov models and DBNs

Kevin Murphy has an excellent toolbox for HMMs, as well as a recently written chapter on dynamic Bayesian networks. The classic reference on HMMs is:

Rabiner, L. (1989). A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. IEEE, 77, 257-286.

Bayesian methods and neural networks

MacKay has written a number of papers integrating Bayesian methods with artifical neural networks. Some of the connections between neural networks and probability are explored by Jordan and Neal.

Back