We’re often interested in the conjunction of more than two events (such as the likelihood that a hurricane will pass through campus, your car won’t work, and your friends will be unable or unwilling to help you). Remember that the probability of the conjunction of any three events A, B, and C can be no more likely than the probability of two of these events.
In thinking of the probability of the conjunction of multiple events, remember that each event A, B, etc., can represent a set of outcomes. Say that A is “useless friends,” B is “dead car” and C is “hurricane.” The probability of useless friends and a dead car is
Call “useless friends and a dead car” X. Then the probability of all three events co-occurring is
This tells us that the probability of all three events co-occuring is the probability of a hurricane, times the probability of that your friends and car will both be useless given that there is a hurricane. We can get further than this using the chain rule:
that is, the probability of all three events co-occurring is equal to the probability of a hurricane hitting us, times the likelihood of your car being dead given that there is a hurricane, times the likelihood of your friends being useless given that you are stuck without a car in a hurricane. The last of these is, I expect, small.
As you can see, problems in probability can become complex and cumbersome. But because we live in an uncertain world, an understanding of probability can help us make more sound, more rational decisions.
So. We’re interested in the probability of a hypothesis in light of some data, or P(H|D).
This depends on the prior probability of the hypothesis, or how likely it was before the data were collected, or P(H). It also depends upon how likely the data are given two states of the world, that is, the probability of the data given that the hypothesis is true, or P (D|H), and the probability of the data given that the hypothesis is not true, or P (D|not H).
Bayes’ theorem matters, because people as well as scientists are concerned with hypotheses. We live in an uncertain world, and constantly form hypotheses about it. We form these hypotheses as questions…
Bayes’ theorem is important because it describes how we should revise our estimates of the probability of a hypothesis in light of data. We can rewrite the above equation, substituting H (hypothesis) for A, and D (data) for B:
Let’s look at the soap-film hypothesis as an example. I begin with an initial P(H), my prior probability (often a base rate) that soap could dry out pens. Say it’s .3
I try a new pen on the board, write with it for 3 minutes, and ruin it. I think that, if my hypothesis is true, that this is likely (P (D|H) =.7). If my hypothesis is false, it is unlikely (P (D|not H) = .2). What is my posterior probability, or the revised estimate of the probability of the hypothesis in light of data?
My new subjective probability is .6. Apply Bayes’ theorem to the following questions to examine how probability estimates should change in light of data:
Anscombe, FJ. 1973. “Graphs in Statistical Analysis.” American Statistician 27 (1): 17–21.
Apicella, Coren L., Frank W. Marlowe, James H. Fowler, and Nicholas A. Christakis. 2012.
“Social Networks and Cooperation in Hunter-Gatherers.” Nature 481 (7382): 497–501.
https://doi.org/10/fz3v4v.
Baker, Monya. 2016. “Is There a Reproducibility Crisis?” Nature 533: 26.
Benjamin, Daniel J., James O. Berger, Magnus Johannesson, Brian A. Nosek, E.-J. Wagenmakers, Richard Berk, Kenneth A. Bollen, et al. 2018.
“Redefine Statistical Significance.” Nature Human Behaviour 2 (1): 6–10.
https://doi.org/10/cff2.
Blumenthal, Arthur L. 1975. “A Reappraisal of Wilhelm Wundt.” American Psychologist 30 (11): 1081.
Bond, Robert M., Christopher J. Fariss, Jason J. Jones, Adam D. I. Kramer, Cameron Marlow, Jaime E. Settle, and James H. Fowler. 2012.
“A 61-Million-Person Experiment in Social Influence and Political Mobilization.” Nature 489 (7415): 295–98.
https://doi.org/10/f3689v.
Bonomi, Flavio, Rodolfo Milito, Jiang Zhu, and Sateesh Addepalli. 2012.
“Fog Computing and Its Role in the Internet of Things.” In
Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing, 13–16. ACM.
https://doi.org/10/gft9b9.
Boyd, Ryan L, Ashwini Ashokkumar, Sarah Seraj, and James W Pennebaker. 2022. “The Development and Psychometric Properties of LIWC-22.” Austin, TX: University of Texas at Austin 10: 1–47.
Boyd, Ryan L., Paola Pasca, and Kevin Lanning. 2020.
“The Personality Panorama: Conceptualizing Personality Through Big Behavioural Data: The Personality Panorama.” Edited by John Rauthmann.
European Journal of Personality, April.
https://doi.org/10/gg4t8j.
Broman, Karl W., and Kara H. Woo. 2018.
“Data Organization in Spreadsheets.” The American Statistician 72 (1): 2–10.
https://doi.org/10.1080/00031305.2017.1375989.
Buckheit, Jonathan B., and David L. Donoho. 1995. “Wavelab and Reproducible Research.” In Wavelets and Statistics, 55–81. Springer.
Cartwright, Dorwin, and Frank Harary. 1956. “Structural Balance: A Generalization of Heider’s Theory.” Psychological Review 63 (5): 277.
Christakis, Nicholas A., and James H. Fowler. 2007.
“The Spread of Obesity in a Large Social Network over 32 Years.” N Engl J Med 357: 3709.
https://doi.org/10/dmrgt6.
———. 2013.
“Social Contagion Theory: Examining Dynamic Social Networks and Human Behavior.” Statistics in Medicine 32 (4): 556–77.
https://doi.org/10/ck2j.
Clarke, Russell, David Dorwin, and Rob Nash. 2009. “Is Open Source Software More Secure?” Homeland Security/Cyber Security.
Cleveland, William S., and Robert McGill. 1985.
“Graphical Perception and Graphical Methods for Analyzing Scientific Data.” Science, New Series 229 (4716): 828–33.
https://www.jstor.org/stable/1695272.
Cox, Jonathan, and Michael Lindell. 2013.
“Visualizing Uncertainty in Predicted Hurricane Tracks.” International Journal for Uncertainty Quantification 3 (2).
https://doi.org/10/gjjsfw.
Deane, Claudia. 2024. “Americans’ Deepening Mistrust of Institutions.”
Donoho, David. 2017.
“50 Years of Data Science.” Journal of Computational and Graphical Statistics 26 (4): 745–66.
https://doi.org/10.1080/10618600.2017.1384734.
Donoho, David L. 2010.
“An Invitation to Reproducible Computational Research.” Biostatistics 11 (3): 385–88.
https://doi.org/10/bxwkns.
Erdelyi, Matthew H. 1974.
“A New Look at the New Look: Perceptual Defense and Vigilance.” Psychological Review 81 (1): 1–25.
https://doi.org/10/cs5c5q.
FitzGerald, Ben, Peter L Levin, and Jacqueline Parziale. 2016. Open Source Software & the Department of Defense. Center for a New American Security.
Gandrud, Christopher. 2013. Reproducible Research with R and R Studio. CRC Press.
Garcia, David, Mansi Goel, Amod Kant Agrawal, and Ponnurangam Kumaraguru. 2018.
“Collective Aspects of Privacy in the Twitter Social Network.” EPJ Data Science 7: 1–13.
https://doi.org/10/cjhr.
Gleibs, Ilka H. 2014.
“Turning Virtual Public Spaces into Laboratories: Thoughts on Conducting Online Field Studies Using Social Network Sites.” Analyses of Social Issues and Public Policy 14 (1): 352–70.
https://doi.org/10/f6t7gd.
Grange, JA, D Lakens, F Adolfi, C Albers, F Anvari, M Apps, S Argamon, et al. 2018. “Justify Your Alpha.” Nature Human Behavior.
Harary, Frank. 1959.
“On the Measurement of Structural Balance.” Behavioral Science 4 (4): 316–23.
https://doi.org/10/cp9nfp.
Hastie, Reid, and Robyn M Dawes. 2010. Rational Choice in an Uncertain World: The Psychology of Judgment and Decision Making. Sage.
Healy, Kieran. 2017. “Data Visualization for Social Science: A Practical Introduction with r and Ggplot2.”
Henrich, Joseph, Steven J. Heine, and Ara Norenzayan. 2010.
“The Weirdest People in the World?” Behavioral and Brain Sciences 33 (2-3): 61–83.
https://doi.org/10/c9j35b.
Hicks, Stephanie C., and Rafael A. Irizarry. 2018.
“A Guide to Teaching Data Science.” The American Statistician 72 (4): 382–91.
https://doi.org/10/gfr5tf.
Hornik, Kurt, and The R Core Team. 2022. “R FAQ: The Comprehensive R Archive Network.”
Hullman, Jessica, Paul Resnick, and Eytan Adar. 2015.
“Hypothetical Outcome Plots Outperform Error Bars and Violin Plots for Inferences about Reliability of Variable Ordering.” Edited by Elena Papaleo.
PLOS ONE 10 (11): e0142444.
https://doi.org/10.1371/journal.pone.0142444.
Hvitfeldt, Emil, and Julia Silge. 2021. Supervised Machine Learning for Text Analysis in R. Chapman and Hall/CRC.
Ioannidis, John PA. 2005.
“Why Most Published Research Findings Are False.” PLoS Medicine 2 (8): e124.
https://doi.org/10/chhf6b.
Isaacson, Walter. 2014. The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution. Simon and Schuster.
Jackson, Dan. 2017. “The Netflix Prize: How a 1 Million Contest Changed Binge-Watching Forever.” Thrillist. Com.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013.
An Introduction to Statistical Learning. Vol. 103. Springer
Texts in
Statistics. New York, NY: Springer New York.
https://doi.org/10.1007/978-1-4614-7138-7.
Kelly, Janice R., Nicole E. Iannone, and Megan K. McCarty. 2016.
“Emotional Contagion of Anger Is Automatic: An Evolutionary Explanation.” British Journal of Social Psychology 55 (1): 182–91.
https://doi.org/10/gf6mn3.
Kondo, Marie. 2016. Spark Joy: An Illustrated Master Class on the Art of Organizing and Tidying up. Ten Speed Press.
Kumar, Devinder, Alexander Wong, and Graham W Taylor. 2017. “Explaining the Unexplained: A Class-Enhanced Attentive Response (Clear) Approach to Understanding Deep Neural Networks.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 36–44.
Lakatos, Imre. 1969. “Falsification and the Methodology of Scientific Research Programmes.” Criticism and the Growth of Knowledge. Cambridge University Press: Cambridge.
Lanning, Kevin. 1987.
“Some Reasons for Distinguishing Between ‘Non-normative Response’ and ‘Irrational Decision’.” The Journal of Psychology 121 (2): 109–17.
https://doi.org/10/fv4hh5.
———. 1994.
“Dimensionality of Observer Ratings on the California Adult Q-set.” Journal of Personality and Social Psychology 67 (July): 151–60.
https://doi.org/10/drnkvm.
———. 1996.
“Robustness Is Not Dimensionality: On the Sensitivity of Component Comparability Coefficients to Sample Size.” Multivariate Behavioral Research 31 (1): 33–46.
https://doi.org/10/dt6gb3.
———. 2017. “What Is the Relationship Between ‘Personality’ and ‘Social’ Psychologies? Network, Community, and Whole Text Analyses of the Structure of Contemporary Scholarship.” Collabra: Psychology 3 (1): 8.
———. 2018. “Data Visualizations in Personality and Social Psychology: Challenges in Representing Taxonomic, Community, and Developmental Structures.” Association of Psychological Science Annual Convention, San Francisco, May 25.
Lanning, Kevin, and Ari Rosenberg. 2009.
“The Dimensionality of American Political Attitudes: Tensions Between Equality and Freedom in the Wake of September 11.” Behavioral Sciences of Terrorism and Political Aggression 1 (2): 84–100.
https://doi.org/10/fckr37.
Leek, Jeffrey T, and Roger D Peng. 2015.
“Statistics: P Values Are Just the Tip of the Iceberg.” Nature 520 (7549): 612.
https://doi.org/10/gfb8jm.
Loevinger, Jane. 1957.
“Objective Tests as Instruments of Psychological Theory.” Psychological Reports 3 (3): 635–94.
https://doi.org/10/b27jpk.
Loukides, Hilary, Mike. 2018. Ethics and Data Science. O’Reilly.
Matz, S. C., M. Kosinski, G. Nave, and D. J. Stillwell. 2017.
“Psychological Targeting as an Effective Approach to Digital Mass Persuasion.” Proceedings of the National Academy of Sciences 114 (48): 12714–19.
https://doi.org/10.1073/pnas.1710966114.
McShane, Blakeley B., David Gal, Andrew Gelman, Christian Robert, and Jennifer L. Tackett. 2017.
“Abandon Statistical Significance.” arXiv:1709.07588 [Stat], September.
https://arxiv.org/abs/1709.07588.
Merton, Robert K. 1936.
“The Unanticipated Consequences of Purposive Social Action.” American Sociological Review 1 (6): 894–904.
https://doi.org/10/fjg8hf.
Miguel, Edward, Colin Camerer, Katherine Casey, Joshua Cohen, Kevin M Esterling, Alan Gerber, Rachel Glennerster, et al. 2014.
“Promoting Transparency in Social Science Research.” Science 343 (6166): 30–31.
https://doi.org/10/gdrcpz.
Milgram, Stanley. 1967. “The Small World Problem.” Psychology Today 2 (1): 6067.
Munafò, Marcus R., Brian A. Nosek, Dorothy V. M. Bishop, Katherine S. Button, Christopher D. Chambers, Nathalie Percie du Sert, Uri Simonsohn, Eric-Jan Wagenmakers, Jennifer J. Ware, and John P. A. Ioannidis. 2017.
“A Manifesto for Reproducible Science.” Nature Human Behaviour 1 (1): 0021.
https://doi.org/10.1038/s41562-016-0021.
Narayanan, Arvind, and Vitaly Shmatikov. 2008.
“Robust De-anonymization of Large Sparse Datasets.” In
2008 IEEE Symposium on Security and Privacy (Sp 2008), 111–25. Oakland, CA, USA: IEEE.
https://doi.org/10.1109/SP.2008.33.
Nikzad, Afshin, Mohammad Akbarpour, Michael A. Rees, and Alvin E. Roth. 2021.
“Global Kidney Chains.” Proceedings of the National Academy of Sciences of the United States of America 118 (36): e2106652118.
https://doi.org/10.1073/pnas.2106652118.
Ondaatje, Michael, and Walter Murch. 2002. The Conversations: Walter Murch and the Art of Editing Film. A&C Black.
Open Science Collaboration. 2015.
“Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716–16.
https://doi.org/10/68c.
Page, Lawrence, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. “The PageRank Citation Ranking: Bringing Order to the Web.”
Peng, Roger. 2018. “Teaching r to New Users - from Tapply to the Tidyverse.”
Peng, Roger D. 2014. R Programming for Data Science.
Pennebaker, James W., Cindy K. Chung, Joey Frazee, Gary M. Lavergne, and David I. Beaver. 2014.
“When Small Words Foretell Academic Success: The Case of College Admissions Essays.” Edited by Qiyong Gong.
PLoS ONE 9 (12): e115844.
https://doi.org/10/f6z8q5.
Phillips, Nathaniel D., Hansjörg Neth, Jan K. Woike, and Wolfgang Gaissmaier. 2017. “FFTrees: A Toolbox to Create, Visualize, and Evaluate Fast-and-Frugal Decision Trees.” Judgment and Decision Making 12 (4): 344–68.
Poulin, Michael J., and Claudia M. Haase. 2015.
“Growing to Trust: Evidence That Trust Increases and Sustains Well-Being Across the Life Span.” Social Psychological and Personality Science 6 (6): 614–21.
https://doi.org/10.1177/1948550615574301.
Reinsel, David, John Gantz, and John Rydning. 2025. “The Digitization of the World.”
Shattuck, Roger. 1997. Forbidden Knowledge: From Prometheus to Pornography. Houghton Mifflin Harcourt.
Silge, Julia, and David Robinson. 2017. Text Mining with R: A Tidy Approach. " O’Reilly Media, Inc.".
Slovic, Paul, David Zionts, Andrew K Woods, Ryan Goodman, and Derek Jinks. 2013.
“Psychic Numbing and Mass Atrocity.” The Behavioral Foundations of Public Policy, 126–42.
https://doi.org/10/gk4945.
Sternberg, Robert J. 1999.
“The Theory of Successful Intelligence.” Review of General Psychology 3 (4): 292–316.
https://doi.org/10/cqrkxh.
Sternberg, Robert J. 2018.
“Theories of Intelligence.” In, edited by Steven I. Pfeiffer, Elizabeth Shaunessy-Dedrick, and Megan Foley-Nicpon, 145161. American Psychological Association.
https://doi.org/10.1037/0000038-010.
Sullivan, J. L., and J. E. Transue. 1999.
“THE PSYCHOLOGICAL UNDERPINNINGS OF DEMOCRACY: A Selective Review of Research on Political Tolerance, Interpersonal Trust, and Social Capital.” Annual Review of Psychology 50 (1): 625–50.
https://doi.org/10/cmthvk.
Sweeney, Latanya. 2005.
“Privacy-Enhanced Linking.” ACM SIGKDD Explorations Newsletter 7 (2): 72–75.
https://doi.org/10/bjvpjh.
Szucs, Denes, and John Ioannidis. 2017.
“When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment.” Frontiers in Human Neuroscience 11: 390.
https://doi.org/10/gc6vws.
Thies, Justus, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015.
“Real-Time Expression Transfer for Facial Reenactment.” ACM Trans. Graph. 34 (6): 183–81.
https://doi.org/10/f7wqz7.
Tufte, Edward R. 2001. The Visual Display of Quantitative Information. 2nd ed. Cheshire, CT: Graphics Press.
Tukey, John W. 1962.
“The Future of Data Analysis.” The Annals of Mathematical Statistics 33 (1): 1–67.
https://doi.org/10/d48nqg.
———. 1977. “EDA: Exploratory Data Analysis.” Reading, Mass.
Tversky, Amos, and Daniel Kahneman. 1974.
“Judgment Under Uncertainty: Heuristics and Biases.” Science 185 (4157): 1124–31.
https://doi.org/10/gwh.
Wainer, Howard. 2007.
“The Most Dangerous Equation.” American Scientist 95 (3): 249.
https://doi.org/10.1511/2007.65.249.
Wainer, H, and D Thissen. 1981. “Graphical Data Analysis,” 51.
Watts, Duncan J. 2004.
“The "New" Science of Networks.” Annual Review of Sociology 3: 243–70.
https://www.jstor.org/stable/29737693.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy McGowan, Romain François, Garrett Grolemund, et al. 2019.
“Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686.
https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science 2e. " O’Reilly Media, Inc.".
Wu, Tim. 2019. “How Capitalism Betrayed Privacy.” The New York Times, 5.