References

Multiple events

We’re often interested in the conjunction of more than two events (such as the likelihood that a hurricane will pass through campus, your car won’t work, and your friends will be unable or unwilling to help you). Remember that the probability of the conjunction of any three events A, B, and C can be no more likely than the probability of two of these events.

Which of the following is more likely?
A: A star athlete becomes a drug addict, enters a treatment program, and wins a championship.
or B: A star athlete becomes a drug addict and wins a championship.
(Hastie & Dawes, p. 351)

In thinking of the probability of the conjunction of multiple events, remember that each event A, B, etc., can represent a set of outcomes. Say that A is “useless friends,” B is “dead car” and C is “hurricane.” The probability of useless friends and a dead car is

P (A and B) = P (A) P (B|A)

Call “useless friends and a dead car” X. Then the probability of all three events co-occurring is

P (X and C) = P (C) P (X|C)

This tells us that the probability of all three events co-occuring is the probability of a hurricane, times the probability of that your friends and car will both be useless given that there is a hurricane. We can get further than this using the chain rule:

P (A and B and C) = P (C) P (A and B|C)

= P (C) P (B|C) P (A|B and C)

that is, the probability of all three events co-occurring is equal to the probability of a hurricane hitting us, times the likelihood of your car being dead given that there is a hurricane, times the likelihood of your friends being useless given that you are stuck without a car in a hurricane. The last of these is, I expect, small.

As you can see, problems in probability can become complex and cumbersome. But because we live in an uncertain world, an understanding of probability can help us make more sound, more rational decisions.

Bayes’ theorem is a tool for estimating one conditional probability from another. It’s often expressed in terms of Hypotheses (H) and Data (D).

So. We’re interested in the probability of a hypothesis in light of some data, or P(H|D).

This depends on the prior probability of the hypothesis, or how likely it was before the data were collected, or P(H). It also depends upon how likely the data are given two states of the world, that is, the probability of the data given that the hypothesis is true, or P (D|H), and the probability of the data given that the hypothesis is not true, or P (D|not H).

If we have these three in li

P (B|A):

It may look complicated at first:

\[ P (H|D) = \frac{P(H)*P(D|H)}{P(H)* P(D|H) + (1-P(H))* P(D|not H)} \]

Bayes’ theorem matters, because people as well as scientists are concerned with hypotheses. We live in an uncertain world, and constantly form hypotheses about it. We form these hypotheses as questions…

Do pens get trashed on the whiteboard because of a soap film?
Am I allergic to shellfish?
Does studying at the last minute help my exam performance?
Does she love me?
Is my car safe in that parking lot?

Bayes’ theorem is important because it describes how we should revise our estimates of the probability of a hypothesis in light of data. We can rewrite the above equation, substituting H (hypothesis) for A, and D (data) for B:

P (H|D) = P(H) P(D|H) /\[ P(H) P(D\|H) + P(not H) P(D\|not H) \]

Let’s look at the soap-film hypothesis as an example. I begin with an initial P(H), my prior probability (often a base rate) that soap could dry out pens. Say it’s .3

P(H) = soap dries out the pen = .3

I try a new pen on the board, write with it for 3 minutes, and ruin it. I think that, if my hypothesis is true, that this is likely (P (D|H) =.7). If my hypothesis is false, it is unlikely (P (D|not H) = .2). What is my posterior probability, or the revised estimate of the probability of the hypothesis in light of data?

P (H|D) = (.3)(.7) / \[ (.3)(.7) + (.7)(.2) \]

My new subjective probability is .6. Apply Bayes’ theorem to the following questions to examine how probability estimates should change in light of data:

Initial hypothesis	P (H)	New data	P (D\|H)	P (D\|not H)	P (H\|D)
Pens don’t last because of soap film on the whiteboard	.3	A pen is quickly trashed	.7	.2	.6
I am allergic to shellfish	.02	Rash after eating clams	.9	.01	?
Studying at the last minute helps my exam performance	.6	I failed	.2	.7	?
She loves me	.3	She gave me a present	.6	.3	?
My car is safe in that parking lot	.3	I just heard a car alarm	.4	.4	?

Bayes’ theorem is a normative model of probability - how we should revise probability estimates, not a descriptive model. That is, the research suggests that people don’t typically revise probability estimates in this way, even though we arguably should.

Implications: Can the principles of probability help us solve these problems?

A cab was involved in a hit-and-run accident at night and two cab companies, the Green and the Blue, operate in the neighborhood in which the accident occurred. Of the cabs in the neighborhood, 85% are Green and 15% are Blue. A witness identified the cab as Blue. The court tested the accuracy of the witness under the same circumstances that existed on the night of the accident and the witness correctly identified each of the two colors of cabs 80% of the time.
1. What is the probability that the cab was Blue?
Imagine two giant jars are each filled with thousands of jelly beans. In the first jar, 70% of the jelly beans are red and the rest are blue. In the second jar, 70% are blue and the rest are red. Suppose one jar is chosen, at random, and 12 jelly beans are taken from it: 8 blue jelly beans and 4 red jelly beans. What are the chances that the 12 jelly beans were taken from the jar with mostly red jelly beans?

___________%.

You are manager of a baseball team. It is the bottom of the ninth inning, there are two outs, and you are losing by one run. You will lose the game if the next batter makes an out. But because there are base runners, you will win the game if the batter gets a hit. You can choose one of two batters
1. Smith has an overall batting average of .320 in 400 times at bat, but has only bat .250 in 20 plate appearances against this pitcher
2. Jones has an overall batting average of only .250 in 400 times at bat, but has bat .320 in 20 plate appearances against this pitcher
James grew up in a Bohemian family. His father was a musician, and his mother was a painter. They lived together for 40 years and never got married. James was a very talented child with a special gift for comedy, but he turned into a rebellious troublemaker in his youth. He dropped out of college after two years and traveled to Asia to learn crafts. James is now 35 years old. Of 100 people like James, how many are
1. Republicans?
2. Artists?
3. Republican Artists?
Steve is very shy and withdrawn, invariably helpful, but with little interest in people, or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.” What is the probability that Steve is
1. a farmer?
2. a Salesman?
3. an Airline Pilot?
4. a Librarian?
5. a Physician?

Anscombe, FJ. 1973. “Graphs in Statistical Analysis.” American Statistician 27 (1): 17–21.

Apicella, Coren L., Frank W. Marlowe, James H. Fowler, and Nicholas A. Christakis. 2012. “Social Networks and Cooperation in Hunter-Gatherers.” Nature 481 (7382): 497–501. https://doi.org/10/fz3v4v.

Baker, Monya. 2016. “Is There a Reproducibility Crisis?” Nature 533: 26.

Benjamin, Daniel J., James O. Berger, Magnus Johannesson, Brian A. Nosek, E.-J. Wagenmakers, Richard Berk, Kenneth A. Bollen, et al. 2018. “Redefine Statistical Significance.” Nature Human Behaviour 2 (1): 6–10. https://doi.org/10/cff2.

Blumenthal, Arthur L. 1975. “A Reappraisal of Wilhelm Wundt.” American Psychologist 30 (11): 1081.

Bond, Robert M., Christopher J. Fariss, Jason J. Jones, Adam D. I. Kramer, Cameron Marlow, Jaime E. Settle, and James H. Fowler. 2012. “A 61-Million-Person Experiment in Social Influence and Political Mobilization.” Nature 489 (7415): 295–98. https://doi.org/10/f3689v.

Bonomi, Flavio, Rodolfo Milito, Jiang Zhu, and Sateesh Addepalli. 2012. “Fog Computing and Its Role in the Internet of Things.” In Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing, 13–16. ACM. https://doi.org/10/gft9b9.

Boyd, Ryan L, Ashwini Ashokkumar, Sarah Seraj, and James W Pennebaker. 2022. “The Development and Psychometric Properties of LIWC-22.” Austin, TX: University of Texas at Austin 10: 1–47.

Boyd, Ryan L., Paola Pasca, and Kevin Lanning. 2020. “The Personality Panorama: Conceptualizing Personality Through Big Behavioural Data: The Personality Panorama.” Edited by John Rauthmann. European Journal of Personality, April. https://doi.org/10/gg4t8j.

Broman, Karl W., and Kara H. Woo. 2018. “Data Organization in Spreadsheets.” The American Statistician 72 (1): 2–10. https://doi.org/10.1080/00031305.2017.1375989.

Buckheit, Jonathan B., and David L. Donoho. 1995. “Wavelab and Reproducible Research.” In Wavelets and Statistics, 55–81. Springer.

Cartwright, Dorwin, and Frank Harary. 1956. “Structural Balance: A Generalization of Heider’s Theory.” Psychological Review 63 (5): 277.

Christakis, Nicholas A., and James H. Fowler. 2007. “The Spread of Obesity in a Large Social Network over 32 Years.” N Engl J Med 357: 3709. https://doi.org/10/dmrgt6.

———. 2013. “Social Contagion Theory: Examining Dynamic Social Networks and Human Behavior.” Statistics in Medicine 32 (4): 556–77. https://doi.org/10/ck2j.

Clarke, Russell, David Dorwin, and Rob Nash. 2009. “Is Open Source Software More Secure?” Homeland Security/Cyber Security.

Cleveland, William S., and Robert McGill. 1985. “Graphical Perception and Graphical Methods for Analyzing Scientific Data.” Science, New Series 229 (4716): 828–33. https://www.jstor.org/stable/1695272.

Cox, Jonathan, and Michael Lindell. 2013. “Visualizing Uncertainty in Predicted Hurricane Tracks.” International Journal for Uncertainty Quantification 3 (2). https://doi.org/10/gjjsfw.

Deane, Claudia. 2024. “Americans’ Deepening Mistrust of Institutions.”

Donoho, David. 2017. “50 Years of Data Science.” Journal of Computational and Graphical Statistics 26 (4): 745–66. https://doi.org/10.1080/10618600.2017.1384734.

Donoho, David L. 2010. “An Invitation to Reproducible Computational Research.” Biostatistics 11 (3): 385–88. https://doi.org/10/bxwkns.

Erdelyi, Matthew H. 1974. “A New Look at the New Look: Perceptual Defense and Vigilance.” Psychological Review 81 (1): 1–25. https://doi.org/10/cs5c5q.

FitzGerald, Ben, Peter L Levin, and Jacqueline Parziale. 2016. Open Source Software & the Department of Defense. Center for a New American Security.

Gandrud, Christopher. 2013. Reproducible Research with R and R Studio. CRC Press.

Garcia, David, Mansi Goel, Amod Kant Agrawal, and Ponnurangam Kumaraguru. 2018. “Collective Aspects of Privacy in the Twitter Social Network.” EPJ Data Science 7: 1–13. https://doi.org/10/cjhr.

Gleibs, Ilka H. 2014. “Turning Virtual Public Spaces into Laboratories: Thoughts on Conducting Online Field Studies Using Social Network Sites.” Analyses of Social Issues and Public Policy 14 (1): 352–70. https://doi.org/10/f6t7gd.

Grange, JA, D Lakens, F Adolfi, C Albers, F Anvari, M Apps, S Argamon, et al. 2018. “Justify Your Alpha.” Nature Human Behavior.

Harary, Frank. 1959. “On the Measurement of Structural Balance.” Behavioral Science 4 (4): 316–23. https://doi.org/10/cp9nfp.

Hastie, Reid, and Robyn M Dawes. 2010. Rational Choice in an Uncertain World: The Psychology of Judgment and Decision Making. Sage.

Healy, Kieran. 2017. “Data Visualization for Social Science: A Practical Introduction with r and Ggplot2.”

Henrich, Joseph, Steven J. Heine, and Ara Norenzayan. 2010. “The Weirdest People in the World?” Behavioral and Brain Sciences 33 (2-3): 61–83. https://doi.org/10/c9j35b.

Hicks, Stephanie C., and Rafael A. Irizarry. 2018. “A Guide to Teaching Data Science.” The American Statistician 72 (4): 382–91. https://doi.org/10/gfr5tf.

Hornik, Kurt, and The R Core Team. 2022. “R FAQ: The Comprehensive R Archive Network.”

Hullman, Jessica, Paul Resnick, and Eytan Adar. 2015. “Hypothetical Outcome Plots Outperform Error Bars and Violin Plots for Inferences about Reliability of Variable Ordering.” Edited by Elena Papaleo. PLOS ONE 10 (11): e0142444. https://doi.org/10.1371/journal.pone.0142444.

Hvitfeldt, Emil, and Julia Silge. 2021. Supervised Machine Learning for Text Analysis in R. Chapman and Hall/CRC.

Ioannidis, John PA. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2 (8): e124. https://doi.org/10/chhf6b.

Isaacson, Walter. 2014. The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution. Simon and Schuster.

Jackson, Dan. 2017. “The Netflix Prize: How a 1 Million Contest Changed Binge-Watching Forever.” Thrillist. Com.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Vol. 103. Springer Texts in Statistics. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4614-7138-7.

Kelly, Janice R., Nicole E. Iannone, and Megan K. McCarty. 2016. “Emotional Contagion of Anger Is Automatic: An Evolutionary Explanation.” British Journal of Social Psychology 55 (1): 182–91. https://doi.org/10/gf6mn3.

Kondo, Marie. 2016. Spark Joy: An Illustrated Master Class on the Art of Organizing and Tidying up. Ten Speed Press.

Kumar, Devinder, Alexander Wong, and Graham W Taylor. 2017. “Explaining the Unexplained: A Class-Enhanced Attentive Response (Clear) Approach to Understanding Deep Neural Networks.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 36–44.

Lakatos, Imre. 1969. “Falsification and the Methodology of Scientific Research Programmes.” Criticism and the Growth of Knowledge. Cambridge University Press: Cambridge.

Lanning, Kevin. 1987. “Some Reasons for Distinguishing Between ‘Non-normative Response’ and ‘Irrational Decision’.” The Journal of Psychology 121 (2): 109–17. https://doi.org/10/fv4hh5.

———. 1994. “Dimensionality of Observer Ratings on the California Adult Q-set.” Journal of Personality and Social Psychology 67 (July): 151–60. https://doi.org/10/drnkvm.

———. 1996. “Robustness Is Not Dimensionality: On the Sensitivity of Component Comparability Coefficients to Sample Size.” Multivariate Behavioral Research 31 (1): 33–46. https://doi.org/10/dt6gb3.

———. 2017. “What Is the Relationship Between ‘Personality’ and ‘Social’ Psychologies? Network, Community, and Whole Text Analyses of the Structure of Contemporary Scholarship.” Collabra: Psychology 3 (1): 8.

———. 2018. “Data Visualizations in Personality and Social Psychology: Challenges in Representing Taxonomic, Community, and Developmental Structures.” Association of Psychological Science Annual Convention, San Francisco, May 25.

Lanning, Kevin, and Ari Rosenberg. 2009. “The Dimensionality of American Political Attitudes: Tensions Between Equality and Freedom in the Wake of September 11.” Behavioral Sciences of Terrorism and Political Aggression 1 (2): 84–100. https://doi.org/10/fckr37.

Leek, Jeffrey T, and Roger D Peng. 2015. “Statistics: P Values Are Just the Tip of the Iceberg.” Nature 520 (7549): 612. https://doi.org/10/gfb8jm.

Loevinger, Jane. 1957. “Objective Tests as Instruments of Psychological Theory.” Psychological Reports 3 (3): 635–94. https://doi.org/10/b27jpk.

Loukides, Hilary, Mike. 2018. Ethics and Data Science. O’Reilly.

Matz, S. C., M. Kosinski, G. Nave, and D. J. Stillwell. 2017. “Psychological Targeting as an Effective Approach to Digital Mass Persuasion.” Proceedings of the National Academy of Sciences 114 (48): 12714–19. https://doi.org/10.1073/pnas.1710966114.

McShane, Blakeley B., David Gal, Andrew Gelman, Christian Robert, and Jennifer L. Tackett. 2017. “Abandon Statistical Significance.” arXiv:1709.07588 [Stat], September. https://arxiv.org/abs/1709.07588.

Merton, Robert K. 1936. “The Unanticipated Consequences of Purposive Social Action.” American Sociological Review 1 (6): 894–904. https://doi.org/10/fjg8hf.

Miguel, Edward, Colin Camerer, Katherine Casey, Joshua Cohen, Kevin M Esterling, Alan Gerber, Rachel Glennerster, et al. 2014. “Promoting Transparency in Social Science Research.” Science 343 (6166): 30–31. https://doi.org/10/gdrcpz.

Milgram, Stanley. 1967. “The Small World Problem.” Psychology Today 2 (1): 6067.

Munafò, Marcus R., Brian A. Nosek, Dorothy V. M. Bishop, Katherine S. Button, Christopher D. Chambers, Nathalie Percie du Sert, Uri Simonsohn, Eric-Jan Wagenmakers, Jennifer J. Ware, and John P. A. Ioannidis. 2017. “A Manifesto for Reproducible Science.” Nature Human Behaviour 1 (1): 0021. https://doi.org/10.1038/s41562-016-0021.

Narayanan, Arvind, and Vitaly Shmatikov. 2008. “Robust De-anonymization of Large Sparse Datasets.” In 2008 IEEE Symposium on Security and Privacy (Sp 2008), 111–25. Oakland, CA, USA: IEEE. https://doi.org/10.1109/SP.2008.33.

Nikzad, Afshin, Mohammad Akbarpour, Michael A. Rees, and Alvin E. Roth. 2021. “Global Kidney Chains.” Proceedings of the National Academy of Sciences of the United States of America 118 (36): e2106652118. https://doi.org/10.1073/pnas.2106652118.

Ondaatje, Michael, and Walter Murch. 2002. The Conversations: Walter Murch and the Art of Editing Film. A&C Black.

Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716–16. https://doi.org/10/68c.

Page, Lawrence, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. “The PageRank Citation Ranking: Bringing Order to the Web.”

Peng, Roger. 2018. “Teaching r to New Users - from Tapply to the Tidyverse.”

Peng, Roger D. 2014. R Programming for Data Science.

Pennebaker, James W., Cindy K. Chung, Joey Frazee, Gary M. Lavergne, and David I. Beaver. 2014. “When Small Words Foretell Academic Success: The Case of College Admissions Essays.” Edited by Qiyong Gong. PLoS ONE 9 (12): e115844. https://doi.org/10/f6z8q5.

Phillips, Nathaniel D., Hansjörg Neth, Jan K. Woike, and Wolfgang Gaissmaier. 2017. “FFTrees: A Toolbox to Create, Visualize, and Evaluate Fast-and-Frugal Decision Trees.” Judgment and Decision Making 12 (4): 344–68.

Poulin, Michael J., and Claudia M. Haase. 2015. “Growing to Trust: Evidence That Trust Increases and Sustains Well-Being Across the Life Span.” Social Psychological and Personality Science 6 (6): 614–21. https://doi.org/10.1177/1948550615574301.

Reinsel, David, John Gantz, and John Rydning. 2025. “The Digitization of the World.”

Shattuck, Roger. 1997. Forbidden Knowledge: From Prometheus to Pornography. Houghton Mifflin Harcourt.

Silge, Julia, and David Robinson. 2017. Text Mining with R: A Tidy Approach. " O’Reilly Media, Inc.".

Slovic, Paul, David Zionts, Andrew K Woods, Ryan Goodman, and Derek Jinks. 2013. “Psychic Numbing and Mass Atrocity.” The Behavioral Foundations of Public Policy, 126–42. https://doi.org/10/gk4945.

Sternberg, Robert J. 1999. “The Theory of Successful Intelligence.” Review of General Psychology 3 (4): 292–316. https://doi.org/10/cqrkxh.

Sternberg, Robert J. 2018. “Theories of Intelligence.” In, edited by Steven I. Pfeiffer, Elizabeth Shaunessy-Dedrick, and Megan Foley-Nicpon, 145161. American Psychological Association. https://doi.org/10.1037/0000038-010.

Sullivan, J. L., and J. E. Transue. 1999. “THE PSYCHOLOGICAL UNDERPINNINGS OF DEMOCRACY: A Selective Review of Research on Political Tolerance, Interpersonal Trust, and Social Capital.” Annual Review of Psychology 50 (1): 625–50. https://doi.org/10/cmthvk.

Sweeney, Latanya. 2005. “Privacy-Enhanced Linking.” ACM SIGKDD Explorations Newsletter 7 (2): 72–75. https://doi.org/10/bjvpjh.

Szucs, Denes, and John Ioannidis. 2017. “When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment.” Frontiers in Human Neuroscience 11: 390. https://doi.org/10/gc6vws.

Thies, Justus, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. “Real-Time Expression Transfer for Facial Reenactment.” ACM Trans. Graph. 34 (6): 183–81. https://doi.org/10/f7wqz7.

Tufte, Edward R. 2001. The Visual Display of Quantitative Information. 2nd ed. Cheshire, CT: Graphics Press.

Tukey, John W. 1962. “The Future of Data Analysis.” The Annals of Mathematical Statistics 33 (1): 1–67. https://doi.org/10/d48nqg.

———. 1977. “EDA: Exploratory Data Analysis.” Reading, Mass.

Tversky, Amos, and Daniel Kahneman. 1974. “Judgment Under Uncertainty: Heuristics and Biases.” Science 185 (4157): 1124–31. https://doi.org/10/gwh.

Wainer, Howard. 2007. “The Most Dangerous Equation.” American Scientist 95 (3): 249. https://doi.org/10.1511/2007.65.249.

Wainer, H, and D Thissen. 1981. “Graphical Data Analysis,” 51.

Watts, Duncan J. 2004. “The "New" Science of Networks.” Annual Review of Sociology 3: 243–70. https://www.jstor.org/stable/29737693.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science 2e. " O’Reilly Media, Inc.".

Wu, Tim. 2019. “How Capitalism Betrayed Privacy.” The New York Times, 5.