Sonntag, 22. Mai 2022

The Art of statistics - Learning from data - David Spiegelhalter

  • The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning.
  • PPDAC problem-solving cycle
    • Problem
      • Understanding and defining the proclem
    • Plan
      • What to measure and how?
    • Data
      • Collection
      • Management
      • Cleaning
    • Analysis
      • Sort, table, graphs
      • pattern
      • hypothesis generation
    • Conclusion
      • Interpretation
      • conclusion
      • new ideas
      • communication
  • None of the data sources could be considered "the truth"
  • Framing
    • 5% mortality sounds worse than 95% survival
  • Nearly everyone has greater than the average number of legs (1.99999)
    • And people have on average one testicle
  • average-house price (median) vs average house-price (mean)
  • We cannot conclude that the higher survival rates were in any sense caused by the increased number of cases - in fact it could even be the other way round: better hospitals simply attracted more patients
  • Alberto Cairo four common features of a good data visualization
    • 1) It contains reliable information
    • 2) The design has been chosen so that relevant patterns become noticeable
    • 3) It presented in an attractive manner, but appearance should not get in the way of honesty, clarity and depth.
    • 4) When appropriate, it is organized in a way that enables some exploration
  • The first rule of communication is to shut up and listen, so that you can get to know about the audience for your communication
  • The second rule of communciation is to now what you want to achieve. To encourage open debate and informed decision-making. We have to acknowledge we are telling a story.
  • Hans Rosling: "These facts are not up for discussion. I am right, and you are wrong"
  • After someone from the royal statistical society criticized their suvey methods, a spokesman for Ryanair's boss Michael O'Leary said, "Ninety-fice per cent of Ryanair customers havent heard of the Royal Statistical Society, 97 per cent don't care what they say and 100% said it sounds like their people need to book a low-fare Ryanair holiday." 
  • Runs of good or bad fortune represent a constant state of affairs, then we will wrongly attribute the reversion to normal as the consequence of any intervention we have made
    • Football managers who get sacked after a string of losses, only to find their successors getting credit for the return to normal
    • Active fund managers dropping in performance, after being tipped after a couple of good years
    • The "Curse of Sports Illustrated" in which athletes get featured on the cover following a series of achievements, only to subsequently have their performance plummet
  • A model is a map, rather than the territory itself.
  • All models are wrong, some are useful
  • Bootstrapping the data - the magical idea of pulling oneself up by one's own bootstraps
    • we do resampling from the collected data, say 1,000 times, we get 1,000 possible estimates of the mean.
    • You can fit regression lines per bootstapped sample
      • you get variability in gradient
  • The American statistical associations 6 principles about P-values:
    1. P-values can indicate how incompatible the data are with a specified statistical model
    2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone
    3. Scientific conclusions and business or policy decisions should not be based only on whether a P-value passes a specific threshold.
    4. Proper inference requires full reporting and transparency
    5. A P-value, or statistical significance, does not measure the size of an effect or the importance of a result.
    6. By itself, a P-value does not provide a good measure of evidence regarding a model or hypothesis. For example, a P-value near 0.05 taken by itself offers only weak evidence against the null hypothesis.
  • HARKing - inventing the Hyptheses after the Results are Known