Thoughts on Probability and Statistics

The total deaths from epidemics since 1900 has been dominated by the very few, most severe epidemics. This teaches us about the true definition of extreme values in statistics.

What is a black swan event, and where do they come from?

The idea of a black swan dates as far back as the time of the Roman poet Juvenal.

rara avis in terris nigroque simillima cygno — From The Satires, Line 6.165

Translation: A rare bird in these lands, very much like a black swan.

Historically, it was presumed that black swans did not exist. Thus, when Dutch explorers became the first Europeans to see a black swan, it was a big shock — something thought to be impossible suddenly became possible.

The idea was later generalized in a book series and became more commonly known. However, it is still not…

Common measurements of volatility and correlation can be highly volatile and misleading, drastically underestimating the true risk of an investment.

Investing is a risky endeavour. Take a wrong step, and you could find your cash burned up. So how do you pick the right investments? Should you trust news? Should you trust the brokers? Should you do your own research? Or should you pay someone else to manage your investments?

Among the more sophisticated investors are those who rely on techniques from mathematics and statistics. These people compare different investment options in terms of return and risk, in order to construct the optimal portfolio.

Reducing the volatility of an investment portfolio

It’s obvious why people would want to maximise their returns. But why do people try to…

Trying to compute something? It might be too slow. Drop your math textbooks. Optimize your code with complexity theory instead.

Calculating a running average

In my last post on insurance, I wanted to calculate a running mean (the mean from the beginning up to the current time t). This was for a time series with 1,000,000 time-steps. In order words, I had to calculate 1,000,000 means. How did I do this?

Algorithm 1: straight out of a math (and not a CS) textbook

If you look inside any standard math textbook, you would find something like this.

Definition of the mean gives us the mean from time 1 up to time t.

Using this formula directly, you might then have an algorithm like this:

# X is a time-series with length T.
runningMean = [X[0]]
for t in range(1,T):

But when you actually run it, the speed of the execution…

Mathematics, risk and society. How a theorem in probability shows that pro-social behaviour (insurance) mitigates individual risk and supports long-term survival.


Introduction — A simple model of savings
Part I — The benefits of an insurance scheme
Part II — The law of large numbers
Part III — When insurance schemes fail
Appendix — Math for the simulation

Inspired by Ole Peters (lecture) and Nassim Nicholas Taleb.

This article strings together a surprising series of thoughts which I’ve had over the past year. Expected values, ensemble averages, time averages, empirical vs theoretical mean. Insurance, cooperation, sharing, culture, tradition, conservatism and politics. Risk, correlation, contagion and catastrophe. Portfolio diversification and market delusion. Survival, elimination, and evolution. …

The experts have great tools for science, but not for real life. How do we keep a distance of 1.5 metres? It’s not easy — but ancient wisdom can tell us what to do.

Social Distancing: “Is this far enough, officer?”

Which genius decided on the rules for social distancing?

This is the official advice from the Australian government on public gatherings during this COVID-19 coronavirus outbreak.

Stay 1.5 metres away from others

I don’t know how long 1.5 metres is, and I don’t have a ruler on me. Neither do my neighbours, and who knows if the other people around me know the number of centimetres in a metre.

Imagine standing in a queue, and you are standing too close to the person in front. The police catch you, and now you find yourself being questioned and lectured about the importance of the public health measures.

Even worse: no more than 1 person per 4 square metres.

An addition to testing guidelines for COVID-19 could help detect potential super-spreaders, protect healthcare workers and support the healthcare system against the surging onslaught of COVID-19.

A COVID-19 Dilemma for the Doctor

A doctor has 1 reliable diagnostic test. Two patients show up to the clinic.

  1. The patient has a dry cough and fever. A travel history reveals that the patient had returned four days ago from Milan, Italy.
  2. The patient has a runny nose, and reports having a mild cold over the past few days, which is mostly resolved. A travel history reveals that the patient had returned four days ago from San Francisco, US.

Who should get the test?

The doctor looks at the official criteria for determining who to test.


From Coronavirus Disease 2019 (COVID-19) CDNA National…

The 2020 Novel Coronavirus Outbreak

Widespread fear in the public compels authorities to act with urgency and increases the chances of collective survival.

Some people can’t estimate epidemic severity

If you want to know the severity of an outbreak, you should be careful not to naively look at the final figures of past outbreaks. These final figures are influenced by a variety of factors:

  • Virus transmissibility (important)
  • Virus lethality (important)
  • Containment effectiveness (misleading)
  • Healthcare quality and access (misleading)

It is important to account for the effect of human intervention when considering severity. Otherwise, a highly lethal and contagious virus, which is brilliantly contained at an early stage, would appear not severe at all.

Some people say: No need to fear — past epidemic severity has been overestimated by the irrational public.

A Feb 19 Bloomberg opinion piece, titled “The Economic Hit From Coronavirus Is All in Your…

The 2020 Novel Coronavirus Outbreak | Thoughts on Probability and Statistics

How false-negatives in diagnostic testing are leading to the release of infected people, motivating extreme containment measures. The COVID-19 outbreak, explained with Bayes’ Rule.

If you are reading this after 2020, please keep in mind that this post was written during the early stages of the COVID-19 pandemic, and hence, may not reflect a reality beyond this time.

We are currently in February 2020. Over the past month, a deadly virus has been spreading throughout China and the world, sending the infected to the ICU and trapping others in their homes. As authorities try to manage this crisis, they face the challenging issue of containment — sending the infected to quarantine, while allowing the non-infected to go free.

The Problem With Epidemics That Plagues The Authorities

Here is the scenario. You have…

The 2020 Novel Coronavirus Outbreak

Why the deceptively good 1% case-fatality rate of novel coronavirus is no reason for optimism.

My previous post is highly related to this post: Why the mortality rate of novel coronavirus is miscalculated, and not important.

Incoming News

A new study just came out, and I’m sure it’s going to be published in the media soon.

It will state that the mortality rate of novel coronavirus is:

  • 18% for severe cases.
  • 1–5% for mild to severe cases.
  • 1% in total.

On the surface, it looks reassuring for the general public. …

Thoughts on Probability and Statistics

What is the “average” and how do we find it? Forget the formula — and get better at math.

How to Understand the Idea of “Average”

Formulas Are For Calculation, Not For Understanding

When people teach you about statistical concepts, you usually get a equation which is a formula for some quantity, like the arithmetic mean. Formulas are fine, but they are designed with calculation in mind. Usually, the equation will put the unknown on one side, and all known quantities on the other.

Like this.

Arithmetic and geometric mean formula, optimised for calculation

Unfortunately, this view does not help students develop a good understanding of concepts like the “average”. And as a result, it is not difficult to find people misapplying statistics, for example, using the arithmetic mean on financial returns data when the geometric mean makes more sense.


Andy Chen

Math, stats, data. Influenced by the complex systems perspective. I prefer to take the critical view.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store