Thoughts on Probability and Statistics

The total deaths from epidemics since 1900 has been dominated by the very few, most severe epidemics. This teaches us about the true definition of extreme values in statistics.

What is a black swan event, and where do they come from?

The idea of a black swan dates as far back as the time of the Roman poet Juvenal.

rara avis in terris nigroque simillima cygno — From The Satires, Line 6.165

Translation: A rare bird in these lands, very much like a black swan.

Historically, it was presumed that black swans did not exist. Thus, when Dutch explorers became the first Europeans to see a black swan, it was a big shock — something thought to be impossible suddenly became possible.

The idea was later generalized in a book series and became more commonly known. However, it is still not…


Common measurements of volatility and correlation can be highly volatile and misleading, drastically underestimating the true risk of an investment.

Image for post
Image for post

Investing is a risky endeavour. Take a wrong step, and you could find your cash burned up. So how do you pick the right investments? Should you trust news? Should you trust the brokers? Should you do your own research? Or should you pay someone else to manage your investments?

Among the more sophisticated investors are those who rely on techniques from mathematics and statistics. These people compare different investment options in terms of return and risk, in order to construct the optimal portfolio.

Reducing the volatility of an investment portfolio

It’s obvious why people would want to maximise their returns. But why do people try to…


Trying to compute something? It might be too slow. Drop your math textbooks. Optimize your code with complexity theory instead.

Calculating a running average

In my last post on insurance, I wanted to calculate a running mean (the mean from the beginning up to the current time t). This was for a time series with 1,000,000 time-steps. In order words, I had to calculate 1,000,000 means. How did I do this?

Algorithm 1: straight out of a math (and not a CS) textbook

If you look inside any standard math textbook, you would find something like this.

Image for post
Image for post
Definition of the mean gives us the mean from time 1 up to time t.

Using this formula directly, you might then have an algorithm like this:

# X is a time-series with length T.
runningMean = [X[0]]
for t in range(1,T):
runningMean.append(sum(X[0:t])/t)

But when you actually run it, the speed of the execution…


Image for post
Image for post

Mathematics, risk and society. How a theorem in probability shows that pro-social behaviour (insurance) mitigates individual risk and supports long-term survival.

Contents

Introduction — A simple model of savings
Part I — The benefits of an insurance scheme
Part II — The law of large numbers
Part III — When insurance schemes fail
Appendix — Math for the simulation

Inspired by Ole Peters (lecture) and Nassim Nicholas Taleb.

This article strings together a surprising series of thoughts which I’ve had over the past year. Expected values, ensemble averages, time averages, empirical vs theoretical mean. Insurance, cooperation, sharing, culture, tradition, conservatism and politics. Risk, correlation, contagion and catastrophe. Portfolio diversification and market delusion. Survival, elimination, and evolution. …


The experts have great tools for science, but not for real life. How do we keep a distance of 1.5 metres? It’s not easy — but ancient wisdom can tell us what to do.

Image for post
Image for post

Social Distancing: “Is this far enough, officer?”

Which genius decided on the rules for social distancing?

This is the official advice from the Australian government on public gatherings during this COVID-19 coronavirus outbreak.

Stay 1.5 metres away from others

I don’t know how long 1.5 metres is, and I don’t have a ruler on me. Neither do my neighbours, and who knows if the other people around me know the number of centimetres in a metre.

Imagine standing in a queue, and you are standing too close to the person in front. The police catch you, and now you find yourself being questioned and lectured about the importance of the public health measures.

Even worse: no more than 1 person per 4 square metres.


An addition to testing guidelines for COVID-19 could help detect potential super-spreaders, protect healthcare workers and support the healthcare system against the surging onslaught of COVID-19.

A COVID-19 Dilemma for the Doctor

A doctor has 1 reliable diagnostic test. Two patients show up to the clinic.

  1. The patient has a dry cough and fever. A travel history reveals that the patient had returned four days ago from Milan, Italy.
  2. The patient has a runny nose, and reports having a mild cold over the past few days, which is mostly resolved. A travel history reveals that the patient had returned four days ago from San Francisco, US.

Who should get the test?

The doctor looks at the official criteria for determining who to test.

OFFICIAL GUIDELINES

From Coronavirus Disease 2019 (COVID-19) CDNA National…


The 2020 Novel Coronavirus Outbreak

A March 5 article suggests that the case-fatality ratio of coronavirus is closer to 0.6% based off data from South Korea. However, the calculation ignores the huge problem of time lag.

The Article

A March 5 SCMP article writes:

Coronavirus: South Korea’s aggressive testing gives clues to true fatality rate

With 140,000 people tested, the country’s mortality rate is just over 0.6 per cent compared to the 3.4 per cent global average reported by the WHO

Various factors can influence this percentage, but scientists agree that all things being equal, it is more accurate when more people are tested

The key points being:

  • South Korea is detecting the milder cases through widespread testing, while other countries are missing these.
  • Therefore, South Korea’s 0.6% case-fatality rate is more accurate than the 3.4%…


The 2020 Novel Coronavirus Outbreak

Widespread fear in the public compels authorities to act with urgency and increases the chances of collective survival.

Some people can’t estimate epidemic severity

If you want to know the severity of an outbreak, you should be careful not to naively look at the final figures of past outbreaks. These final figures are influenced by a variety of factors:

  • Virus transmissibility (important)
  • Virus lethality (important)
  • Containment effectiveness (misleading)
  • Healthcare quality and access (misleading)

It is important to account for the effect of human intervention when considering severity. Otherwise, a highly lethal and contagious virus, which is brilliantly contained at an early stage, would appear not severe at all.

Some people say: No need to fear — past epidemic severity has been overestimated by the irrational public.

A Feb 19 Bloomberg opinion piece, titled “The Economic Hit From Coronavirus Is All in Your…


The 2020 Novel Coronavirus Outbreak | Thoughts on Probability and Statistics

How false-negatives in diagnostic testing are leading to the release of infected people, motivating extreme containment measures. The COVID-19 outbreak, explained with Bayes’ Rule.

Wuhan coronavirus. Novel coronavirus. COVID-19.

We are currently in February 2020. Over the past month, a deadly virus has been spreading throughout China and the world, sending the infected to the ICU and trapping others in their homes. As authorities try to manage this crisis, they face the challenging issue of containment — sending the infected to quarantine, while allowing the non-infected to go free.

The Problem With Epidemics That Plagues The Authorities

Here is the scenario. You have a cough and a fever. There is a chance that you have caught COVID-19 — the virus spreading throughout the world. …


The 2020 Novel Coronavirus Outbreak

Why the deceptively good 1% case-fatality rate of novel coronavirus is no reason for optimism.

My previous post is highly related to this post: Why the mortality rate of novel coronavirus is miscalculated, and not important.

Incoming News

A new study just came out, and I’m sure it’s going to be published in the media soon.

It will state that the mortality rate of novel coronavirus is:

  • 18% for severe cases.
  • 1–5% for mild to severe cases.
  • 1% in total.

On the surface, it looks reassuring for the general public. …

Andy Chen

Math, stats, data. Influenced by the complex systems perspective. I prefer to take the critical view.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store