About Beer
Why ¶
相信很多统计学相关专业的同学都知道 Box-Cox 变换,这里 Box 就是统计学巨擘 George E. P. Box. 在其讣告Renowned statistician George Box dies at 93 中有这么一段关于 Monday night beer and statistics sessions 的介绍 (这在其传记中也有相关的介绍)。
Monday night beer and statistics sessions
Best known on campus were the Monday night beer and statistics sessions Box hosted at his home for students and other researchers. The gatherings sparked lively discussions about how statistics could help solve research problems posed by speakers from a wide range of disciplines.
这种积极开放,共同探讨统计学如何用于解决问题的方式着实令人神往。 即使是在读书的时候我也很少有机会和大家像这样一起讨论有趣的问题。
后来读研有时会和室友@SY 一起解一些概率统计/算法题目,有些是游戏相关面试题 (1),有些是自己遇到的一些中大复试考研题 (2) , 有些来自Tomorrow学长的博客, 这期间我也自己写过一些相关的文章(3)。
- 游戏中的概率问题——有趣的概率问题(一): 之前和@SY 讨论过的一些问题感觉很有意思,就拜托他找时间整理了下来。
- A Fair Game Problem: 中大考研复试的一道题,@SY 给我讲了两遍没听懂,又找 Tomorrow 学长帮忙写的一个解析:)
- Three doors and three prisoners
这些问题的本身就趣味十足,解决的方法更是灵活多变:可以用朴素的概率统计思想求解,也可以用很“取巧”(1) 的方式来快速得到答案, 也可以写代码来模拟问题得到数值解。对我来说,当一个有趣的问题通过多种截然不同的方式得到相同的结果时,那种喜悦是难以言表的。
- 只是看似取巧而已,其能快速解决问题还是因为这种解法更加接近于问题的本质。
这是读书时的一些经历,算是 Beer 项目的萌芽阶段。
Past/Now/Future¶
Past¶
后来工作,从另外的角度的见识到了概率统计的作用,基于概率统计的方法一般简洁优雅,复杂度低且具有很强的可解释性——这些 特质使得其成为算法落地时很好的选择。
这种在真实世界发挥作用的方式让我对概率统计的认识得到了很大的拓展。读书时候学到的那些抽象的概念:帕累托分布,游程检验, Wilcoxon-Mann-Whitney statistic 等等这些都变得鲜活起来——这同样让我感觉非常地快乐。
我和同事提过说有机会的话我也想试试搞下类似 Monday night beer and statistics sessions 的东西, 他说很感兴趣,让我到时候喊他一起——可惜由于种种原因,此事一直未能成行:)
Now¶
后来偶然看到Fifty challenging problems in probability这本书,发现了书中记录了很多有趣的问题。 这一次我决定真正开始品尝这杯:beer,于是就有了现在的Beer, 目前主要是 Fifty challenging problems in probability书中几个问题的解答:Beer/Fifty.
目前对于每个问题都提供了详细的描述,提示,解析解,数值模拟代码,下面是第一道题的题干和对应的答案。 这里重点提下,模拟程序的代码是运行在浏览器的,也就是说每次我们刷新网页,这个模拟就会执行一次全新的模拟。
The Sock Drawer¶
Problem¶
Question
A drawer contains red socks and black socks. When two socks are drawn at random, the probability that both are red is \(\frac{1}{2}\)
(a) How small can the number of socks in the drawer be?
(b) How small if the number of black socks is even?
一个抽屉里装有红袜子和黑袜子。 当随机抽取两只袜子时,两只袜子都是红色的概率为\(\frac{1}{2}\)
(a) 抽屉里的袜子数量最少可以是多少?
(b) 如果黑袜子的数量是偶数,最少可以是多少?
Tip¶
Tip
Use probability equations to solve for radicals, Pell's equation
概率等式用于根式求解,Pell方程
Solutions¶
Solution1: Analysis¶
Solution1: Analysis
Let the number of red socks be denoted as \(R \in \mathbb{Z}^{+}\), and the total number of socks be denoted as \(N \in \mathbb{Z}^{+}\), where \(N > R\). The probability of randomly picking a red sock on the first draw is \(\frac{R}{N}\), and the probability of picking another red sock after picking a red sock on the first draw is \(\frac{R-1}{N-1}\). Therefore, we have:
Simplifying this equation, we get:
Since our goal is to find the minimum value of \(N\), combining the above conditions, the original problem can be formulated as the following integer programming problem:
As for the canonical solution to this integer programming problem... I'm not quite familiar with it (feel free to submit a pull request to supplement it). Here, we'll use a relatively simple approach.
Based on \(N^2 - N + 2R(1-R) = 0\), since our goal is to find \(N\), we can treat it as an unknown and use the quadratic formula to derive the corresponding values. Therefore, we have:
Considering the conditions that \(N\) and \(R\) are both positive integers, we can calculate:
And it must hold that \(\sqrt{1 - 8R(1-R)}\) is an odd number, which requires \(1 - 8R(1-R)\) to be a perfect square.
We can find values that satisfy these conditions through an iterative approach:
import math
def is_square(n: int) -> bool:
for i in range(n // 2 + 1):
if i * i == n:
return True
return False
def is_odd(n: int) -> bool:
return n % 2 == 1
def find_numbers(r_limit: int = 50) -> None:
for r in range(1, r_limit):
under_sqrt = 1 - 8 * r * (1 - r)
if is_square(under_sqrt) and is_odd(int(math.sqrt(under_sqrt))):
n = int((1 + int(math.sqrt(under_sqrt))) / 2)
print(f"R = {r}, N = {n}, B = N - R = {n - r}\n")
find_numbers()
R = 3, N = 4, B = N - R = 1
R = 15, N = 21, B = N - R = 6
Obtaining the final answers:
(a): At least 4 socks are needed (3 red and 1 black), in this case, we have \(\frac{R}{N}\frac{R-1}{N-1} = \frac{3}{4}\frac{2}{3} = \frac{1}{2}\)
(b): At least 21 socks are needed (15 red and 6 black), in this case, we have \(\frac{R}{N}\frac{R-1}{N-1} = \frac{15}{21}\frac{14}{20} = \frac{1}{2}\)
记红色袜子的个数为\(R \in \mathbb{Z}^{+}\),袜子总数为\(N \in \mathbb{Z}^{+}\), 且\(N > R\), 那么第一次随机拿到红袜子的概率\(\frac{R}{N}\),在第一次拿到红袜子后第二次又拿到红色的概率为\(\frac{R-1}{N-1}\),由此得
化简得,
因为我们的目的是求\(N\)的最小值,综合以上条件,原问题可以整理为如下整数规划问题
至于这个整数规划问题的规范解法...我还不太会(欢迎提PR补充). 这里采用一种相对简单的解法。
根据\(N^2 - N + 2R(1-R) = 0\),因为我们的目的是求\(N\),这里可以将其看作未知数,然后用求根公式来接出对应的值,所以有
根据\(N\), \(R\)均为正整数的条件,可以算出
且必须有\(\sqrt{1 - 8R(1-R)}\)为奇数,这就要求\(1 - 8R(1-R)\)必须为平方数。
可以通过遍历的方法来求出找出符合条件的值:
import math
def is_square(n: int) -> bool:
for i in range(n // 2 + 1):
if i * i == n:
return True
return False
def is_odd(n: int) -> bool:
return n % 2 == 1
def find_numbers(r_limit: int = 50) -> None:
for r in range(1, r_limit):
under_sqrt = 1 - 8 * r * (1 - r)
if is_square(under_sqrt) and is_odd(int(math.sqrt(under_sqrt))):
n = int((1 + int(math.sqrt(under_sqrt))) / 2)
print(f"R = {r}, N = {n}, B = N - R = {n - r}\n")
find_numbers()
R = 3, N = 4, B = N - R = 1
R = 15, N = 21, B = N - R = 6
得到最终答案:
(a): 最少有4只袜子(3红1黑),此时有\(\frac{R}{N}\frac{R-1}{N-1} = \frac{3}{4}\frac{2}{3} = \frac{1}{2}\)
(b): 最少有21只袜子(15红6黑),此时有\(\frac{R}{N}\frac{R-1}{N-1} = \frac{15}{21}\frac{14}{20} = \frac{1}{2}\)
Solution2: Simulation¶
Solution2: Simulation
The above method is to analyze the simplified problem first and then solve it. We can also make full use of the powerful computing power of the computer to directly solve this problem. Let’s take a look at the "violent aesthetics" of computers!
def is_close(f1: float, f2: float) -> bool:
return abs(f1 - f2) < 1e-6
def simulation(r_limit: int = 50, n_limit: int = 50) -> None:
for r in range(1, r_limit):
for n in range(r + 1, n_limit):
if is_close(1 / 2, (r / n) * ((r - 1) / (n - 1))):
print(f"R = {r}, N = {n}, B = N - R = {n - r}\n")
simulation()
R = 3, N = 4, B = N - R = 1
R = 15, N = 21, B = N - R = 6
Obtaining the final answers:
(a): At least 4 socks are needed (3 red and 1 black), in this case, we have \(\frac{R}{N}\frac{R-1}{N-1} = \frac{3}{4}\frac{2}{3} = \frac{1}{2}\)
(b): At least 21 socks are needed (15 red and 6 black), in this case, we have \(\frac{R}{N}\frac{R-1}{N-1} = \frac{15}{21}\frac{14}{20} = \frac{1}{2}\)
上面的方法是先分析简化问题之后再求解,我们也可以充分利用计算机强大的计算能力来直接求解本问题。 让我们一起来看下计算机的“暴力美学”吧!
def is_close(f1: float, f2: float) -> bool:
return abs(f1 - f2) < 1e-6
def simulation(r_limit: int = 50, n_limit: int = 50) -> None:
for r in range(1, r_limit):
for n in range(r + 1, n_limit):
if is_close(1 / 2, (r / n) * ((r - 1) / (n - 1))):
print(f"R = {r}, N = {n}, B = N - R = {n - r}\n")
simulation()
R = 3, N = 4, B = N - R = 1
R = 15, N = 21, B = N - R = 6
得到相同的答案:
(a): 最少有4只袜子(3红1黑),此时有\(\frac{R}{N}\frac{R-1}{N-1} = \frac{3}{4}\frac{2}{3} = \frac{1}{2}\)
(b): 最少有21只袜子(15红6黑),此时有\(\frac{R}{N}\frac{R-1}{N-1} = \frac{15}{21}\frac{14}{20} = \frac{1}{2}\)
Future¶
关于的未来,现在的计划主要是分成三部分:类似 Fifty 这种习题集单独一个部分,各类记录在各个地方的无出处可考的题目作为一个部分, 最后一部分是工业界一些的实际的案例。 目前是专注在第一部分,后续的 Roadmap 会更新到Issue。
Help Wanted¶
个人的力量终究是有限的,欢迎大家一起参与建设,一起感受概率统计的乐趣。 欢迎各类贡献:提建议,提供题目,提供解答等都可以。
其实只要了解概率统计基础知识和基本的编程知识就可以试着做一些代码贡献了, 如果大家在 PR 过程中遇到问题欢迎在Discussion提问, 我会尽量帮助大家解决!