Probability Basics
Core probability theory for modeling uncertainty in Machine Learning.
1. Core Probability Theory
Probability theory provides a framework for modeling uncertainty. In Machine Learning, we deal with uncertain events (noisy data, stochastic processes, model predictions), making probability the language of ML.
"The probability of an event E is between 0 and 1, inclusive."
2. Probability Rules
Addition Rule
For any two events A and B:
Multiplication Rule
For any two events A and B:
Bayes’ Theorem
Describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
"Probability of A given B equals Probability of B given A times Probability of A, divided by Probability of B."
Why is this used in ML?
Bayes’ Theorem is the foundation of Naive Bayes Classifiers and Bayesian Inference. It allows us to update model beliefs as we acquire new data.
Code Implementation
# Calculate P(Disease | Positive Test)
# Given: P(D)=0.01, P(Pos|D)=0.99, P(Pos|~D)=0.05
p_d = 0.01
p_pos_given_d = 0.99
p_pos = 0.0594
p_d_given_pos = (p_pos_given_d * p_d) / p_pos
# Result: 0.1667
3. Conditional Probability
The measure of the probability of an event occurring, given that another event has already occurred.
4. Independence
Two events A and B are independent if the occurrence of one does not affect the probability of occurrence of the other.
Why is this used in ML?
The Naive Bayes assumption is that features are conditionally independent given the class label, which simplifies computation significantly.
5. Random Variables
A variable whose possible values are numerical outcomes of a random phenomenon.
Discrete Random Variables
Can take on a countable number of distinct values (e.g., outcome of a die roll).
Continuous Random Variables
Can take on an infinite number of possible values (e.g., height, time). Defined by a Probability Density Function (PDF).
Code Implementation
from scipy import stats
from scipy import stats
# Library: Scipy
# 1. Normal Distribution (Continuous)
# PDF at x=0 for Standard Normal (mean=0, std=1)
norm_pdf_0 = stats.norm.pdf(0)
# Result: 0.3989
# CDF at x=0 (Probability that X <= 0)
norm_cdf_0 = stats.norm.cdf(0)
# Result: 0.5
# 2. Binomial Distribution (Discrete)
# PMF: 10 trials, p=0.5, prob of exactly 5 heads
binom_pmf_5 = stats.binom.pmf(5, n=10, p=0.5)
# Result: 0.2461
6. Expectation, Variance, & Covariance
Expectation (Expected Value)
The long-run average value of repetitions of the experiment.
Variance
Measures the spread of the random variable involved.
Covariance
Measure of the joint variability of two random variables.
Code Implementation
import numpy as np
import numpy as np
# Library: Numpy
# Rolling a fair die (1-6), p=1/6
val = np.array([1, 2, 3, 4, 5, 6])
prob = np.array([1/6] * 6)
# Expected Value E[X] = sum(x * p(x))
ev = np.sum(val * prob)
# Result: 3.5
# Variance Var(X) = E[X^2] - (E[X])^2
var = np.sum((val**2) * prob) - ev**2
# Result: 2.92
7. Joint & Marginal Distributions
Joint Probability Distribution
Gives the probability that two or more random variables fall within a particular range or discrete set of values simultaneously.
Marginal Probability Distribution
The probability distribution of a subset of the collection of random variables, obtained by summing (discrete) or integrating (continuous) over the other variables.
Why is this used in ML?
Understanding joint and marginal distributions is crucial for Generative Models (like GANs and VAEs) which try to learn the joint probability distribution of the data.
Code Implementation
import pandas as pd
import pandas as pd
# Library: Pandas
# Scenario: Weather vs Commute Mode
data = {
'Weather': ['Sunny', 'Sunny', 'Rainy', 'Sunny', 'Rainy', 'Rainy', 'Sunny', 'Rainy', 'Sunny', 'Rainy'],
'Commute': ['Walk', 'Bus', 'Bus', 'Walk', 'Car', 'Bus', 'Walk', 'Car', 'Bus', 'Car']
}
df = pd.DataFrame(data)
# Joint Probability Table
joint_probs = pd.crosstab(df['Weather'], df['Commute'], normalize=True)
# Commute Bus Car Walk
Weather
Rainy 0.2 0.3 0.0
Sunny 0.2 0.0 0.3
# Marginal Probability (Weather)
# Sum across columns (axis=1)
marginal_weather = joint_probs.sum(axis=1)
# Result:
# {'Rainy': 0.5, 'Sunny': 0.5}
# Marginal Probability (Commute)
# Sum across rows (axis=0)
marginal_commute = joint_probs.sum(axis=0)
# Result:
# {'Bus': 0.4, 'Car': 0.3, 'Walk': 0.3}