Bayesian Multivarate Regression with Regularization
In the Bayesian context, regularization is performed by applying appropriate prior distribution to the regression coefficients. Some of the common priors include
Laplace prior
Spike-and-slab
Horseshoe and hierarchical shrinkage
Traditionally, horse-shoe prior is defined as
where is the global shrinkage parameter and is the local shrinkage parameter.
# Data preparation
import pytensor.tensor as at
import arviz as az
import pandas as pd
test_scores = pd.read_csv(pm.get_data("test_scores.csv"), index_col=0)
X = test_scores.dropna().astype(float)
y = X.pop("score")
# Standardization
X -= X.mean()
X /= X.std()
N, D = X.shape
D0 = int(D / 2)
The horseshoe prior for each regression coefficient is
is the prior on the error standard deviation
follows a Half-StudentT distribution (just one option as a heavy-tailed distribution with non-zero probability
represents the true number of non-zero coefficients. We guess it to be half of the overall parameters. This doesn't have to be a super good guess.
need to be strictly positive, but not necessarily long-tail. hence the inverse gamma prior.
Finally, to make the sampling process more efficient in PyMC, we re-parameterize the coefficient as
// Some code
with pm.Model(coords={"predictors": X.columns.values}) as test_score_model:
sigma = pm.HalfNormal("sigma", 25)
tau = pm.HalfStudentT("tau", 2, D0 / (D - D0) * sigma / np.sqrt(N))
# Local shrinkage prior
lam = pm.HalfStudentT("lam", 5, dims="predictors") # function default scale to 1
c2 = pm.InverseGamma("c2", 1, 1)
z = pm.Normal("z", 0.0, 1.0, dims="predictors")
# Shrunken coefficients
beta = pm.Deterministic(
"beta", z * tau * lam * at.sqrt(c2 / (c2 + tau**2 * lam**2)), dims="predictors"
) # here we can pass dims="predictors" cause we specified the coordinates in pm.Model call
# No shrinkage on intercept
beta0 = pm.Normal("beta0", 100, 25.0)
scores = pm.Normal("scores", beta0 + at.dot(X.values, beta), sigma, observed=y.values)
pm.model_to_graphviz(test_score_model) # check model graph
with test_score_model:# check whether sampled prior, specifically the scores, matches to reality
prior_samples = pm.sample_prior_predictive(100)
az.plot_dist(
test_scores["score"].values,
kind="hist",
color="C1",
hist_kwargs=dict(alpha=0.6),
label="observed",
)
az.plot_dist(
prior_samples.prior_predictive["scores"],
kind="hist",
hist_kwargs=dict(alpha=0.6),
label="simulated",
)
plt.xticks(rotation=45);
# fitting model
with test_score_model:
idata = pm.sample(1000, tune=2000, random_seed=42)
# checking energy plot
az.plot_energy(idata)
# checking trace plot
az.plot_trace(idata, var_names=["tau", "sigma", "c2"])
# plot coefficient
az.plot_forest(idata, var_names=["beta"], combined=True, hdi_prob=0.95, r_hat=True)
Last updated