Weeks 5,6 & 7 GSOC - 2024

Rohan BabbarRohan Babbar
3 min read

This blog post covers my contributions to the project "Prior Elicitation (PreliZ)" with ArviZ during the fifth, sixth, and seventh weeks of GSoC 2024.

Work Done

  1. Mid Term Evaluation

  2. Add posterior to prior as a new predictive function to the library

Mid Term Evaluation

Well, the mid-term evaluation went pretty well. We were on track with the progress of the project, and I received positive feedback from the mentors on my work, which was really encouraging for me.

Posterior to Prior (P2P)

Started working on a new predictive function posterior_to_prior.

The steps followed while implementing this are:

  1. Read the model and extract variable names and their families.

  2. Fit the posterior samples to the corresponding families (using MLE).

  3. Generate samples from the prior predictive distribution of the new model.

  4. Compare the posterior samples with the samples from the new model.

How to use posterior_to_prior

First, we prepare the model and the inference data.

import preliz as pz
import numpy as np
import pandas as pd
import pymc as pm

data = pz.Normal(0, 1).rvs(200)
with pm.Model() as model:
    a = pm.Normal("a", mu=0, sigma=1)
    b = pm.HalfNormal("b", sigma=1)
    y = pm.Normal("y", mu=a, sigma=b, observed=data)
    idata = pm.sample(tune=200, draws=500, random_seed=2945)

We pass the model, and inference data to the function. We also introduce a parameter called alternative, which determines how samples are fit to the original prior distribution. If set to ‘auto,’ the method evaluates the fit to the original prior as well as a set of predefined distributions. To specify alternative distributions, use a list of PreliZ distributions. Alternatively, use a dictionary where the keys are variable names from the model and the values are lists of PreliZ distributions. This allows you to specify different alternative distributions for each variable

When alternative is set to None, no alternative distributions are considered while fitting the model.

>>> pz.posterior_to_prior(model, idata, alternative=None)
with pm.Model() as model:
    a = pm.Normal("a", mu=-0.0104, sigma=0.0725)
    b = pm.HalfNormal("b", sigma=1.03)

When we set alternative="auto", the model will consider the most common alternative distributions during the fitting process

>>> pz.posterior_to_prior(model, idata, alternative="auto")
with pm.Model() as model:
    a = pm.Normal("a", mu=0.184, sigma=0.0697)
    b = pm.Gamma("b", alpha=423, beta=426)

When we provide alternative with a list of PreliZ distributions, we can specify which distributions should be considered as alternatives during the model fitting process.

>>> pz.posterior_to_prior(model, idata, alternative=[pz.LogNormal()])
with pm.Model() as model:
    a = pm.Normal("a", mu=-0.0934, sigma=0.0685)
    b = pm.LogNormal("b", mu=-0.0175, sigma=0.0491)

We can pass a dictionary to alternative, where the keys are variable names and the values are lists of PreliZ distributions, to specify different distributions for each variable.

>>> pz.posterior_to_prior(model, idata, alternative={"b": [pz.Gamma(mu=0)]}))
with pm.Model() as model:
    a = pm.Normal("a", mu=-0.00377, sigma=0.0696)
    b = pm.Gamma("b", mu=1.03, sigma=0.0519)

To do in the coming weeks

Add support for Bambi models, write thorough documentation for the new function, and include appropriate tests.

Thank You for reading my blog :)

0
Subscribe to my newsletter

Read articles from Rohan Babbar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Rohan Babbar
Rohan Babbar