Welcome to p2prev’s documentation!

Here, you can find documentation for p2prev’s model classes. Check out our GitHub repository and example notebooks for more information.

Indices and tables

Model classes and p-curve log-likelihoods

class p2prev.model.PCurveMixture(pvals, effect_size_prior=1.5, **sampler_kwargs)

A user-friendly wrapper for fitting a p-curve mixture model.

Parameters:
  • pvals (np.array of size (n_observations,)) – The observed p-values

  • effect_size_prior (float) – Mean of the exponential distribution used as an effect size prior. You can use PCurveMixture.prior_predictive_power(alpha) to see how this parameter translates to a prior over Type II error for a given false positive rate alpha.

  • **sampler_kwargs – You can input any valid argument to pymc.sample if you wish the change the Monte-Carlo sampling settings. By default, 5 chains of 1000 samples will be drawn from the posterior, and this will be distributed across five CPUs if available. The same random seed will be used each time.

compare()

Performs model comaprison of mixture model against all-null and all-alternative models. This method can’t be called until you’ve called the .fit_alternative() method.

property effect_size

posterior samples for abstract effect size

effect_size_hdi()

HDI (default 95%) for abstract effect size

Parameters:

hdi_prob (float) – Width of highest-density interval to return.

fit()

Fits model. Must be called before doing anything else (except prior predictive simulation).

fit_alternative()

Fits alternative models. Must be called before model comparison can be performed.

property map

maximum a-posteriori estimate of prevalence

property mixture

The trace of the fit mixture model.

plot_compare(**plot_kwargs)

Plots model comparison

plot_trace(**kwargs)

Plots posterior traces.

posterior_predictive_power(alpha)

Posterior samples for within-subject power given alternative hypothesis is true. Returned as pandas DataFrame with prevalence samples, so can be plotted as a joint distribution.

posterior_predictive_power_hdi(alpha, hdi_prob=0.95)

Posterior HDI for within-subject effect size at a given significance level.

Parameters:
  • alpha (float) – Significance level for which to get posterior power .

  • hdi_prob (float) – Width of highest-density interval to return.

property prevalence

posterior samples for prevalence parameters

prevalence_hdi(hdi_prob=0.95)

HDI (default 95%) for prevalence parameters

Parameters:

hdi_prob (float) – Width of highest-density interval to return.

prior_predictive_power(alpha, random_seed=0)

Prior samples for within-subject power given alternative hypothesis is true. Returned as pandas DataFrame with prevalence samples, so can be plotted as a joint distribution.

prior_predictive_power_hdi(alpha, hdi_prob=0.95)

HDI under prior for within-subject effect size at a given significance level.

Parameters:
  • alpha (float) – Significance level for which to get posterior power .

  • hdi_prob (float) – Width of highest-density interval to return.

summary(**summary_kwargs)

Gives summary of posteriors for model parameters.

class p2prev.model.PCurveWithinGroupDifference(pvals1, pvals2, effect_size_prior=1.5, **sampler_kwargs)

Fits p-curve mixture model for two within-subject hypothesis tests applied to the SAME group of subjects. Estimates the difference in prevalence of the two effects tested, again in the SAME subjects. (If instead you want to compare prevalence of the same effect in two different groups of subjects, i.e. a “between group” difference, you can just fit a PCurveMixture to each group individually and subtract posterior samples from the two models to get samples from the posterior of the difference.)

We assume that the effect size for each test is fixed, i.e. no effect size information is explicitly pooled between tests. However, we account for the possibility that expressing H1 makes a subject more likely to express H2 or the reverse. (The degree of such covariation is something the model learns from the data.)

Parameters:
  • pvals1 (np.array of size (n_subjects,)) – The observed p-values for within-subject hypothesis test of H0 vs. H1.

  • pvals2 (np.array of size (n_subjects,)) – The observed p-values for within-subject hypothesis test of H0 vs. H2. Subject order should be the same as in pvals.

  • effect_size_prior (float) – Mean of the exponential distribution used as an effect size prior.

  • **sampler_kwargs – You can input any valid argument to pymc.sample if you wish the change the Monte-Carlo sampling settings. By default, 5 chains of 1000 samples will be drawn from the posterior, and this will be distributed across five CPUs if available. The same random seed will be used each time.

Notes

I had to do some PyMC trickery to get PyMC to account for covariation between H1 and H2 prevalence the way we need it to, at the cost of being able to use built-in model comparison techniques as in the main PCurveMixture class. If you want to compare to the H0 only model or H1/H2 only models, then you should do that for H1 and H2 individually using PCurveMixture.

property effect_size_H1

posterior samples for relative effect size of H1

property effect_size_H2

posterior samples for relative effect size of H2

property effect_size_diff

posterior samples for H2 - H1 effect sizes

effect_size_diff_hdi(hdi_prob=0.95)

HDI for H2 - H1 effect sizes

Parameters:

hdi_prob (float) – Width of highest-density interval to return.

fit()

fits model

property mixture

the trace of the fit model

plot_trace(**kwargs)

plots the traces for parameters

power_diff(alpha)

posterior samples for power given H2 minus power given H1 at a given significance level alpha.

Parameters:

alpha (float) – Significance level for which to get posterior power .

power_diff_hdi(alpha, hdi_prob=0.95)

HDI for difference in within-subject power

Parameters:
  • alpha (float) – Significance level for which to get posterior power .

  • hdi_prob (float) – Width of highest-density interval to return.

property prevalence_H1

posterior samples for H1 prevalence

property prevalence_H2

posterior samples for H2 prevalence

property prevalence_diff

posterior samples for H2 prevalence minus H1 prevalence

prevalence_diff_hdi(hdi_prob=0.95)

HDI for H2 prevalence minus H1 prevalence

Parameters:

hdi_prob (float) – Width of highest-density interval to return.

property prob_H1_given_H0

posterior samples for H1 prevalence given that H2 is false

property prob_H1_given_H2

posterior samples for H1 prevalence given that H2 is true

property prob_H2_effect_size_greater

posterior probability H2 effect minus H1 effect is positive

property prob_H2_given_H0

posterior samples for H2 prevalence given that H1 is false

property prob_H2_given_H1

posterior samples for H2 prevalence given that H1 is true; that is, the prevalence of H2 being true among the subpopulation in which H1 is true

property prob_H2_prev_greater

posterior probability H2 prevalence minus H1 prevalence is positive

summary(**summary_kwargs)

returns a summary of the posterior

p2prev.model.p_curve_loglik(p, delta)

log-likelihood of p-values from a one-tailed Z-test with known unit variance and one observation

p2prev.model.p_loglik_overdisp(p, d, nu)

log-likelihood for p-values of a two-tailed t-test with effect size d and degrees of freedom nu.