Welcome to p2prev’s documentation!¶
Here, you can find documentation for p2prev’s model classes. Check out our GitHub repository and example notebooks for more information.
Indices and tables¶
Model classes and p-curve log-likelihoods¶
- class p2prev.model.PCurveMixture(pvals, effect_size_prior=1.5, **sampler_kwargs)¶
A user-friendly wrapper for fitting a p-curve mixture model.
- Parameters:
pvals (np.array of size (n_observations,)) – The observed p-values
effect_size_prior (float) – Mean of the exponential distribution used as an effect size prior. You can use PCurveMixture.prior_predictive_power(alpha) to see how this parameter translates to a prior over Type II error for a given false positive rate alpha.
**sampler_kwargs – You can input any valid argument to pymc.sample if you wish the change the Monte-Carlo sampling settings. By default, 5 chains of 1000 samples will be drawn from the posterior, and this will be distributed across five CPUs if available. The same random seed will be used each time.
- compare()¶
Performs model comaprison of mixture model against all-null and all-alternative models. This method can’t be called until you’ve called the .fit_alternative() method.
- property effect_size¶
posterior samples for abstract effect size
- effect_size_hdi()¶
HDI (default 95%) for abstract effect size
- Parameters:
hdi_prob (float) – Width of highest-density interval to return.
- fit()¶
Fits model. Must be called before doing anything else (except prior predictive simulation).
- fit_alternative()¶
Fits alternative models. Must be called before model comparison can be performed.
- property map¶
maximum a-posteriori estimate of prevalence
- property mixture¶
The trace of the fit mixture model.
- plot_compare(**plot_kwargs)¶
Plots model comparison
- plot_trace(**kwargs)¶
Plots posterior traces.
- posterior_predictive_power(alpha)¶
Posterior samples for within-subject power given alternative hypothesis is true. Returned as pandas DataFrame with prevalence samples, so can be plotted as a joint distribution.
- posterior_predictive_power_hdi(alpha, hdi_prob=0.95)¶
Posterior HDI for within-subject effect size at a given significance level.
- Parameters:
alpha (float) – Significance level for which to get posterior power .
hdi_prob (float) – Width of highest-density interval to return.
- property prevalence¶
posterior samples for prevalence parameters
- prevalence_hdi(hdi_prob=0.95)¶
HDI (default 95%) for prevalence parameters
- Parameters:
hdi_prob (float) – Width of highest-density interval to return.
- prior_predictive_power(alpha, random_seed=0)¶
Prior samples for within-subject power given alternative hypothesis is true. Returned as pandas DataFrame with prevalence samples, so can be plotted as a joint distribution.
- prior_predictive_power_hdi(alpha, hdi_prob=0.95)¶
HDI under prior for within-subject effect size at a given significance level.
- Parameters:
alpha (float) – Significance level for which to get posterior power .
hdi_prob (float) – Width of highest-density interval to return.
- summary(**summary_kwargs)¶
Gives summary of posteriors for model parameters.
- class p2prev.model.PCurveWithinGroupDifference(pvals1, pvals2, effect_size_prior=1.5, **sampler_kwargs)¶
Fits p-curve mixture model for two within-subject hypothesis tests applied to the SAME group of subjects. Estimates the difference in prevalence of the two effects tested, again in the SAME subjects. (If instead you want to compare prevalence of the same effect in two different groups of subjects, i.e. a “between group” difference, you can just fit a PCurveMixture to each group individually and subtract posterior samples from the two models to get samples from the posterior of the difference.)
We assume that the effect size for each test is fixed, i.e. no effect size information is explicitly pooled between tests. However, we account for the possibility that expressing H1 makes a subject more likely to express H2 or the reverse. (The degree of such covariation is something the model learns from the data.)
- Parameters:
pvals1 (np.array of size (n_subjects,)) – The observed p-values for within-subject hypothesis test of H0 vs. H1.
pvals2 (np.array of size (n_subjects,)) – The observed p-values for within-subject hypothesis test of H0 vs. H2. Subject order should be the same as in pvals.
effect_size_prior (float) – Mean of the exponential distribution used as an effect size prior.
**sampler_kwargs – You can input any valid argument to pymc.sample if you wish the change the Monte-Carlo sampling settings. By default, 5 chains of 1000 samples will be drawn from the posterior, and this will be distributed across five CPUs if available. The same random seed will be used each time.
Notes
I had to do some PyMC trickery to get PyMC to account for covariation between H1 and H2 prevalence the way we need it to, at the cost of being able to use built-in model comparison techniques as in the main PCurveMixture class. If you want to compare to the H0 only model or H1/H2 only models, then you should do that for H1 and H2 individually using PCurveMixture.
- property effect_size_H1¶
posterior samples for relative effect size of H1
- property effect_size_H2¶
posterior samples for relative effect size of H2
- property effect_size_diff¶
posterior samples for H2 - H1 effect sizes
- effect_size_diff_hdi(hdi_prob=0.95)¶
HDI for H2 - H1 effect sizes
- Parameters:
hdi_prob (float) – Width of highest-density interval to return.
- fit()¶
fits model
- property mixture¶
the trace of the fit model
- plot_trace(**kwargs)¶
plots the traces for parameters
- power_diff(alpha)¶
posterior samples for power given H2 minus power given H1 at a given significance level alpha.
- Parameters:
alpha (float) – Significance level for which to get posterior power .
- power_diff_hdi(alpha, hdi_prob=0.95)¶
HDI for difference in within-subject power
- Parameters:
alpha (float) – Significance level for which to get posterior power .
hdi_prob (float) – Width of highest-density interval to return.
- property prevalence_H1¶
posterior samples for H1 prevalence
- property prevalence_H2¶
posterior samples for H2 prevalence
- property prevalence_diff¶
posterior samples for H2 prevalence minus H1 prevalence
- prevalence_diff_hdi(hdi_prob=0.95)¶
HDI for H2 prevalence minus H1 prevalence
- Parameters:
hdi_prob (float) – Width of highest-density interval to return.
- property prob_H1_given_H0¶
posterior samples for H1 prevalence given that H2 is false
- property prob_H1_given_H2¶
posterior samples for H1 prevalence given that H2 is true
- property prob_H2_effect_size_greater¶
posterior probability H2 effect minus H1 effect is positive
- property prob_H2_given_H0¶
posterior samples for H2 prevalence given that H1 is false
- property prob_H2_given_H1¶
posterior samples for H2 prevalence given that H1 is true; that is, the prevalence of H2 being true among the subpopulation in which H1 is true
- property prob_H2_prev_greater¶
posterior probability H2 prevalence minus H1 prevalence is positive
- summary(**summary_kwargs)¶
returns a summary of the posterior
- p2prev.model.p_curve_loglik(p, delta)¶
log-likelihood of p-values from a one-tailed Z-test with known unit variance and one observation
- p2prev.model.p_loglik_overdisp(p, d, nu)¶
log-likelihood for p-values of a two-tailed t-test with effect size d and degrees of freedom nu.