Homework

  1. Install the package LearnBayes and re-do the analysis of the cancer data
  2. Obtain the mean and the variance for the beta-binomial distribution. Show that it tackles   the overdispersion problem. Hint: use the formulas for conditional expectations and variances.
  3. Obtain the Laplace approximation for the posterior expection of  logit(mu) and log(tau) in the cancer mortality rate example (data available from the package LearnBayes). Original Laplace approximation article.
  4. The failure time of a pump follows a two-parameter exponential distribution, f(y|b,m) = 1/b exp(-(y-m)/b, when y>=m.
    1. Obtain the likelihood for b and m based on an i.i.d. sample of size n
    2. Consider a suitable transformation that maps the parameters b and m to the plane
    3. Consider a sample of 8 pumps, where all pumps failed at some time. The smallest failure time was 23721 minutes and the total testing time for all pumps was 15962989 minutes. Assume a sensible prior for the transformed parameters and explore the contours of the posterior distribution.
    4. Find a normal approximation to the posterior distribution of the transformed parameters
    5. Use rejection sampling and SIR to approximate the posterior distribution. Compare.
    6. Use importance sampling as well as a Laplace approximation to estimate the posterior mean and variance of the transformed parameters.
    7. Define the reliability at time t_0 as R(t_0) = exp(-(t_0 -m)/b). Describe the posterior moments and the posterior dsitribution of R(10^6).
  5. Problems 3.10.1,3.10.2, 3.10.5; 4.7.1, 4.7.2
  6. Problems 5.9.7,5.9.8,5.9.10, 5.9.12
  7. Repeat the SAT example with:
    1. Direct sampling from the posterior
    2. Gibbs sampling from the posterior
    3. Abrams and Sansó '98 approximations for the posterior moments
  8. Problems 5.9.14, 5.9.15
  9. Covid19 Data
  10. Consider the SAT example, use the DIC to compare the models with no  pooling, total pooling and partial pooling based on a hierarchical model with unknown variance.
  11.  Write the Bayes factor, BIC, DIC and Gelfand and Ghost criterion to compare a model where n observations are assumed to be sampled with a poisson distribution with a gamma prior, to a model where the observations are sampled from a binomial distribution, with a fixed, large, number of trials and beta prior for the probability of success.
  12. Consider the SAT example, use the DIC to compare the models with no  pooling, total pooling and partial pooling based on a hierarchical model with unknown variance.
  13. Problems 6.7.2,6.7.6; 7.8.4, 7.8.5 
  14. Problems 8.10.11, 8.10.14
  15. The North American Breeding Survey provides information about the abundance of the different species of birds in North America. Download the data for Red-Tailed Hawks in California for the years 1966 - 2018. Let n(y) be the count for year y and c(y) the route count for year y. Assume that n(y) follows a Poisson distribution with mean l(y)*c(y). 
    1. Perform an exploratory analysis of the data
    2. Propose a hierarchical model to estimate l(y) that borrows information from all the years.
    3. Fit the proposed model with a sample-based approach.
    4. Validate the model by exploring the if the results are compatible with the observed data. Discuss possible elaborations of the model that may be needed.
    5. Estimate the probability of observing more than 450 hawks in a year with route count of 120.
  16. For each of the examples considered in class regarding the censored and truncated weights data develop an approach based on MCMC with auxiliary variables and write the full conditionals.
  17. Consider the following data regarding the heights in inches of male students at a college. First interval: less than 66, counts 14; second interval 66 to 68, counts 30; third interval 68 to 70, counts 49; fourth interval 70 to 72, counts 70; fifth interval 72 to 74, counts 33; sixth interval greater than 74, counts 15. Assume that the height of students is normally distributed, and assume a non-informative distribution for the parameters of the normal.
    1. Use Metroplis Hastings to estimate the parameters of the normal using a multinomial likelihood.
    2. Introduce latent variables and use Gibbs sampling to do the estimation. Compare.
  18. Write an R function to fit a linear model using a Bayesian approach based with a non-informative prior for both the regression parameters and the variance. The function should produce as ouput samples of the regression coefficients and the variance.
  19. Find the marginal distribution of the regression coefficients in a linear normal model.
  20. Show that the posterior predictive distribution of a new observation corresponding to a regression model is a student distribution as indicated in the slides.
  21. Consider the data available as "birthweight" form the package "LearnBayes". Fit a linear regression that considers age and gender as explanatory variables for birth weight. Describe the posterior distribution of the regression parameters using a sample-based approach. Explore the predictive posterior distribution for the birth weight of children in the following four cases: (a) 36 week female/male; (b) 40 week female/male. Compare.
  22. Consider a conditional linear model with a design matrix and an error covariance matrix specified by a set of unknown parameters. Find an explicit expression for the posterior distribution of those parameters.
  23. Show that the posterior distribution of the regression parameters using an informative prior is the same when the quadratic forms are completed directly and when the prior is considered as additional data.
  24. Consider a normal linear model where the errors have a covariance matrix that is a multiple of the identity. Is it possible to obtain a conjugate prior (informative) for the regression coefficients and the variance? If so, find the posterior distribution.
  25. Obtain the expressions for the marginal distributions of the data and the Bayes factors for linear models using g-priors.
  26. Problems 14.10.1 and 14.10.10
  27. Write the full conditionals for a robust regression where the errors are assumed to correspond to a student distribution with known degrees of freedom.
  28. Modification of Problem 17. Suppose that a second class of students, of the same size as the first, is considered and the results are: First interval: less than 66, counts 18; second interval 66 to 68, counts 28; third interval 68 to 70, counts 49; fourth interval 70 to 72, counts 73; fifth interval 72 to 70, counts 33; sixth interval greater than 74, counts 10. Write the full conditionals for a model that considers both sets of data, accounting for potential differences in both classes, using normal latent variables. Run the resulting MCMC. Modify your model to consider latent variables distributed as student with 5 degrees of freedom.
  29. Obtain the full conditionals for a Bayesian Lasso regression model assuming that \lamba, the scale of the penalization term, is randomly distributed according to a gamma distribution (see section 20.2).
  30. Consider data corresponding to a mixture of M exponential densities. Consider appropriate conjugate priors for the parameters of the model. Find the full conditionals needed to explore the posterior distribution using a Gibbs sampler.
  31. Write the steps of the EM algorithm to estimate the parameters of a mixture of M normals with conjugate priors for the parameters of each of the model components.