from scipy.stats import t, norm
= np.random.normal(size=100)
devs = t.ppf(norm.cdf(devs), df=1)
tdevs
plt.scatter(devs, tdevs)'devs'); plt.ylabel('tdevs')
plt.xlabel(min(devs), max(devs)], [min(devs), max(devs)], color='red')
plt.plot([ plt.show()
Simulation
Overview
References:
- Gentle: Computational Statistics
- Monahan: Numerical Methods of Statistics
Many (most?) statistical papers include a simulation (i.e., Monte Carlo) study. Many papers on machine learning methods also include a simulation study. The basic idea is that closed-form mathematical analysis of the properties of a statistical or machine learning method/model is often hard to do. Even if possible, it usually involves approximations or simplifications. A canonical situation in statistics is that we have an asymptotic result and we want to know what happens in finite samples, but often we do not even have the asymptotic result. Instead, we can estimate mathematical expressions using random numbers. So we design a simulation study to evaluate the method/model or compare multiple methods. The result is that the researcher carries out an experiment (on the computer, sometimes called in silico), generally varying different factors to see what has an effect on the outcome of interest.
The basic strategy generally involves simulating data and then using the method(s) on the simulated data, summarizing the results to assess/compare the method(s).
Most simulation studies aim to approximate an integral, generally an expected value (mean, bias, variance, MSE, probability, etc.). In low dimensions, methods such as Gaussian quadrature are best for estimating an integral but these methods don’t scale well, so in higher dimensions (e.g., the usual situation with
To be more concrete:
If we have a method for estimating a model parameter (including estimating uncertainty), such as a regression coefficient, what properties do we want the method to have and what criteria could we use?
If we have a prediction method (including prediction uncertainty), what properties do we want the method to have and what criteria could we use?
If we have a method for doing a hypothesis test, what criteria would we use to assess the hypothesis test? What properties do we want the test to have?
If we have a method for finding a confidence interval or a prediction interval, what criteria would we use to assess the interval?
1. Monte Carlo considerations
Motivating example
Let’s consider linear regression, with observations
But suppose that we’re interested in the properties of standard regression estimation when in reality the mean is not linear in
Instead we decide to use a Monte Carlo estimate. To keep the notation more simple, let’s just consider one element of the vector
Next let’s think about Monte Carlo methods in general.
Monte Carlo (MC) basics
Monte Carlo overview
The basic idea is that we often want to estimate
We get an MC estimate of
Note that in most simulation studies,
Back to the regression example
Let’s relate that back to our regression example. In that particular case, if we’re interested in whether the regression estimator is biased, we want to know:
If we are interested in the variance of the regression estimator, we have
Finally note that we also need to use the Monte Carlo estimate of
We might also be interested in the coverage of a confidence interval. In that case we have
Note that coverage can be too low for two reasons: (1) the estimated uncertainty is too low, or (2) the estimator is biased.
Simulation uncertainty (i.e., Monte Carlo uncertainty)
Since
The simulation variance is
Note that in the simulation setting, the randomness in the system is very well-defined (as it is in survey sampling, but unlike in most other applications of statistics), because it comes from the RNG that we perform as part of our attempt to estimate
:::{.callout-warning “Simulation (Monte Carlo) uncertainty is not statistical uncertainty”} This is the uncertainty in our simulation-based estimate of some quantity (expectation) of interest. It is NOT the statistical uncertainty in a problem. :::
Back to the regression example
Some examples of simulation variances we might be interested in in the regression example include:
Uncertainty in our estimate of bias:
.Uncertainty in the estimated variance of the estimated coefficient:
.Uncertainty in the estimated mean square prediction error:
.
In all cases we have to estimate the simulation variance, hence the
Final notes
Sometimes the
Variance reduction (optional)
There are some tools for variance reduction in MC settings. One is importance sampling (see Section 3). Others are the use of control variates and antithetic sampling. I haven’t personally run across these latter in practice, so I’m not sure how widely used they are and won’t go into them here.
In some cases we can set up natural strata, for which we know the probability of being in each stratum. Then we would estimate
Another strategy that comes up in MCMC contexts is Rao-Blackwellization. Suppose we want to know
2. Design of simulation studies
Consider the paper that is part of PS5. We can think about designing a simulation study in that context.
First, what are the key issues that need to be assessed to evaluate their methodology?
Second, what do we need to consider in carrying out a simulation study to address those issues? I.e., what are the key decisions to be made in setting up the simulations?
Basic steps of a simulation study
Specify what makes up an individual experiment (i.e., the individual simulated dataset) given a specific set of inputs: sample size, distribution(s) to use, parameter values, statistic of interest, etc. In other words, exactly how would you generate one simulated dataset?
Often you’ll want to see how your results will vary if you change some of the inputs; e.g., sample sizes, parameter values, data generating mechanisms. So determine what factors you’ll want to vary. Each unique combination of input values will be a scenario.
Write code to carry out the individual experiment and return the quantity of interest, with arguments to your code being the inputs that you want to vary.
For each combination of inputs you want to explore (each scenario), repeat the experiment
times. Note this is an easily parallel calculation (in both the data generating dimension and the inputs dimension(s)).Summarize the results for each scenario, quantifying simulation uncertainty.
Report the results in graphical or tabular form.
Often a simulation study will compare multiple methods, so you’ll need to do steps 3-6 for each method.
Various considerations
Since a simulation study is an experiment, we should use the same principles of design and analysis we would recommend when advising a practicioner on setting up a scientific experiment.
These include efficiency, reporting of uncertainty, reproducibility and documentation.
In generating the data for a simulation study, we want to think about what structure real data would have that we want to mimic in the simulation study: distributional assumptions, parameter values, dependence structure, outliers, random effects, sample size (
All of these may become input variables in a simulation study. Often we compare two or more statistical methods conditioning on the data context and then assess whether the differences between methods vary with the data context choices. E.g., if we compare an MLE to a robust estimator, which is better under a given set of choices about the data generating mechanism and how sensitive is the comparison to changing the features of the data generating mechanism? So the “treatment variable” is the choice of statistical method. We’re then interested in sensitivity to the conditions (different input values).
Often we can have a large number of replicates (
We might denote the data, which could be the statistical estimator under each of two methods as
One can think about choosing
When comparing methods, it’s best to use the same simulated datasets for each level of the treatment variable and to do an analysis that controls for the dataset (i.e., for the random numbers used), thereby removing some variability from the error term. A simple example is to do a paired analysis, where we look at differences between the outcome for two statistical methods, pairing based on the simulated dataset.
One can even use the “same” random number generation for the replicates under different conditions. E.g., in assessing sensitivity to a
Experimental Design (optional)
A typical context is that one wants to know the effect of multiple input variables on some outcome. Often, scientists, and even statisticians doing simulation studies will vary one input variable at a time. As we know from standard experimental design, this is inefficient.
The standard strategy is to discretize the inputs, each into a small number of levels. If we have a small enough number of inputs and of levels, we can do a full factorial design (potentially with replication). For example if we have three inputs and three levels each, we have
As the number of inputs and/or levels increases to the point that we can’t carry out the full factorial, a fractional factorial is an option. This carefully chooses which treatment combinations to omit. The goal is to achieve balance across the levels in a way that allows us to estimate lower level effects (in particular main effects) but not all high-order interactions. What happens is that high-order interactions are aliased to (confounded with) lower-order effects. For example you might choose a fractional factorial design so that you can estimate main effects and two-way interactions but not higher-order interactions.
In interpreting the results, I suggest focusing on the decomposition of sums of squares and not on statistical significance. In most cases, we expect the inputs to have at least some effect on the outcome, so the null hypothesis is a straw man. Better to assess the magnitude of the impacts of the different inputs.
When one has a very large number of inputs, one can use the Latin hypercube approach to sample in the input space in a uniform way, spreading the points out so that each input is sampled uniformly. Assume that each input is
Even amongst statisticians, taking an experimental design approach to a simulation study is not particularly common, but it’s worth considering.
3. Implementation of simulation studies
Luke Miratrix (a UCB Stats PhD alum) has prepared a nice tutorial on carrying out a simulation study, including helpful R code. So if the discussion here is not concrete enough or you want to see how to effectively implement such a study, see simulation_tutorial_miratrix.pdf and the similarly named R code file.
Computational efficiency
Parallel processing is often helpful for simulation studies. The reason is that simulation studies are embarrassingly parallel - we can send each replicate to a different computer processor and then collect the results back, and the speedup should scale directly with the number of processors we used. Since we often need to some sort of looping, writing code in C/C++ and compiling and linking to the code from Python may also be a good strategy, albeit one not covered in this course.
A handy function in Python is itertools.product
to get all combinations of a set of vectors.
import itertools
= ["low", "med", "hi"]
thetaLevels = [10, 100, 1000]
n = ["t", "norm"]
tVsNorm = list(itertools.product(thetaLevels, tVsNorm, n)) levels
Analysis and reporting
Often results are reported simply in tables, but it can be helpful to think through whether a graphical representation is more informative (sometimes it’s not or it’s worse, but in some cases it may be much better). Since you’ll often have a variety of scenarios to display, using trellis plots in ggplot2 via the facet_wrap
function will often be a good approach to display how results vary as a function of multiple inputs in R. In Python, it looks like there are various ways (RPlot
in pandas, seaborn, plotly), but I don’t know what the most standard way is.
You should set the seed when you start the experiment, so that it’s possible to replicate it. It’s also a good idea to save the current value of the seed whenever you save interim results, so that you can restart simulations (this is particularly helpful for MCMC) at the exact point you left off, including the random number sequence.
To enhance reproducibility, it’s good practice to post your simulation code (and potentially simulated data) on GitHub, on your website, or as supplementary material with the journal. Another person should be able to fully reproduce your results, including the exact random number generation that you did (e.g., you should provide code for how you set the random seed for your randon number generator).
Many journals are requiring increasingly detailed documentation of the code and data used in your work, including code and data for simulations. Here are the American Statistical Association’s requirements on documenting computations in its journals:
“The ASA strongly encourages authors to submit datasets, code, other programs, and/or extended appendices that are directly relevant to their submitted articles. These materials are valuable to users of the ASA’s journals and further the profession’s commitment to reproducible research. Whenever a dataset is used, its source should be fully documented and the data should be made available as on online supplement. Exceptions for reasons of security or confidentiality may be granted by the Editor. Whenever specific code has been used to implement or illustrate the results of a paper, that code should be made available if possible. [.…snip.…] Articles reporting results based on computation should provide enough information so that readers can evaluate the quality of the results. Such information includes estimated accuracy of results, as well as descriptions of pseudorandom-number generators, numerical algorithms, programming languages, and major software components used.”
4. Random number generation (RNG)
At the core of simulations is the ability to generate random numbers, and based on that, random variables. On a computer, our goal is to generate sequences of pseudo-random numbers that behave like random numbers but are replicable. The reason that replicability is important is so that we can reproduce the simulation.
Generating random uniforms on a computer
Generating a sequence of random standard uniforms is the basis for all generation of random variables, since random uniforms (either a single one or more than one) can be used to generate values from other distributions. Most random numbers on a computer are pseudo-random. The numbers are chosen from a deterministic stream of numbers that behave like random numbers but are actually a finite sequence (recall that both integers and real numbers on a computer are actually discrete and there are finitely many distinct values), so it’s actually possible to get repeats. The seed of a RNG is the place within that sequence where you start to use the pseudo-random numbers.
Sequential congruential generators
Many RNG methods are sequential congruential methods. The basic idea is that the next value is
Given the construction, such sequences are periodic if the subsequence ever reappears, which is of course guaranteed because there is a finite number of possible subsequence values given that all the
An example of a sequential congruential method is a basic linear congruential generator:
= 100
n = 171
a = 30269
m
= np.empty(n)
x 0] = 7306
x[
for i in range(1, n):
= (a * x[i-1]) % m
x[i]
= x / m
u
= np.random.default_rng(seed=1)
rng = rng.uniform(size=n)
uFromNP
=(10, 8))
plt.figure(figsize
2, 2, 1)
plt.subplot(range(1, n+1), u)
plt.plot("manual")
plt.title("Index"); plt.ylabel("Value")
plt.xlabel(
2, 2, 2)
plt.subplot(range(1, n+1), uFromNP)
plt.plot("numpy")
plt.title("Index"); plt.ylabel("Value")
plt.xlabel(
2, 2, 3)
plt.subplot(=25)
plt.hist(u, bins"Value"); plt.ylabel("Frequency")
plt.xlabel(
2, 2, 4)
plt.subplot(=25)
plt.hist(uFromNP, bins"Value"); plt.ylabel("Frequency")
plt.xlabel(
plt.tight_layout() plt.show()
A wide variety of different RNG have been proposed. Many have turned out to have substantial defects based on tests designed to assess if the behavior of the RNG mimics true randomness. Some of the behavior we want to ensure is uniformity of each individual random deviate, independence of sequences of deviates, and multivariate uniformity of subsequences. One test of a RNG that many RNGs don’t perform well on is to assess the properties of
Combining generators can yield better generators. The Wichmann-Hill is an option in R and is a combination of three linear congruential generators with
RNGkind("Wichmann-Hill")
set.seed(1)
<- .Random.seed
saveSeed <- runif(10)
uFromR <- c(171, 172, 170)
a <- c(30269, 30307, 30323)
m <- rep(0, 10)
u <- matrix(NA, nr = 10, nc = 3)
xyz 1, ] <- (a * saveSeed[2:4]) %% m
xyz[1] <- sum(xyz[1, ]/m) %% 1
u[for(i in 2:10) {
<- (a * xyz[i-1, ]) %% m
xyz[i, ] <- sum(xyz[i, ]/m) %% 1
u[i]
}for(i in 1:10)
print(c(uFromR[i], u[i]))
[1] 0.1297134 0.1297134
[1] 0.9822407 0.9822407
[1] 0.8267184 0.8267184
[1] 0.242355 0.242355
[1] 0.8568853 0.8568853
[1] 0.8408788 0.8408788
[1] 0.3421633 0.3421633
[1] 0.7062672 0.7062672
[1] 0.6212432 0.6212432
[1] 0.6537663 0.6537663
## we should be able to recover the current value of the seed
10, ] xyz[
[1] 24279 14851 10966
2:4] .Random.seed[
[1] 24279 14851 10966
PCG generators
Somewhat recently O’Neal (2014) proposed a new approach to using the linear congruential generator in a way that gives much better performance than the basic versions of such generators described above. This approach is now the default random number generator in numpy (see numpy.random.default_rng()
), called the PCG-64 generator. ‘PCG’ stands for permutation congruential generator and encompasses a family of such generators.
The idea of the PCG approach goes like this:
- Linear congruential generators (LCG) are simple and fast, but for small values of
don’t perform all that well statistically, in particular having values on a lattice as discussed above. - Using a large value of
can actually give good statistical performance. - Applying a technique called permutation functions to the state of the LCG in order to produce the output at each step (the random value returned to the user) can improve the statistical performance even further.
In the PCG approach, the state is usually a 64 or 128 bit integer. Instead of using relatively small values of
O’Neal then goes further; instead of simply discarding bits, she proposes to either shift bits by a random amount or rotate bits by a random amount, where the random amount is determined by a small number of the initial bits. This improves the statistical performance of the generator. The choice of how to do this gives the various members of the PCG family of generators. The details are fairly complicated (the PCG paper is 50-odd pages) and not important for our purposes here.
Mersenne Twister
A commonly used generator (including in both R and Python) is the Mersenne Twister. It’s the default in R and the old default in numpy (see next section for what I mean by “old default”).
The Mersenne Twister has some theoretical support, has performed reasonably on standard tests of pseudorandom numbers and has been used without evidence of serious failure. (But note that O’Neal criticizes it in her technical report.) Plus it’s fast (because bitwise operations are fast). The particular Mersenne twister used has a periodicity of .Random.seed[2]
in R and (I think) np.random.get_state()[2]
in Python.
The Mersenne twister is in the class of generalized feedback shift registers (GFSR). The basic idea of a GFSR is to come up with a deterministic generator of bits (i.e., a way to generate sequences of 0s and 1s),
numpy provides access to the Mersenne Twister via the MT19937
generator; more on this below. It looks like PCG-64 only became available as of numpy version 1.17.
The period versus the number of unique values generated
The output of the PCG-64 is 64 bits while for the Mersenne Twister the output is 32 bits. The result is that the generators generate fewer unique values than their periods. This means you could get duplicated values in long runs, but this does not violate the comment about the periodicity of PCG-64 and Mersenne-Twister being longer than
The seed and the state
Setting the seed picks a position in the periodic sequence of the RNG, i.e., in the state of the RNG. The state can be a single number or something much more complicated. As mentioned above, the state for the Mersenne Twister is a set of 624 32-bit integers plus a position in the set. For the PCG-64 in numpy, the state is two numbers – the actual state and the increment (c
above). This means that when the user passes a single number as the seed, there needs to be a procedure that deterministically sets the state based on that single number seed. The details of this are not usually well-documented or viewable by the user.
Ideally, nearby seeds generally should not correspond to getting sequences from the RNG stream that are closer to each other than far away seeds. According to Gentle (CS, p. 327) the input to set.seed()
in R should be an integer, set.seed()
and furthermore, you can pass integers larger than 1023 to set.seed()
, so I’m not sure how much to trust Gentle’s claim. More on generating parallel streams of random numbers below.
When one invokes a RNG without a seed, RNG implementations generally have a method for choosing a seed (often based on the system clock). The numpy documentation says that it “mixes sources of entropy in a reproducible way” to do this.
Generators should give you the same sequence of random numbers, starting at a given seed, whether you ask for a bunch of numbers at once, or sequentially ask for individual numbers.
Additional notes
There have been some attempts to generate truly random numbers based on physical randomness. One that is based on quantum physics is http://www.idquantique.com/true-random-number-generator/quantis-usb-pcie-pci.html. Another approach is based on lava lamps!
RNG in Python
Choosing a generator
In numpy, the default_rng RNG is PCG-64. It has a period of
However, while the default is PCG-64, simply using the functions available via np.random
to generate random numbers in the fashion of older versions of numpy actually uses the Mersenne Twister.
I think that this text from help(np.random)
explains what is going on:
Legacy
------
For backwards compatibility with previous versions of numpy before 1.17, the
various aliases to the global `RandomState` methods are left alone and do not
use the new `Generator` API.
We can change to a specific RNG using syntax (the Generator
API) like this:
= np.random.Generator(np.random.MT19937(seed = 1)) # Mersenne Twister
rng = np.random.Generator(np.random.PCG64(seed = 1)) # PCG-64 rng
but below note that there is a simpler way to change to the PCG-64.
Then to use that generator when doing operations that generate random numbers, we need to use methods accessed via the Generator
object (rng
here):
= 3) # Now generate based on chosen generator.
rng.random.normal(size ## np.random.normal(size = 3) # This will NOT use the chosen generator.
In R, the default RNG is the Mersenne twister (?RNGkind
).
Using PCG64
To use the PCG-64, we need to explicitly create and make use of the Generator
object (rng
here), which is the new numpy approach to handling RNG.
We set the seed when setting up the generator via np.random.default_rng(seed)
(or np.random.Generator(np.random.PCG64(seed = 1))
).
= np.random.default_rng(seed = 1)
rng = 5) rng.normal(size
array([ 0.34558419, 0.82161814, 0.33043708, -1.30315723, 0.90535587])
= np.random.default_rng(seed = 1)
rng = 5) rng.normal(size
array([ 0.34558419, 0.82161814, 0.33043708, -1.30315723, 0.90535587])
= rng.bit_generator.state
saved_state = 5) rng.normal(size
array([ 0.44637457, -0.53695324, 0.5811181 , 0.3645724 , 0.2941325 ])
= rng.choice(np.arange(1, 51), size=2000, replace=True)
tmp = saved_state
rng.bit_generator.state = 5) rng.normal(size
array([ 0.44637457, -0.53695324, 0.5811181 , 0.3645724 , 0.2941325 ])
saved_state
{'bit_generator': 'PCG64', 'state': {'state': 216676376075457487203159048251690499413, 'inc': 194290289479364712180083596243593368443}, 'has_uint32': 0, 'uinteger': 0}
'state']['state'] # actual state saved_state[
216676376075457487203159048251690499413
'state']['inc'] # increment ('c') saved_state[
194290289479364712180083596243593368443
saved_state
contains the actual state and the value of c
, the increment.
Question: how many bits does saved_state['state']['state']
correspond to?
Using the Mersenne Twister (optional)
If we simply start using numpy or scipy to generate random numbers without choosing the generator using , we’ll be using the Mersenne Twister. I believe this is what the documentation mentioned above means by “aliases to the global RandomState
methods”.
We get replicability by setting the seed to a specific value at the beginning of our simulation. We can then set the seed to that same value when we want to replicate the simulation.
1)
np.random.seed(= 5) np.random.normal(size
array([ 1.62434536, -0.61175641, -0.52817175, -1.07296862, 0.86540763])
1)
np.random.seed(= 5) np.random.normal(size
array([ 1.62434536, -0.61175641, -0.52817175, -1.07296862, 0.86540763])
We can also save the state of the RNG and pick up where we left off. So this code will pick where you had left off, ignoring what happened in between saving to saved_state
and resetting.
1)
np.random.seed(= 5) np.random.normal(size
array([ 1.62434536, -0.61175641, -0.52817175, -1.07296862, 0.86540763])
= np.random.get_state()
saved_state = 5) np.random.normal(size
array([-2.3015387 , 1.74481176, -0.7612069 , 0.3190391 , -0.24937038])
Now we’ll do some arbitrary work with random numbers, and see that if we use the saved state we can pick up where we left off above.
= np.random.choice(np.arange(1, 51), size=2000, replace=True) # arbitrary work
tmp
## Restore the state.
np.random.set_state(saved_state)= 5) np.random.normal(size
array([-2.3015387 , 1.74481176, -0.7612069 , 0.3190391 , -0.24937038])
If we look at saved_state
, we can confirm it actually corresponds to the Mersenne Twister.
RNG in parallel
We can generally rely on the RNG in Python and R to give reasonable set of pseudo-random values. One time when we want to think harder is when doing work with RNG in parallel on multiple processors. The worst thing that could happen is that one sets things up in such a way that every process is using the same sequence of random numbers. This could happen if you mistakenly set the same seed in each process, e.g., using np.random.seed(1)
on every process. Numpy now provides some nice functionality for parallel RNG, with more details given in the SCF parallelization tutorial.
5. Generating random variables
There are a variety of methods for generating from common distributions (normal, gamma, beta, Poisson, t, etc.). Since these tend to be built into Python and R and presumably use good algorithms, we won’t go into them. A variety of statistical computing and Monte Carlo books describe the various methods. Many are built on the relationships between different distributions - e.g., a beta random variable (RV) can be generated from two gamma RVs.
Multivariate distributions
The mvtnorm
package supplies code for working with the density and CDF of multivariate normal and t distributions.
To generate a multivariate normal, in Unit 10, we’ll see the standard method based on the Cholesky decomposition:
= np.linalg.cholesky(covMat) # L is lower-triangular
L = L @ np.random.normal(size = covMat.shape[0]) x
Side note: for a singular covariance matrix we can use the Cholesky with pivoting, setting as many rows to zero as the rank deficiency. Then when we generate the multivariate normals, they respect the constraints implicit in the rank deficiency. However, you’ll need to reorder the resulting vector because of the reordering involved in the pivoted Cholesky.
Inverse CDF
Most of you know the inverse CDF method. To generate
Rejection sampling
The basic idea of rejection sampling (RS) relies on the introduction of an auxiliary variable,
To implement this we draw from a larger set and then only keep draws for which
- generate
- generate
- if
then use ; otherwise go back to step 1
The intuition here is graphical: we generate from under a curve that is always above
If
One example of RS is to sample from a truncated normal. Of course we can just sample from the normal and then reject, but this can be inefficient, particularly if the truncation is far in the tail (a case in which inverse CDF suffers from numerical difficulties). Suppose the truncation point is greater than zero. Working with the standardized version of the normal, you can use an translated exponential with lower end point equal to the truncation point as the majorizing density (Robert 1995; Statistics and Computing). For truncation less than zero, just make the values negative.
Adaptive rejection sampling (optional)
The difficulty of RS is finding a good enveloping function. Adaptive rejection sampling refines the envelope as the draws occur, in the case of a continuous, differentiable, log-concave density. The basic idea considers the log of the density and involves using tangents or secants to define an upper envelope and secants to define a lower envelope for a set of points in the support of the distribution. The result is that we have piecewise exponentials (since we are exponentiating from straight lines on the log scale) as the bounds. We can sample from the upper envelope based on sampling from a discrete distribution and then the appropriate exponential. The lower envelope is used for squeezing. We add points to the set that defines the envelopes whenever we accept a point that requires us to evaluate
Importance sampling
Importance sampling (IS) allows us to estimate expected values. It’s an extension of the simple Monte Carlo sampling we saw at the beginning of the unit, with some commonalities with rejection sampling. Importance sampling comes up in a wide variety of contexts, so we’ll try to develop some intuition for how it works and how to reduce variance when using IS.
Here we don’t require the majorizing property, just that the densities have common support, but things can be badly behaved if we sample from a density with lighter tails than the density of interest. So in general we want
This suggests we can reduce variance in an IS context by oversampling
What if we actually want a sample from
Ratio of uniforms (optional)
If
One can also consider truncating the rectangular region, depending on the features of
Monahan recommends the ratio of uniforms, particularly a version for discrete distributions (p. 323 of the 2nd edition).