这份 notebook 结合了我自己的作业以及 Duncan 的答案和标准答案。
Before we move on to any real data, let us take a step back and look at a couple ways to solve for parameters from a distribution.
First let us draw 1000 samples from a Gaussian distribution and look at a histogram of the result.
As a reminder, a Gaussian distribution for parameter $x$ with mean $\mu$ and standard deviation $\sigma$ is given by
\begin{equation} p(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}. \end{equation}import numpy as np
import matplotlib.pyplot as plt
ndraws = 1000
mu = 10.
sigma = 1.
data = np.zeros(ndraws)
for i in range(ndraws):
data[i] = np.random.normal(mu,sigma)
plt.hist(data,bins=30)
plt.xlabel('Amplitude',size=15)
plt.ylabel('Bin Count',size=15)
plt.axvline(mu,color='k',linestyle='--')
plt.axvline(mu-sigma,color='k',linestyle='--')
plt.axvline(mu+sigma,color='k',linestyle='--')
plt.show()
Now let us suppose we only have the data
vector defined above, and want to find the mean value of that data. How do we go about that?
The first step is always to craft a data model. Let us switch to an observational standpoint. We start by assuming that the true value of the data $\mu'$ is a single value who's obserbed value has been obscured by some underlying Gaussian noise, i.e. $x = \mu' + n$ where $x$ is the data, and $n$ is some Gaussian noise. We will generalize this to the following equation, which ought to be baked into your memory
\begin{equation} d = s + n \end{equation}where $d$ is the observed data, $s$ is the true underlying signal, and $n$ is again a noise term. This formalism is the basis of our component separation methods and will be expanded upon further. Down the road.
Now that we have a simple data model, how does one go about actually determining the true underlying signal $s$? A good starting point is by finding what is called the maximum-likelihood solution, using maximum likelihood estimation (MLE). A cursory Google search for maximum likelihood estimation will provide a mix of math heavy resources (like Wikipedia), and some practical examples (like towardsdatascience).
Here we will stick with a Gaussian assumption, i.e. that the noise inherent in the data has a Gaussian form.
For any observed point $x_i$, the probability of observing that data point, given the underlying mean ($\mu$) and standard deviation ($\sigma$) of the distribution is given by
$p(x_i|\mu,\,\sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x_i-\mu)^2}{2\sigma^2}}$,
and the probability of observing say 3 data points $x_1$, $x_2$, and $x_3$ is given by
$p(x_1,x_2,x_3|\mu,\,\sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x_1-\mu)^2}{2\sigma^2}}\times\frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x_2-\mu)^2}{2\sigma^2}}\times\frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x_3-\mu)^2}{2\sigma^2}}$.
Let us take all of our observations into account and condense our probability expression (the likelihood) to be
\begin{equation} p(\vec{x}|\mu,\,\sigma) = {\displaystyle \prod_{i=1}^{1000}}\frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x_i-\mu)^2}{2\sigma^2}} \end{equation}From our Calculus experience, we may note that a sure fire way to determine the maximum or minimum of a function is by taking its derivative. However we can tell by looking at the above equation that taking the derivative of this thing will be a cumbersome operation.
Those versed in MLE will be quick to point out that taking the natural logarithm of the above expression gives us something nice to work with!
To be explicit, finding the maximum likelihood solution involves finding $\mu$ where
\begin{equation} \frac{\partial \ln (p(\vec{x}|\mu,\,\sigma))}{\partial \mu} = 0. \end{equation}Before we move on, let us touch on a couple final pieces of information which will be important for the future. First let's consider the equation we are solving here $p(\vec{x}|\mu,\,\sigma)$. This equation reads "probability of the data given the parameters $\mu$ and $\sigma$. Since we already know the data, and don't know the values of $\mu$ and $\sigma$ a priori, it appears that this equation itself isn't very useful. However, the likelihood of having $\mu$ and $\sigma$ given some data $\vec{x}$ is defined as
\begin{equation} L(\mu,\,\sigma|\vec{x}) = p(\vec{x}|\mu,\,\sigma). \end{equation}Though the equations are equal, they ask different questions. The left-hand side is concerned about the likelihood of the parameters given the data, while the right-hand side is asking about what the probability of observing the data is, given the parameters. Hence the phrasing Maximum Likelihood Solution.
Finally, as we will see often later on, let us clearly define the log-likelihood as
\begin{equation} \mathcal{L}(\mu,\,\sigma|\vec{x}) = \ln\,p(\vec{x}|\mu,\,\sigma) \end{equation}Compiled with help from towardsdatascience.com.
where $N=1000$ (hereafter).
Since $\sigma$ is probably a non-zero constant, the mean of the data can be expressed as
$$ \mu = \frac{1}{N}\sum_{i=1}^N x_i $$Create a function which takes in the data and a noise ($\sigma$) estimate and returns the log-likelihood as a function of the mean. Plot the result.
Create a function which solves for the ML solution given the data, and compare your answer to the value of $\mu$ given in the first cell.
def return_log_likelihood(data,sigma):
samples = np.shape(data)[0]
# Make a grid of uniform possible mean values
mu_grid = np.linspace(np.min(data),np.max(data),1000)
# Calculate the log-likelihood given the data for each mu in mu_grid
lnL = np.zeros(samples)
lnL -= np.array([ np.sum((data - mu) ** 2) for mu in mu_grid ]) / (2*sigma**2)
lnL -= np.log(2 * np.pi * sigma**2) * samples / 2
return mu_grid,lnL
# Get the values
x, lnL = return_log_likelihood(data,sigma)
# Plot
plt.plot(x,lnL)
plt.xlabel('$\mu$',size=15)
plt.ylabel('$\mathcal{L(\mu)}$',size=15)
plt.show()
def solve_for_ML(data):
samples = np.shape(data)[0]
amplitude = np.sum(data)/samples
return amplitude
print(solve_for_ML(data))
10.016836442176967
As we will see further on, determing the mean of an observable can be aided by (or in practice requires) the addition of multiple data points. As we move more towards a realistic example, we will say that we have observations at multiple frequencies. For a simple example, let's consider some observable which scales linearly with frequency.
Let's create a new data set, where our observable has a mean value $a_{obs} = \mu$, and each observation frequency has its own noise characterisitic $n_{\nu}$.
When carrying out this multi-frequency analysis, we need to determine a reference frequency $\nu_{\rm ref}$, at which we evaluate the recovered amplitude $a_{obs}$.
Therefore, our new data model will be written as
$d_{\nu} = a_{obs}(\nu/\nu_{\rm ref}) + n_{\nu}$.
# New variables
nfreq = 5
nus = np.asarray([10., 20., 30., 40., 50.])
sigmas = np.asarray([1., 1.5, 1.25, 2., 0.75])
# Remake our data array
data = np.empty((ndraws,nfreq))
# Reference frequency
nu_ref = 20.
# Define a frequency scaling function for the data
def gen_spectrum(nu, nu_ref):
return (nu/nu_ref)
# Simulate some data
for i in range(ndraws):
for j in range(nfreq):
data[i][j] = mu*gen_spectrum(nus[j],nu_ref) + np.random.normal(0.0,sigmas[j])
What does this data look like for us?
data_mean = np.empty(nfreq)
data_error = np.empty(nfreq)
for j in range(nfreq):
data_mean[j] = np.mean(data.T[j])
data_error[j] = np.std(data.T[j])
plt.errorbar(nus,data_mean,yerr=data_error,fmt='.',color='k')
plt.xlabel('"Frequency"', size=15)
plt.ylabel('Amplitude', size=15)
plt.show()
We can see that it would be simple to draw a straight line through the points, though many straight lines may fit the through all of the error bars. It is vital that we make an estimate of the best fit to the data, not just a fit.
return_log_likelihood
function to work for the multi-frequency data and plot $\mathcal{L}(a_{obs}|\boldsymbol d)$ as a function of $a_{obs}$.solve_for_ML
function to work for the multi-frequency data and find the ML value of $a_{obs}$.The likelihood for "multi-frequency" is sharper which indicates a more precise measurement of $a_{obs}$.
# Task 2: Multi-frequency generalization @Jinyi
def return_log_likelihood(nus, data, sigma, spectrum):
samples, nfreq = data.shape
# Make a grid of uniform possible mean values
mu_grid = np.linspace(np.min(data), np.max(data), 1000)
# Calculate the log-likelihood given the data for each mu in mu_grid
lnL = np.zeros(samples)
for i, mu in enumerate(mu_grid):
chisq = (data - mu*np.ones_like(data)*spectrum(nus,nu_ref))**2*sigma**-2
lnL[i] -= np.sum(chisq) / 2
lnL[i] -= np.sum(np.log(2*np.pi*sigma**2)) * samples / 2
return mu_grid, lnL
# Get the values
x, lnL = return_log_likelihood(nus, data, sigmas, gen_spectrum)
# Plot
plt.plot(x,lnL)
plt.xlabel('$\mu$',size=15)
plt.ylabel('$\mathcal{L(\mu)}$',size=15)
plt.show()
def solve_for_ML(nus, data, sigma, spectrum):
x, lnL = return_log_likelihood(nus, data, sigma, spectrum)
amplitude = x[lnL==lnL.max()]
return amplitude
print(solve_for_ML(nus, data, sigmas, gen_spectrum))
[10.00078908]
Up until now we have taken a very simple example, where we have a data vector containing samples, and we find the MLS. Before we move to drawing samples from noisy data, we will define some more general nomenclature. Our original data model looked something like
\begin{equation} d = s + n. \end{equation}We then expanded the actual signal into an amplitude and a frequency scaling
\begin{equation} s = a_{obs}(\nu/\nu_{\rm ref}). \end{equation}This expansion of the signal term can be generalized in the following way
\begin{equation} s = Ta, \end{equation}where $T$ is a matrix containing the scaling relation information (in this simple case $T=(\nu/\nu_{\rm ref})$), and $a$ is a vector corresponding to the amplitudes we are looking for $a=a_{ obs}$.
In essence, by finding the ML solution, we are solving the equation
\begin{equation} (T^t N^{-1} T)a = T^t N^{-1} d, \end{equation}where $N^{-1}$ is the noise covariance matrix.
To be explicit, through the remainder of this notebook we will be looking and multi-frequency observations (vital in Cosmological Component Separation) and aim to solve for the true signal sky signal $s_{\nu}$ with the data model
\begin{equation} d_{\nu} = s_{\nu} + n_{\nu}, \end{equation}by solving the linear equation
\begin{equation} {\displaystyle \sum_{\nu}(T_{\nu}^t N_{\nu}^{-1} T_{\nu})a = \sum_{\nu}T_{\nu}^t N_{\nu}^{-1} d_{\nu}}. \end{equation}This equation will be expanded upon to include sampling and prior terms throughout this notebook.
Note that when we move to more complex systems (many data sets with many pixels), filling and multiplying these matricies can be very expensive both computationally and for memory usage. A student interested in should read "An Introduction to the Conjugate Gradient Method withouth the Agonizing Pain" by Jonathan Shewchuk. It is excellent. Additionally, Numerical Recipes is always your friend when trying to solve complex linear systems computationally.
Before getting too far into the weeds here, we will introduce Bayes' Theorem as well as a practical definition for our purposes. Given a set of parameters $\vec{\omega} = \{\omega_1,...,\omega_k\}$, we will define the posterior distribution of $\boldsymbol \omega$ with respect to the data as
\begin{equation} P(\vec{\omega}\mid\boldsymbol{d}) = \frac{P(\boldsymbol{d} \mid \vec{\omega}) P(\vec{\omega})}{P(\boldsymbol{d})} \propto \mathcal{L}(\vec{\omega}) P(\vec{\omega}), \end{equation}where $ P(\boldsymbol{d}\mid\vec{\omega}) \equiv \mathcal{L}(\vec{\omega})$ is the likelihood function (as we have talked about above), $P(\vec{\omega})$ is the prior, and $P(\boldsymbol{d})$ is a normalization factor which we disregard here as it is independent of the parameters $\vec{\omega}$.
In Chapter 1 we have simply drawn Gaussian samples of the data using numpy
, having already known the mean and standard deviation of our data. In map-space analysis, we work with sky maps that are characterized by a total sky signal amplitude $s_{\nu}$, with an estimate of the noise $n_{\nu}$, giving us our data
This has been stated before and will be stated again in hopes of making you see the above equation when you close your eyes. We now wish to sample for $s_{\nu}$, the true sky signal, seeing as we know the data $d_{\nu}$ and have an estimate of the noise $n_{\nu}$.
Here we introduce three methods which we can use for sampling parameters. The Metropolis and Metropolis-Hastings methods are broadly used in MCMC methods and will be introduced here. These methods are usable in the general sense. Inversion Sampling is also introduced, which is used in the Commander/Cosmoglobe framework for sampling non-linear parameters, i.e., spectral indices. Finally from a Gaussian distribution is introduced. This is a core method to learn for sampling linear parameters and will be expanded upon in the next Chapter to the multi-variate case. In the Cosmoglobe/Commander framework this method is used to sample for Galactic foreground amplitudes.
The Metropolis method is by far the most well-known of all of the MCMC sampling methods due to it's ease of implementation, tunability, and effectiveness. The Metropolis and Metropolis-Hastings algorithms are effective as all samples with a higher likelihood are accepted, while some samples with lower likelihoods will be accepted from time to time (this can be helpful to avoid becoming stuck around a local likelihood peak).
Let us say we wish to sample for some parameter $\omega$. Let $\omega_j$ represent te $j$th sample of $\omega$ in a Markov chain. We also need to define a stochastic proposal probablility distribution, $T(\omega_{j+1}\mid \omega_{j})$, with which we tune in order to help with parameter convergence. In the Metropolis method, the proposal distribution is symmetric, i.e., $T(\omega_{j+1}\mid \omega_{j})=T(\omega_{j}\mid \omega_{j+1})$. In the Metropolis-Hastings method, this proposal distribution need not be symmetric.
The next thing we need to define is the acceptance criterion, which tells us which samples we will accept and which ones we will not. This acceptance criterion is defined by the ratio of the densities, which is given by the ratio of the posterior probabilities as defined by Bayes' theorem ($\frac{P(\boldsymbol d \mid \omega_{j+1})}{P(\boldsymbol d \mid \omega_{j})}$). If a prior is used to inform our sampling, this ratio is weighted by the ratio of the prior probabilities. Therefore our total definition of the acceptance probability $q$ is defined as
\begin{equation} q = \frac{P(\boldsymbol d \mid \omega_{j+1})}{P(\boldsymbol d \mid \omega_{j})}\frac{P(\omega_{j+1})}{P(\omega_j)}. \end{equation}With these defined, we can now outline the algorithm itself. The algorthim is given by:
Now that we have sampled the two amplitude parameters, we want to sample spectral parameters, which do not depend linearly on the data $d$, and are not Gaussian distributions, hence demand a different approach. To sample these two conditional distributions, we employ the "inversion sampler", which is completely general and works for all univariate distributions. In our case, we have the distribution \begin{align} P(\beta\mid d, a) &\propto P(d\mid a, \beta) P(\beta)\\ &\propto \left[\prod_{\nu} \mathrm \exp\left(-\frac{1}{2}\left(d_{\nu}-T(\beta)a\right)^t N_{\nu}^{-1}\left(d_{\nu}-T(\beta)a\right)\right)\right]P(\beta). \end{align}
Using this for P(x) we can sample using the inversion sampler by:
We may write $d = Ta+n = f(\omega)\cdot A + n$ where $\omega$ is the set of spectral indices (such as $\beta_{\rm s}$ or $T_{\rm d}$ for dust). If we rewrite this and assume that the noise is gaussian $n = d - Ta$ we can express the distribution as \begin{align} P(a\mid d, \omega\setminus a) &\propto P(d\mid\omega)P(a)\\ &\propto P(d\mid a)P(a)\\ &\propto \left[\prod_{\nu} \exp\left(-\frac{1}{2}(d_{\nu}-T_{\nu}a)^t N_{\nu}^{-1}(d_{\nu}-T_{\nu}a)\right)\right]\cdot P(a) \end{align} which can be sampled by solving for $a$ in $$\biggl(S^{-1} + \sum_{\nu}T^t_{\nu}N_{\nu}^{-1}T_{\nu}\biggr)\,a = \sum_\nu T_{\nu}^t N_\nu^{-1}d_{\nu}+\sum_\nu T_{\nu}^t N_\nu^{-1/2}m_{\nu} + \sum_{\nu}T_{\nu}^tN_{\nu}^{-1/2}\eta_{\nu} + S^{-1/2}\eta_{0},$$ where S is a prior standarad deviation, N is the noise, m is the prior mean, and $\eta$ are random normal values. For a full explanation on this, look at BeyondPlanck 1, appendix 2.
Note here that the simplified case (no prior)
\begin{equation} \biggl(\sum_{\nu}T^t_{\nu}N_{\nu}^{-1}T_{\nu}\biggr)\,a = \sum_\nu T_{\nu}^t N_\nu^{-1}d_{\nu} + \sum_{\nu}T_{\nu}^tN_{\nu}^{-1/2}\eta_{\nu}, \end{equation}is the same equation at the end of the previous section with an additional sampling term.
We can also think of the above equation as
\begin{equation} a = \biggl(\sum_{\nu}T^t_{\nu}N_{\nu}^{-1}T_{\nu}\biggr)^{-1}\sum_\nu T_{\nu}^t N_\nu^{-1}d_{\nu} + \biggl(\sum_{\nu}T^t_{\nu}N_{\nu}^{-1}T_{\nu}\biggr)^{-1}\sum_{\nu}T_{\nu}^tN_{\nu}^{-1/2}\eta_{\nu}, \end{equation}which is nice for the most simple cases.
return_log_likelihood
and solve_for_ML
functions
(Hint: for our new version of solve_for_ML
, let's solve the equation $\biggl(\sum_{\nu}T^t_{\nu}N_{\nu}^{-1}T_{\nu}\biggr)\,a = \sum_\nu T_{\nu}^t N_\nu^{-1}d_{\nu}$).Some notes for the code. There are $m$ frequencies. $$ P(\boldsymbol{d}|a,\beta)=\frac{1}{(2\pi)^{m/2}|N|^{1/2}}\exp\left[-\frac{1}{2}(\boldsymbol{d}-Ta)^tN^{-1}(\boldsymbol{d}-Ta) \right] $$
$$ \mathcal{L}(\boldsymbol{d}|a,\beta)=-\frac{1}{2}\ln (2\pi)^m |N|-\frac{1}{2}(\boldsymbol{d}-Ta)^tN^{-1}(\boldsymbol{d}-Ta) $$Since $T$ and $N$ are both diagonal here, I treat them as vetors in later calculations.
nbands = 4
newamp = 7.5
frequencies = np.asarray([10.,40.,70.,100.])
ref_freq = 40.
noise = np.asarray([0.75, 1.25, 2.0, 0.5])
data = newamp*gen_spectrum(frequencies,ref_freq)+np.random.normal(0.0,noise)
spectrum = lambda freq: freq/ref_freq
def eval_loglikelihood(data,noise,spectrum,amplitude,frequencies):
nbands = len(frequencies)
T = spectrum(frequencies)
N = noise ** 2
lnL = -0.5*np.log((2*np.pi)**nbands*np.prod(N))-0.5*np.sum((data - T*amplitude)**2*N**-1 )
return lnL
amps = np.linspace(6, 9, 1000)
lnLs = np.zeros_like(amps)
for i in range(len(amps)):
lnLs[i] = eval_loglikelihood(data, noise, spectrum, amps[i], frequencies)
plt.plot(amps, np.exp(lnLs))
def solve_for_ML(data,noise,spectrum,frequencies):
T = spectrum(frequencies) # d=Ta+n
N = noise ** 2
b = sum(T*N**-1*data) # Ax=b
A = sum(T**2*N**-1)
amp_ML = b/A
return amp_ML
amp_ML = solve_for_ML(data,noise,spectrum,frequencies)
plt.axvline(amp_ML, c='r')
<matplotlib.lines.Line2D at 0x7f843ae79e48>
from tqdm import tqdm
def metropolis_sample(data,noise,spectrum,amplitude,frequencies,niter):
samples = np.zeros(niter)
amp_sample = amplitude
# Define the step length of the proposal distribution
step_size = 0.5
# Evaluate the inital likelihood P(data|amp_sample)
lnL_old = eval_loglikelihood(data,noise,spectrum,amp_sample,frequencies)
# We will adopt a Gaussian distribution for our proposal distribution
for i in tqdm(range(niter)):
q = -1
# If q >= U[0,1] accept, else, try again
while q < np.random.rand():
sample = amp_sample + np.random.normal(0.0, step_size)
# Evaluate the probability of new sample
lnL_new = eval_loglikelihood(data,noise,spectrum,sample,frequencies)
# Calculate the ratio of the likelihoods
# Calculate acceptance probability
q = np.exp(lnL_new - lnL_old)
lnL_old = lnL_new
amp_sample = sample
# Save sample so we can look at the chain afterwards
samples[i] = amp_sample
return samples
def inversion_sample(data,noise,spectrum,xs,frequencies,niter):
samples = np.zeros(niter)
# Compute P(x) over a grid in x ^^(xs)
Px = np.exp(np.asarray([ eval_loglikelihood(data, noise, spectrum, x, frequencies) for x in xs ]))
# Compute the cumulative probability distribution
Fx = np.cumsum(Px) / np.sum(Px)
for i in tqdm(range(niter)):
# draw a random number
eta = np.random.uniform(0,np.max(Fx))
# Find the value of the parameter which corresponds to the random number (i.e. solve for x where eta = F(x))
# It's a simple interpolation to avoid import other modules e.g. scipy. @Jinyi
# As long as we use enough grid points, there will be no problem here. @Jinyi
samples[i] = np.interp(eta, Fx, xs)
return samples
def gaussian_sample(data,noise,spectrum,frequencies,niter):
# In this single pixel, single foreground case, the matrix T_nu is given
# by the spectrum at each frequency, as we attempted to described previously
T = spectrum(frequencies)
N = noise**2
nbands = len(frequencies)
# Compute the static parts of the equation (no eta)
amp_ML = solve_for_ML(data,noise,spectrum,frequencies)
# For each iteration draw a random normal for each entry of eta
samples = np.zeros(niter)
for i in range(niter):
eta = np.random.normal(size=nbands)
# Compute the sampling portion and save the sample
samples[i] = amp_ML + np.sum(T**2 * N**-1)**-1 * np.sum(T * N**-0.5 * eta)
return amp_ML, samples
# Sampling
niter = 100000
fig, axes = plt.subplots(nrows=2, sharex=True, figsize=(6, 12))
amp_ML, samples = gaussian_sample(data,noise,spectrum,frequencies,niter)
axes[0].set_xlim([6.8, 8.2])
axes[0].plot(samples, np.arange(len(samples)), 'k.', ms=1, alpha=0.2)
_=axes[1].hist(samples, 200)
axes[0].axvline(amp_ML, label="ML", c='magenta')
axes[0].axvline(newamp, label="Input", c='lime')
axes[0].legend()
fig, axes = plt.subplots(nrows=2, sharex=True, figsize=(6, 12))
samples = metropolis_sample(data,noise,spectrum,4,frequencies,niter)
axes[0].set_xlim([6.8, 8.2])
axes[0].plot(samples, np.arange(len(samples)), 'k.', ms=1, alpha=0.3)
_=axes[1].hist(samples, 200)
axes[0].axvline(amp_ML, label="ML", c='magenta')
axes[0].axvline(newamp, label="Input", c='lime')
axes[0].legend()
fig, axes = plt.subplots(nrows=2, sharex=True, figsize=(6, 12))
samples = inversion_sample(data,noise,spectrum,np.linspace(6,9,1000),frequencies,niter)
axes[0].set_xlim([6.8, 8.2])
axes[0].plot(samples, np.arange(len(samples)), 'k.', ms=1, alpha=0.3)
_=axes[1].hist(samples, 200)
axes[0].axvline(amp_ML, label="ML", c='magenta')
axes[0].axvline(newamp, label="Input", c='lime')
axes[0].legend()
100%|██████████| 100000/100000 [00:06<00:00, 16296.67it/s] 100%|██████████| 100000/100000 [00:00<00:00, 137573.92it/s]
<matplotlib.legend.Legend at 0x7f843c351748>
In this chapter we will simulate a simplified example of the Microwave Sky. We will consider a single-pixel example that includes the CMB, synchrotron, and thermal dust radiation, with frequencies similar to that of the Planck satellite.
Let's begin by first importing a couple useful functions and constants. We will also introduce an array of frequencies corresponding to the Planck satellite band's central frequencies, as well as their corresponding polarization sensitivities. Will use these for the remainder of the exercises. You are of course free to add your own frequencies and sensitivities if you wish.
# Constants
kb = 1.38e-23 # Boltzmanns constant
Tcmb = 2.7255 # CMB temperature
h = 6.626e-34 # Plancks constant
#Pixel size. Small number => large pixels => little noise pr pixel
nside = 64
#Frequency bands for Planck
nbands = 9
freq = np.array([30., 44., 70., 100., 143., 217., 353., 545., 857.])
#Noise level pr pixel, pr frequency
sigma = np.array([1.2, 2.7, 3.2, 5.6, 2.1, 3.8, 1.3, 2.9, 1.7])*nside/512
# Unit conversion between Rayleigh–Jeans units and cmb units
def Kcmb_to_Krj(frequency):
xx = h*frequency*1.0e9/(kb*Tcmb)
return xx**2*np.exp(xx)/((np.exp(xx)-1.)**2)
Now we can simulate our data using this sky model for a given set of frequencies. Below we set the model parameters, the set of frequencies, the level of noise for each frequency and which frequency we use as pivot frequency (reference frequency) for dust and syncrotron.
We have added all the Planck frequencies for to illuminate the coverage of the experiment. We will begin with an example similar to that of our last. Synchrotron radiation is, with both physical and observation support, assume to have a power-law frequency scaling (in $\mu K_{\rm RJ}$ units, see Synchrotron lecture). This is essentially the same model we just ran with, with just an additional term. Explicitly the synchrotron sky signal at frequency $\nu$, $s_{\rm s,\,\nu}$ is given by
\begin{equation} s_{\rm s,\,\nu} = A_{\rm s} \Big(\frac{\nu}{\nu_{\rm 0,\,s}}\Big)^{\beta_{\rm s}}, \end{equation}where $\beta_{\rm s}$ has typical values between -3.0 and -3.2. Let us begin by defining a simple class to store our synchrotron information.
class synch_comp:
def __init__(self,nu_ref,amplitude,beta):
self.par = np.empty(2)
self.nu_ref = nu_ref
self.par[0] = amplitude
self.par[1] = beta
def spectrum(self,nu,beta=None):
beta = self.par[1] if beta is None else beta
return (nu/self.nu_ref)**beta
def model(self,nu):
# self.model = self.par[0]*self.spectrum(self.nu_ref)
return self.par[0]*self.spectrum(nu)
Now we will simulate the synchrotron sky signal across the Planck frequency channels, and sample the corresponding parameters (i.e. $A_{\rm s}$ and $\beta_{\rm s}$).
# We will adopt a reference frequency of 30 GHz as this is where we have the highest synchrotron signal-to-noise
# We Will also take a fiducial value of 12. muK_{RJ} for the amplitude
# Finally we will also assume that beta_s = -3.1
synch = synch_comp(30.0,12.,-3.1)
d_synch = np.empty(nbands)
s_synch = np.empty(nbands)
for j in range(nbands):
s_synch[j] = synch.par[0]*synch.spectrum(freq[j],synch.par[1])
d_synch[j] = synch.par[0]*synch.spectrum(freq[j],synch.par[1]) + np.random.normal(0.0,sigma[j])
plt.scatter(freq,s_synch)
plt.scatter(freq,d_synch)
plt.yscale('symlog')
plt.xscale('log')
print(d_synch)
[11.77644852 3.88449349 0.26025146 -0.3637787 0.24193864 0.6130427 0.12752236 -0.43809018 -0.28245745]
If we print out the data we can see that we quickly hit the "noise floor", where the magnitude of the synchrotron emission is lower than the uncertainty in our measurement. Gathering high signal-to-noise observations is vital in CMB experiment, specifically in polarization as the CMB signal is very weak. Hopefully we will see some of the difficulties this can cause us through these exercises.
# ML solution for As
As_ML = solve_for_ML(d_synch, sigma, synch.spectrum, freq)
print('As_ML = %f muK' % As_ML)
As_ML = 11.787656 muK
# Gaussian sampling.
niter = 100000
def chisq(data, model, sigma):
return np.sum((data - model) ** 2 * sigma ** -2)
# Plot chain of samples
fig, axes = plt.subplots(nrows=3, sharex=True, figsize=(6, 16))
amp_ML, samples = gaussian_sample(d_synch,sigma,synch.spectrum,freq,niter)
axes[0].plot(samples, np.arange(len(samples)), 'k.', ms=1, alpha=0.3)
_=axes[1].hist(samples, 100)
axes[0].axvline(amp_ML, label="ML", c='magenta')
axes[0].axvline(12., label="Input", c='lime')
axes[0].legend()
As_array = np.linspace(samples.min(), samples.max(), 100)
axes[2].plot(As_array, np.asarray([chisq(d_synch, As * synch.spectrum(freq), sigma) for As in As_array]), lw=10, alpha=0.3)
axes[2].scatter(samples, np.asarray([chisq(d_synch, As * synch.spectrum(freq), sigma) for As in samples]), s=3, alpha=0.1, c='k')
<matplotlib.collections.PathCollection at 0x7f843aeb9588>
fig, ax = plt.subplots()
ax.scatter(freq, s_synch, label='s_synch')
ax.scatter(freq, sigma, label='sigma')
ax.legend()
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xticks(freq)
ax.set_xticklabels([ '%d' % _ for _ in freq ])
plt.show()
For data cuts, I will only choose [33, 44, 70, 100, 143] bands for $A_s$ estimation where the noise does not significantly larger than the synchrotron signal.
# Data cuts
d_synch_cut = d_synch[:5]
sigma_cut = sigma[:5]
freq_cut = freq[:5]
# ML solution for As
As_ML = solve_for_ML(d_synch_cut, sigma_cut, synch.spectrum, freq_cut)
print('As_ML = %f muK' % As_ML)
# Gaussian sampling.
niter = 100000
# Plot chain of samples
fig, axes = plt.subplots(nrows=3, sharex=True, figsize=(6, 16))
amp_ML, samples = gaussian_sample(d_synch_cut,sigma_cut,synch.spectrum,freq_cut,niter)
axes[0].plot(samples, np.arange(len(samples)), 'k.', ms=1, alpha=0.3)
_=axes[1].hist(samples, 100)
axes[0].axvline(amp_ML, label="ML", c='magenta')
axes[0].axvline(12., label="Input", c='lime')
axes[0].legend()
As_array = np.linspace(samples.min(), samples.max(), 100)
axes[2].plot(As_array, np.asarray([chisq(d_synch, As * synch.spectrum(freq), sigma) for As in As_array]), lw=10, alpha=0.3)
axes[2].scatter(samples, np.asarray([chisq(d_synch, As * synch.spectrum(freq), sigma) for As in samples]), s=3, alpha=0.1, c='k')
As_ML = 11.787496 muK
<matplotlib.collections.PathCollection at 0x7f843b692eb8>
The maximum likelihood $A_s$ doesn't change too much after date cuts. And the samples chain doesn't seem to change a lot either. So data cuts didn't improve $A_s$ estimation in my results.
I think that's because the noise realizaiton in higher frequency bands didn't go far too away from the typical noise level, and when we did ML estimation of $A_s$, the matrix $N^{-1}$ had already given these data a proper smaller weight, and thus the data in higher frequency bands didn't affect my results a lot.
If the noise realization is far too away from the typical noise level, I believe it's better to make data cuts especially when we have only a few data points.
# Keep this function simple
def eval_loglikelihood(data, model, noise):
N = noise ** 2
lnL = - 0.5*np.log((2*np.pi)**len(data)*np.prod(N)) - 0.5*np.sum((data - model)**2 * N**-1)
return lnL
def inversion_sample(data, comp_class, noise, frequencies, niter, *args):
samples = np.zeros(niter)
''' About
-----
* args enter into comp_class to get the model
* args[0] = ref frequency
* args[1:] = forground model parameters e.g. As and beta
* type(arg) in args = int/float/list/numpy.ndarray
Example
-------
To sample beta in synchrotron model:
>>> nu_ref = 30.0
>>> amp, betas = 12.0, np.linspace(-4, -1, 1000)
>>> samples = inversion_sample(data, synch_class, noise, freq, niter, nu_ref, amp, betas)
To sample As in synchrotron model:
>>> nu_ref = 30.0
>>> amps, beta = np.linspace(10, 14, 1000), -3.1
>>> samples = inversion_sample(data, synch_class, noise, freq, niter, nu_ref, amps, beta)
'''
# Search for xs @Jinyi
for arg in args:
if type(arg) in [list, np.ndarray]:
xs, nx = np.array(arg), len(arg)
break
# NOTE speed-up needed, there must be a smarter way @Jinyi
pars = np.array([ list(_) if type(_) in [list, np.ndarray] else [_]*nx for _ in args ]).T
Px = np.empty(nx)
for i,par in enumerate(pars):
# Might be time-consuming here by creating a new instance in the loop. @Jinyi
# A simple function will save nearly half of time here. @Jinyi
# However in my analysis, I use at most 1000 grid points, so it won't be expensive. @Jinyi
model = comp_class(*par).model(frequencies)
Px[i] = np.exp(eval_loglikelihood(data, model, noise))
# Compute the cumulative probability distribution
Fx = np.cumsum(Px) / np.sum(Px)
for i in range(niter):
# draw a random number
eta = np.random.uniform(0,np.max(Fx))
# Find the value of the parameter which corresponds to the random number (i.e. solve for x where eta = F(x))
# It's a simple interpolation to avoid importing other modules e.g. scipy. @Jinyi
# As long as we use enough grid points, there will be no problem here. @Jinyi
samples[i] = np.interp(eta, Fx, xs)
return samples
# Synchrotron parameters
nu_ref = 30.
amp = 12.
freq = np.array([30., 44., 70., 100., 143., 217., 353., 545., 857.])
sigma = np.array([1.2, 2.7, 3.2, 5.6, 2.1, 3.8, 1.3, 2.9, 1.7]) * nside / 512
spectrum = lambda nu: nu/nu_ref
betas = np.linspace(-4, -2, 1000)
niter = 100000
samples = inversion_sample(d_synch, synch_comp, sigma, freq, niter, nu_ref, amp, betas)
# Plot chain of samples
fig, axes = plt.subplots(nrows=3, sharex=True, figsize=(6, 16))
lnL = np.array([eval_loglikelihood(d_synch, amp*spectrum(freq)**beta, sigma) for beta in betas])
axes[0].plot(betas, lnL)
beta_s_ML = betas[lnL==max(lnL)]
print(beta_s_ML)
axes[0].axvline(beta_s_ML, c='magenta', label="ML")
axes[0].axvline(-3.1, c='lime', label='Input')
axes[0].legend()
axes[1].plot(samples, np.arange(len(samples)), 'k.', ms=1, alpha=0.3)
_=axes[2].hist(samples, 100)
[-3.1011011]
可以看到谱指数的分布并不是高斯的。
# Datacut
samples = inversion_sample(d_synch[:2], synch_comp, sigma[:2], freq[:2], niter, nu_ref, amp, betas)
# Plot chain of samples
fig, axes = plt.subplots(nrows=3, sharex=True, figsize=(6, 16))
lnL = np.array([eval_loglikelihood(d_synch, amp*spectrum(freq)**beta, sigma) for beta in betas])
axes[0].plot(betas, lnL)
beta_s_ML = betas[lnL==max(lnL)]
print(beta_s_ML)
axes[0].axvline(beta_s_ML, c='magenta', label="ML")
axes[0].axvline(-3.1, c='lime', label='Input')
axes[0].legend()
axes[1].plot(samples, np.arange(len(samples)), 'k.', ms=1, alpha=0.3)
_=axes[2].hist(samples, 100)
[-3.1011011]
Now that we have sampled two parameters separately, while holding the other constant, we can introduce Gibbs sampling. Gibbs sampling is the core of the $\texttt{Commander}$ and Cosmoglobe frameworks, and is a useful tool for mapping out multi-dimensional parameter spaces. Consider a data model with some set of parameters $\vec{\omega} = \{\omega_0,..,\omega_k\}$. We utilize Bayes Theorem, as defined earlier, to explore this parameter space. Gibbs sampling functions by sampling each of these parameters $\omega_{i}$ individually, while holding all of the parameters constant. Therefore, in each Gibbs chain there are $k$ steps, such that we draw samples in the following way
\begin{align} \omega_{0} &\leftarrow P(\omega_{0}\mid \vec{d},\,\omega_{1},\,\omega_{2},\cdots,\,\omega_{k})\\ \omega_{1} &\leftarrow P(\omega_{1}\mid \vec{d},\,\omega_{0},\,\omega_{2},\,\cdots,\,\omega_{k})\\ &\vdots\\ \omega_{k} &\leftarrow P(\omega_{k}\mid \vec{d},\,\omega_{0},\,\omega_{1},\,\cdots,\,\omega_{k-1}). \end{align}Let's try a simple example here with our two synchrotron parameters. Seeing as we have simulated the data ourselves, we already know what the correct answer should be, but in practice we don't know what the true sky signal looks like a priori. Let's say that we think that the real synchrotron amplitude $A_{\rm s} = 10.0$, and that the synchrotron spectral index $\beta_{\rm s} = -3.0$.
# Simulate data
nu_ref = 30.0
A_s_real = 10.0
beta_s_real = -3.0
synch = synch_comp(nu_ref, A_s_real, beta_s_real)
np.random.seed(2021)
d_synch = synch.model(freq) + np.random.normal(size=nbands) * sigma
# Initialize
A_s = [5.0]
beta_s = [-2.5]
chisq_s = [-2*eval_loglikelihood(d_synch, A_s[-1]*spectrum(freq)**beta_s[-1], sigma)]
ngibbs = 250
niter = 3
A_s_grid = np.linspace(8, 12, 1000)
beta_s_grid = np.linspace(-4, -2, 1000)
for i in tqdm(range(ngibbs)):
A_s.append(gaussian_sample(d_synch,sigma,synch.spectrum,freq,niter)[1][-1])
beta_s.append(inversion_sample(d_synch,synch_comp,sigma,freq,niter,nu_ref,A_s[-1],beta_s_grid)[-1])
chisq_s.append(-2*eval_loglikelihood(d_synch, A_s[-1]*spectrum(freq)**beta_s[-1], sigma))
100%|██████████| 250/250 [00:07<00:00, 32.57it/s]
fig, ax = plt.subplots()
ax.scatter(A_s, beta_s, s=2, c='k')
ax.set_xlim([9.6, 11])
ax.set_ylim([-4.2, -2.1])
A_s_mesh, beta_s_mesh = np.meshgrid(A_s_grid, beta_s_grid)
chisq_s_mesh = np.array([ -2*eval_loglikelihood(d_synch, A_s*spectrum(freq)**beta_s, sigma)
for A_s,beta_s in zip(A_s_mesh.flatten(), beta_s_mesh.flatten()) ]).reshape(A_s_mesh.shape) # NOTE speed-up needed
ax.contour(A_s_mesh, beta_s_mesh, chisq_s_mesh - chisq_s_mesh.min(), levels=np.array([2.3, 6.17, 11.8]), linewidths=2)
<matplotlib.contour.QuadContourSet at 0x7f843cfcbda0>
The lines are $1\sigma$, $2\sigma$, $3\sigma$ constrains.
红色的线是用格点算出来的,跟黑色的样本点符合得还不错,但是一般情况下需要用样本点去估计 $1\sigma$,$2\sigma$ 以及 $3\sigma$ 所在的位置。这是一个比较错误的示范,因为作业做到这里的时候我还不会画那种误差圈圈的图。
Now we're really getting some practical stuff done! Hopefully you've gained some understanding of the theory behind sampling for parameters, and how it is done in practice as well. Now that we have the base infrastructure built, we want to expand to more and more realistic examples. Let's introduce some more foregrounds which you can create classes for and simulate a more complex data set. Let's add two more components, thermal dust emission, and the CMB. These are two good components to introduce as they are prevalent in total intensity, as well as in polarization. Recall that we are working in Rayleigh-Jeans units, and will introduce the parametric models for these components in such units. And as a consistent reminder, details on microwave sky foregrounds can be found in the online lectures.
For the CMB radiation we have $$s_{\mathrm{CMB},\,\nu} = a_{\mathrm{CMB},\,\nu} \frac{x^2e^{x}}{\left(e^{x}-1\right)^2}$$ where $a_{\mathrm{CMB}}$ is the strength (amplitude) of the signal and the rest is a converstion factor between Rayleigh Jeans units and CMB units where $x=h\nu/kT_0$.
Thermal dust dominates the higher frequencies. Dust is modelled with a modified blackbody function, $$s_{\mathrm{d},\,\nu} = a_{\mathrm{d},\,\nu}\,\left(\frac{\nu}{\nu_{0,\mathrm{d}}}\right)^{\beta_{\mathrm{d}}+1}\frac{e^{h\nu_{\mathrm{0,\mathrm{d}}}/kT_{\mathrm{d}}}-1}{e^{h\nu /kT_{\mathrm{d}}}-1}$$ where $a_{\mathrm{d}}$ is the strength of the signal at the reference frequency $\nu_{0,\mathrm{d}}$, $\beta_{\mathrm{d}}$ is the spectral index and $T_{\mathrm{d}}$ is the dust temperature.
The sky model $s_{\mathrm{RJ},\,\nu}$ that we will use is thus given by this model $$s_{\mathrm{RJ},\,\nu} =s_{\mathrm{CMB},\,\nu} + s_{\mathrm{s},\,\nu}+ s_{\mathrm{d},\,\nu}$$containing a total of seven parameters to be sampled. Three amplitudes ($a_{\mathrm{CMB}}$, $a_{\mathrm{s}}$ and $a_{\mathrm{d}}$), two spectral indexes ($\beta_{\mathrm{s}}$ and $\beta_{\mathrm{d}}$) and thermal dust ($T_{\mathrm{d}}$). Given the total degrees of freedom, we need at least 7 frequency bands to constrain this model.
# Constants
kb = 1.38e-23 # Boltzmanns constant
h = 6.626e-34 # Plancks constant
Tcmb = 2.7255 # CMB temperature
def Kcmb_to_Krj(nu):
x = h * nu * 1.e9 / (kb * Tcmb)
return x**2 * np.exp(x) / ((np.exp(x) - 1.)**2)
def mbb(nu, beta, T):
# Modified Black Body Intensity is proportional to this mbb function
# nu, T in GHz, Kcmb
x = h * nu * 1.e9 / (kb * T)
return nu**(beta + 1) / (np.exp(x) - 1)
# General component class
class comp:
def __init__(self, *args):
self.nu_ref = args[0]
self.par = np.array(args[1:])
def spectrum(self, nu):
pass
def model(self, nu):
return self.par[0] * self.spectrum(nu)
# CMB component
class cmb_comp(comp):
def __init__(self, *args):
super(__class__, self).__init__(*args)
def spectrum(self, nu):
return Kcmb_to_Krj(nu)
# Synchrotron component
class synch_comp(comp):
def __init__(self, *args):
super(__class__, self).__init__(*args)
def spectrum(self, nu):
return (nu / self.nu_ref) ** self.par[1]
# Dust component
class dust_comp(comp):
def __init__(self, *args):
super(__class__, self).__init__(*args)
def spectrum(self, nu):
return mbb(nu, self.par[1], self.par[2]) / mbb(self.nu_ref, self.par[1], self.par[2])
# Sum up components
class signal():
def __init__(self):
self.comps = []
def add_comp(self, comp): # here comp is an instance
self.comps.append(comp)
def model(self, nu):
ret = np.zeros_like(nu)
for comp in self.comps:
ret += comp.model(nu)
return ret
# Params
A_cmb = 2.
A_s = 12.
A_d = 8.
beta_s = -3.1
beta_d = 1.6
T_d = 19.6
nu_ref_d = 353.
nu_ref_s = 30.
# Instantiate
cmb = cmb_comp(None, A_cmb)
synch = synch_comp(nu_ref_s, A_s, beta_s)
dust = dust_comp(nu_ref_d, A_d, beta_d, T_d)
# Planck Data simulation
s = signal()
s.add_comp(cmb)
s.add_comp(synch)
s.add_comp(dust)
d = s.model(freq) + np.random.normal(size=nbands) * sigma
# SED plot
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(freq, synch.model(freq), color='green', label='Synchrotron')
ax.plot(freq, dust.model(freq), color='red', label='Thermal Dust')
ax.plot(freq, cmb.model(freq), color='orange', label='CMB')
freqs = np.linspace(freq.min(), freq.max(), 1000)
ax.plot(freqs, s.model(freqs), 'k:')
ax.errorbar(freq, d, yerr=sigma, fmt='.', color='k', label='Data')
for i in range(nbands):
ax.axvline(freq[i],color='grey',alpha=0.5)
plt.yscale('log')
plt.xscale('log')
ax.set_ylabel(r'$\mu K_{\rm RJ}$',size=15)
ax.set_xlabel(r'GHz',size=15)
ax.set_ylim([1e-2,1e2])
plt.legend()
plt.show()
Now that we have simulated a more complex data model, something that better represents the real microwave sky, we can Gibbs sample the components!
当我们在采样不同参数的时候,应当根据参数的特点采用不同采样方法。在采样 $A$ 的时候应当采用高斯采样的方法,而采样 $\beta$ 和 $T_\mathrm{d}$ 的时候应当采用更为普遍的 Inversion 采样方法。
def gibbs_sample(data, sigma, freq, cmb, synch, signal, dust, synch_class, dust_class):
'''
data : data
sigma : sigma
freq : freq
cmb : instance of cmb_comp class
synch : instance of synch_comp class
dust : instance of dust_comp class
'''
# Synchrotron sampling
# --------------------
d_to_sample = data - cmb.model(freq) - dust.model(freq)
# A_s sampling
synch.par[0] = gaussian_sample(d_to_sample, sigma, synch.spectrum, freq, 1)[-1]
# beta_s sampling
xs = np.linspace(-4.0, -1.0, 1000)
synch.par[1] = inversion_sample(d_to_sample, synch_class, sigma, freq, 1, nu_ref_s, synch.par[0], xs)
# Save
A_s, beta_s = synch.par
# Dust sampling
# -------------
d_to_sample = data - cmb.model(freq) - synch.model(freq)
# A_s sampling
dust.par[0] = gaussian_sample(d, sigma, dust.spectrum, freq, 1)[-1]
# beta_d sampling
xs = np.linspace(1.4, 1.8, 1000)
dust.par[1] = inversion_sample(d_to_sample, dust_class, sigma, freq, 1, nu_ref_d, dust.par[0], xs, dust.par[2])
# T_d sampling
xs = np.linspace(15., 25., 1000)
dust.par[2] = inversion_sample(d_to_sample, dust_class, sigma, freq, 1, nu_ref_d, dust.par[0], dust.par[1], xs)
# Save
A_d, beta_d, T_d = dust.par
# CMB sampling
# ------------
d_to_sample = data - synch.model(freq) - dust.model(freq)
# A_cmb sampling
cmb.par[0] = gaussian_sample(d_to_sample, sigma, cmb.spectrum, freq, 1)[-1]
# Save
A_cmb = cmb.par[0]
# Residual
# --------
model = cmb.model(freq) + synch.model(freq) + dust.model(freq)
res = sum(data - model)
# Chi square
# ----------
chisq = sum((data - model)**2 * sigma**-2) / len(freq)
return A_cmb, A_s, A_d, beta_s, beta_d, T_d, res, chisq
A_cmb_ = 1.9
A_s_ = 12.0
A_d_ = 9.0
beta_s_ = -3.0
beta_d_ = 1.6
T_d_ = 19.6
cmb = cmb_comp(None, A_cmb_)
synch = synch_comp(nu_ref_s, A_s_, beta_s_)
dust = dust_comp(nu_ref_d, A_d_, beta_d_, T_d_)
model = cmb.model(freq) + synch.model(freq) + dust.model(freq)
res = sum(d - model)
chisq = sum((d - model)**2 * sigma**-2) / len(freq)
ngibbs = 10000
# A_cmb, A_s, A_d, beta_s, beta_d, T_d
samples = np.zeros([ngibbs + 1, 8])
samples[0] = np.asarray([A_cmb_, A_s_, A_d_, beta_s_, beta_d_, T_d_, res, chisq])
for _ in tqdm(range(ngibbs)):
samples[_+1] = gibbs_sample(d, sigma, freq, cmb, synch, signal, dust, synch_comp, dust_comp)
100%|██████████| 10000/10000 [18:12<00:00, 9.15it/s]
fig, axes = plt.subplots(nrows=6, figsize=(6, 18))
truths = [A_cmb, A_s, A_d, beta_s, beta_d, T_d]
labels = ['A_cmb', 'A_s', 'A_d', 'beta_s', 'beta_d', 'T_d']
for ax, sample, input_par, label in zip(axes, samples.T[:6], truths, labels):
ax.plot(sample, np.arange(len(sample)), 'k.', ms=1, alpha=0.3)
ax.axvline(input_par, c='lime', label='Input')
ax.set_xlabel(label)
ax.legend(loc='lower right')
from corner import corner
fig = corner(samples[100:, :6], labels=labels, truths=truths)
可以看到 $A_\mathrm{cmb}$ 和 $A_s$ 的强简并性。