4. Derivation of an unfalsifiable energy distribution

Published: J. Stat. Mech. (2024) 093209

Section  presents a brief summary of the theoretical setup that is used in Sec. 4.2 and Sec. 5 to derive the Maxwell-Boltzmann distribution and the Bose-Einstein distribution, respectively.

It is assumed that it is possible for a measurement to determine which element of a cover of a DOF's phase space, comprising disjoint subsets of area ${h_?}$, contains the DOF's microstate. Although this assumption is not compatible with the uncertainty principle discussed in Sec. 1.1.1, its use in derivations as a working assumption was justified in Sec. 3.2.

4.1 Theoretical setup

Consider an arbitrary continuously-evolving deterministic system whose microstate can be specified by $\mathbf{\Gamma}\equiv(\mathbf{P},\mathbf{Q})$, where $\mathbf{Q}\equiv (Q_{1},Q_{2}\cdots)$ is some set of generalized coordinates and $\mathbf{P}\equiv (P_{1},P_{2}\cdots)$, where $P_\eta$ is the momentum conjugate to $Q_\eta$. In this coordinate system, let $\mathcal{H}(\mathbf{\Gamma})$ denote the system's Hamiltonian, and, as before, $\mathbb{G}\equiv\mathbb{Q}\times\mathbb{P}\ni\mathbf{\Gamma}$, $\mathbb{Q}\ni\mathbf{Q}$, and $\mathbb{P}\ni\mathbf{P}$ denote the system's phase space, configuration space, and momentum space, respectively.

Let us begin by partitioning $\mathbb{G}$ into nonoverlapping subsets of equal measure (phase space `volume') as follows: We choose a countable set $\mathcal{G}$ of evenly-spaced points (microstates) in $\mathbb{G}$ and define a neighbourhood $\mathcal{N}_\mathbf{\Gamma}\subset\mathbb{G}$ of each point $\mathbf{\Gamma}\in\mathcal{G}$ such that $\mathbb{G} = \bigcup_{\mathbf{\Gamma}\in\mathcal{G}}\mathcal{N}_\mathbf{\Gamma}$, and such that, if $\mathbf{\Gamma},\mathbf{\Gamma}'\in\mathcal{G}$ are any two different points ($\mathbf{\Gamma}\neq\mathbf{\Gamma}'$), then $\abs{\mathcal{N}_\mathbf{\Gamma}\cap\mathcal{N}_{\mathbf{\Gamma}'}}=0$ and $\abs{\mathcal{N}_\mathbf{\Gamma}}=\abs{\mathcal{N}_{\mathbf{\Gamma}'}}$, where $\abs{\mathcal{N}_\mathbf{\Gamma}}$ denotes the measure of $\mathcal{N}_\mathbf{\Gamma}$ in $\mathbb{G}$. For simplicity, let us assume that if $\mathbf{\Gamma}_t\in\mathcal{N}_\mathbf{\Gamma}$, then $\mathbf{\Gamma}_t$ is closer to $\mathbf{\Gamma}$ than to any other element of $\mathcal{G}$. Therefore the interior of $\mathcal{N}_\mathbf{\Gamma}$ is the set of all points in $\mathbb{G}$ that are closer to $\mathbf{\Gamma}$ than to any other element of $\mathcal{G}$.

Now let $p_\mathbf{\Gamma}$, where $\mathbf{\Gamma}\in\mathcal{G}$, denote the probability, $\Pr(\mathbf{\Gamma}_t\in\mathcal{N}_\mathbf{\Gamma})$, that $\mathbf{\Gamma}_t$ is within $\mathcal{N}_\mathbf{\Gamma}$. The probability distribution for the point $\mathbf{\Gamma}$ that identifies the region $\mathcal{N}_\mathbf{\Gamma}$ containing $\mathbf{\Gamma}_t$ is $p:\mathcal{G}\to [0,1]; \mathbf{\Gamma}\mapsto p_\mathbf{\Gamma}$.

Now let us suppose, momentarily, that $\mathbf{\Gamma}_t$ is known to be in region $\mathcal{N}_\mathbf{\Gamma}$, and that $\mathcal{N}_\mathbf{\Gamma}$ is partitioned into $W_\mathbf{\Gamma}$ nonoverlapping subsets of equal measure $v\equiv\abs{\mathcal{N}_\mathbf{\Gamma}}/W_\mathbf{\Gamma}$. Then, as Shannon demonstrated [Shannon, 1948], we can quantify the amount of information that must be revealed to determine which of these subsets $\mathbf{\Gamma}_t$ is in by $\log W_\mathbf{\Gamma} = \log \abs{\mathcal{N}_\mathbf{\Gamma}}-\log v$. In the limit $W_\mathbf{\Gamma}\to \infty,\; v\to 0$, the quantity of information required becomes infinite. However, as discussed in Sec. 1.1.1, we are assuming that $v$ has a lower bound, which means that $W_\mathbf{\Gamma}$ has an upper bound.

Without losing generality, let us assume that these bounds are $\abs{\mathcal{N}_\mathbf{\Gamma}}$ and $1$, respectively. In other words, let us assume that when we originally partitioned $\mathbb{G}$, we chose the set $\mathcal{G}$ such that the following is true:

Given any microstate $\mathbf{\Gamma}\in\mathcal{G}$, and any microstate $\mathbf{\Gamma}'\in\mathbb{G}$, which is closer to $\mathbf{\Gamma}$ than to any other element of $\mathcal{G}$, it is theoretically possible to distinguish between $\mathbf{\Gamma}'$ and any element of $\mathcal{G}\setminus\{\mathbf{\Gamma}\}$ by empirical means; and it is impossible to distinguish between $\mathbf{\Gamma}'$ and $\mathbf{\Gamma}$ by empirical means.

I will refer to $\mathcal{G}$ as a maximal set of mutually-distinguishable microstates; I will refer to a sampling of $\mathbb{G}$ with such a set as a maximal sampling; and I will use $\mathbf{h}\equiv \abs{\mathcal{N}_\mathbf{\Gamma}}$ to denote the measure of each neighbourhood $\mathcal{N}_\mathbf{\Gamma}$ in a maximal sampling of phase space.

4.2 Maxwell-Boltzmann statistics

This section draws heavily from the works of Jaynes [Jaynes, 1957a] and Shannon [Shannon, 1948].

Let us add the assumption that we know that the expectation value of the system's energy is $\mathscr{E}$. For example, the system might be a classical crystal whose average energy is determined by a heat bath to which it is coupled.

The system's state of thermal equilibrium can be defined as the probability distribution $p$ that maximises the Shannon entropy [Shannon, 1948], subject to the constraint that the Hamiltonian's expectation value,

\[\begin{aligned}\expval{\mathcal{H}}[p]\equiv \sum_{\mathbf{\Gamma}\in\mathcal{G}} p_\mathbf{\Gamma}\mathcal{H}(\mathbf{\Gamma}),\end{aligned}\]
is equal to $\mathscr{E}$, and subject to the normalization constraint $\sum_{\mathbf{\Gamma}\in\mathcal{G}} p_\mathbf{\Gamma} = 1$. The Shannon entropy is
\[\begin{aligned}\expval{S}[p] &\equiv \sum_{\mathbf{\Gamma}\in\mathcal{G}} p_\mathbf{\Gamma} \mathfrak{I}(p_\mathbf{\Gamma}), \end{aligned}\]
where $ \mathfrak{I}(p_\mathbf{\Gamma}) \equiv-\log p_\mathbf{\Gamma}$ is the Shannon information [Shannon, 1948] of $p$ at $\mathbf{\Gamma}$. From now on it will be implicit that $\sum_\mathbf{\Gamma}$ means $\sum_{\mathbf{\Gamma}\in\mathcal{G}}$.

The Shannon information, $\mathfrak{I}(p_\mathbf{\Gamma})$, quantifies how much would be learned, meaning by how much would the uncertainty in the location of $\mathbf{\Gamma}_t$ reduce, if it was discovered that $\mathbf{\Gamma}_t\in\mathcal{N}_\mathbf{\Gamma}$. The functions $k\mathfrak{I}(p_\mathbf{\Gamma})$, for any $k\in\mathbb{R}^+$, are the only functions that satisfy the following three conditions: (i) they would vanish if it was known that $\mathbf{\Gamma}_t$ was in $\mathcal{N}_\mathbf{\Gamma}$ prior to `discovering' it there, i.e., if $p_\mathbf{\Gamma}=1$; (ii) they increase as the discovery that $\mathbf{\Gamma}_t\in\mathcal{N}_\mathbf{\Gamma}$ becomes more surprising, i.e., as $p_\mathbf{\Gamma}$ decreases; and (iii) they are additive. Additivity means that if, for example, it was discovered that $\mathbf{\Gamma}_t$ was in $\mathcal{N}_\mathbf{\Gamma}$ and that the microstate $\mathbf{\Gamma}'_t$ of another independently-prepared and statistically independent system was in $\mathcal{N}_{\mathbf{\Gamma}'}$, the quantity of information about the locations of $\mathbf{\Gamma}_t$ and $\mathbf{\Gamma}'_t$ that was unknown would decrease by $\mathfrak{I}(p_\mathbf{\Gamma})+\mathfrak{I}(p_{\mathbf{\Gamma}'})$.

Any probability distribution, $p$, is a state of knowledge that an observer could be in. The Shannon information, $\mathfrak{I}(p_\mathbf{\Gamma})$, of $p_\mathbf{\Gamma}$, quantifies the information that would be revealed by the discovery that $\mathbf{\Gamma}_t\in \mathcal{N}_\mathbf{\Gamma}$, and the Shannon entropy is the expectation value of the quantity of information that would be revealed by discovering which point $\mathbf{\Gamma}$ in the maximal set of mutually-distinguishable microstates $\mathcal{G}$ the true microstate $\mathbf{\Gamma}_t$ is closest to. Therefore $\expval{S}[p]$ quantifies the incompleteness of distribution $p$, as a state of knowledge, when the identity of the element of $\mathcal{G}$ that is closest to $\mathbf{\Gamma}_t$ is regarded as complete knowledge.

Whether or not $\expval{S}[p]$ is satisfactory as a quantification of uncertainty in all contexts is probably irrelevant in the present context, because we will be maximising its value subject to the stated contraints. Therefore what is relevant is that it increases monotonically as the location of $\mathbf{\Gamma}_t$ in $\mathbb{G}$ becomes more uncertain.

We can express the stationarity of $\expval{S}[p]$ subject to constraints $\expval{\mathcal{H}}[p]=\mathscr{E}$ and $\sum_{\mathbf{\Gamma}} p_\mathbf{\Gamma}=1$ as

\[\begin{aligned}\delta\left\{ \expval{S}[p] - \beta\left(\expval{\mathcal{H}}[p]-\mathscr{E}\right) -\beta\lambda\left(\sum_{\mathbf{\Gamma}} p_\mathbf{\Gamma} -1\right)\right\} = 0,\end{aligned}\]
where $\beta$ and $\beta\lambda$ are Lagrange multipliers. If we divide across by $-\beta$ and define the constant $T\equiv (k_B\beta)^{-1}$, where $k_B$ is the Boltzmann constant, this can be expressed as $\delta \left(\tilde{\mathcal{F}}[p]+\lambda\sum_{\mathbf{\Gamma}}p_\mathbf{\Gamma}\right) = 0$, where $\tilde{\mathcal{F}}[p]\equiv\expval{\mathcal{H}}[p]-k_B T\expval{S}[p]$. By taking a partial derivative of $\tilde{\mathcal{F}}[p]+\lambda\sum_{\mathbf{\Gamma}}p_\mathbf{\Gamma}$ with respect to $p_\mathbf{\Gamma}$ and setting it equal to zero, we find that
\[\begin{aligned}p_\mathbf{\Gamma} & = e^{-(\mathcal{H}(\mathbf{\Gamma})-\mathcal{F})/k_B T} = \mathcal{Z}^{-1} e^{-\mathcal{H}(\mathbf{\Gamma})/k_B T}, \end{aligned}\]
where $\mathcal{Z}\equiv \exp\left(-\mathcal{F}/k_B T\right)$ is known as the partition function and we refer to the quantity $\mathcal{F} = -k_B T\log \mathcal{Z}$, which is the value taken by $\tilde{\mathcal{F}}[p]$ when it is stationary with respect to normalization-preserving variations of $p$, as the free energy.

Equation  is the familiar Maxwell-Boltzmann distribution and $T$ is the temperature. The derivation of Eq. is a derivation, based on the premises that precede it and those stated within it, of the only empirically-unfalsifiable probability distribution for the true microstate. It is unfalsifiable because it explicitly rejects bias by maximising uncertainty subject to one physical constraint, which is the only thing that we know about the state of the system; namely, that a heat bath ensures that its average energy is $\mathscr{E}$.

As discussed in Sec. 2, the absence of bias guarantees us that if we had enough independent replicas of the physical system, and if the only thing we knew about each one was that its average energy was $\mathscr{E}$, and if we could determine by measurement which element $\mathcal{N}_\mathbf{\Gamma}$ of the phase space partition the microstate of each one was in, the fraction of those whose microstate was in $\mathcal{N}_\mathbf{\Gamma}$ would be $p_\mathbf{\Gamma} = e^{-\beta\mathcal{H}(\mathbf{\Gamma})}/\mathcal{Z}$.

Now let us make the simplifying assumption under which the Bose-Einstein distribution is valid within quantum mechanics: The total energy is a sum of the energies of independent DOFs. Within quantum mechanics these DOFs are often interpreted as particles.

With the Hamiltonian of DOF $\eta$ denoted by $\mathcal{H}_\eta(\Gamma_\eta)$, where $\Gamma_\eta\equiv(Q_\eta,P_\eta)$, we can express the Hamiltonian of the set of all DOFs as

\[\begin{aligned}\mathcal{H}(\mathbf{\Gamma}) = \sum_\eta \mathcal{H}_\eta(\Gamma_\eta),\end{aligned}\]
and we can express the partition function as
\[\begin{aligned}\mathcal{Z}&\equiv \sum_{\mathbf{\Gamma}} e^{-\beta\mathcal{H}(\mathbf{\Gamma})} = \sum_{\mathbf{\Gamma}} \prod_{\eta} e^{-\beta\mathcal{H}_{\eta}(\Gamma_\eta)} \end{aligned}\]
where the product $\prod_\eta$ is over all DOFs.

Now let us choose the maximal set of mutually-distinguishable microstates, $\mathcal{G}$, to be a lattice, which is the direct product $\prod^\times_\eta \mathcal{G}_\eta$, where $\mathcal{G}_{\eta}$ is both a two dimensional lattice and a maximal set of mutually-distinguishable points in the phase space $\mathbb{G}_{\eta}$ of DOF $\eta$. The area of the non-overlapping neighbourhoods $\mathcal{N}_{\Gamma_\eta}$ of $\Gamma_\eta$ whose union is $\mathbb{G}_\eta$ is ${h_?}\equiv \abs{\mathcal{N}_{\Gamma_\eta}}=\Delta Q_{\eta}\Delta P_{\eta}=\Delta_{\mathrm{Q}}\Delta_{\mathrm{P}}$, where $\frac{1}{2}\Delta P_\eta$ is the smallest difference in momentum $P_\eta$ between mutually-distinguishable microstates of $\eta$ with the same coordinate; and $\frac{1}{2}\Delta Q_\eta$ is the smallest difference in coordinate $Q_\eta$ between mutually-distinguishable microstates with the same momentum.

These choices and definitions allow us to swap the order of the sum and the product in Eq., thereby expressing it as $\mathcal{Z}=\prod_{\eta} \mathcal{Z}_{\eta}$, where

\[\begin{aligned}\mathcal{Z}_{\eta}\equiv \sum_{\Gamma_{\eta}} e^{-\beta\mathcal{H}_{\eta}(\Gamma_{\eta})}, \end{aligned}\]
and where $\sum_{\Gamma_\eta}$ denotes $\sum_{\Gamma_\eta\in\mathcal{G}_\eta}$. If we know the partition function $\mathcal{Z}_{\eta}$ of each DOF $\eta$, we can calculate the partition function $\mathcal{Z}$ of the system as a whole.

In Sec. 5 we will explore other ways to calculate $\mathcal{Z}$ by transforming away from $(\mathbf{P},\mathbf{Q})$ and $(P_\eta,Q_\eta)$ to different sets of variables. To avoid a proliferation of new symbols, I will recycle the symbols $\mathbb{G}$, $\mathbb{G}_{\eta}$, $\mathcal{H}$, $\mathcal{H}_{\eta}$, $\mathbf{\Gamma}$, $\Gamma_{\eta}$, $\mathcal{N}_\mathbf{\Gamma}$, $\mathcal{G}$, $\mathcal{G}_{\eta}$, $\mathbf{h}$, $p_\mathbf{\Gamma}$, and $p$. They will have the same meanings in the new coordinates as they do for coordinates $(\mathbf{P},\mathbf{Q})$.

Comments