3. Probability domain quantization
Published: J. Stat. Mech. (2024) 093209
The purpose of this section is to explain why one consequence of the uncertainty principle is that the most informative statistically unfalsifiable probability distribution for the location of a physical system's microstate in its phase space $\mathbb{G}$ is not a probability density function whose domain is $\mathbb{G}$, but a probability mass function whose domain is a partition of $\mathbb{G}$. In other words, the uncertainty principle quantizes the domain of any empirically testable probability distribution for the location of a classical dynamical system's microstate.
The derivations of the Maxwell-Boltzmann distribution in Sec. 4 and the Bose-Einstein distribution in Sec. 5 are reasonably self-contained, and reading this section is unnecessary to understand the gist of these derivations. However, skipping this section makes the derivations' logical foundations appear simpler than they are.
It also makes it appear that the derivations are built on an unjustified assumption: Namely, that an observer is capable of determining which point $\mathbf{\Gamma}$ on a lattice in phase space the microstate $\mathbf{\Gamma}_t$ of a physical system is closest to. This section will make clear that such an assumption is not made. It is important that it is not made because, as I now explain, the uncertainty principle, $\Delta_{\mathrm{Q}}\Delta_{\mathrm{P}}>{h_?}$, implies that an observer would not be capable of such a determination. Therefore that assumption would be a false premise.
If $\Gamma$ and $\Gamma+\Delta\Gamma$ are adjacent points of the lattice in the phase space $\mathbb{G}$ of a single DOF, and if $\mathcal{N}_{\Gamma}$ and $\mathcal{N}_{\Gamma+\Delta\Gamma}$ are the sets of points in $\mathbb{G}$ that are closer to $\Gamma$ and $\Gamma+\Delta\Gamma$, respectively, than to any other points of the lattice, then $\mathcal{N}_{\Gamma}$ and $\mathcal{N}_{\Gamma+\Delta\Gamma}$ share a border. The limit ${h_?}$ on microstate measurement precision implies that it is impossible for an observer to determine which side of their shared border the DOF's microstate $\Gamma_t$ is on. Furthermore, as discussed in Sec. 1.1, the result of a measurement of $\Gamma_t$ is the identification of an element $\Gamma$ of $\mathbb{G}$ and a ratio $\mathfrak{r}=\Delta_{\mathrm{Q}}/\Delta_{\mathrm{P}}\in\mathbb{R}^+$, such that $\Gamma_t\in\mathfrak{R}(\Gamma,\mathfrak{r})$. The value of $\Gamma$ is not restricted to a point on a lattice, and the probability is zero that, by chance, it turns out to be one of the points of a particular lattice, because the measure of a lattice in $\mathbb{G}$ is zero. Therefore it is impossible for an observer to determine which point on a specific lattice $\Gamma_t$ is closest to.
The purpose of Sec. 3.1 is to discuss probability spaces that are capable of satisfying Kolmogorov's probability axioms [Kolmogorov, 1960][Borovkov, 2013][Varadhan, 2001][Jaynes, 2003]; and, in particular, some difficulties that arise when deriving a probability distribution for the location of $\Gamma_t$ in $\mathbb{G}$. It is the uncertainty principle that causes the difficulties, and which forces us to confront certain subtleties in the definitions of probability spaces.
In order to resolve the difficulties, while ensuring empirical testability of the probability distributions that will be derived in Secs. and , a detail will be added in Sec. 3.2 to the infinite set of measurements ($M$ measurements in the limit $M\to\infty$) performed on independently prepared physical systems that we imagined in Sec. 2.
This detail is a filtration of the $M$ results of those measurements: We will imagine defining an infinite set $\{p^{(\mathcal{C})}\}$ of different probability mass functions, $p^{(\mathcal{C})}$, each of which is consistent with a different subset of the $M\to\infty$ measurements, and each of whose domains is a different partition of $\mathbb{G}$. The introduction of this detail will clarify the true meaning of the apparently-false premise on which the derivations in Secs. and are built.
A brief clarification is that we can imagine that each function $p^{(\mathcal{C})}$ is assigned to a different agent, or `statistician', and each probability mass function $p$ that is derived in later sections is the function $p^{(\mathcal{C})}$ that has been assigned to one of those statisticians. This construction allows us to imagine calculating each distribution $p^{(\mathcal{C})}$ in two ways: The first is from the statistics gathered by the statistician to whom $p^{(\mathcal{C})}$ has been assigned (`Statistician $\mathcal{C}$'). The second is by using Jaynes' approach, as discussed in Sec. 2, and as will be used in Sec. 4, to theoretically derive the distribution that Statistician $\mathcal{C}$ would be unable to falsify.
3.1 Probability spaces
To develop a probabilistic description of the location of a microstate in its phase space, we must construct one or more probability spaces, each of which satisfy Kolmogorov's axioms of probability [Jaynes, 2003][Kolmogorov, 1960][Borovkov, 2013][Varadhan, 2001]. For simplicity, let us consider a physical system with a single DOF, whose microstate is $\Gamma_t\in\mathbb{G}$.
A probability space $(S,\Sigma,P)$ consists of a sample space $S$, a $\sigma$-algebra $\Sigma$, and a probability measure, $P:\Sigma\to[0,1]$. The sample space $S$ is the set of all mutually-exclusive outcomes or results of a trial or measurement, and the $\sigma$-algebra $\Sigma$ is the set of all events to which $P$ assigns probabilities.
$\Sigma$ is a cover of $S$, meaning that it is a collection of subsets whose union is the whole set. However, it is not necessary for the elements of $S$, which are the mutually exclusive outcomes, to be elements of $\Sigma$. The only properties of $\Sigma$ that are required for the probability space to satisfy the axioms of probability are that it is a set of subsets of $S$, which includes $S$ itself, and which is closed under countable unions ($A_1,A_2,\cdots\in\Sigma\implies \bigcup_{i=1}^\infty A_i\in\Sigma$), closed under countable intersections ($\bigcap_{i=1}^\infty A_i\in\Sigma$), and closed under complements ($A\in\Sigma\implies S\setminus A\in\Sigma$).
In Sec. 3.1.1, we will consider the construction of a probability space for the outcome of a measurement of a microstate, in order to show that constructing it is straightforward.
In Sec. 3.1.2 we will consider the construction of a probability space for the location of $\Gamma_t$ in $\mathbb{G}$ in order to show that the uncertainty principle makes an unbiased construction of a single probability space impossible. To avoid introducing bias, it is necessary to introduce an infinite number of probability spaces.
In Sec. 3.2, a logical construction will be outlined, which resolves some conceptual difficulties that arise when describing the location of $\Gamma_t$ in $\mathbb{G}$ with an infinite number of probability distributions. This lays the logical foundations for the derivations of the Maxwell-Boltzmann and Bose-Einstein distributions presented in Sec. 4.2 and , respectively.
3.1.1 Probability space for the outcome of a measurement of $\Gamma_t$
The assumption made in Sec. 1.1 was that the outcome of an accurate and maximally precise measurement of the location of $\Gamma_t$ in $\mathbb{G}$ would be the identification of an element $(\Gamma,\mathfrak{r})$ of
Let $\mathfrak{M}:S\to\Omega$ denote a random variable that maps the outcome $o$ of a measurement of $\Gamma_t$ to an element $\mathfrak{M}(o)$ of $\Omega$; and let $\text{`}\Gamma_t\in\mathfrak{R}(\Gamma,\mathfrak{r})\text{'}$ represent the outcome of the measurement that $\mathfrak{M}$ would map to the point $(\Gamma,\mathfrak{r})\in\Omega$. Quotes are placed around $\Gamma_t\in\mathfrak{R}(\Gamma,\mathfrak{r})$ to indicate that $\text{`}\Gamma_t\in\mathfrak{R}(\Gamma,\mathfrak{r})\text{'}$ represents the measurement outcome, which is a revelation and a piece of information, rather than the location of $\Gamma_t$ that the information revealed implies.
The sample space for the measurement outcome is
The elements of $\text{`}\Omega\text{'}$ are mutually exclusive because, assuming that $\Gamma_1\neq\Gamma_2$ and/or $\mathfrak{r}_1\neq\mathfrak{r}_2$, the outcome of the measurement can be $\text{`}\Gamma_t\in\mathfrak{R}(\Gamma_1,\mathfrak{r}_1)\text{'}$ or $\text{`}\Gamma_t\in\mathfrak{R}(\Gamma_2,\mathfrak{r}_2)\text{'}$, but not both. It cannot be both because the result of each measurement is the revelation of a single imprecisely specified location. Since $\mathfrak{R}(\Gamma_1,\mathfrak{r}_1)\neq\mathfrak{R}(\Gamma_2,\mathfrak{r}_2)$, $\text{`}\Gamma_t\in\mathfrak{R}(\Gamma_1,\mathfrak{r}_1)\text{'}$ and $\text{`}\Gamma_t\in\mathfrak{R}(\Gamma_2,\mathfrak{r}_2)\text{'}$ are two different revelations, and both cannot occur.
The mutual exclusivity of elements of $\text{`}\Omega\text{'}$ makes it straightforward to define a probability space $(\text{`}\Omega\text{'},\wp(\text{`}\Omega\text{'}),P_o)$ for the measurement outcome, whose $\sigma$-algebra is the power set $\wp(\text{`}\Omega\text{'})$ of $\text{`}\Omega\text{'}$. A probability density function $\rho_o:\Omega\to\mathbb{R}^+$ can also be defined such that, for any $A\subseteq\Omega$,
Therefore, were it not for the uncertainty principle, it would not be necessary to draw attention to the distinction between the outcome $o\in \text{`}\Omega\text{'}$ of a measurement, and the element $\mathfrak{M}(o)$ of the measureable space $\Omega$ to which it is mapped by $\mathfrak{M}$. The uncertainty principle makes discussing this distinction important, because it is the location of $\Gamma_t$ that we wish to model statistically, and we cannot directly use the range $\Omega$ of $\mathfrak{M}$ as the domain of a probability distribution for its location. We will explore the reasons for this next.
3.1.2 Probability space for the location of $\Gamma_t$ in $\mathbb{G}$
To understand why it is not straightforward to define a probability space for the location of $\Gamma_t$, consider that, although the elements of $\text{`}\Omega\text{'}$ are mutually exclusive, the elements of the set
The fact that the elements of $\mathfrak{R}(\Omega)$ are not mutually exclusive means that it cannot be treated as a sample space for the purpose of building a probability space. However, the problem is more serious than this: $\mathfrak{R}(\Omega)$ cannot even be a subset of a probability space's $\sigma$-algebra, because probabilities cannot be assigned to intersections of elements of $\mathfrak{R}(\Omega)$, and because a $\sigma$-algebra must be closed under intersections of countable numbers of its elements.
For example, a probability cannot be assigned to the event $\Gamma_t\in\mathfrak{R}(\Gamma_1,\mathfrak{r}_1)\cap\mathfrak{R}(\Gamma_2,\mathfrak{r}_2)$, despite the fact that there is an intuitively clear sense in which $\Gamma_t\in\mathfrak{R}(\Gamma_1,\mathfrak{r}_1)\cap\mathfrak{R}(\Gamma_2,\mathfrak{r}_2)$ is a possibility. It cannot be assigned a probability because whether or not this possibility has been realised is unknowable. It is unknowable because the area of $\mathfrak{R}(\Gamma_1,\mathfrak{r}_1)\cap\mathfrak{R}(\Gamma_2,\mathfrak{r}_2)$ is less than ${h_?}$, which means that to know that $\Gamma_t\in\mathfrak{R}(\Gamma_1,\mathfrak{r}_1)\cap\mathfrak{R}(\Gamma_2,\mathfrak{r}_2)$ would imply a violation of the uncertainty principle.
Therefore, in the context of defining the $\sigma$-algebra of a probability space, the probability
One illustration of the problems that would arise if $\Gamma_t\in\mathfrak{R}(\Gamma_1,\mathfrak{r}_1)\cap\mathfrak{R}(\Gamma_2,\mathfrak{r}_2)$ was regarded as an event that could be assigned a finite probability is the fact that the probability measure that assigned the probability would be inconsistent with statistics gathered from an infinite number of measurements: The fraction of the measurements that would discover that event $\Gamma_t\in\mathfrak{R}(\Gamma_1,\mathfrak{r}_1)\cap\mathfrak{R}(\Gamma_2,\mathfrak{r}_2)$ had occurred would be zero.
Therefore a probability space whose probability measure would be consistent with an infinite number of measurements must be built from a sample space $\mathcal{C}$ that is a cover of $\mathbb{G}$ whose elements are mutually disjoint subsets of $\mathbb{G}$ of area no less than ${h_?}$. I use the term disjoint in the unconventional weaker sense that sets $A$ and $B$ are disjoint if the measure $\abs{A\cap B}$ of their intersection is zero. Therefore elements of $\mathcal{C}$ may share boundaries.
Unfortunately, there are an infinite number of covers of $\mathbb{G}$ that meet these specifications. Therefore there are an infinite number of probability spaces that could be built for the location of $\Gamma_t$ in $\mathbb{G}$, and choosing any one of them as the statistical model that describes $\Gamma_t$ would be to introduce bias. For example, in general, the expectation value,
To avoid bias, we must define, or be aware of the existence of, an infinite number of probability spaces: There is one probability space, $(\mathcal{C},\wp(\mathcal{C}),P_\mathcal{C})$, and one probability distribution, $p^{(\mathcal{C})}:\mathcal{C}\to[0,1]; c\mapsto p^{(\mathcal{C})}(c)\equiv P_\mathcal{C}(c)$, for each cover $\mathcal{C}$.
The next step is to understand how each element of the infinite set $\{p^{(\mathcal{C})}\}$ of probability distributions could, in principle, be validated or invalidated by statistics from an infinite set of measurements of $\Gamma_t$, each of whose outcomes is an element of $\text{`}\Omega\text{'}$. If it is not possible to imagine calculating a distribution from statistics, rather than deriving it theoretically, it cannot be claimed that the theoretically derived distribution is empirically unfalsifiable.
3.2 An infinitude of statisticians
Let us restrict attention to the most informative probability distributions possible. Therefore, let us disregard covers whose elements are larger than necessary, and only consider probability distributions $p^{(\mathcal{C})}$ whose domains are covers $\mathcal{C}$ whose elements all have areas of exactly ${h_?}+\delta{h_?}$. Let us also take the limit $\delta{h_?}\to 0^+$, so that $\delta{h_?}$ can be regarded as both finite and arbitrarily small. Let $\Lambda$ denote the set of all covers that meet these specifications.
Now let us assume that the results of the $M\to \infty$ measurements are distributed among an infinite number of statisticians, such that there is exactly one statistician (`Statistician $\mathcal{C}$') for each $\mathcal{C}\in \Lambda$. Then let us imagine that each measurement whose outcome is $\text{`}\Gamma_t\in\mathfrak{R}(\Gamma,\mathfrak{r})\text{'}$ is communicated to all of the statisticians whose covers contain an element of which $\mathfrak{R}(\Gamma,\mathfrak{r})$ is a subset, and is not communicated to the rest of the statisticians.
Clearly, every measurement of $\Gamma_t$ determines that $\Gamma_t\in\mathbb{G}$. Therefore, by definition of a cover, every measurement determines that $\Gamma_t$ is in some element of every statistician's cover. However, we are supposing that each statistician learns the result of each measurement of $\Gamma_t$ if and only if the measurement has determined which element of their cover contains $\Gamma_t$. This can only be the case if the set $\mathfrak{R}(\Gamma,\mathfrak{r})$ that the measurement discovers $\Gamma_t$ to be in is a subset of an element of their cover.
For each element $c$ of $\mathcal{C}$, Statistician
$\mathcal{C}$ calculates the fraction, $p^{(\mathcal{C})}(c)$,
of the total number $M_{\mathcal{C}} The next question to address is the following: If one of the $M$ measurements
was chosen at random, is $p^{(\mathcal{C})}(c)$
the probability that $c$ contains
the microstate of the sample being measured in that measurement?
In other words, is $\Pr(\Gamma_t\in c)=p^{(\mathcal{C})}(c)$? The first thing to note is that $\Pr(\Gamma_t\in c)$ is an unknowable probability,
for the same reason that, in general, $\Gamma_t\in c$ is an untestable proposition:
$\mathcal{P}_{\textrm{certain}}\equiv P_o(\{\text{`}\Gamma_t\in c\text{'}\})$ is
the fraction of the $M$ measurements
in which it is known that $\Gamma_t\in c$, and
To shed more light on the empirically-unanswerable question of whether $\Pr(\Gamma_t\in c)=p^{(\mathcal{C})}(c)$, let us consider the possibility that
In other words (and for clarity I will use the unjustifiable and unphysical assumption that it is possible to know the probabilities $\{\Pr(\Gamma_t\in g): g\subset \mathbb{G}\}$), $p^{(\mathcal{C}_{1})} (c)\neq p^{(\mathcal{C}_{2})} (c)$ would imply that there does not exist a constant $K$ such that $P_o(\text{`}\Gamma_t\in\mathfrak{R}(\Gamma,\mathfrak{r})\text{'})=K\Pr(\Gamma_t\in\mathfrak{R}(\Gamma,\mathfrak{r}))$ for all $(\Gamma,\mathfrak{r})\in\Omega$.
Not only can we not rule out the possibility that $K$ is not constant, it would be surprising if it were constant: It was mentioned in Sec. 1.1 that the measurement precisions $\Delta_{\mathrm{Q}}$ and $\Delta_{\mathrm{P}}$ depend in part on $\Gamma_t$ and in part on how the measurement of $\Gamma_t$ is performed. Therefore, if the measurement outcome is $\text{`}\Gamma_t\in\mathfrak{R}(\Gamma,\mathfrak{r})\text{'}$, the location of $\Gamma_t$ in $\mathbb{G}$ has played a part in determining $\mathfrak{r}$, in general. The fact that it has also played a part in determining $\Gamma$ is obvious.
However the dependence of $K$ on the microstate of a physical system implies that $K$ depends on the system's Hamiltonian, which implies that it depends on what the physical system is. In other words, this dependence cannot be a universal limitation on the act of measuring a DOF's microstate.
Therefore, instead of abandoning the prospect of devising a universally-applicable statistical model, such as Bose-Einstein statistics, this dependence should be treated as one of the pecularities of individual physical systems, or methods of measurement, that were discussed in Sec. 2.2.2, and whose effects on statistics must be accounted for before those statistics can be compared with the predictions of universally-applicable statistical models. When deriving a statistical model that is universally applicable, it is not only reasonable to assume that $K$ is the same for every $(\Gamma,\mathfrak{r})\in\Omega$, making that assumption appears to be unavoidable.
In other words, while bearing in mind that $p^{(\mathcal{C})}(c)\propto P_o(\text{`}\Gamma_t\in c\text{'})\propto\Pr(\Gamma_t\in c)$ is an empirically untestable proposition, let us use it as a rough approximation to a more nuanced and precise interpretation of $p^{(\mathcal{C})}(c)$. Then, so that we can derive a universally-applicable statistical model, we purposely neglect pecularities of individual physical systems, and samples of those systems, because this is the only way to derive a model that is generally applicable. This entails assuming that the fraction of the $M_{\mathcal{C}}$ measurements revealed to Statistician $\mathcal{C}$ for which $\Gamma_t\in c$ equals the fraction of all $M$ measurements for which $\Gamma_t\in c$.
3.2.1 Justification of a working assumption used in the derivations
As discussed above, $p^{(\mathcal{C})}(c)=\Pr(\Gamma_t\in c)$ is an empirically untestable proposition, but is also the only reasonable assumption to make when deriving a generally-applicable unfalsifiable probability distribution. It is equivalent to the assumption that the number of measurements whose outcome is $\text{`}\Gamma_t\in\mathfrak{R}(\Gamma,\mathfrak{r})\text{'}$ is proportional to the number of measurements in which $\Gamma_t\in\mathfrak{R}(\Gamma,\mathfrak{r})$, with the same constant of proportionality for every $\Gamma$ and every $\mathfrak{r}$.
Under the assumption that $p^{(\mathcal{C})}(c)=\Pr(\Gamma_t\in c)$, we can justify the working assumption that it is possible to determine which element of cover $\mathcal{C}$ contains $\Gamma_t$ as follows: From the perspective of Statistician $\mathcal{C}$, the revelation of a measurement outcome to them can be regarded as their `measurement' of $\Gamma_t$. Therefore, from their perspective, each of their measurements determines which element of $\mathcal{C}$ contains $\Gamma_t$.
Then we can imagine that Statistician $\mathcal{C}$ calculates $p^{(\mathcal{C})}(c)$ from the results of their `measurements', and that if we are told the macrostate $\mathcal{M}$ that defines the measurements, we can theoretically derive a probability distribution whose domain is $\mathcal{C}$, and which agrees perfectly with $p^{(\mathcal{C})}(c)$, by eliminating all bias subject to the constraint that information $\mathcal{M}$ is true.
Each of the distributions derived in Sec. 4 and Sec. 5 can be interpreted as this theoretically-derived statistically-unfalsifiable probability distribution, where the statistics that fail to falsify it are those gathered by Statistician $\mathcal{C}$.
Comments