17. Summary

https://doi.org/10.48550/arXiv.2403.13981

This work concludes now with three summaries, followed by a brief discussion of context and outlook. Each of the three summaries addresses a different one of the three objectives stated in Sec. 1.

17.1 Structure homogenization

This work lays some foundations of a theory of the relationship between a microstructure and its macrostructure. The microstructure is assumed to consist of one or more differentiable fields, ${\nu_i:\realone^n\to\realone}$, which fluctuate on the microscale $a$. The microstructure’s macrostructure is the set of all observable manifestations on the macroscale, ${L\ggg a}$, of the microstructure, ${\{\nu_i\}}$.

For the purposes of this summary, it will be assumed that the microstructure comprises a single scalar field, ${\nu:\realone^3\to\realone}$.

17.1.1 Premises of structure homogenization theory

Structure homogenization theory, in its most basic form, is founded on two premises.

The first premise is that the microstructure fluctuates on the microscale $a$ and the macroscale $L$, but there exists an intermediate mesoscale $l$, where ${a\ll l \ll L}$, on which its fluctuations are negligible. Therefore, on any mesoscopic domain, the average of the microscopic fluctuations of $\nu$ almost vanish, and non-linear contributions to its macroscopic variations are negligible.

A more mathematical statement of this premise is that the Fourier transform ${\ftsnu\equiv\fourierspace{\nu}}$ of ${\nu}$ satisfies

\begin{align*} \int_{\abs{\kvec}<k_L} \abs{\ftsnu(\kvec)}^2\ddpow{3}{\kappa} \gg &\int_{\abs{\kvec}\in(k_L,k_a)} \abs{\ftsnu(\kvec)}^2\ddpow{3}{\kappa} \\ &\ll \int_{\abs{\kvec}>k_a} \abs{\ftsnu(\kvec)}^2\ddpow{3}{\kappa}, \tag{97} \end{align*}

where ${k_a\equiv 2\pi/\amax}$ and ${k_L\equiv 2\pi/\Lmin}$; where the microscale $a$ and the macroscale $L$ are defined by ${\epsilon\sim a\iff \epsilon<\amax}$ and ${\epsilon\sim L\iff \epsilon>\Lmin}$; and where ${\amax}$ is a property of the microstructure ($\nu$) and ${\Lmin}$ is determined by both the microstructure and the scale on which the microstructure is observed.

The second premise of structure homogenization theory is that when $\nu$ is observed or measured with a probe of macroscopic dimensions, such as the pupil of an eye, what is observed is a weighted spatial average of $\nu$ on a mesoscopic domain.

17.1.2 Observable artefacts of structure homogenization

It turns out that the homogenization transformation introduces qualitative differences between a simple microstructure $\nu$ and any macrostructure defined by it. These mathematical peculiarities and observable artefacts of structure homogenization have some very important physical consequences.

They exist because perfect homogenization, meaning a total elimination of $\nu$’s microscopic fluctuations, is only possible in the limit in which the macrostructure is the average of $\nu$ over all points in its domain. This is the limit in which the counterpart of $\nu$ at the macroscale, ${\Nu:\realone^3\to\realone}$, is flat and featureless. For example, the Earth’s surface macrostructure is close to this limit in Voyager 1’s famous ‘pale blue dot’ photograph [Sagan, 1994].

Away from this limit, homogenization is imperfect, and spatial averages of $\nu$ are not uniquely defined: The average of $\nu$ on a mesoscopic domain is only defined to a precision that is finite. Therefore $\Nu$ can only be defined to a finite precision, $\precNu$.

The fact that $\Nu$ is only defined to a finite precision means that if the only way to distinguish between two points ${\bx_1, \bx_2 \in\realone^3}$ is to observe the difference in its value at those points, ${\Nu(\bx_1)-\Nu(\bx_2)}$, there is a limit, $\prectheo$, to the precisions with which positions and displacements can be measured or observed. An approximate relationship between $\precNu$, $\prectheo$, and the uncertainty $\precmom$ in the gradient of $\Nu$ is ${\prectheo\precmom\propto\precNu}$.

The uncertainty in positions and displacements implies a one to many relationship between points $\bx$ at the macroscale and points ${x}$ at the microscale. Effectively, microstructure homogenization is a compression of space which causes all microscopic distances to vanish. This spatial compression causes surfaces and interfaces, which are ill-defined at the microscale because their widths are indeterminate, to become well-defined and locally planar (zero width) at the macroscale: Since the domain of $\nu$ is $\realone^3$, they become two dimensional manifolds which carry excess fields, in general. For example, homogenizing a material’s microscopic volumetric charge density $\rho$ not only defines a macroscopic analogue $\Rho$ of $\rho$ within the material, it also turns the material’s boundary into a two dimensional manifold (the surface), which carries an areal charge density, $\bsigma$.

I have derived expressions that relate boundary excess fields to the microscopic fields whose homogenization created them. For example, I have derived an expression ${\bsigma[\rho]}$ relating the areal charge density at a surface to the microscopic volumetric charge density, $\rho$. This expression generalizes Finnis’s expression [Finnis, 1998] for the surface charge density of a crystal to amorphous microstructures.

17.2 Electrical macrostructure

I used the basic elements of the theory of structure homogenization to deduce how the microscopic fields $\rho$, $\me$, and $\phi$, that appear in Maxwell’s vacuum theory of electricity manifest as macroscopic fields, and to deduce the relationships between those macroscopic fields.

The set ${\{\rho,\me,\phi\}}$ does not only define a set ${\{\Rho,\E, \bphi\}}$ of macroscopic-counterpart fields. It also defines macroscopic excess fields on lower-dimensional manifolds, such as surfaces, interfaces, edges, line defects, and point defects. These manifolds and fields are created by the spatial compression that is intrinsic to structure homogenization.

The linearity of the spatial averaging operation that turns microstructure into macrostructure means that the relationships between $\rho$, $\phi$, and $\me$ are preserved by the homogenization transformation. Therefore ${\Rho = -\laplacian\bphi}$ and ${\E=-\grad\bphi}$.

It is a well-known and obvious stability requirement that ${\Rho=0}$ in the bulk of every material. It follows that, in the bulk of a macroscopically-uniform material whose surfaces are charge-neutral, either $\bphi$ is constant and ${\E=0}$ or $\bphi$ is a linear function of position and $\E$ is constant.

Both the $\pp$ and $\D$ fields that appear in macroscopic electromagnetic theory have been interpreted, and their existences justified, in multiple mutually-inconsistent ways since Maxwell introduced them in the 19th century. I have pointed out that none of these interpretations or justifications are valid and that $\pp$ and $\D$ appear within physical theory for historical reasons only: $\pp$ is not observable, is not a necessary element of electromagnetic theory, cannot be defined uniquely, and its existence is prohibited by macroscale symmetry. Furthermore, it continues to cause a great deal of confusion, without adding to the utility of electromagnetic theory. Scrapping it removes the distinction between $\E$ and the electric displacement $\D$, so $\D$ should also be scrapped.

The only volumetric fields that are required at the macroscale are $\bphi$ and its derivatives $\E$ and $\Rho$; but the linearity of their interrelationships facilitates the decomposition of each one into components with distinct origins and effects. For example, when studying dielectric response it might be useful to write ${\E=\Eext + \DE}$ and ${\Rho=\Rho_0 + \DRho}$, where $\Eext$ is an externally-applied electric field, ${\DRho}$ is the change that it induces in the charge density, and $\DE$ is the field emanating from ${\DRho}$.

When studying the long wavelength electric fields that emanate from modulations of the structure by optical lattice vibrations, it makes more sense to express these modulations directly as changes in charge density (${\Delta \rho}$ and/or ${\Delta\Rho}$) than as a diverging polarization field. The electric field can be calculated directly from the former, whereas the latter must be translated into a charge density to deduce its field. Furthermore, expressing these modulations as a charge density makes the qualitative difference between the long wavelength limit (${\bk\to 0}$) and a rigid relative displacement of sublattices (${\bk=0}$) clearer: a longitudinal optical phonon of wavelength ${\lambda_L\sim L}$ creates an electric field of wavelength ${\lambda_L}$; but if the material’s surfaces are earthed, a rigid relative displacement of sublattices does not create any macroscopic field. Therefore if a material is at equilibrium, ${\Rho=0}$ implies that there cannot be any macroscopic electric field emanating from its bulk. The electric potential also vanishes unless it has a source. Therefore, if all surfaces of an electromagnetically-isolated (${\E^\mathrm{ext}=0}$) material are neutral, ${\E}$ and ${\bphi}$ both vanish in its bulk. This has important implications for materials physics.

The absence of a macroscopic field can also be understood as a demand of symmetry: symmetry is scale-dependent and the bulks of all compositionally- and structurally- uniform materials are isotropic at the macroscale, regardless of their microstructures. A vector field that has a linear relationship with ${\Rho}$ cannot exist if ${\Rho}$ is uniform because all directions are equivalent. Therefore if ${\E}$ does not vanish in the bulk of a homogeneous material, it is either externally applied or it emanates from an accumulation of charge at surfaces, interfaces, or other macroscopic heterogeneities.

On the macroscale, a material’s response to an external field ${\Eext}$ is the changing of the areal densities of charge at all points on surfaces and interfaces whose tangent planes are not parallel to ${\Eext}$. When ${\Eext}$ is perpendicular to two opposing surfaces, and parallel to all others, the net change in the macroscopic field in the material at equilibrium is ${\Eext-\Delta\bsigma/\epsilon_0}$ where ${\Delta\bsigma>0}$ is the magnitude of the changes in the surface charges induced by ${\Eext}$.

When a crystal possesses a spontaneous polarization field, by which I mean only that its microstructure lacks inversion symmetry, any surface perpendicular to an axis of anisotropy would carry an areal surface charge density $\bsigma$ unless neutralized by extrinsic charges. A charged surface is unstable unless stabilized by another source of potential, such as an oppositely charged surface.

The definition of surface charge density $\bsigma$ as the integral of $\Rho$ across the surface is equivalent to Finnis’s definition, which I have generalized to non-crystalline microstructures. By relating currents to changes of surface charge, Finnis’s result can be used to calculate the normal component of the current density $\J$ at any interface if the time dependence of $\rho$ at the interface is known.

The current density in an insulator can also be calculated using the main practical result of the Modern Theory of Polarization (MTOP), which is a definition of the polarization current density $\Jconv$ in terms of the time-dependent microstructure of a material’s bulk. I have shown that this result follows from Finnis’s result and that quantum mechanics is not required to derive it. My derivations make clear that the original MTOP definition of polarization current [Resta, 1993; King-Smith and Vanderbilt, 1993; Vanderbilt and King-Smith, 1993] is exact: The fact that $\Jconv$ is expressed in terms of single particle states does not constitute an approximation.

17.3 Mathematical representations of classical microstructures

I have shown that all features of the mathematical structure of the quantum mechanical theory of electricity in materials that are relevant to this work are compatible with, or required features of, an internally-consistent statistical theory of a deterministic dynamical system of charged particles.

If a classical material comprised a large number of particles whose charges and masses were comparable in magnitude to those of electrons and nuclei, those particles would move so rapidly, and respond so sensitively to the act of observing them, that it would be impossible to observe their instantaneous positions or to follow their individual trajectories.

Therefore, as in quantum mechanics, the observable microstructure would not be the particles’ instantaneous configuration (set of positions), but their joint position probability distribution, ${\pdf=\pdf(\rvecsub{1}\cdots\rvecsub{\Nelec})}$. If a subset of the fast-moving classical particles were identical, the impossibility of following their trajectories would make them indistinguishable. Therefore any empirically-unfalsifiable pdf $\pdf$ would be symmetric with respect to the exchange of any two of the particles’ positions.

A coincidence point in the particles’ configuration space $\configspace$ is a configuration in which two or more particles occupy the same point in space. Since two massive particles cannot occupy precisely the same point in space, $\pdf$ must vanish at all coincidence points. This requirement implies that $\pdf$ has non-differentiable cusps at coincidence points unless its derivatives with respect to the coincident particles’ positions vanish sufficiently rapidly as coincidence points are approached. If $\pdf$ is non-differentiable at coincidence points, the information it possesses can be specified by an exchange-antisymmetric function ${\Psi=\sqrt{\pdf}e^{i\theta}\in\lebesgue(\realone^{3\Nelec})}$, which is differentiable for all practical purposes because it changes sign at coincidence points.

Any set of single-particle functions ${\{\varphi_i\}}$, which is a complete orthonormal basis of ${\lebesgue(\realone^3)}$, defines an infinite set of ${\Nelec}$-particle Slater determinants. The set of Slater determinants is a complete orthonormal basis of the subspace of ${\lebesgue(\realone^{3\Nelec})}$ that comprisess its anti-symmetric elements. Therefore the statistical state $\Psi$ of the microstructure of a set of mutually-repulsive and indistinguishable classical particles can be expressed exactly as a weighted sum of an infinite number of Slater determinants, or approximated by a weighted sum of a finite number of Slater determinants.

I have discussed various kinds of single-particle statistical states, or orbitals, in a many-particle system. Orbitals play important roles in both the MTOP and in commonly-used and commonly-taught models of chemical bonding.

If a material’s statistical microstructure changes as some stimulus $\zeta$ varies, it may be possible to express the number density of each of its constituent sets of identical particles exactly as ${n(\rvec;\zeta)=\sum_{i=1}^{\Nelec} n_i(\rvec;\zeta)}$, where ${n_i(\rvec;\zeta)\equiv\abs{\varphi_i(\rvec;\zeta)}^2}$, and ${\{\varphi_i(\zeta)\in\lebesgue(\realone^3)\}_{i=1}^{\Nelec}}$ is an orthonormal set whose elements vary continuously as $\zeta$ changes. Without invoking quantum mechanics I have shown that when ${n(\zeta)}$ admits such a representation, the polarization current that flows as $\zeta$ changes is given exactly by the MTOP expression [Resta, 1993; King-Smith and Vanderbilt, 1993; Vanderbilt and King-Smith, 1993],

\begin{align*} \Jconv=q\dot{\zeta}\sum_{i=1}^{\Nelec}\dvone{\zeta}\left(\int_{\realone^3}\ddpow{3}{r}\rvec n_i(\rvec;\zeta)\right), \end{align*}

where $q$ is the charge of each particle in the identical set. If the number density and its continuous dependence on $\zeta$ can be represented by one $\Nelec$-fold set of single-particle states, it can be represented by an infinite number of such sets, which are related to one another by rotations within the $N$-dimensional subspace of ${\lebesgue(\realone^3)}$ that they span.

When the bulk of a crystal is represented in a torus, delocalized sets whose elements have the crystal’s periodicity, such as the eigenfunctions of single-particle effective Hamiltonians, are known as Bloch functions. Each set of Bloch functions can be transformed into any number of sets of localized Wannier functions. The most localized set is known as the set of maximally localized Wannier functions (MLWFs) [Ferreira and Parada, 1970; Marzari and Vanderbilt, 1997; Souza et al., 2001; Marzari et al., 2012]. I point out that MLWFs do not have an obvious physical interpretation, and that using them to analyse the natures of chemical bonds is not justified theoretically.

However I point out that good reasons exist to attach physical meaning to so-called natural states and natural orbitals. Many of these reasons were summarized by Coleman [Coleman, 1963] and others [Löwdin, 1955; Davidson, 1972; McWeeny, 1960; Ando, 1963; Helbig et al., 2010; Goedecker and Umrigar, 1998; Giesbertz and van Leeuwen, 2013], while others are illustrated by results derived in Appendix I and discussed in Sec. 14.

For example, I show that the expected energy, $E$, of a classical or quantum mechanical system of $\Nelec$ identical particles in a pure state can be expressed exactly as

\begin{align*} E&=\sum_\alpha \occ_\alpha \energy_{\alpha} + \sum_\alpha \sum_{\beta\geq\alpha}\sqrt{\occ_\alpha\occ_\beta} \, w_{\alpha\beta}, \tag{98} \end{align*}

where the sums are over the set ${\{\varphi_\alpha\}}$ of all natural orbitals; the orbital ‘occupations’ satisfy ${\occ_\alpha\in[0,1]}$ and ${\sum_\alpha\occ_\alpha=\Nelec}$; ${\energy_\alpha\equiv\expvaltwo{\hamsmall}{\varphi_\alpha}}$ is the expectation value of a single-particle Hamiltonian $\hamsmall$ for a particle in natural orbital ${\varphi_\alpha}$; and ${w_{\alpha\beta}}$ is the energy of a mediated coupling between particles in natural orbitals ${\varphi_\alpha}$ and ${\varphi_\beta}$.

If $E$ is expressed in terms of an arbitary orthonormal set of orbitals, the expression is much more complicated than Eq. (98) and does not give a clear meaning to the concept of orbitals being occupied. Eq. (98) shows that, when interactions between particles are sufficiently weak, $E$ is approximately an occupation-weighted sum of the expected energies of particles that are statistically almost independent of one another.

The approximation ${E\approx\sum_\alpha\occ_\alpha\energy_\alpha}$ becomes an equality in the limit of infinitesimally weak interactions; and it is shown in Sec. 14.2 that, in that limit, the relationship ${\occ_\alpha\equiv\occ(\energy_\alpha)}$ between a natural orbital’s occupation number at thermal equilibrium and its energy ${\energy_\alpha}$ is a Fermi-Dirac distribution. Quantum mechanics is not invoked to draw this conclusion, so it applies equally to classical and quantum mechanical systems of indistinguishable particles.

17.4 Outlook

The main body of this work began in Sec. 4 with a brief description of the gross features of Maxwell’s theory of a luminiferous aether that occupies space and pervades all matter. The purpose of that description was to contrast Maxwell’s understanding of matter and materials with the current state of physical theory; and, on the basis of that contrast, to encourage the abandonment of unobservable and redundant features of his theory; namely, $\pp$ and $\M$.

Although Sec. 4 may have appeared critical of Maxwell’s work, the sequel to this work will argue that, in some important and fundamental ways, the physical picture that Maxwell espoused is a better likeness of reality than the one painted by modern textbooks. However, it is a theory of fields whose common domain is three dimensional space rather than four dimensional spacetime, because he developed it before Poincaré, Minkowski, Einstein and others had elevated time’s status from a parameter to one of the dimensions of spacetime [Walter, 2014; Miller, 1981; Darrigol, 2006].

Sec. 8 showed that qualitative differences can exist between a macrostructure and a base microstructure, such as those illustrated by Fig. 5, which are consequences of homogenization bringing about an uncertainty principle. The sequel to this work will attempt to generalize Maxwell’s ideas to a four dimensional spacetime, and to further examine the observable artefacts of large differences in scale between an observer and their subject.

It will suggest that the observed structure of the microscopic world might not be what a microscopic observer would observe; and it will examine the possibility that point particles are not components of a base microstructure, but components of a macrostructure that possesses a submicrostructure. The microstructure $\nu$ considered in Sec. 8 is the submicrostructure of macrostructure,

\begin{align*} \{\Nu\}\cup\{\bsigmaNuind{i}\}\cup\{\linechargeNuind{j}\}\cup\{\pointchargeNuind{k}\}, \end{align*}

and that macrostructure is the supermacrostructure of $\nu$.

It is shown in [Tangney, 2024], Sec. 14, and Appendices Appendix C, Appendix D, Appendix E, and Appendix F that an inviolable uncertainty principle, and large differences between the characteristic time scales of an observer and their subject, can explain most or all of quantum mechanics. Those two ingredients would imply that most or all ‘quantum mechanical’ aspects of our theory of electricity in materials are consistent with electrons and nuclei being as classical as billiard balls. In light of this, it is interesting to wonder how science might have progressed had physics not been diverted quite so radically by the experimental discovery of quantum mechanics early in the 20th century.

For generations, physicists and chemists have sought to understand the quantum world from within quantum mechanics. Relatively little effort has been put into the development of classical statistical mechanics; and even less effort has been put into trying to understand quantum mechanics from within the classical realm. It seems valid to question the wisdom of this imbalance of effort, because the common belief that quantum mechanics is somehow more general than classical statistical mechanics lacks a rigorous theoretical justification.

When Bohr coined the term correspondence principle [Bohr, 1920], he is unlikely to have anticipated it becoming mistaken for a reference to a rigorously-derived result or set of results [Falkenburg, 2009; Messiah, 1961]. Nevertheless, it is often used as an assertion that purports to summarize physical theory near the classical/quantum boundary - an underdeveloped region of physical theory that is not ripe for such brief and breezy summarization.

If it turns out that differences in time and length scales explain some or all of the qualitative differences between what humans observe of the microscopic world (point particles and quantized fields) and what we see around us on human scales, it will be important to examine the possibility that observations of a supermacrostructure contain artefacts of the observer occupying scales that are orders of magnitude smaller than their subject. In other words, it will raise the question of whether cosmologists draw their conclusions from observations of the apex macrostructure; and the question of whether they account for all artefacts of their observables occupying such large time and length scales.