(The “if” part is trivial.) As is the case for many historical results, Wilson’s Theorem was *not *proven by Wilson. Instead, it was Joseph Lagrange who provided the first proof.

The proof, as we see it today, might be phrased as follows:

*Proof: *Suppose that is prime. Then each of the nonzero residues modulo is a unit, so that represents the product over all units in . If

, i.e. ,

then and its inverse each show up in our list of units. We cancel out such terms in pairs, and conclude that

We have if and only if , which by primality of forces or . In other words, . It follows that

When is composite, the direct translation of Wilson’s problem gives

The problem, here, is that we’ve multiplied a number of zero divisors together, which can be avoided by only multiplying across the units, , of . In this post, we consider the product

determine its value, give credit to Gauss for doing so over two centuries ago, and discuss a few generalizations.

**— WILSON’S THEOREM FOR CYCLIC GROUPS —**

One particularly tedious proof from elementary number theory is the classification of cyclic unit groups. That is,

**Question: ***When is a cyclic group?*

It turns out that is cyclic if and only if takes one of the following forms:

for an odd prime and . (This can also be proven using the structure theorem for finite abelian groups, which is a bit of overkill but is certainly easier in practice.) If is cyclic, we have a cute proof of Wilson’s Theorem that extends beyond the case prime:

**Theorem: ***If is cyclic, then*

*Proof: *If is cylic, it admits a generator . Since generates all of , we have

We recall that if and only if . In the case above, this divisibility condition amounts to whether or not is even, ie. if is odd. This fails except in the case , whereby we see that the product in question cannot be .

On the other hand, it is clear that our product is a square root of . (Just look at the expression at right in the line above.) But a cyclic group only admits two square roots of (again, think about generators), so our product over the units must be the other square root of , ie. .

**— GAUSS’ GENERALIZATION TO ACYCLIC GROUPS —**

**Question: ***What about when is not cyclic?*

This first occurs in the case , in which we calculate

The next case is , which gives

It’s not too hard to prove that the product over the units in should be a square root of . Unfortunately, this doesn’t pin down our answer very much in general. For example, *every *unit in is a square root of , and the same holds for , too! (It’s not always this bad, however.)

In his *Disquisitiones Arithmeticae*, Gauss proved that the pattern emerging above continues. That is,

**Theorem (Gauss): ***The product over the units in is if is cyclic. Otherwise, this product is .*

*Pr**oof: *Our proof begins with the fundamental theorem of finite abelian groups. If as a factored product of distinct primes, then

It follows that

For , let be a generator of . Let denote the elements of the acyclic group . We check that the product over all units is thus

These powers of will vanish under the assumption that

which is met if and only if is even. This occurs so long as has a prime factor or a factor of with multiplicity at least . In other words, this occurs whenever is not cyclic. When is not cyclic, the product over all units in may therefore be written

It thus suffices to show our claim in the case , with . That is, that

We leave it as an Exercise that . If we let and represent generators for these two subgroups, respectively, then the product over units in may be written

Since is acyclic, it has exponent strictly less than its order, . Thus the exponent divides , hence the product above is , which completes our proof.

*Remarks — *I did not examine Gauss’ proof of the preceding Theorem (DA, art. 78) before deriving it independently. I had assumed that Gauss proved this fact using a direct, elementary approach. The opposite is true – Gauss’ proof mirrors ours given above, insofar as both require the structure theorem for finite abelian groups. Need I mention that the structure theorem for finite abelian groups is likewise due to Gauss?

**— WILSON’S THEOREM IN FINITE ABELIAN GROUPS —**

Nothing stops us from generalizing the proof in the previous section to the case of , an arbitrary finite abelian group. In this case, our Theorem takes the following form:

**Theorem: ***If has a unique element of order , then the product over the elements in is exactly . Otherwise, this product is .*

*Proof: *Recall that in our very first proof of Wilson’s Theorem, we have characterized this product as the product over all elements in of order exactly . If there is a unique such element, say , the product over all elements in is clearly .

The “otherwise” claim is far more interesting. Note that the elements of with order at most form a subgroup, which — by the structure theorem of finite abelian groups — is necessarily isomorphic to for some integer . Fixing generators for these components, we recognize the product over as

in which we’ve used that the exponent of is .

*Remark — *To recover our earlier results from this very general theorem, we need only prove that has a unique element of order if and only if is cyclic. This follows from the classification of cyclic unit groups and careful consideration of the Euler phi function; navigating this was the major obstruction in the theorem we credited to Gauss. (Of course, Gauss would not have considered the problem in this way.)

**— EXERCISES —**

**Exercise: **Prove Wilson’s Theorem by calculating the number of -Sylow subgroups of the symmetric group .

**Exercise: **Deduce Wilson’s Theorem from Burnside’s Lemma by considering the following group action: cyclic permutation of the set of -cycles in .

**Exercise: **Prove that for . *Hint: What is the order of in this group? Look at via the binomial theorem.*

]]>

**Question: **Let be a set of positive integers totaling 20. What is the maximum value of

It’s a fun problem, so don’t rush past the spoiler tags too fast. When you’re ready, I’ll spoil the solution to the question above, and discuss a “continuous” version of the question above. Namely, what happens when is allowed to include positive real numbers?

As promised, here’s one solution to the question posed before the break:

*Solution: *Suppose that is a set of positive integers totaling 20. If for some we have , then we increase our product over terms in by replacing by the two elements and . We may therefore conclude that is at most 4. In fact, we may assume without loss of generality that is at most 3, since any occurrence of 4 may just as easily be the pair {2,2}.

From the other direction, we may assume that for all , since

In other words, the set that maximizes our product consists (without loss of generality) of just 2’s and 3’s. Moreover, the inequality

tells us that 2 occurs in our maximal product at most twice. The set is then determined uniquely, as

For this set, our product is 1458, and this product is maximal.

Older readers may recognize this solution from an earlier post, where it was used to produce a crude upper bound on the maximal orders of elements in the permutation group . We won’t be heading back down that path today. Rather, let’s turn to the following, continuous analogue of our earlier question:

**Question: **Let be a set of positive real numbers totaling . What is the maximal product of the set , as varies?

If we want to follow the discrete case, we quickly prove that in a maximal set we may assume that implies

.

However, unlike in the discrete case, this leaves a spectrum of possibilities. Is it obvious, for example, that our product should be bounded? Could an optimal solution have infinitely many terms in ?

As it happens, neither of these pathologies arise, because an infinite product either has

- Product equal to zero, or
- Infinitely many terms that do not limit to zero, which contradicts our restriction on the sum of .

With that worry at ease, we present our solution to the continuous case:

*Solution: *Our solution uses the general idea of perturbation theory: if are both in , let’s consider what happens when we replace the pair with its arithmetic mean , occurring twice. Since

and this latter inequality follows from the AM-GM inequality, it follows that we can always increase our product by averaging out the terms in . In a set with maximal product, we may assume that every term in is equal. If has terms, then

,

and the product over is .

For which integer is this maximized? Let ; then we seek to maximize

,

with an integer multiple of . If we relax our assumptions on and merely assume that is real, then (and hence ) is uniquely maximized for . Therefore, restricting to , our maximum is attained at one of the following two points:

,

which in either case gives us the maximal product

So, which one of the terms above is largest? Based on the work that got us this far, it stands to reason that our answer should correlate (if not correspond) to whichever one of the two approximations

is closer to . Indeed, if the function we were maximizing — namely — was symmetric about its unique maximum, our answer would be this simple: take the choice that corresponds to the better approximation. While this claim fails, our intuition in the matter nevertheless carries through.

We can prove the following result, which says that each choice in the maximum from line (1) occurs with equal probability:

**Theorem: **The set of integers such that the maximum in line (1) occurs with the first term has natural density 1/2.

*Proof: *For any , there exists a positive such that

for all integers . Let , in which denotes the fractional part of the real number , and let denote the set of integers

It follows that

for all . Expanding in a power series at , we find

Therefore, there exists an interval such that for all , we have

Provided that is taken large enough so that both and both lie in the interval , we find that

Thus (2) holds on a set of natural density equal to the natural density of the set. But has natural density , by the equidistribution theorem, which gives (2) on a set of density for any . Conversely, one may show that

(see the Exercises), which implies that (2) fails for a set of natural density 1/2. Since (2) holds for if and only if the first term in line (1) is maximal, which gives our result.

**— EXERCISES —**

**Exercise: **If is a set of positive rationals totaling , what is the maximal product over the elements in ?

**Exercise: **Use the degree two Taylor expansion of about the point to show that the claim in line (3) holds.

**Exercise: **Show that the interval can be explicitly bounded from inside, and use this (plus the effective equidistribution theorem) to give an effective version of our final theorem. *(Hard.)*

]]>

1. The group of biholomorphic maps (those that respect the structure of as a Riemann surface). It is well-known that such maps are given by Möbius transformations, i.e. rational functions of the form

satisfying . The group of Möbius transformations (also known as the *Möbius Group* and herein denoted ) is naturally isomorphic to , the projective (special) linear group, via:

2. The group of *conformal* maps , denoted for brevity. To be clear, here we refer to those maps which preserve **unsigned** angle measure. *(In contrast, some authors require conformal maps to preserve orientation as well.)* We recall the fundamental result that such maps contain the Möbius group as a subgroup of index two. To be specific, any conformal self-map on is either biholomorphic (returning to case (1)), or bijective and *anti-holomorphic*: a biholomorphic function of the complex conjugate .

After the fold, we begin a two-part program to calculate the maximal such that the symmetric group injects into (resp. ). Along the way, we study injections of the alternating group into , and highlight some exceptional cases in which our injections can be attached to group actions on a finite invariant set.

**— PART I **(Injections )** —**

Our first goal will be to verify the existence of injective maps . While it suffices here to provide a single example, it is infinitely more enlightening to show how such an example may be found. As a happy consequence, we’ll end up classifying all possible images up to inner automorphism. To be specific,

**Theorem 1.1: ***There exists an injection . Moreover, this injection is unique (up to inner automorphism on ).*

*Proof: *Recall the group presentation

Now, suppose that is given, and let denote the images of , respectively. Because is elliptic, we may assume (up to inner automorphism on ), in which is primitive. Since has order two if and only if , takes the form

Secondly, the relation (with plenty of algebra!) forces

as a polynomial in . Moreover, since has order *exactly* four, we find (else has order dividing two). Next, because, we can rewrite this as , hence . In particular, is non-zero, by a determinant calculation. We may thus projectivize , such that

(1)

in which we have used that . Because the centralizer contains the subgroup of diagonal (projective) matrices, we may freely conjugate by diagonal elements. In particular, we note that

hence is conjugate (via ) to a Möbius transformation mapping. In other words, we may assume in line (1) – again, up to inner automorphism on . In this sense, is uniquely determined by , whereby there exist at most two injections modulo inner automorphism: one for each choice of among the roots of .

As it happens, each choice of induces a valid injection (this is finite computation). As a final claim, we leave to the reader to show that conjugation by the Möbius transformation fixes while interchanging the choices for associated to and . That is, *any* two injections differ by an inner automorphism of , precisely as claimed.

Next, we claim that the constant taken in Theorem 1.1 is maximal. Here, two avenues of proof seem promising:

- A direct proof (along the lines of Theorem 1.1), built from a manageable group presentation for .
- A proof built upon Theorem 1.1, noting that any injection restricts to a map on of the type studied in Theorem 1.1.

Both work well. The first is given below, while the second is sketched in the Exercises:

**Theorem 1.2: ***There exists no injection .*

*Proof: *We begin with the group presentation

known at the time of Burnside’s article *Note on the Symmetric Group* (1897). If exists, let denote the images of , respectively. For definiteness, we may assume that (up to inner automorphism). For, we recall that if and only if . Thus takes the form

and some computer-assisted algebra quickly gives us the following relations:

The first of these implies , whereas the second yields

Since , we obtain , which contradicts that .

**— PART II **(Injections ) **—**

The questions answered in the previous section admit natural analogues with in place of . Answering them, however, will be a bit harder than before, owing to the greater complexity of the group structure on . For this reason, we try when possible to reduce questions regarding to questions of alone.

For example, the identification induces a semi-direct product structure on ; namely,

,

in which the -action is given by complex conjugation. It follows that any injective map induces a restricted map , i.e. an injection of the alternating group into the Möbius group. To see how this might benefit us, consider the following:

**Proposition 2.1: ***There exists no injection . It follows that does not inject into .*

*Proof: *See the Exercises.

As it turns out, there *does* exists an injection of into , which is maximal in the sense of the preceding Proposition. Unfortunately, this fact does not a priori *imply* that injects into (so Proposition 2.1 won’t serve a major role in our classification of symmetric subgroups). In fact, the opposite is true:

**Theorem 2.2: ***There exists no injection .*

*Proof: *As in Theorem 1.2, we begin with the group presentation

.

Suppose that exists. Since has cycle type , it follows that, hence lies in . Regarded as an element of , is torsion (hence elliptic), so we may assume that , in which (up to inner automorphism). As by Theorem 1.2, we may write

(in general form). The relation holds if and only if

(2) , , and .

(Case 1): If , we may scale ; then and . It follows that , in which or . If , we then have, which implies that , a contradiction. Thus , and we may write

On the other hand, that forces , an impossibility. This case is therefore untenable, and we turn to:

(Case 2): If , we may projectivize to assume ; then and . This last expression is both real and (purely) imaginary, hence . Thus , and takes the form

As for the relation , it follows that . Since, this implies . After substitutiting this into our expression for , we find that the relation forces

Coupled with , we find , but this contradicts that has order five. Thus there exists no injection .

We note that Theorem 2.2 recovers Theorem 1.2. Regardless, Theorem 2.2 can’t be used to prove Theorem 1.2, because we have invoked that first theorem in assuming . Avoiding such circularity would require — in essence — reproving Theorem 1.2 as an early claim.

For free, we obtain

**Corollary 2.3:** *There exists an injection if and only if **.*

**— PART III **(Group Actions and Invariant Sets) **—**

We’ve seen that contains no symmetric subgroups beyond those that inject into , despite the fact that contains properly as an index two subgroup. Why, then, might we be interested in subgroups not contained in ?

For starters, let be any injection, and consider the sequence of maps

in which denotes the quotient of by its normal subgroup . As mentioned previously, we have . (In particular, we cannot have, Klein’s four group, despite the fact that is normal in .) Thus if and only if . When these equivalent conditions hold, membership in is detected by the sign character on . In this one sense, the theory of symmetric subgroups in is more robust than the analogue theory in .

There is a second, much more compelling reason to study symmetric subgroups in , which begins with the following construction:

Let be any finite set, , and suppose that is a subgroup of conformal mappings such that each permutes the points of . (In other words, is an *invariant set* for the action of .) This gives an induced map , known as the *permutation representation (associated to the group action )*. Our interest in such maps is simple: if bijects, then is exactly the sort of map we’ve been looking for.

If this permutation representation bijects, then our general theory implies. Of these , the extremal case naturally presents the greatest interest. Here, there are two cases to consider:

- All points in lie on a circle in . Up to inner automorphism, we may assume that , with .
- The points in lie on no common circle, but we may assume after conjugation that is given by , with .

If case (1) holds, note that we may assume , by modding out by the trivial action of . As it turns out, however, case (1) cannot actually occur:

**Example 3.1: **Suppose that surjects, and that case (1) holds. We may assume . If corresponds to the 3-cycle , brief computation shows that

Since fixes , we must have . But this polynomial has no real roots, which contradicts that . It follows that case (1) cannot occur.

Note: If case (2) *can* be realized, this gives strong motive to consider injections of into , as opposed to injections into the smaller group . As it happens, case (2) *is* realized, in an essentially unique way.

To prove this result, it will be advantageous to present first a general Lemma. For the moment, let’s relax our assumptions and suppose that the permutation representation merely surjects. By the previous Example, it follows that the invariant set has (after inner automorphism). As the following (general) Lemma shows, the map injects without further hypotheses:

**Lemma 3.2: ***Suppose that contains four points not lying on a circle in . If fixes each point in , then . In other words, the permutation representation injects.*

*Proof: *Suppose that a non-identity element fixes . Conjugating by an element of , we may assume that contains , with. We note that , as acts sharply 3-transitively on . Thus, writing with , it follows that fixes the three points , and . Because acts sharply 3-transitively on , we have. Thus , which fixes (and nothing else). This contradicts that fixes , whence no such exists.

And now, our final Theorem:

**Theorem 3.3: ***There exists a subgroup and an invariant set ,, such that the permutation representation bijects. If and are as and above, there exists an element such that and . It follows that the associated injection is unique up to inner automorphism.*

*Proof: *Suppose that and exist as above. Up to inner automorphism by, we may assume that , with . Moreover, Example 3.1 implies that is a root of . In the group presentation

we may take , wherein is as in Example 3.1. Since the image of (, say) is anti-holomorphic, hence a simple transposition (by order). Moreover, can be chosen to fix while transposing , as is complete. Thus takes the form , for which we note that surjects.

By Lemma 3.2, we have . As such, is uniquely determined by, and is determined by choice of (among the roots of ). These two choices of are related by conjugation by , which gives uniqueness of (in the sense of this Theorem). It follows that is unique up to inner automorphism, as claimed.

**— EXERCISES —**

**Exercise: **Herein, we outline a second proof of Theorem 1.2. Assuming that injects, let denote any subgroup isomorphic to . By Theorem 1.1, we may assume that is generated by

and ,

in which is some primitive cube root of unity. Show that contains an element that commutes with such that and. Use these relations to find a contradiction.

**Exercise: **Using the group presentation

,

show that contains a subgroup isomorphic to (which is unique up to inner automorphism). Then, using the group presentation

,

show that no injection exists.

]]>

**Theorem (Product Rule): ***Let and be differentiable on the open set . Then is differentiable on , and we have*

for all *.*

*Proof: *For , we have (by definition of the derivative)

under the assumption that each of these last two limits exists. This of course holds, as these limits are and , respectively.

All in all, then, the product rule is easy to prove and easy to use. But — and this is of utmost pedagogical importance — * is the product rule intuitive? *By this proof alone, I would argue not; the manipulation of the numerator is weakly-motivated and our result falls out without reference to more general phenomena.

In this post, we’ll explore the merits of a second proof of the product rule, one that I hope presents a motivated and compelling argument as to **why**** **the product rule should look the way it does.

**— PART I (PRODUCTS AND CHAINS) —**

In what sense, if any, should the product rule be natural? As suggested in the introduction, the derivative is — fundamentally — a linear operator. What business, then, does the derivative have in respecting products?

In one sense, very little. To hash this thought out more fully, let be an algebra over the ring . A -linear operator is called a * derivation *if satisfies the product rule, i.e.

for all . Derivations can be thought of as formal counterparts of the derivative, and in this light we make two observations: firstly, that the product rule shines as the *characteristic property *of derivations; secondly, that this holds because (and only because!) we have prescribed it.

To put some of this into perspective, I’d like to compare the product rule to a second mainstay of differential calculus: the familiar chain rule. Frequently, this is taught long after the product rule, in part for the following:

- While the quotient rule is perhaps most naturally a corollary of the product and chain rules, it can be (and often is) derived independently. By circumventing the chain rule, one can differentiate the trigonometric functions sooner (i.e. before returning to the chain rule). This is done in Stewart, for example.
- In a curriculum that focuses on differentiating each of the so-called “elementary functions“, the chain rule is only required insofar as it used to derive the differentiation laws for inverse functions (e.g. the inverse trigonometric functions and either the logarithm or the exponential).

There’s also the question about proof: on a moral level, the chain rule follows from the factorization

in which the first term is recognized as and the latter as . Unfortunately, it may be the case that fails to inject in any neighborhood of , in which case our “moral proof” falls short.

*Remark: This is no more than a technical obstruction: for such that , we simply replace our left-most difference quotient by . (This all works by continuity of .)*

Despite this obstruction, our moral proof of the chain rule is elegant in form and obvious in execution. As one might expect, this simplicity has categorical significance: the chain rule encodes precisely the fact that the derivative (and generalizations) give functors from the category of differentiable manifolds to the category of tangent bundles.

**— PART II (LOGARITHMS) —**

By now, I hope that this post has made two opinions clear: that the derivative is fundamentally a *linear *object, and that the chain rule respects this linearity in ways that the product rule does not. This motivates our present interest in logarithms, as a method to turn products into sums. As it turns out, we’ll need just one Lemma:

**Lemma: ***Let be defined on an open set . Then for all, and for (provided that ).*

*Remark: *In most usual definitions of the logarithm, one of these statements will be obvious. If the logarithm is defined as an anti-derivative of , for example (making our first assertion tautological), then a result due to Saint-Vincent (1647) implies that . On the other hand, it is also common to first define the logarithm as inverse to the exponential (which gives the stated functional equation), and prove that equals its own derivative. *(This, in turn, can be used to define .)*

We are now primed to present a second proof of the product rule. Regrettably, we must finally break the symmetry we’ve created between the product rules for differentiable (resp. complex-differentiable, i.e. holomorphic) functions defined between subsets of (resp. subsets of ).

**Proposition: ***Suppose that and are differentiable and non-vanishing on the open set . Then is differentiable on , and we have*

for all .

*Proof: *Let , a connected component of .* *If and are functions of a real variable, we may assume by continuity that on (negating or if necessary). Then on , and implicit differentiation gives

in which we have used the chain rule and our Lemma. Our result follows by clearing denominators.

In the complex case, the fact that and are non-vanishing throughout gives the existence of local branches to the logarithm. With these branches, our proof carries through as in the real case.

As it stands, this version of the product rule has been artificially weakened by the hypothesis that and be non-vanishing on . In this sense, I would compare it to our (somewhat incomplete) proof of the chain rule – an elegant proof with some technical holes pushed under the rug.

On the other hand, this gap is not so hard to fill: borrowing some intuition from perturbation theory, we are led to consider functions of the form, in which the perturbation is chosen such that and become locally non-vanishing (about a fixed point in the domain of differentiability of and ). Then

by our Proposition. On the other hand, linearity of the differential gives

.

It follows that , after cancellation, i.e. the product rule.

**— PART III (THE PROBLEM WITH RINGS) **—

And now, for some last-minute abstract nonsense:

Having seen these two proofs, it’s obvious why our first dominates the classroom, despite the haunting simplicity of line (1). Less obvious — and far more troubling — is the inherent difficulty in relating additive and multiplicative constructs (cf. the Goldbach and *abc* Conjectures), a thorn in the side of number theorists and algebraists the world over.

When multiplication and addition *do* behave (in some predetermined context), it is frequently because there exist sufficiently well-behaved analogues of the logarithm and exponential functions. In the case at hand (asking how a certain* linear* operator respects *multiplication* of functions), it has been enough to know that the logarithm satisfies a characteristic functional equation and has a well-understood derivative.

In the case of formal group laws over a ring of positive characteristic, for example, the non-existence of logarithms/exponentials is central to the field’s depth. In certain cases, these formal group laws give rise to actual group laws, e.g. on the completion of the base ring with respect to the -adic topology. In particular, -adic convergence of the logarithm affords us — in a small but tangible way — a better understanding of the group structure on an elliptic curve.

**— EXERCISES —**

**Exercise: **The product rule trivializes if we assume some multivariable calculus. Let and , and define . Calculate

using the multivariate chain rule.

**Exercise: **Given a matrix Lie group , let denote the set of matrices such that , where denotes the matrix exponential. Then is a * Lie algebra*, known as the Lie algebra associated to . Find the Lie algebras associated to and .

**Exercise: **If is a field of characteristic zero, prove that the additive and multiplicative formal group laws are isomorphic over .

]]>

Unfortunately, few students see more than two or three explicit (i.e. closed form) group laws before stumbling into the deep end of abstract nonsense. In this article, we’ll see in a rigorous sense why this ** must** be the case, providing along the way a complete classification of polynomial and rational formal group laws (over any reduced ring).

**— PART I —**

Following Bochner, a * formal group law *over a commutative ring (with unity) is a bivariate power series such that the following two properties hold:

- ;
- ,

in which we borrow the O-notation to denote an element of the ideal. On occasion, we’ll stress the fact that* * is a formal group law by writing . Then (2) clearly implies that is an associative binary operation, while (1) states that acts locally (i.e. to first order) like “normal” addition on .

The reader may note that (1-2) fall a few axioms short of the well-known group axioms. Actually, though, no further axioms are needed: the existence of inverses (and identity) are consequences of (1-2). To be precise,

**Proposition:** *Let be a formal group law. Then*

- ;
*There exists such that .*

*Proof: *An exercise in the (formal) Implicit Function Theorem.

Although we won’t need (or prove) this result, we note that all formal group laws over a ring of characteristic are commutative. (This result is sometimes known as *Lazard’s Theorem.*)

By one measure, the simplest of all formal group laws are those lying inside, i.e. those given by polynomials (versus formal power series). Here, two common examples come to mind:

**Example 1: **The simplest of all formal group laws is the * additive formal group law*, given by . Less obvious is the

These two examples have something in common: each is a member of the one-dimensional family

of formal group laws. And, as our first Theorem shows, these often exhaust the polynomial formal group laws over .

**Theorem: ***Suppose that is reduced (i.e. has trivial nilradical), and **let be a polynomial formal group law over . Then for some .*

*Proof: *Suppose that is given by the polynomial

,

and let , the degree of in . Then is at most , with the coefficient of equal to

.

Now, suppose that this coefficient is . Taking , it follows that is nilpotent. By the Exercises, it follows that for all . The nilradical of is trivial by hypothesis, so that for all . This contradicts that , so that is exactly. On the other hand, a quick calculation shows that is at most . Condition (2) forces , and condition (1) gives exactly.

Likewise, the identity yields to us that . Thus fits the form

,

whereupon condition (1) gives our result.

Thus, we see that *polynomial *formal group laws are unavoidably plain in the case of reduced rings. *(The Exercises contain further examples of polynomial formal group laws, over non-reduced rings.)*

**— PART II (RATIONAL GROUP LAWS) —**

Ever on the hunt for simple examples, we now turn our attention to formal group laws defined by rational functions. I mentioned previously that few students see more than three formal group laws expressed in closed form. This third example is often the following:

**Example 2: **Let be a commutative ring and define

,

considered as formal power series in . *(If is a field, we may view as an element of without incident. In general, though, we are studying the localization of with respect to the multiplicatively closed set .)*

It can be shown that defines a formal group law over . Readers may recognize two special cases of , in that

;

.

(These may be obtained from each other via Osborne’s Rule.) As a remark, also gives a law for adding velocities (with unit , the speed of light) in the framework of special relativity. *(More information can be found here.)*

Actually, the one-dimensional family of formal group laws (as well as the family given after Example 1) belongs to a two-parameter family

,

defined over any commutative ring . As our next Theorem shows, these often exhaust the rational group laws over :

**Theorem: ***Let be a reduced (commutative) ring. If is a rational formal group law over , then for some constants .*

*Note:* This Theorem dates from 1976, in *Rational Formal Group Laws, *the doctoral dissertation of Robert Bismuth [1]. Unfortunately, his proof clocks in at around 30 pages, and – in my opinion – fails to address the material in a conceptual way.

Here, we present instead (a significantly expanded version of) a later proof, due to R. Coleman and F. McGuinness [3]. This proof, published under the by-now-familiar title *Rational Formal Group Laws*, holds when is a field of characteristic . *(The full proof may be thereafter obtained using techniques from [1].)*

*Proof: *For the moment, let us assume that (a ring of characteristic ) is algebraically closed. Let (a rational function), and define

.

With this, we define the -form , which – as a claim – satisfies . To see this, we recall the definition of the pullback:

To simplify this last expression, recall that . Applying at and setting , it follows by the chain rule that. Then, as , we obtain. The chain rule gives , so that

,

as claimed. Next, let (resp. ) denote the set of poles (resp. zeros) of . With the equation , it follows that and . Moreover, viewing as a branched cover , we have

.

The right-hand side is bounded above by ; summing over gives

.

The double sum at left is simply , since preserves the order of poles/zeros. It follows that the inequality in the previous line is *equality*, i.e.

for all . This equality can be written suggestively as

, (1)

in which we note that our left-hand side is non-positive and our right-hand side is non-negative (as surjects). Thus each is , and it follows that either or for all (i.e. admits only simple poles).

In this first case, is a linear fractional transformation fixing (of infinite order, because ), so each power of has unique non-zero fixed point. As is finite and , there exist integers such that, hence is fixed by . On the other hand, any fixed point of is fixed by , so that fixes by uniqueness. It follows that is the unique pole of . Likewise, we may show that , since . That is, has one pole (of order at most , by our condition on ) and no zeros, so that , in which is some linear fractional transformation of fixing .

In the second case, fix . Then consists of a single point (by (1)), say . A local calculation gives

. (2)

As in the previous case, there exists such that (for some ). For this , induction on (2) gives , hence (because and ). If is not a branch point, then . Then , a contradiction. Thus (and similarly, ) is contained in the branch locus of , so that in particular . As is non-empty (as ), (by the Residue Theorem), so that has two (distinct) poles and no zeros. It follows that , in which is a linear fractional transformation fixing and (as is algebraically closed).

In either case, we may write : in the first case with ; in the second, with . *(I.e. is isomorphic (see the Exercises) to either the additive or multiplicative formal group law on .)* Regardless, simplification yields

, (3)

with . This concludes our proof when is algebraically closed. In the general case, fix an embedding of into an algebraic closure, and consider which rational functions of the form (3) lie in .

**— PART III (ALGEBRAIC GROUP LAWS) —**

At the rate we’re going, a better name for this post might be *Formal Groups (And Where Not to Find Them)*. And so, to find the examples we seek, we shift our attention one final time: from rational formal group laws to

.

**Example 3: **Using the addition formula for sine, we see that the function

,

defines a formal group law over the ring . Moreover, satisfies the following polynomial relation:

.

Thus is an algebraic formal group law over .

**Example 4: **Fix a ring , and consider the set

.

For , multiplication (in the complex sense) gives a group operation on, which makes into a Lie group with identity . Near the identity, we obtain a chart for of the form . Under these coordinates, the group law () takes the form

,

i.e. , where is as in Example 3. That is, we have seen that ** arises as the group law of an algebraic group** (with respect to local coordinates). Consistent with the terminology of Bochner, a polynomial of this form is said to be a ** formal algebraic group **(cf. formal Lie group).

These two Examples paint a picture which is indicative of a general rule, established by R. Coleman in 1986 [2]:

**Theorem: ***Let be algebraically closed field, of characteristic zero. If is an algebraic formal group law over , then is algebraically isomorphic to a formal algebraic group.*

*Proof: *Can be found here.

That is, not only do algebraic groups give rise to algebraic formal groups,** all such formal groups (up to isomorphism) appear in this form. *** (The caveat “up to isomorphism” is necessary, as a formal power series need not converge.) *

And so, we finally have an answer to our question: if you’re looking for “simple” (in an algebraic sense) examples of formal group laws, look to the theory of algebraic groups. Not only will you find some great examples (e.g. matrix groups and elliptic curves), you’d be hard-pressed to find *anything* quite as simple.

**— EXERCISES —**

**Exercise: **A * homomorphism* of formal groups over is a power series such that . Show that a homomorphism of polynomial formal groups over exists if . It follows that associate elements define isomorphic formal groups. Show that the converse need not hold.

**Exercise: **If is a homomorphism of polynomial formal groups over and is a polynomial, we’ll say that is a *p-homomorphism** *of polynomial formal groups. Similarly, is a *p-isomorphism** *if admits a polynomial inverse. Show that the p-isomorphism classes of polynomial formal groups are classified by the equivalence classes of associate elements in .

**Exercise: **Let be a commutative ring. Show that the nilradical of is. *Hint: the inclusion “” is easy; for the converse, proceed by induction on degree. *(Solution can be found on *Project Crazy Project*.)

**Exercise: **Fix a nonzero integer , and let denote the ring . Show that

defines a polynomial formal group law, which is not of the form . Find a polynomial formal group law over the ring , not of the form .

**Exercise: **Given the result of the main Theorem of Part II, find necessary and sufficient conditions for each of the two cases therein to occur. Are rational formal group laws in general (rationally) isomorphic to the additive formal group law, or the multiplicative formal group law?

**— REFERENCES —**

[1] R. Bismuth, *Rational Formal Group Laws,* (1976).

[2] R. Coleman, *One-Dimensional Algebraic Formal Groups*, Pacific J. Math. **122**, (1986), no. 1, 35-41.

[3] R. Coleman and F. McGuinness, *Rational Formal Group Laws*, Pacific J. Math. **147**, (1991), no.1, 25-27.

]]>
** perfect information**, in which each player knows the moves carried out by all players. Such games laid the foundation for early economic models of perfect competition, wherein consumers have full knowledge of both market conditions and each others’ consumer tendencies.

Of course, perfect competition — taken literally — cannot exist, and the failure of this model and others prompted economic theorists to study a more-general class of games, games of *limited information** *(also known as games of *imperfect information*).

In this post, we’ll look at one-player games of limited information (sometimes classified as puzzles, not games) through a topological lens, and create for each game a poset of topologies under which topologically indistinguishable points correspond to outcomes that are indiscernible in a limited-information context. Expanding this dictionary, we’ll describe a topology on the outcome space under which the “safe” or “warranted” extension of one’s limited information relates to the continuity of certain maps.

**— PART I (THE DICTIONARY)–**

Our discussion will be modeled around Minesweeper-type games, a game archetype that includes in addition many grid-based logic puzzles. In this setting, the “limited information” is encoded in the set of revealed grid tiles, which — depending on the rules of the game — may (or may not) conform to particular patterns. In this sense, a round of the game is a map

,

in which is the game-space (e.g. the set of grid tiles) and is the space of outcomes (e.g. the numbers 0-9, union “the mine”), and is consistent with the rules of . Let

denote the set of valid rounds to the game . For each subset , we define an equivalence relation , such that precisely when identically. Following this, we define (for fixed ) a topology on , with basis the cosets of . *(In other words, the open sets are precisely the (possibly empty) unions of cosets in .)*

**Example 1: **With all notation as before, we find that forms the indiscrete (trivial) topology on . In contrast, yields the discrete topology.

In general, two sets in a topological space are said to be **topologically** **indistinguishable**** **if they are contained in exactly the same open sets (equivalently, in the same closed sets). One motivating property of our construction is that two games are topologically indistinguishable with respect to if and only if they agree on the tiles of . *(You may want to pause and verify this.)*

**Example 2: **(A continuation of Example 1.) In the trivial topology, each pair of points is topologically indistinguishable (hence no two games are differentiated). On the other hand, all points are distinguished in the discrete topology, i.e. this space is . (Of course, totally disconnected spaces satisfy not just one, but *all* of the separation axioms.)

A second, subtler advantage to our construction is its naturality: an inclusion of sets induces an “inclusion” of topologies, in the following exact sense:

For two topologies over a single topological space, we say that if all open sets in are open in . If so, is said to be **coarser**** **than , while is said to be * finer* than . We remark that the relation “” defines a poset on all topologies over a given space.

We have the following:

**Proposition: ***The surjection*

*given by is inclusion-preserving.*

*Proof: *If are subsets of , let denote the coset class of under the equivalence . To show that , it suffices to prove that can be written as a union of cosets in . But this is clear, as

,

in which the union runs over all cosets such that .

Whereas is by construction surjective, it will rarely inject. Minesweeper, as a typical example, yields a non-injective map . *(See the Exercises.) *Nevertheless, we give now (for completeness) a criterion equivalent to the injectivity of :

**Theorem: ***The map is an inclusion-preserving bijection if and only if the following condition holds:*

*()* *For all , there exists a coset that splits non-trivially (i.e. into multiple cosets) under .*

*Proof: *Necessity of the condition () is clear: if for two sets this condition does not hold, then all open sets under are open under , giving . Thus by our Proposition, i.e. .

On the other hand, suppose that fails to inject, and take for two sets . Then , so is a non-empty union of cosets under . Any coset in this union is open in (hence a union of -cosets). But the only open set (with respect to ) contained in is , so we have for all . Suppose by symmetry that . It follows from definition alone that . Taking and , we find a contradiction to criterion ().

**— PART II (REFINEMENT)–**

In Part I, we discussed at length the ways in which an increase in limited information can be modeled by a refinement of topologies over the space of rounds . In this section, we focus on the process of *extending *one’s limited information, in relation to the existence of continuous maps into (and its powers).

To set the stage for our work this section, suppose that a game has been determined mod (i.e. the outcomes of on are known). The simplest of all scenarios occurs when the map fails to inject; specifically, when for some . Here, it’s simple to show that (regardless of ), hence defines a unique function mod .

More follows if we specify beforehand, as would happen in practice. In this case, we obtain a unique extension of to containing if and only if; that is, if forms a *single* coset mod . *(To borrow from algebraic number theory, one might say that is “inert” with respect to .)*

Inertness has an equivalent formulation in the continuity of certain maps. To keep things simple, let us first assume that is obtained from with the inclusion one additional element . If is given the discrete topology, then the map

given by evaluation at is continuous if and only if it is constant (i.e. when). In greater generality, we have (for ) if and only if

is continuous. (Here is given the product topology.)

It turns out that this second formulation is more useful. To see why, let’s consider again the case of Minesweeper:

**Example 3: **In the familiar case of Minesweeper, our outcome set consists of the integers 0-9, as well as the mine . If our game is known on and lies outside of , we need not have full knowledge of on to expand our limited information. Rather, it suffices to know whether or not is a mine, i.e. which component of the partition

of that lies in. If this component is fixed as varies over , then we are justified in expanding our limited information to include the value of on . But this happens precisely when the map is continuous with respect to the partition topology on .

All together, we’ve stumbled upon an “algorithm” for justifiable and/or safe play:

- Partition into the minimal number of components required to inform safe play under general conditions.
- Given knowledge of on , let be maximal such that is continuous with respect to the partition topology on (whence the product topology on ).
- This justifies knowledge of on ; now repeat step 2.

If is finite, the procedure above will terminate (in finite time) with knowledge of on some subset . Dependent on we have two cases (here, we assume finite):

- , in which case we have won our game.
- , in which case we reveal some such that the Bayesian probability of winning (given knowledge of on weighed against the risk of guessing ) is maximized.

Combined with our earlier procedure, this gives an optimal (probabilistic) strategy for , which need not be a winning strategy. *(This result, we remind the reader, relies upon the finiteness of .)*

**— PART III (GENERALIZATIONS) —**

If our game space is infinite, far less can be said. For instance, it could very well be that our “algorithm” above may never terminate (although these “infinitely long games still hold interest for certain game theorists; see determinacy).

A second means of generalization comes from relaxing our assumptions on (while keeping finite). Here, our algorithm for careful play proceeds without a hitch, while admitting the novel possibility of an infinite number of different rounds . Indeed, the only sticking point in this more general setting is our reliance upon Bayesian probability.

Working over a finite set of rounds allows us to (implicitly in the previous) make use of a uniform distribution on . Without this standard, invocation of Bayesian probability requires a priori knowledge of the distribution of rounds in . Alternatively — if the distribution of rounds is not known — we could employ a frequentist approach, hopefully converging upon optimal play (using a machine-learning algorithm, say).

Simple examples of this more general behavior are included in the Exercises.

**— EXERCISES —**

**Exercise: **Show that the function given by associated to the game of Minesweeper is not injective. *(Hint: consider at which point the round is determined.)*

**Exercise: **If the space is infinite, is there any difference (as regards the continuity of our maps ) between the product topology and the box topology on ?

**Exercise: **Let be a region, and let be a closed, measurable set with finitely many components. We define

,

in which is the volume of the -dimensional unit ball and is the indicator function of . Fix some . We define a game of limited information as follows: the player chooses such that . If, the player loses (immediately). Otherwise, play continues until the player can ascertain the number of connected components of . Prove that this game has a winning strategy if and only if

- We do not lose on the first turn.
- is pre-compact (equivalently, totally bounded).
- The connected components of are simply connected.

Prove that (2) can be dropped if we are allowed to take countably many turns. *(This might be called a “-winning strategy.)*

**Exercise (The Secretary Problem): **Let be a list of real numbers (unknown to the player). We define a game of limited information as follows: the player views the elements of sequentially, and may stop at any point. If the player stops at , the round ends and the player wins if and only if . What is the optimal (probabilistic) strategy, as ? How does this problem change if is known to be sampled from a given distribution?

]]>

in which the sum exhausts the rational primes at most . At this point, it becomes quite elementary to derive the two inequalities

Results of this flavor remained essentially unimproved for over a century, until Chebyshev presented the following landmark theorem in 1852:

**Theorem (Chebyshev): ***There exist positive constants such that*

Thus Chebyshev’s Theorem shows that represents the growth rate (up to constants) of ; stated equivalently in Bachmann-Landau notation, we have .* *Yet more is true: the constants in Chebyshev’s proof are therein made effective, and can be taken as

As a corollary to Chebyshev’s Theorem, we have for. By making this implicit bound on precise, Chebyshev was able to prove Bertrand’s Postulate (thereafter known as the Bertrand-Chebyshev Theorem).

In this post, we’ll prove a variant of Chebyshev’s Theorem in great generality, and discuss some historically competitive bounds on the constants and given above. Lastly, we’ll discuss how Chebyshev’s Theorem relates to proposed “elementary” proofs of the PNT.

**— PART I (LEMMAS) —**

To begin, let denote the von Mangoldt function, and let denote the second Chebyshev function

We recall that (by elementary methods), whereby it suffices in the context of Chebyshev’s Theorem to study the asymptotic growth of . This (given the complexity of the PNT) is understandably difficult, and so our first step towards Chebyshev’s result is to approximate by “simpler” functions.

For what follows, let be a multi-set of non-zero integers. We define the two functions

If , we’ll say that is ** balanced**. In this case, the following Lemma provides the asymptotic growth of :

**Lemma 1: ***If is balanced, then*

*Proof: *To begin, consider the identity

which expresses Legendre’s formula (1808) in terms of the von Mangoldt function. Extending by linearity, we obtain

i.e. an expression of as the logarithm of a ratio of factorials. To continue, we now recall Stirling’s approximation (in its weak form), that. Accordingly, we obtain

As is balanced, the first and third sums vanish and our result follows.

*(Note: In the proof of Lemma 1 we remarked that arises as the logarithm of a ratio of factorials. On the other hand, , and we may view our general technique as an estimation of as a ratio of factorials.)*

For future use, we define .

With Lemma 1 in hand, we seek to relate the growth of and . To do so, we first note that is periodic (with period dividing ), provided that is balanced. In particular, is bounded; let (resp. ) denote the maximum (resp. minimum) of . Define , if such an exists (i.e. for ). Lemma 2 gives an upper bound for :

**Lemma 2: ***Let be balanced, and suppose that . Then*

*Proof: *By definition of , we have for . As , it follows that

This last expression telescopes, and we obtain (after iterated substitution)

in which denotes the cube and represents the multinomial coefficient. With Lemma 1, this implies

Now, it follows from the multinomial theorem that

and this proves Lemma 2.

In our final lemma, we derive a lower bound for . Recall that is bounded (with maximum dependent on ) provided that is balanced. For , we define , and then set . We then have:

**Lemma 3:** *Let be balanced, and suppose that . Then*

*Proof: *By definition of (and our hypothesis on ), we have

as the previous expression telescopes. Let be a lower bound for as (these lower bounds exist by positivity of ). Then for large , and thus

In particular, this forces for all lower bounds ; Lemma 3 follows in passing to the liminf.

**— PART II —**

Now, we present some applications of the lemmas in Part I. Our first example concerns the simplest of all balanced multi-sets:

**Example 1: **In this example, we take . Then satisfies

in particular, (with ), and (with ). Noting that , we conclude (via Lemmas 2-3) that

The relationship between and the central binomial coefficients (cf. Lemma 1) is far from coincidental: a more-precise study of these coefficients yields a proof of Bertrand’s Postulate (an observation due to Erdős). Yet this is likely the extent to which the central binomial coefficients encapsulate the density of primes, as the upper and lower bounds on differ here by a factor of two (cf. for all ).

**Example 2: **Our second example comes from Chebyshev (1852), and is generated by the (balanced) multi-set . We readily find (with ) and (with ), which yields the bounds

(exact values for are given in the introduction). As remarked above, these bounds suffice to give an elementary proof of Bertrand’s Postulate, which may explain why Chebyshev makes no attempt to further improve them (at least, not in his original manuscript, *Mémoire sur les nombres premiers*).

On the other hand, it’s quite possible that Chebyshev could not find other multi-sets that improved upon these bounds. Indeed, insofar as required to prove Bertrand’s Postulate (for all , not just asymptotically), the multi-set affords a similar — yet far shorter — proof.

The torch that Chebyshev lit was carried in subsequent decades by Sylvester, who in 1892 published the bounds

In a sense, these bounds were penultimate: within four years two proofs of the PNT were published (due to Hadamard and de la Vallée-Poussin).

**Example 3: **Now, to find for ourselves some competitive bounds on , we embrace that which Chebyshev could not: brute force search over short multi-sets . In a few hundred hours of CPU time (in Mathematica), I’ve found the following:

which induce the lower (resp. upper) bounds and on the infimum and supremum of , respectively. Unfortunately, I have yet to find multi-sets that improve upon the elementary bounds of Sylvester (which — credit to him — require far more delicate approximations). Nevertheless, the lower bound which stems from lies within spitting distance () of Sylvester’s bound, a non-trivial feat in itself.

**— PART III —**

After 1896, the Chebyshev Theorem (and extensions thereof) became little more than a historical/pedagogical note. But the nagging question remains: could Chebyshev (in theory) have proven the PNT with this approach?

In short, maybe. ** Under the assumption of the PNT**, three proofs emerged (circa 1937; due independently to Erdős, Kalmár, and Rosser) that showed that the constants in Chebyshev’s proof could be forced arbitrarily close to one.

I confess that the techniques presented hitherto in this article are too *ad hoc *to reach this result. That is, in building upon the weak foundation of Egyptian fractions to create multi-sets with desirable properties, we find ourselves limited by the unpredictability of Egyptian fraction representations.

Morally, then, our multi-sets form great examples but remain too unwieldy for use in a theoretic capacity. All the same, it takes only a small tweak of our previous methods to reach the result above, which we prove now as a fitting end to this article:

*Proof: *Consider the multi-set

Then is not necessarily balanced. (In fact, Bertrand’s Postulate implies that is *never* balanced; see the Exercises.) Nevertheless, if we set , the modified function

is periodic — with period dividing — and is consequently bounded. Next, we introduce the “indicator function”

in which the arithmetic function denotes the Kronecker Delta . For brevity, set (as a Dirichlet convolution), which we note satisfies

Our proof now commences in earnest. With the convolution identity* * (the so-called *Chebyshev Identity), *we derive

The right-hand side of may be rewritten as

Using Stirling’s Approximation (just as in Lemma 1), we obtain the asymptotic

in which denotes the constant

Now turning to the left-hand side of , an application of the Dirichlet Hyperbola method gives

By construction, on the interval . In particular, the rightmost term above is nothing more than , which is by our estimates in Part II. Moreover, since represents the summary function of, it follows that on . And so, the second sum in the line above simplifies to , and we obtain at last:

wherein the final simplification comes from direct analogy with Lemma 1. Coupled with lines and , this implies

(provided that for simplicity). For fixed , we obtain

for all sufficiently large (e.g. ).

At this point in our proof, we need only show that as . For this, it suffices to establish the following two claims:

**a.** tends to as , i.e.

**b.** tends to as .

For (a), define the functions

*(Here, is the familiar Mertens Function.)* Abel Summation implies that

The two estimates and — both equivalent to the PNT (e.g. Apostol, Thm 4.16) — yield

whereby . Then, because the classical error bound on the PNT (due to Hadamard and de la Vallée-Poussin) provides

for some , we obtain . Thus (after some calculus), which proves our claim from (a).

To establish (b), we begin with the identity

which follows from the Abel summation formula (using the smooth weight function ). As — by the classical PNT error bound — each term on the right-hand side of converges as . If denotes this limit, then as and

in the context of line . Our early estimates in Part II give , whereby. It is an old result of Chebyshev that this forces (see the Exercises), and this concludes our proof.

**— EXERCISES —**

**Exercise: **Suppose that for some constant . Show that this implies . Then, use Euler’s estimate on the harmonic sum of primes (given in the introduction) to prove that . (*While this was first noticed by Chebyshev in 1852, our work here shows that his result was well within the grasp of Euler, over a century beforehand.)*

**Exercise: **Let , and use the closed form for to show that

Use this estimate to give a second proof that implies . *Hint: use the Taylor Series for .*

**Exercise: **Use Bertrand’s Postulate to show that is never balanced, i.e.

for all . Similarly, show that the th harmonic number is never an integer. How does Möbius inversion connect these two results? *Hint: For the first part, multiply by and reduce modulo a large prime.*

**— REFERENCES —**

[1] T. Apostol, *Introduction to Analytic Number Theory, *Springer (1976).

[2] P. L. Chebyshev, *Mémoire sur les Nombres Premiers*, J. Math. Pures Appl., **17**,** **(1852).

[3] H. Diamond and P. Erdős, *On Sharp Elementary Prime Number Estimates*, L’Enseignement Mathématique, **26**, (1980).

]]>

**Theorem: ***Let be a subring (with unity), and suppose that has finite spectrum. Then is dense in , the topological closure of the fraction field of .*

In particular — assuming a finite spectrum — it follows that the rank of (considered as an abelian group) is bounded below by two. Unfortunately, this is as far as our previous methods will take us, even when (see the Exercises).

The primary objective of this post is an extension to this result. Specifically, we would like to capture *in a purely algebraic way* (e.g. without mention of the topology on ) the fact that becomes quite large in certain rings with finite spectrum. After the fold we’ll accomplish exactly that, with the following Theorem and its generalizations to a wide class of commutative rings:

**Theorem 1: ***Suppose that is a subring (with unity). If has finite spectrum, then has infinite rank as an abelian group.*

**— PART I —**

Before we proceed to our proof, some remarks are in order. First off — if we continue our count from the previous article — this provides a third proof of the infinitude of prime ideals in the ring of integers over a number field. Indeed, Dirichlet’s unit theorem implies that the units in such a ring have finite rank.

Secondly, and perhaps more importantly, the conclusion of this theorem makes sense in any unital ring, so we may ask whether or not our theorem holds in greater generality (as will be done in Part II).

As one final reminder, we recall the definition of the ideal

in which the sum runs over the non-zero prime ideals in . Here, if is a domain with finite spectrum. Having recalled this, we present the proof of Theorem 1:

**Theorem 1: ***Suppose that is a subring (with unity). If has finite spectrum, then has infinite rank as an abelian group.*

*Proof: *This proof will proceed in two cases, dependent on whether or not contains transcendental elements. In our first case, take transcendental over ; then (an earlier Proposition) implies that . Now, suppose that is a set of irreducible polynomials. If is not independent (as a multiplicative set), then there exist integers such that

at which point the fact that is a UFD implies that for all . Thus is an independent set, whence has rank at least . It thus suffices (for our first case) to evince infinitely many irreducible polynomials in, and such an example is provided by the cyclotomic polynomials with (or the linear polynomials).

Our second case is slightly more involved. Take algebraic; we may assume without loss of generality that is an algebraic integer. Now, suppose that the (multiplicative) abelian group generated by has finite rank. The field norm restricts to a homomorphism, and the image of this map has finite rank. Let

Finiteness of rank implies that the set of prime divisors of is finite. Yet if represents the product of these primes, it follows that, so our hypothesis forces for all integers . *(This is an extension of Euclid’s proof of the infinitude of the primes to a statement about the prime divisors of a polynomial.)* This gives a contradiction, and so has infinite rank. As , it follows that has infinite rank as well.

**— PART II —**

From here on, we direct our attention to extensions of our Theorem 1 to more general rings. To begin, let be a ring with unity . Let be the kernel of the ring homomorphism generated by ; then for some , and we define the * characteristic* of to be (see here for more information). In particular, a ring of characteristic contains a subring isomorphic to , and if is a domain, then contains a subfield isomorphic to .

The advantage of this definition is obvious when we realize that our *purely algebraic *proof of Theorem 1 cannot distinguish between (resp. ) and their embedded copies in (resp. ). In other words,

**Theorem 2: ***Let be a unital domain of characteristic . If has finite spectrum, then has infinite rank as an abelian group.*

Note: the shift from Theorem 1 to Theorem 2 represents the paradigmatic change between two characterizations of : first as a subring of ; then as a “*super-ring*” of . To emphasize our new-found generality, we present the following:

**Corollary 3: ***Let be a discrete valuation ring of characteristic . Then has infinite rank.*

*Proof: *A discrete valuation ring is a PID with a unique non-zero prime ideal (hence finite spectrum).

It’s easy to construct counter-examples to Theorem 2 if we drop either of the hypotheses that

- have characteristic (e.g. , with prime); or
- have finite spectrum (e.g. ).

What is less clear is the dependence on being a domain, which runs deep in our assumptions (e.g. that the intersection of any two ideals be non-trivial). As some small consolation when is *not* a domain, we find that

the nilradical of (in which the intersections exhaust the prime ideals of ), which offers a nice reinterpretation of our earlier construction.

**Non-Example: **Take to be the dual numbers over the integers, i.e.. Then , and some algebra implies that the units group of is precisely

so that in particular has finite rank. Next, recall that is a prime ideal if and only if is an integral domain. Thus is a prime ideal for any prime so has infinite spectrum, which is consistent with the general form of Theorem 2.

This non-example gives some hope to the idea of extending Theorem 2 to certain rings that are not domains. And indeed — such extensions do exist, under hypotheses that require us to recall one more definition: given a short exact sequence

of modules over a commutative ring, is said to ** split** if (under a natural isomorphism for which and is projection to ).

With these preparations, we make our final refinement of Theorem 1:

**Theorem 4: ***Let be a commutative ring of characteristic (with unity), such that is prime and the short exact sequence*

*splits. If has finite spectrum, then has infinite rank.*

*Proof: *Let . The lattice theorem for commutative rings gives a bijection between the ideals of containing and the ideals , given by . For , we have

(a domain) and so restricts to a map . Moreover, for any , the preimage lies in ; this is a general property of (unital) ring homomorphisms. Thus induces a bijection

and in particular is finite. Next, we claim that has characteristic . Indeed, if denotes the canonical (characteristic) injection, suppose that . Then for some , a contradiction to injectivity. Thus is a unital domain of characteristic with finite spectrum, and it follows by Theorem 2 that has infinite rank.

As splits by hypothesis, we obtain an injective section , which restricts to an injective group homomorphism . So has infinite rank, as claimed.

Our work in generalizing Theorem 2 deserves a few remarks. Firstly, while we have used Theorem 2 in the proof of Theorem 4, we note that Theorem 4 recovers a proof of Theorem 2 directly. For if (taken as in Theorem 4) is a domain, then , and the short exact sequence splits trivially.

Secondly, we note that the condition implies that all zero divisors in are nilpotent (and these conditions are equivalent). On the other hand, the condition on is less transparent, but we can nevertheless force it through various tricks; e.g. if is a -module and is a free -module (as in our Non-Example).

The obstruction to variants of Theorem 4 in more general rings is outlined in the Exercises. Suffice to say that the relationship between and is less clear-cut in general.

**— EXERCISES —**

**Exercise: **Let . Prove that the multiplicative group is dense in if and only if (1) and are multiplicatively independent and (2) at least one of has argument incommensurable with . *(Consider the lattice of logarithms.)*

**Exercise: **Suppose that is a commutative ring with unity, such that the set

forms an ideal. Give an example of a ring of characteristic such that the quotient has positive characteristic (which is necessarily prime).

**Exercise: **If is a commutative ring (with unity) such that is an ideal, prove that is a domain and that

Use this to prove that has infinite rank provided that (1) , (2) has finite spectrum, and that (3) the short exact sequence

splits.

]]>

For the sake of explicit analogy, we include a (needlessly abstracted) version of Euclid’s result now:

**Theorem 1 (Euclid): ** *The integers have infinite spectrum.*

*Proof: *If not, let exhaust the list of prime ideals. As the integers form a PID, (in fact, they form a Euclidean domain), we may associate to each given ideal a prime generator such that . Let ; as is not a unit (replacing with if necessary), it admits a prime factor which equals some . But then divides both and , whence as well. This is a contradiction, and our result follows.

It is worth remarking that the “PID-ness” of the integers is not needed in the derivation of Theorem 1. Indeed, if were not principal, we may take instead to be *any* non-zero element of . Thus, we see that the PID-ness of is *not* the crucial property of the integers (viewed as a subfield of ) upon which our proof stands. Rather — as we’ll see after the fold — Euclid’s proof frames the infinitude of primes as a consequence of the *finiteness of the units group!*

Where is a subring (with unity) as before, we define

in which the sum runs over all non-zero prime ideals in . This is not quite the nilradical of , as our intersection omits the prime ideal . Nor is the Jacobson radical (the intersection of all maximal ideals), except when the Krull dimension of is (e.g. for a PID or a Dedekind domain).

**Proposition 2: ***Let be a subring (with unity). Then .*

*Proof: *Let and . If for some , then as . Yet is a unit, so this contradiction forces for all prime ideals . Yet all non-units of lie in maximal ideals, so this implies that is a unit. The reverse inclusion is obvious.

**Corollary 3:** *If has finitely many units, then has infinite spectrum.*

*Proof:* If has finite unit group, then the identity forces. Yet if , then must be infinite, as the intersection of finitely many non-zero ideals (in a domain) is itself a non-zero ideal.

As an application of Corollary 3, we may reestablish Euclid’s result on the infinitude of primes. Moreover — if we assume the Dirichlet unit theorem — it follows that for all imaginary quadratic fields. Yet, as the following Theorem shows, we have only begun to tap the power of our Proposition:

**Theorem 4: ***If has finite spectrum, then is dense in , the topological closure of the fraction field of .*

*Proof: *Suppose that has finite spectrum, so that as in Corollary 3. If , then any non-zero generates a lattice over . If , let be non-zero and choose not real. Then and are independent over and so generate a lattice in . In each case, we shall denote this lattice by .

Fix , and choose a sequence tending to infinity. There exists some dependent only on such that for some choice of . Then

and it’s not hard to show that this rightmost sum is . Thus the sequence tends to , while Proposition 2 implies that each term is unital.

Once again, we can use the Dirichlet unit theorem to furnish examples of number fields with infinite spectrum:

- All quadratic fields.
- Cubic and quartic fields that are not totally real.
- Select fields of degree five and six.

Actually — and in some sense with a lot less work — we can show that *all* number fields have infinite spectrum. To do so, we’ll need just one more Proposition:

**Proposition 5:** *Let be a number field of degree . Then the length of the longest arithmetic progression in is uniformly bounded as a function of .*

*Proof:* Following Newman in [1], suppose that the units

form an arithmetic progression. Let , which lies in as is a unit. Thus , for . If we take as well, then, a consequence of the fact that (whereby ). Thus the polynomial (as a degree norm form) has roots at . Then , which gives our uniform bound.

Coupled with Proposition 2, this implies the following:

**Corollary 6: ***Let be a number field, with ring of integers . Then has infinite spectrum.*

*Proof: *If our result does not hold, then is non-zero; let be non-zero, and consider the (infinite) arithmetic progression . Each element lies in by Proposition 2, but this contradicts Proposition 5 and so our result must hold.

*(In particular, our previous appeals to the Dirichlet unit theorem have been unnecessary.)*

**— PART II —**

Of all proofs of the infinitude of the primes, it is without a doubt that of Euler that has led to the greatest generalizations. In summary, the fundamental theorem of arithmetic (that the integers form a UFD) allows us to write

i.e. an expression of the zeta function as an Euler product. In this way, the divergence of the harmonic series implies that the right-hand product contains infinitely many terms. Yet this method (as written above) fails to generalize to number fields that are not UFDs. Yet this setback admits a simple remedy: if one shows that all number fields are Dedekind domains (i.e. that the ideal group of is always a UFD), then the factorization

holds, in which (resp. ) exhausts the non-zero ideals (resp. prime ideals) in . Once again, we find that our infinite sum diverges at (it is minorized by the harmonic series) which implies that the right-hand sum constitutes an *infinite* product. Yet this is to say that has infinite spectrum (our result in Corollary 6).

Nevertheless, I contend that our generalizations to Euclid’s method have interest in their own right. For one, they highlight the tension between the size of the units group and the cardinality of the spectrum in *any* ring of characteristic 0. While our best example of this phenomenon — Theorem 4 — relies heavily on the topology of , it is not difficult to imagine other results that capture this sentiment. In particular, I conjecture (but have not been able to prove) the following:

**Conjecture 7: ***Let be a domain with characteristic 0, and suppose that has finite rank as an abelian group. Then *

*the arithmetic progressions in have uniformly bounded length, so;**has infinite spectrum (à la Corollary 6).*

**— EXERCISES —**

**Exercise: **Show that the ring

satisfies , so the converse to Theorem 4 is false.

The following Exercise gives conditions under which our methodology is sharp.

**Exercise: **Let be a domain (with unity). We have already shown that implies that has infinite spectrum.

- Prove that these conditions are equivalent if is a Noetherian domain with finite Krull dimension.
- We call a –
if all ideals in can be written as modules over with at most generators. Use the Krull height theorem to show that -PIDs are Noetherian domains of finite Krull dimension, so that if and only if in a -PID.**PID** - Show that the ring of integers in a number field (of degree ) is an -PID.

*(Hard.)*

**— REFERENCES —**

[1] M. Newman, *Units in Arithmetic Progression in an Algebraic Number Field, *Proc. Amer. Math. Soc, **43**(2), 1974.

]]>

For , the ring of integers is just the integers , in which case we recall the Fundamental Theorem of Arithmetic: that every integer may be written as a finite product

in which the are prime and uniquely determined (up to permutation). Domains for which this holds are known in general as ** unique factorization domains **(UFDs). For — with square-free — the ring of integers will in general

Far less is known in the case (in which case *k* is known as a * real quadratic field*), although an unproven conjecture dating back to Gauss suggests that there should be infinitely many real quadratic fields. More recently, some heuristics stemming from Cohen suggests that the ring of integers in should be a UFD with probability as on the square-free integers.

Here, we’ll focus on a more tractable variant of this problem:

**Question:** What can be said about the number of distinct real quadratic fields with for which is ** not** a UFD?

For a weak answer to the question above, we devote the rest of this article to the establishment of the following bound:

**Theorem: **As , we have

in which the implied constant is made effective (e.g. greater than ).

For a high-level perspective, our plan is to identify a “large” infinite family of *d* for which the (images of the) norm forms in are both uniformly well-behaved and restricted, in the sense that they can be chosen to uniformly avoid the norms of some select low-lying primes. To control the distribution of said primes (i.e. to ensure that they remain small), we trade power for control and restrict our study to the primes that ramify in as opposed to those that split (which are far more numerous).

**— PART I —**

As always, we require a few Lemmas:

**Lemma 1:** Let be of degree 2. Then the set

has natural density 0.

*Proof: *For any prime , we note that for some *n* iff admits a root in the finite field , which occurs precisely when

in which denotes the polynomial discriminant of *f* and denotes the Legendre symbol. Of course, this implies that for all integers *k*, and for it follows that is composite. Thus, on a set of density , and we have

in which *A* denotes the set of primes such that (1) holds. Now, let, in which is odd. *(If , then is reducible over and our theorem holds trivially.)* We have for all primes in a coset of , and quadratic reciprocity implies for all primes in a coset of . Thus *A* contains all primes in some arithmetic progression, and thus

which tends to 0 provided that the sum diverges. This fact follows from the Chebotarev density theorem (or a sufficiently strong version of Dirchlet’s theorem on arithmetic progressions), and the well-known estimate.

Our second lemma concerns the density of square-free values for the polynomial (as above). We define

If is assumed, we denote by the set of roots of over the ring .

**Lemma 2: **Let be of degree 2, with leading coefficient . Then

In particular, the density is positive iff the content of is square-free and .

*Proof: *Let be prime. As in Lemma 1, implies . If and , then Hensel’s Lifting Lemma implies that admits exactly two roots over . In particular, we have with probability as . If for some , we find (likewise) that with probability , and so the formula in (2) holds. As

wherein the last inequality follows from absolute convergence of the series, we have iff for one of the finite primes dividing . If , then admits roots over . For , this forces (as is a field and ). Then, and we repeat this argument for to show . In the case , finite computation gives the stated exception, and these conditions are clearly sufficient for to hold.

Our final Lemma can be viewed as a strengthening of Lemma 1:

**Lemma 3: **Let be of degree 2, and let be a finite set of primes. Let be given by

Then .

*Proof: *It suffices to show that the set

has density 1. Let be prime such that there exists an with divisble by . Define such that for all (which exists as). As ranges across the primes in (where is as in Lemma 1), we have

As in Lemma 1, it follows that .

**— PART II —**

In this section, we’ll begin to see how our Lemmas apply to the construction of real quadratic number fields without unique factorization.

Let be of degree 2. Then is a quadratic irrational for any fixed , and so

in which the expression at right denotes the (periodic!) continued fraction expansion of . If the terms can be taken as integer polynomials in , and if can be taken independent of (for sufficiently large), then we say that has a * uniform root*. For example, satisfies

for and so admits a uniform root.

**Theorem 1: **Suppose that of degree 2 admits a uniform root. As , the ring of integers in fails to be a UFD for all *n* in a set of density .

*Proof: *Take square-free and set . Let be a ramified prime ideal, lying over the rational prime . If is a UFD, then is principal (with generator of norm ). For , the norm form in is given by , hence there exist integers such that

If we suppose further that , then (3) has a solution if and only if with one of the “pre-periodic” approximants to (i.e. it suffices to check successive approximants up to ; see here for more information).

If admits a uniform root and , the approximants to appear as a rational function in , and the norm form of the *i*-th approximation to takes the form

Let denote the (finite) set of primes which arise when is constant. Otherwise, , and we have as . For sufficiently large, let be a prime divisor of . Then the norm form fails to surject onto , and will not be a UFD by the remarks above. This proves our Theorem in the case , and a proof for the case is outlined in the Exercises. The result for general follows by consideration of the four polynomials , for (each of which satisfies one of our conditions on ).

With this result in hand, we derive a (preliminary) lower bound on the function . This is presented in the following example, which moreover sets the stage for our work in Part III.

**Example: **For , define . As

for all , it follows that admits a uniform root. Let . As , we note that if and only is square-free. In either case, Theorem 1 implies that

To evaluate , we note that , so

after some simplification. If we take square-free, then for all (just consider the reduction into ). Now, if we define and , it follows that

For , we get the weak estimate . *(With more care, this constant may be raised to **.)*

**— PART III —**

We now approach our final step in the proof that

which is achieved by (delicately!) adding the contributions of each to our lower bound on . We require one more Lemma:

**Lemma 4: **If for any two , then and.

*Proof: *We first note that for , since

for (and the lower bound is obvious). So implies that these values lie between the same perfect squares, i.e. . Yet for all , which implies (whence ).

This “injectivity” result implies that we need not worry about double-counting our contributions to our estimate on . We note that provided that , whereby

*(This step requires some uniformity in the rate at which tends to, but this is not difficult, as for .) *By introducing an error term (and slightly reducing our constants), this implies that

With this in hand, we are ready to prove our final estimate on :

**Theorem 2: **As , we have

*Proof: *To estimate the sum above in (4), we define the Dirichlet series

where the introduction of the Möbius function is used to restrict our sum to the square-free integers. For real, we see that

in which denotes the infinite product

We recognize the rightmost product in (5) as an Euler product relating to the Möbius function, and so

as along the positive axis. It follows that the *N-*th partial sum to satisfies

(by a consideration of the residue of at ), which yields to us the estimate

using (4).

To get a handle on the constant appearing in the final estimate of this proof, we can unravel all of our infinite products. The resulting constant is then

Thus the implied constant in the statement of Theorem 2 may be taken in (slight) excess of , as earlier claimed. Without a doubt, this constant can be improved (perhaps with a more careful treatment of ).

**— EXERCISES —**

**Exercise: **As varies over the polynomials of degree 2, show that can be made both arbitrarily large (less than 1) and arbitrarily small (while greater than 0). Is it the case that

is dense in ?

**Exercise: **When is of degree 2, what are the possible values of ? Give necessary and sufficient conditions for each to occur. (Take care in the case .)

**Exercise: **Complete the proof of Theorem 1 in the case , using the fact that

**Exercise: **What is the bound on which follows from the observation

for (i.e. that admits a uniform root)? *(**This exceeds the bound which we established at the end of Part II.)*

]]>