Proof sketch of Thue’s theorem

In this post a proof of the following theorem is going to be sketched, following the treatment in Borevich and Shafarevich’s Number Theory. This sketch is by no means meant to be highly detailed and I am writing it mostly for my own purposes, so I avoid proving some things, even if they aren’t that straightforward.

Thue’s theorem: Suppose f(x,y)=a_0x^n+a_1x^{n-1}y+\dots+a_ny^n is a binary form which has degree n\geq 3, is irreducible (i.e. f(x,1) is an irreducible polynomial in x) and f(x,1) has at least one nonreal root in \mathbb C. Then for any nonzero integer c the equation f(x,y)=c has only finitely many integral solutions.

Proof: Suppose otherwise…

Setup

First of all, we may suppose a_0=1, for otherwise, we replace $f(x,y)$ with a_0^{n-1}f(\frac{1}{a_0}x,y), which still has integer coefficients. Write

f(x,1)=(x+\theta_1)(x+\theta_2)\dots(x+\theta_n).

The numbers \theta_1,\dots,\theta_n are all conjugates of \theta=\theta_1, since we assumed f(x,1) is irreducible. It’s then easy to see

f(x,y)=(x+y\theta_1)(x+y\theta_2)\dots(x+y\theta_n)=N(x+y\theta) \qquad (1)

where N is the norm of the field k=\mathbb Q(\theta). Also put K=\mathbb Q(\theta_1,\dots,\theta_n). Hence we are interested in the solutions of N(\alpha)=c, where \alpha is in the module (i.e. the additive subgroup) M generated by 1,\theta. Extend this two-element set to a basis of k \mu_1=1,\mu_2=\theta,\mu_3,\dots,\mu_n and denote by \overline{M} the module generated by these. To recover elements of M among these, we use the dual basis, i.e. elements \mu_1^*,\dots,\mu_n^* such that T(\mu_i\mu_j^*)=0 for i\neq j and $latex T(\mu_i\mu_i^*)=1$. Trace of \alpha\mu_i^* recovers then the coefficient of \mu_i in \alpha, hence we want

T(\alpha\mu_i^*)=0 for i=3,\dots,n.

A general result (Theorem 1, Section 5.2, Chapter 2 in Borevich-Shafarevich, slightly rephrased) about elements of fixed norm in a module states the following.

Theorem 1: For a module \overline{M} of rank n in a field k of degree n there are elements \gamma_1,\dots,\gamma_k\in\overline{M} and \varepsilon_1,\dots,\varepsilon_r\in k such that every solution of N(\alpha)=c,\alpha\in\overline{M} can be uniquely written as

\alpha=\gamma_b\varepsilon_1^{u_1}\dots\varepsilon_r^{u_r}.

Moreover, r=s+t-1, where s is the number of real embeddings of k into \mathbb C and 2t is the number of complex embeddings.

Therefore \alpha as above is in M if it satisfies the system of equations

T(\gamma_a\mu_i^*\varepsilon_1^{u_1}\dots\varepsilon_r^{u_r})=0 for i=3,\dots,n.

Since we assume there are infinitely many \alpha solving the above system, and \gamma_a ranges over a finite set, we can choose one of the \gamma such that infinitely many solutions of the above have \gamma_a=\gamma. We can now write this system as

\displaystyle\sum_{j=1}^n\sigma_j(\gamma\mu_i^*)\sigma_j(\varepsilon_1)^{u_1}\dots\sigma_j(\varepsilon_r)^{u_r}=0 for i=3,\dots,n\qquad (2),

where \sigma_j are embeddings of k into K ordered so that \sigma_j(\theta)=\theta_j.

So now we want to derieve a contradiction from the assumption that (2) has infinitely many solutions in integers a_1,\dots, a_r.

Entering the \frak P-adic world

The idea now is to prove that (2) not only has finitely many integral solutions, but it has finitely many solutions in \frak P-adic integers, where \frak P is some prime of K. More precisely, we take any prime (= prime ideal in the ring of integers) \frak P and the corresponding valuation \nu=\nu_{\frak P}. Then we construct the completion K_{\frak P} of K with respect to this valuation. By a “\frak P-adic number” we mean any element of K_{\frak P}, and ones with nonnegative valuations are going to be called “\frak P-adic integers”.

We now want to make sense of equations (2) for a_i not necessarily integers, but also \frak P-adic integers. The problem reduces to making sense of a^b for a a fixed \frak P-adic number and b a \frak P-adic integer, which is meant to vary. For this, we employ exponential and logarithmic functions: we will write a^b=\exp(b\log a). \exp and \log are defined using their power series:

\displaystyle\exp x=\sum_{n=0}^\infty\frac{x^n}{n!},

\displaystyle\log(1+x)=\sum_{n=1}^\infty(-1)^{n+1}\frac{x^n}{n}.

These two functions are each other’s inverses, that is,

\exp\log(1+x)=1+x,\log\exp x=x.

There are many ways to justify this, the most straightforward one being that we know these equalities hold for complex numbers, hence they are formal equalities of power series, hence they must also hold for \frak P-adic numbers. However, these functions are not defined everywhere. Nevertheless, they can be shown to have positive radius of convergence. More precisely:

Lemma 1: There is a rational integer \kappa such that both \exp x and \log(1+x) are defined for \nu(x)\geq\kappa. Moreover, \nu(\log(1+x))\geq\kappa, so (1+x)^b=\exp(b\log(1+x)) is defined for any \frak P-adic integer b.

Unfortunately, there is no reason to expect numbers \varepsilon_i suit our purposes. However, we can change them so that this is the case. First of all, we may suppose that \frak P is such that all of \sigma_j(\varepsilon_i) have valuation zero (there are finitely many of these numbers, and they have nonzero valuation only with respect to finitely many prime ideals). Now we look at reduction modulo \frak P^\kappa (or, more precisely, modulo any element with valuation \kappa). The quotient ring is finite, say it’s of size d. Then $\varepsilon_i^d$ always is congruent to 1 modulo \frak P^\kappa, i.e. \varepsilon_i=1+x for x of valuation at least \kappa.

Moreover, we can replace the set of \gamma_i by products of \gamma_i and suitable powers of \varepsilon_i. we only need to multiply by powers between 0 and d-1. To avoid introducing more notation, we will just assume that \varepsilon_i, and hence also \sigma_j(\varepsilon_i), are of the form which allows us to speak of their exponential functions.

Analytic fluff

The exponential function on \frak P-adic numbers satisfies all the familiar properties. Thanks to this, equations (2) can be rewritten as

\displaystyle\sum_{j=1}^nA_{ij}\exp L_j(u_1,\dots,u_r)=0 for i=3,\dots,n,\qquad (3)

where A_{ij}=\sigma_j(\gamma\mu_i^*) and L_j(u_1,\dots,u_r)=\displaystyle\sum_{k=1}^ru_k\log\sigma_j(\varepsilon_k). Note that the involved functions are all continuous functions of u_k.

Now we use the fact that \frak P-adic integers are compact (under the topology induced by the valuation). Since we assumed (3) has infinitely many (\frak P-adic) integral solutions, there must be a subsequence of these solutions which converges to some tuple (u_1^*,\dots,u_r^*). By continuity, this tuple constitutes another solution to (3). By a change of variables v_i=u_i-u_i^*, we get a system of equations

\displaystyle\sum_{j=1}^nA_{ij}^*\exp L_j(v_1,\dots,v_r)=0 for i=3,\dots,n,\qquad (4)

where A_{ij}^*=A_{ij}\exp L_j(u_1^*,\dots,u_r^*), which by above has a sequence of solutions converging to the origin. We point out at this point that the equations in (4) are linearly independent, i.e. the matrix (A_{ij}^*) of coefficients has rank n-2. This is because A_{ij} is the product of \exp L_j(u_1^*,\dots,u_r^*)\sigma_j(\gamma) and \sigma_j(\mu_i^*), and the matrix of all \sigma_j(\mu_i^*) is invertible, as square of its determinant is discriminant of linearly independent tuple, hence is nonzero.

We consider the local analytic manifold V of (4), i.e. the set of solutions of this system in some small neighbourhood of the origin. By assumption on the sequence of solutions converging to the origin, this manifold consists of more than one point. Hence, by a general theorem, it must contain an analytic curve – there is a system of r (formal) power series \omega_1(t),\dots,\omega_r(t), not all identically zero and all with no constant term, which plugged in for v_k in (4). Equivalently, if we put P_j(t)=L_j(\omega_1(t),\dots,\omega_r(t)), we get

\displaystyle\sum_{j=1}^nA_{ij}^*P_j(t)=0 for i=3,\dots,n. \qquad (5)

where P_j(t) are power series with no constant terms.

Finishing the proof

We have the system (5) of equations involving (exponentials of) P_j(t). However, P_j(t) are also linear combinations of r power series. Therefore, by linear algebra, we can find a system of n-r independent linear equations

\displaystyle\sum_{j=1}^nP_j(t)=0 for i=1,\dots,n-r\qquad (6)

satisfied by these power series. We will now use the assumption we haven’t used yet: that f(x,1) has a complex root. Recall this implies the field k has at least one complex embedding, i.e. t\geq 1 (see statement of theorem 1). Therefore n-r=s+2t-s-t+1=t+1\geq 2. Using (5) and (6) we can therefore use the following lemma:

Lemma 2: Suppose formal power series (over some field of characteristic zero) P_1(t),\dots,P_n(t) with no constant term satisfy a system of n-2 equations of the form

\displaystyle\sum_{j=1}^nA_{ij}^*\exp P_j(t)=0

and also a system of two equations of the form

\displaystyle\sum_{j=1}^nB_{ij}P_j(t)=0.

Then P_j(t)=P_k(t) for some j\neq k.

Before we provide a proof of this lemma, we will show why it helps us complete the proof. Recalling the definition of P_j(t), this implies that any analytic curve contained in the manifold V is also contained in the manifold W defined by the equation

\displaystyle\prod_{1\leq j<k\leq n}(L_j(v_1,\dots,v_r)-L_k(v_1,\dots,v_r)).

It follows (though not immediately) that V\subseteq W. We will obtain a contradiction as soon as we deduce W contains only finitely many points (v_1,\dots,v_r) corresponding to the solutions of (3), since we assumed that V contains infinitely many such points. Equivalently, since product in the definition of W consists of finitely many terms, we need to show only finitely many tuples can satisfy

L_j(v_1,\dots,v_r)=L_k(v_1,\dots,v_r)

for j\neq k.

Let (u_1,\dots,u_r) be a solution of (3) coming from \alpha=x+y\theta,x,y\in\mathbb Q, and u_i=u_i^*+v_i. We have

\sigma_j(\alpha)=\sigma_j(\gamma)\sigma_j(\varepsilon_1)^{u_1}\dots\sigma_j(\varepsilon_r)^{u_r}=\sigma_j(\gamma)\sigma_j(\varepsilon_1)^{u_1*}\dots\sigma_j(\varepsilon_r)^{u_r*}\sigma_j(\varepsilon_1)^{v_1}\dots\sigma_j(\varepsilon_r)^{v_r}
=c_j\exp L_j(v_1,\dots,v_r)

where c_j is a constant independent of $\alpha$. Similarly,

\sigma_k(\alpha)=c_k\exp L_k(v_1,\dots,v_r).

Assuming L_j(v_1,\dots,v_r)=L_k(v_1,\dots,v_r), this implies

\displaystyle\frac{\sigma_j(\alpha)}{c_j}=\frac{\sigma_k(\alpha)}{c_k},\frac{\sigma_j(\alpha)}{\sigma_k(\alpha)}=\frac{c_j}{c_k}.

Taking \alpha'=x'+y'\theta to be a different such solution, this implies

\displaystyle\frac{\sigma_j(\alpha)}{\sigma_k(\alpha)}=\frac{\sigma_j(\alpha')}{\sigma_k(\alpha')},\frac{x+y\theta_j}{x+y\theta_k}=\frac{x'+y'\theta_j}{x'+y'\theta_k}

and hence (xy'-x'y)(\theta_j-\theta_k)=0 and xy'=x'y,\frac{x}{x'}=\frac{y}{y'} (x',y' can’t be both zero, so neither can be). It follows that \alpha' is a rational multiple of \alpha, say \alpha'=d\alpha. But recall that \alpha,\alpha' have the same norm, so d has norm 1, hence it is \pm 1. Therefore \alpha,\alpha' are equal or opposite. Hence there are only two possible values of \alpha, which is certainly a finite amount! As explained above, this gives us a contradiction. \square

Proof of lemma 2

Since n power series \exp P_j satisfy n-2 independent linear equations, we can express all of them in terms of just two, say \exp P_{n-1} and P_n. Put

\exp P_i=a_i\exp P_{n-1}+b_i\exp P_n\qquad (7).

Suppose a_i=0. Then \exp P_i and b_i\exp P_n are equal. They have constant terms equal to, respectively, 1,b_i since P_i have no constant term, so \exp P_i=\exp P_n and we can deduce from this (computing coefficients one-by-one) that P_i=P_n. Hence we may assume a_i\neq 0 (as otherwise we are already done). Putting Q_i=P_i-P_n we then have

\exp Q_i=a_i\exp Q_{n-1}+b_i

and we may also assume Q_i are nonzero. Differentiation gives

Q_i'\exp Q_i=a_iQ_{n-1}'\exp Q_{n-1}.

Previous two equations combined give

\displaystyle Q_i'=Q_{n-1}'\exp Q_{n-1}\frac{1}{c_i+\exp Q_{n-1}}\qquad (8)

with c_i=\frac{b_i}{a_i} for i=1,\dots,n-2. We now use the other pair of assumed equations. By subtracting suitable multiples of P_n from them we find

\displaystyle\sum_{j=1}^{n-1} B_{ij}Q_j=k_iP_n dla i=1,2.

If either k_i is zero, this gives us a nontrivial linear relation between Q_j. Otherwise, subtracting suitable multiples and using independence we again get a nontrivial linear relation. In either case, we get

\displaystyle\sum_{j=1}^{n-1}d_jQ_j=0

for d_i not all zero. Differentiation and (8) give us

Q_{n-1}'\exp Q_{n-1}\left(\displaystyle\sum_{i=1}^{n-2}\frac{d_i}{c_i+\exp Q_{n-1}}+\frac{d_i}{\exp Q_{n-1}}\right)=Q_{n-1}'\exp Q_{n-1}\left(\sum_{i=1}^{n-1}\frac{d_i}{c_i+\exp Q_{n-1}}\right)=0

(setting c_{n-1}=0). As Q_{n-1}',\exp Q_{n-1}\neq 0 we deduce

\displaystyle\sum_{i=1}^{n-1}\frac{d_i}{c_i+\exp Q_{n-1}}=0.

Hence we get that the rational function

$latex \displaystyle\sum_{i=1}^{n-1}\frac{d_i}{c_i+z}$

vanishes when we put z=\exp Q_{n-1}. But unless this function vanishes identically, this would imply \exp Q_{n-1} is algebraic overits field of coefficients. But no nonconstant power series over a field is algebraic, so this can’t be as Q_{n-1}\neq 0. Thus this rational function is identically zero. This means that some two c_i are equal (otherwise this function would have a pole as z\rightarrow -c_i for any c_i with d_i\neq 0. Therefore c_j=c_k for some j\neq k.

Since \frac{b_j}{a_j}=c_j=c_k=\frac{b_k}{a_k}, (7) gives us

\frac{1}{a_k}\exp P_j=\frac{1}{a_k}\exp P_k.

Comparing constant coefficients and then other coefficients, we get P_j=P_k with j\neq k. $\square$

Ultrabrief summary

The proof goes roughly as follows:

  1. Suppose otherwise.
  2. Using (a variation of) Dirichlet’s unit theorem and general results on modules, reduce the problem to showing finiteness of certain exponential equation in many variables.
  3. Generalize the context of the question to \frak P-adic-analytic setting so that we can speak of exponentials of (some) non-rational-integers.
  4. Using some difficult words like “local analytic manifold” reduce (a big part of) the problem to (essentially) showing it cannot contain an analytic curve.
  5. Use a fancy lemma to deduce the manifold is too algebraically constrained to contain infinitely many integral points.
  6. Write an ultrabrief summary.

Clearly two of these steps are (arguably) the most ingenious and crucial ones: passing from a number field to its completion and then reducing the problem to analoguous problem in functional setting (i.e. there is no formal power series blah blah). Both the complete fields (called more precisely local fields) and functional questions have many times in mathematics proven themselves to be much easier to work with than in number fields. The former’s advantage is mainly ability for us to use analytic tools (and difficult words), while in functional setting we have an incredibely useful tool – differentiation.

You can see simplicity of working in functional setting e.g. in the proof of Riemann hypothesis. In the future I will probably make more posts showcasing the local methods like this one, possibly less difficult ones (or perhaps more).

Advertisements

2 thoughts on “Proof sketch of Thue’s theorem

Share a thought

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s