Skip to main content

5 posts tagged with "maths"

Mathematics-related posts.

View All Tags

Continuity of a Vector Norm Mapping

· 5 min read

The Context

In a mathematical proof to show the existence of the singular-value decomposition (SVD)1 in real matrices which I have recently come across, there is a (small) statement that is applied in it without proof, which goes like this:-

Let ARm×nA \in \mathbb{R^{m \times n}} be a real matrix and xRnx \in \mathbb{R}^n a real vector, then the map f:RnRf : \mathbb{R}^n \to \mathbb{R}, defined by f(x)=Ax2f(x)=\Vert Ax \Vert_2, where 2\Vert\cdot\Vert_2 represents the 2-norm (Euclidean norm), is continuous.

This statement is used to show the existence of the largest singular value of a matrix ARm×nA \in \mathbb{R}^{m \times n}, usually denoted as σ1\sigma_1, when defining σ1:=supxSAx2\sigma_1 := \sup_{x \in S} \Vert Ax \Vert_2, where SS represents the unit hypersphere centred at the origin.

This article describes a proof of the statement above, which makes use of the induced matrix norm (also known as the operator norm).

The Good Old Epsilon-Delta

We employ the εδ\varepsilon-\delta definition of a continuous function for this proof, i.e.

Let (X,dX)(X,d_X) and (Y,dY)(Y,d_Y) be metric spaces. A function f:XYf : X \to Y is continuous in a set AXA \subseteq X if for any point aAa \in A and any ε>0\varepsilon > 0, there exists some δ>0\delta>0, which depends on ε\varepsilon and/or aa, such that whenever dX(x,a)<δd_X(x,a)<\delta where xXx \in X, we have dY(f(x),f(a))<εd_Y(f(x),f(a))<\varepsilon.

This definition provides rigour to the notion that a continuous function leaves no 'gaps' to its graph, i.e. the distance between two points on the function graph is arbitrarily small as we 'zoom in' indefinitely so that the distance between two points on the domain becomes arbitrarily small.

In our case here, X=RnX = \mathbb{R}^n, Y=R Y=\mathbb{R} and dX=dY=2d_X = d_Y = \Vert\cdot\Vert_2.

The induced matrix 2-norm, whose definition is stated as follows, also comes into play here.

A2=supxRn\{0}Ax2x2, where ARm×n.\Vert A \Vert_2= \sup_{x \in \mathbb{R}^n \backslash \{0\}}\frac{\Vert Ax \Vert_2}{\Vert x \Vert_2} \text{, where } A \in \mathbb{R}^{m \times n}.

With these in place, let's start the proof:-

Let xRnx \in \mathbb{R}^n, ARm×nA \in \mathbb{R^{m \times n}} and ε>0\varepsilon > 0 be arbitrary, and set δ=εA2\delta=\dfrac{\varepsilon}{\Vert A \Vert_2}.

If A=0A = 0,i.e. AA is the zero matrix, then ff is essentially the function that maps everything to zero, which is clearly continuous2.

Next, we consider nonzero AA. We first note that from the definition of the induced matrix 2-norm, we can see that for any x,yRnx,y \in \mathbb{R}^n where xyx \neq y, we have that

A2=supxyRn\{0}AxAy2xy2A(xy)2xy2\Vert A \Vert_2=\sup_{x-y \in \mathbb{R}^n \backslash \{0\}}\frac{\Vert Ax-Ay \Vert_2}{\Vert x-y \Vert_2} \geqslant \frac{\Vert A(x-y) \Vert_2}{\Vert x-y \Vert_2}     AxAy2A2xy2.(*)\implies \Vert Ax-Ay \Vert_2 \leqslant \Vert A \Vert_2 \Vert x-y \Vert_2. \tag{*}

Therefore, it follows that for yRny \in \mathbb{R}^n such that xy2<δ\Vert x-y \Vert_2<\delta,2

f(x)f(y)2=Ax2Ay22AxAy2A2xy2<A2δ=A2εA2=ε.\begin{align*} \Vert f(x)-f(y) \Vert_2 = \left\Vert \Vert Ax \Vert_2-\Vert Ay \Vert_2 \right\Vert_2 & \leqslant \Vert Ax-Ay \Vert_2 \\ & \leqslant \Vert A \Vert_2 \Vert x-y \Vert_2 \\ & < \Vert A \Vert_2 \cdot \delta \\ & = \Vert A \Vert_2 \cdot \frac{\varepsilon}{\Vert A \Vert_2} \\ & = \varepsilon. \end{align*}

Note that the first \leqslant is due to the triangle inequality, and the second \leqslant comes from (*).

This thus concludes the proof that the map xAx2x \mapsto \Vert Ax \Vert_2 is continuous.

The Nicer Things

In fact, the statement above is a special case and combination of a number of nice results, which are stated as follows:-

The proof of each of these statements is linked as above.

Footnotes

  1. I am linking the explainer article written by Gregory Gundersen here as I find the article straightforward to understand due to its use of plain language and lack of jargons, as well as illustrations that help readers in visualising the concept. If you want to learn more about SVDs, in particular an alternative existence proof for the SVDs of real matrices, do refer to this article as well.

  2. The case when x=yx = y is immediate: we have that Ax2Ay22=0<ε\left\Vert \Vert Ax \Vert_2-\Vert Ay \Vert_2 \right\Vert_2=0 < \varepsilon, by the literal definition of ε\varepsilon. The exact same argument can be applied for the case when A=0A = 0. 2

  3. The Wikipedia article about Lipschitz continuity is linked here.

Generalising the Geometric Series

· 13 min read

Level 0: Geometric Series

A series that looks like the following:

1c+1c2+1c3+=k=11ck,c is a constant with c>1\frac{1}{c}+\frac{1}{c^2}+\frac{1}{c^3}+\cdots=\sum_{k=1}^{\infty}\frac{1}{c^k}, \quad c \text{ is a constant with } |c|>1

is a convergent geometric series, which can be evaluated using the following infinite sum formula:

a1r\frac{a}{1-r}

where aa is the first term of the series and rr is the common ratio. In the case of the series above, a=1ca = \frac{1}{c} and r=1cr = \frac{1}{c}. Plugging them in, we obtain the sum, S0S_0, for the above series:

S0=1c1.S_0=\frac{1}{c-1}.

This series is useful in various applications, from distance calculation in physics, compound interests in finance, population growth in biology to even fascinating topics like fractal geometry. Indeed, this series is good and fun, but like most mathematicians may ask, "Can we generalise it even further, and how?"

This article demonstrates one way this series can be generalised, which is by allowing variables in the numerator of the summed over terms.

Level 1: Arithmetico-geometric Series

Now, what if instead of the numerator being a constant number, it is the index of summation kk?

k=1kck,c is a constant with c>1.\sum_{k=1}^{\infty}\frac{k}{c^k}, \quad c \text{ is a constant with } |c|>1.

It somewhat resembles the previous series but is no longer a geometric one. In fact, it is now an arithmetico-geometric series, with each term in the sum being the product of its respective term of an arithmetic sequence and that of a geometric sequence.

It seems like the infinite sum formula above is not really useful currently. How should we tackle this problem then?

Scaling and Shifting

Since the variable kk in the numerator is of degree 11, let's call the desired sum S1S_1, and list the first few terms of the series.

S1=1c+2c2+3c3+S_1=\frac{1}{c}+\frac{2}{c^2}+\frac{3}{c^3}+\cdots

Next, let's multiply all of the terms by 1c\dfrac{1}{c}. We then obtain 1cS\dfrac{1}{c} S, with the terms becoming the following.

1cS1=1c2+2c3+3c4+\frac{1}{c}S_1=\frac{1}{c^2}+\frac{2}{c^3}+\frac{3}{c^4}+\cdots

Now, observe that with the exception of the k=1k=1 term, there is a difference of 1ck\dfrac{1}{c^k} between each term in the former and latter series. With this in mind, let's see what happens if we subtract the former sum by the latter.

c1cS1=1c+1c2+1c3+=k=11ck=S0\frac{c-1}{c}S_1=\frac{1}{c}+\frac{1}{c^2}+\frac{1}{c^3}+\cdots=\sum_{k=1}^{\infty}\frac{1}{c^k}=S_0

We recover the geometric series at the beginning! The key here is that we have managed to reduce the problem down to what we've seen before, and we can now apply the geometric infinite sum formula!

How can we express this in terms of SS then? Recalling what we have previously done, we can in fact write it as S1cSS-\frac{1}{c} S. Doing some simplification and applying the infinite sum formula gives us the following equation:

c1cS1=1c1\frac{c-1}{c}S_1=\frac{1}{c-1}

Solving this equation finally gives us the desired answer, also known as the Gabriel's staircase:

S1=c(c1)2.S_1=\frac{c}{(c-1)^2}.
note

The infinite sum formula has an application in probability theory: it gives us the expected value of a discrete random variable defined by a geometric distribution.

Explanations of some jargons involved.

  • A random variable is a representation of a collection of possible outcomes of an experiment, each of which being assigned a probability value.

    There are two types of random variables: discrete and continuous. Roughly speaking, a discrete random variable is "defined over whole (natural) numbers like 1, 2 and 3" whereas a continuous random variable is "defined over decimals (real numbers) like 0.2, 1.67 or even π." Formally, a discrete random variable takes on a countable set of possible outcomes while a continuous random variable takes on an uncountable set of possible values.

    A common example of a discrete random variable, labelled here as XX, is the roll of a six-sided fair die 🎲. There are six possible results of a die roll: 1,2,3,4,5,61,2,3,4,5,6, each of which occurring with a probability of 16\frac{1}{6}, thus X={1,2,3,4,5,6}X=\{1,2,3,4,5,6\}, and each of the probabilities can be tabulated as follows:

kkP(X=k)P(X=k)
1116\frac{1}{6}
2216\frac{1}{6}
3316\frac{1}{6}
4416\frac{1}{6}
5516\frac{1}{6}
6616\frac{1}{6}
  • The expected value, also called expectation, of a random variable is the average value of all possible outcomes of an experiment, weighted by their respective probabilities. Its formula is

    E[X]=kXkP(X=k)E[X]=\sum_{k \in X}k \cdot P(X=k)

    which is the sum of the products of all possible values of a random variable XX and their respective probabilities.

    Using the previous die 🎲 example, the expected value obtained from a single die roll is then

    116+216+316+416+516+616=72=3.5.1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + 3 \cdot \frac{1}{6} + 4 \cdot \frac{1}{6} + 5 \cdot \frac{1}{6} + 6 \cdot \frac{1}{6} = \frac{7}{2} = 3.5.
  • A geometric distribution is a probability distribution that gives the number of binomial trials (i.e. you either "succeed" or "fail") needed until achieving the first success.

    If the probability of success on each trial is pp, then the probability of having the nn-th trial as the first success is

    P(X=n)=(1p)n1p.P(X=n)=(1-p)^{n-1}p.

    Using this result, the expected value formula and the fact that the number of trials can theoretically be infinite, the expectation of this distribution is then found to be

    E[X]=pk=1k(1p)k1E[X]=p \sum_{k=1}^{\infty}{k \cdot (1-p)^{k-1}}

    which explains why it is related to the arithmetico-geometric series discussed in this section. In fact, this infinite sum corresponds to the formula with cc substituted by 11p\frac{1}{1-p} (so that it lies within the interval of convergence), less the multiplicative constant pp and the index shift.

    Upon further simplification while accounting for the multiplicative constant, as well as the index shift by adding a factor of 11p\frac{1}{1-p}, we obtain this surprisingly simple result:

    p11p(11p1)211p=1p.p \cdot \cfrac{\frac{1}{1-p}}{\left(\frac{1}{1-p}-1\right)^2} \cdot \frac{1}{1-p}=\frac{1}{p}.

This is helpful in calculating the average number of trials needed before attaining the first success. Here are a few examples.

note

For the c=10c=10 case, this sum also provides a proof that the infinite sum 0.1+0.02+0.003+0.0004+=k=1k10k=10810.1+0.02+0.003+0.0004+\cdots=\sum_{k=1}^{\infty}\frac{k}{10^k}=\frac{10}{81} is in fact rational, which may not be obvious at first glance.

warning

This method only works if the series is indeed convergent. In other words, it won't work for a series like 1 - 1 + 1 - 1 + ... (which has an oscillating sum) or 1 + 1/2 + 1/3 + ... (which, surprisingly, blows up to positive infinity).

As a sidenote, this also means that the use of the same method covered in the infamous Numberphile -1/12 video to "evaluate" the sum of natural numbers is actually wrong (as explained with more detail in this Mathologer video), because the series mentioned in that video are clearly not even convergent to begin with!

A proof to show that k=1kck\sum_{k=1}^{\infty}\frac{k}{c^k} is indeed convergent.

The ratio test will come in handy here, i.e. we need to evaluate the limit L=limkk+1-th termk-th term=limkk+1ck+1kckL = \lim\limits_{k \to \infty}\left\vert{\frac{k+1\text{-th term}}{k\text{-th term}}}\right\vert = \lim\limits_{k \to \infty}\left\vert{\cfrac{\frac{k+1}{c^{k+1}}}{\frac{k}{c^k}}}\right\vert, which gives us L=1c<1L=\frac{1}{c} < 1, so the series is convergent.

In fact, it exhibits a stronger form of convergence: it is absolutely convergent, i.e. even if we replace each term in the series with the absolute value of themselves, the series will still converge!

Level 2: Quadratic-geometric Series

At this point, one may ask, "Why stop here? Why not replace the numerator from kk to k2k^2 and find the general formula of the resulting new series? It will still be convergent (proof below) anyways."

k=1k2ck\sum_{k=1}^{\infty}{\frac{k^2}{c^k}}

Let's try to apply the same technique we used to obtain the general formula for the case when p=1p=1. Here, we call the desired sum S2S_2.

S2=1c+4c2+9c3+S_2=\frac{1}{c}+\frac{4}{c^2}+\frac{9}{c^3}+\cdots

Multiplying throughout by 1c\dfrac{1}{c} yields

1cS2=1c2+4c3+9c4+\frac{1}{c}S_2=\frac{1}{c^2}+\frac{4}{c^3}+\frac{9}{c^4}+\cdots

Finally, subtracting the former by the latter, we obtain

c1cS2=1c+3c2+5c3+=k=12k1ck=2S1S0.\frac{c-1}{c}S_2=\frac{1}{c}+\frac{3}{c^2}+\frac{5}{c^3}+\cdots=\sum_{k=1}^{\infty}\frac{2k-1}{c^k}=2S_1-S_0.

Once again, we've managed to utilise the fact that the pairwise differences of consecutive perfect squares produce the sequence of odd numbers to reduce this series down to simpler results.

Applying the previous results obtained so far, we finally have

c1cS2=k=12k1ck=2k=1kckk=11ck=2c(c1)21c1\frac{c-1}{c}S_2=\sum_{k=1}^{\infty}\frac{2k-1}{c^k}=2 \cdot \sum_{k=1}^{\infty}\frac{k}{c^k}-\sum_{k=1}^{\infty}\frac{1}{c^k}=2 \cdot \frac{c}{(c-1)^2}-\frac{1}{c-1}     S2=c(c+1)(c1)3.\implies S_2=\frac{c(c+1)}{(c-1)^3}.

Level 3: Cubic-geometric Series

Now, what can we get if the numerator is a cubic term?

k=1k3ck\sum_{k=1}^{\infty}{\frac{k^3}{c^k}}

We will call this sum S3S_3 and let's try the same method again.

S3=1c+8c2+27c3+S_3=\frac{1}{c}+\frac{8}{c^2}+\frac{27}{c^3}+\cdots     1cS3=1c2+8c3+27c4+\implies \frac{1}{c}S_3=\frac{1}{c^2}+\frac{8}{c^3}+\frac{27}{c^4}+\cdots     c1cS3=1c+7c2+19c3+37c4+(#)\implies \frac{c-1}{c}S_3=\frac{1}{c}+\frac{7}{c^2}+\frac{19}{c^3}+\frac{37}{c^4}+\cdots \tag{\#}

Notice that in (#), the numerator terms are in the form of k3(k1)3k^3-(k-1)^3, with k=1,2,3,k = 1,2,3,\ldots This simplifies to 3k23k+13k^2-3k+1, which means that (#) can be rewritten as:

c1cS3=3k=1k2ck3k=1kck+k=11ck=3S23S1+S0\frac{c-1}{c}S_3=3\sum_{k=1}^{\infty}{\frac{k^2}{c^k}}-3\sum_{k=1}^{\infty}{\frac{k}{c^k}}+\sum_{k=1}^{\infty}{\frac{1}{c^k}}=3S_2-3S_1+S_0

Finally, we then arrive at:

c1cS3=3c(c+1)(c1)33c(c1)2+1c1\frac{c-1}{c}S_3=3 \cdot \frac{c(c+1)}{(c-1)^3}-3 \cdot \frac{c}{(c-1)^2}+\frac{1}{c-1}     S3=c3+4c2+c(c1)4.\implies S_3=\frac{c^3+4c^2+c}{(c-1)^4}.

You may start to notice a pattern here, and this will be discussed in the next section.

Level p: Polynomial-geometric Series

All of these naturally lead to the question: what happens if we generalise the power of kk on the numerator to be any positive integer, i.e. what is the infinite sum of the following series?

k=1kpck,c is a constant with c>1,  p>1 is an integer.\sum_{k=1}^{\infty}\frac{k^p}{c^k}, \quad c \text{ is a constant with } |c|>1,\; p>1 \text{ is an integer}.

We know from previous sections that such series are in fact convergent, so we could try to find the respective general formulae for them.

A proof to show that this series is convergent.

Using the ratio test again like the previous proof, we find that L=limk(k+1)pck+1kpck=1c<1L=\lim\limits_{k \to \infty}\left\vert{\cfrac{\frac{(k+1)^p}{c^{k+1}}}{\frac{k^p}{c^k}}}\right\vert=\frac{1}{c} < 1 as well, so it is indeed convergent.

Let's call this sum SpS_p. Inductively, one can see that if we obtain c1cSp\dfrac{c-1}{c}S_p, its numerator terms are then in the form of kp(k1)pk^p-(k-1)^p, where k=1,2,3,k=1,2,3,\ldots

Not only is this where the binomial theorem comes into play, you can observe that the highest-power term, kpk^p gets cancelled out while the terms of lower powers remain intact. This means that this method enables us to obtain a recurrence relation of SpS_p in terms of the lower-power sums, i.e. Sp1S_{p-1}, Sp2S_{p-2} and so on, and this can help us to evaluate its general formula.

Since kp(k1)p=kpi=0p(pi)kpi(1)i=i=1p(pi)kpi(1)i+1k^p-(k-1)^p=k^p-\sum_{i=0}^{p}\binom{p}{i}k^{p-i}(-1)^i=\sum_{i=1}^{p}\binom{p}{i}k^{p-i}(-1)^{i+1}, we then arrive at this recurrence relation:

Sp=cc1i=1p(1)i+1(pi)Spi.S_p=\frac{c}{c-1} \cdot \sum_{i=1}^{p}(-1)^{i+1}\binom{p}{i}S_{p-i}.

In fact, this derivation is very similar to that in this short paper by Alan Gorfin.

A Segue into Combinatorics

Interestingly, the closed form of this general formula has connections with combinatorics. According to a short paper by Tom Edgar, SpS_p has the following formula, due to Carlitz:

Sp=cAp(c)(c1)p+1S_p=\frac{c \cdot A_{p}(c)}{(c-1)^{p+1}}

where Ap(c)=k=0p1A(p,k)ckA_p(c)=\sum_{k=0}^{p-1}A(p,k)c^k denotes the pp-th order Eulerian polynomial (following the Wikipedia notation).

The coefficients of the Eulerian polynomial, denoted A(p,k)A(p,k) as above, are called Eulerian numbers. We can construct a triangle similar to the Pascal's triangle using Eulerian numbers, with pp starting from 11 and incrementing one step at a time when traversing down the column, whereas kk ranges from 00 to p1p-1 as we go from the left to the right of each row. This triangle is then usually called the Euler's triangle.

p=1:1p=2:11p=3:141p=4:111111p=5:12666261p=6:157302302571\begin{array}{lc} p=1: & 1 \\ p=2: & 1 \quad 1 \\ p=3: & 1 \quad 4 \quad 1 \\ p=4: & 1 \quad 11 \quad 11 \quad 1 \\ p=5: & 1 \quad 26 \quad 66 \quad 26 \quad 1 \\ p=6: & 1 \quad 57 \quad 302 \quad 302 \quad 57 \quad 1 \\ & \quad \quad \quad \vdots \quad \quad \quad \end{array}

Eulerian numbers have an application in combinatorics. To quote Wikipedia,

the Eulerian number is the number of permutations of the numbers 11 to nn in which exactly kk elements are greater than the previous element (permutations with kk "ascents").

For example, A(3,1)=4A(3,1) = 4, gives us the number of permutations of 1,2,31,2,3 with exactly one element being greater than the previous element, i.e. 132,213,231,312132, 213, 231, 312.

This means that this extension of the geometric series provides us a connection to a combinatorial pattern! Isn't that cool?

Level s: Complex-geometric Series

As a sidenote, what if we generalise even further, so that the numerator variable could take on powers of any complex number?

This brings us beyond the realm (heh) of real numbers and opens ourselves up to the world of Dirichlet series and polylogarithm function, which are common research topics in analytic number theory.

A Dirichlet series is defined as a series of the form

k=1akks\sum_{k=1}^{\infty}{\frac{a_k}{k^s}}

where ss is a complex number and aka_k is a complex sequence.

The polylogarithm function is then a particular collection of Dirichlet series for which aka_k is a power series (the free variable in the terms of the series has successively increasing integer powers) in zz, where z<1|z|<1 and zz is allowed to be complex (hence the absolute value here is in fact the modulus). Here, ss is called the order or weight of the function.

Lis(z)=z1s+z22s+z33s+=k=1zkks\operatorname {Li} _{s}(z)=\frac{z}{1^s}+{z^{2} \over 2^{s}}+{z^{3} \over 3^{s}}+\cdots=\sum _{k=1}^{\infty }{z^{k} \over k^{s}}

One can then notice that SpS_p discussed previously is in fact Lip(1c)\operatorname{Li}_{-p}\left(\frac{1}{c}\right), with c>1|c| > 1.

Unfortunately, even allowing analytic continuation, obtaining a value or an explicit expression for Lis(1c)\operatorname{Li}_{-s}\left(\frac{1}{c}\right) for even just positive rational values of ss, let alone a non-real number, is difficult. For example, even a closed-form expression for Li32(1c)=k=1kkck\operatorname{Li}_{-\frac{3}{2}}\left(\frac{1}{c}\right)=\sum_{k=1}^{\infty}\frac{k\sqrt{k}}{c^k} is currently unknown.

That said, polylogarithm function is still a topic with various directions of research, including its integral representations, dilogarithms (i.e. when s=2s = 2) and polylogarithm ladders.

To read more about the subject, as well as other topics covered here, feel free to search online using the keywords or navigate to the references/further reading section below.

References, further reading

Prime Numbers and Harmonic Series

· 3 min read

The Statement

There is a prime number related result which is adapted from a tutorial question in a course about number theory, which is stated as follows.

For any prime pp, 1+12+13++1p11+\frac{1}{2}+\frac{1}{3}+\cdots+\frac{1}{p-1} is divisible by pp.

The Proof

A proof of this result is fairly straightforward, when observed correctly.

Indeed, when p=2p=2, 1+1p1=1+1=21+\frac{1}{p-1}=1+1=2 is divisible by 22.

Now, we only consider the odd primes, then we should have an even number of terms for the sum above. Now here's the key part: note that the terms of the sum above can be rearranged such that the first and last term are next to each other, as well as the second and second last term, and so on. When we sum each of the pairs together, we then obtain

pp1+p2(p2)++p(p12)(p12+1).\frac{p}{p-1}+\frac{p}{2(p-2)}+\cdots+\cfrac{p}{\left(\frac{p-1}{2}\right)\left(\frac{p-1}{2}+1\right)}.

We can then see that we are able to factorise pp out of the sum, so it is indeed divisible by pp.

In fact, using this result as well as the fact that there are an infinite number of prime numbers, which makes the set of prime numbers to be unbounded above due to the definition of prime numbers itself that necessitates them to be integers, it follows that the series n=11n\sum_{n=1}^{\infty}\frac{1}{n} diverges to positive infinity. This provides an alternative proof to the ingenious method that produces an infinite sum of one halves.

How about the other way around?

This gives rise to another interesting question: can we prove that there are infinitely many primes, assuming the divergence of the harmonic series?

In fact, this is true. If we break down the denominator of each term of the harmonic series into its respective prime factorisation, then apply the distributive law and the formula for the geometric infinite sum, we then obtain the following result due to Euler, where P\mathbb{P} denotes the set of prime numbers:

i=11i=pP(1+1p+1p2+)=pP111/p.\sum _{i=1}^{\infty }{\frac {1}{i}}=\prod _{p\in \mathbb {P} }\left(1+{\frac {1}{p}}+{\frac {1}{p^{2}}}+\cdots \right)=\prod _{p\in \mathbb {P} }{\frac {1}{1-1/p}}.

What Euler did next was taking logarithms on both sides and applying the Taylor series of logarithms gives the following:

lnpP111/p=pPln111/p=pP(1p+12p2+13p3+)=pP1p+convergent terms.\displaystyle \ln \prod _{p\in \mathbb {P} }{\frac {1}{1-1/p}}=\sum _{p\in \mathbb {P} }\ln {\frac {1}{1-1/p}}=\sum _{p\in \mathbb {P} }\left({\frac {1}{p}}+{\frac {1}{2p^{2}}}+{\frac {1}{3p^{3}}}+\cdots \right)=\sum _{p\in \mathbb {P} }{\frac {1}{p}}+\text{convergent terms}.

Since this is equal to the divergent harmonic series and a finite sum containing finite terms must converge, it follows that there must be infinitely many primes. This result also implies that the sum of reciprocals of prime numbers diverges, which was more rigorously verfied by Franz Mertens.

Higher-order Functions and Set Theory

· 3 min read

Higher-order function (HOF) is a type of nested function that offers a higher level of abstraction than ordinary functions, where instead of taking primitives or variables as arguments (parameters) and outputting objects, functions are taken in and output by HOFs.

The subtleties between a function composition and HOF can be shown through the syntax of Python, which can result in different behaviours depending on how we close the parentheses.

A classic example is demonstrated as follows.

def thrice(f):
return lambda x: f(f(f(x)))

add1 = lambda x: x + 1

x = thrice(thrice(add1))(0) # 9
y = thrice(thrice)(add1)(0) # 27

In the example above, x will output 9 because the defining line of code is interpreted ‘right-to-left’ (like function composition), first composing the add1 function thrice to produce a function equivalent to lambda x: x + 3, then composing the first output thrice to produce a function equivalent to lambda x: x + 9, thus 0 + 9 = 9.

As for y, it will output 27 because it is defined ‘left-to-right’, first composing the thrice function itself three times, resulting in an HOF that effectively composes an input function 33=273^3=27 times, which then takes in the function add1 to produce a function equivalent to lambda x: x + 27, and finally we get 0 + 27 = 27.

Higher-order functions are closely related to mathematics, in the sense that they enable the implementation of operators in mathematics, as well as a family of functions, using code.

For example1, in set theory, we can define a family of functions FF, which itself is also considered a function from NNNN\mathbb{N}^{\mathbb{N}} \to \mathbb{N}^{\mathbb{N}}. This operator FF takes in a function f:NNf : \mathbb{N} \to \mathbb{N} and outputs another function F(f):NNF(f) : \mathbb{N} \to \mathbb{N} that takes in whatever ff takes in and outputs whatever ff outputs plus one.

Phrased in a set-theoretic language as ‘pure’ as possible, FF is defined as

F={(f,{(n,f(n)+1):nN}):fNN}.F=\{(f,\{(n,f(n)+1) : n \in \mathbb{N}\}) : f \in \mathbb{N}^{\mathbb{N}}\}.

An HOF implementation of FF in Python can then look like the following:

# The following type hints are to indicate that the functions are in N^N, and are unnecessary for practical purposes.
from typing import Annotated, Callable
from annotated_types import Ge # requires additional pip install: https://pypi.org/project/annotated-types/

def F(f: Callable[Annotated[int, Ge(0)], Annotated[int, Ge(0)]]) -> Callable[Annotated[int, Ge(0)], Annotated[int, Ge(0)]]:
def output(n: Annotated[int, Ge(0)]) -> Annotated[int, Ge(0)]:
return f(n) + 1
return output
f = lambda x: x + 1
F(f)(0) # 2

Footnotes

  1. This example is taken from the lecture notes of the Set Theory course conducted during the Fall 2022 semester by Dilip Raghavan in National University of Singapore.