Blog | Tze Heng's Notes and Blog

Coprimality in Lowest Common Multiple

October 28, 2024 · 3 min read

Suppose that we select arbitrarily a number of, say, $n$ integers, denoted by $a_1, \ldots, a_n$ . If we obtain their lowest common multiple (LCM), and label it as $M := \mathrm{lcm}(a_1,\ldots,a_n)$ , then take for each of the initial $n$ integers the corresponding number $k_i$ ( $i = 1, \ldots, n$ ) such that multiplying by that number gives us the LCM, i.e. $k_i a_i = M$ for each $i$ , an intriguing pattern starts to emerge: all the numbers $k_i$ are coprime, i.e. $\gcd(k_1,\ldots,k_n)=1$ .

The following is an example code snippet that illustrates this phenomenon. This snippet generates a list of $n$ random integers between $0$ and $K$ , obtains their LCM, extract the factor $k_i$ for each of $a_i$ , where $i = 1, \ldots, n$ , from the LCM and calculate the greatest common divisor (GCD) of the $k_i$ 's.

Repeatedly running this code generates a different group of $n$ integers, but notice that the GCD obtained in the end is always 1.

The definitions of lcm_euclidean_general and gcd_euclidean_general in the snippet can be found in this note.

from random import randint
n, K = 4, 2000 # feel free to change the parameters here
test = [randint(0, K) for i in range(n)]
lcm = lcm_euclidean_general(*test)
smallest_k = tuple(map(lambda x: lcm // x, test))
print(gcd_euclidean_general(*smallest_k)) # 1

Why though?

To those who are familiar with the topic of coprime integers, this phenomenon should not come as a surprise. Nevertheless, a brief explanation of this fact will be presented here.

If we assume that the numbers $k_i$ , where $i = 1,\ldots,n$ are not coprime, or equivalently $\gcd(k_1,\ldots,k_n) > 1$ , and we call the GCD to be $G$ , then we can write $k_i=Gb_i$ where $b_i$ is an integer and $G>1$ . It follows that $M=Gb_ia_i$ for each $i$ .

However, this also means that we can divide each of them throughout by $G$ and obtain $\frac{M}{G}=b_ia_i$ for each $i$ . Note that $\frac{M}{G} < M$ because $M > 0$ and $G > 1$ , and is a constant integer because $G$ is also a common factor of $M$ . Note as well that since each $b_i$ is an integer, this means that $\frac{M}{G}$ is a common multiple of the $a_i$ 's that is smaller than the lowest common multiple $M$ ? This does not make sense, right? This is where we reach a (logical) contradiction for which the only way to resolve this is to conclude that the GCD $G$ cannot be greater than 1 all along, thus the $k_i$ 's from the start must be coprime!

We have thus demonstrated the technique of proof by contradiction to confirm the validity of this fact.

That's all for this post. Hopefully this fact sounds cool enough to you and prompts you to learn more about factors and multiples of integers, or even elementary number theory in general.

How often do invertible matrices appear?

October 3, 2024 · 14 min read

Introduction

Invertible matrices are fundamental in linear algebra. They represent reversible linear transformations which are pleasant to work with, because:-

Invertible matrices give us a unique solution to a system of linear equations.

Invertible matrices represent linear transformations that are of full rank, i.e. those that do not 'squish' the space in a way that reduces its dimension, and injective (one-to-one).

The columns of an ( $n \times n$ real) invertible matrix, are linearly independent to each other, and thus spans the full $\mathbb{R}^n$ space.

Invertible matrices have nonzero determinants.

– some of the logically equivalent statements in the Invertible Matrix Theorem (IMT)

As nice mathematical structures usually do not happen frequently, this makes one wonder: in the universe of square matrices of an arbitrarily fixed¹ size, say, $n \times n$ , what is the probability of finding an invertible matrix?

Surprisingly (at least to me), it turns out to be a very technical problem that has very different answers depending on various factors with respect to the entries of the matrix, including their probability distribution, allowable types of numbers (discrete or continuous, etc.) and even the underlying measure. In other words, there isn't a blanket answer and certain conditions need to be specified in order to reach an answer.

It is way more often than you may expect

Let's first consider the scenario that is considered 'default' or 'standard' by most people: the probability of finding an invertible matrix among the space of an $n \times n$ matrix of real numbers².

In this scenario, the answer turns out to be a surprising one (again, at least to me): it is almost surely to find an invertible matrix, i.e. the probability is 1 but this does not mean all real matrices are invertible (there are indeed singular, or non-invertible, matrices). This is a phenomenon that can happen when we are considering infinite sample spaces; even though there are an infinite number of singular matrices, invertible matrices still outnumber them so much that they are probabilistically negligible (literally zero probability). To quote the Wikipedia article about invertible matrices,

Singular matrices are rare in the sense that if a square matrix's entries are randomly selected from any bounded region on the number line or complex plane, the probability that the matrix is singular is 0, that is, it will "almost never" be singular.

There are two main approaches to prove this result, both of which make use of linear algebra and measure theory: one involves determinants, whereas the other involves subspaces and its dimension.

Determinant Argument

The argument involving determinants, as found in this post and this post from Mathematics Stack Exchange, applies the fact that invertible matrices have nonzero determinant, or in other words, singular matrices have zero determinant. The determinant of a real $n \times n$ matrix (with $n^2$ entries) is also a polynomial function, as shown below ( $\sigma$ represents a permutation function and $\operatorname{sgn}(\sigma)$ is either 1 or -1):

M := \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} \implies \det(M)=\sum _{\sigma \in S_{n}}\operatorname {sgn}(\sigma )a_{1,\sigma (1)}\cdots a_{n,\sigma (n)}

Therefore, since $\det(M) = 0$ if $M$ is singular, the collection of singular matrices is equivalent to the zero set of the determinant as a polynomial function from $\mathbb{R}^{n^2}$ to $\mathbb{R}$ .

There is a mathematical result which states that the zero set of a polynomial function has measure zero (a proof of this can be found here or here). It's fine if you don't understand what 'measure zero' means: what matters here is that an event represented by a (sub)set of measure zero has zero probability.

Thus, there is zero probability that a random real square matrix is singular, so the probability to obtain an invertible matrix is 1 (in the almost surely sense).

Subspace Dimension Argument

Another argument, as found in this Reddit comment, this blog post and this Mathematics Stack Exchange post, involves the fact that the collection of singular matrices form a subspace that is one dimension less than the space of all $n \times n$ matrices.

Why is that so? A less rigorous reasoning of this goes as follows: note that for an $n \times n$ matrix to be singular, it must have at least one column that is a linear combination of all the other $n-1$ columns.

Now, for a single nonzero vector $\vec{v}$ in $\mathbb{R}^n$ , the collection of $\mathbb{R}^n$ vectors that are linearly dependent to that particular vector are its scalar multiples, i.e. $\{k\vec{v} : k \in \mathbb{R}\}$ , and this is a one-dimensional subspace of $\mathbb{R}^n$ . For two vectors $\vec{v_1}$ and $\vec{v_2}$ , the collection of vectors that are linearly dependent to them is $\{ a\vec{v_1}+b\vec{v_2} : a,b \in \mathbb{R} \}$ , which forms a 2-dimensional subspace. Following this pattern, if we have $n-1$ vectors in $\mathbb{R}^n$ , the collection of vectors that are linearly dependent to them is exactly their linear span, which is an $n-1$ -dimensional subspace.

This implies that the column space of a singular matrix is an at most $n-1$ -dimensional subspace of $\mathbb{R}^n$ , so the space of singular matrices form an $n - 1$ -dimensional hypercube, compared to an $n$ -dimensional hypercube formed by the space of all square matrices. Thus, the probability of finding a singular matrix is:

\dfrac{n \text{-dimensional volume of a cube in } \mathbb{R}^{n-1}}{n \text{-dimensional volume of a cube in } \mathbb{R}^{n}} = 0.

An alternative but very similar argument, as described here, states that due to the fact that singular matrices have zero determinant which gives us one constraint equation, the set of singular $n \times n$ matrices form an $n^2-1$ -dimensional vector space, which similarly has measure zero, with respect to the $n^2$ -dimensional space of all $n \times n$ matrices over the real numbers.

Either way, it follows that the probability of finding an invertible real matrix is, surprisingly, 1. In other words, for whatever real matrices that you come up with randomly, it is extremely likely to have an inverse!

Matrices with discrete entries

Things start to get interesting when we restrict the entries of a square matrix to be discrete values, in particular binary (i.e. 0 or 1), finite field and integer values.

Binary matrices

For binary matrices, the probability trend depends on the underlying ring (or in laymen's terms: number system) of the entries of the matrix. For binary matrices over the real numbers, it is a long-time research problem stemming from the 1960s that involves contributions by multiple mathematicians, and is settled by Tikhomirov in his 2018 paper: the probability that an $n \times n$ binary matrix $M_n$ is invertible is given by

P(M_n \text{ is invertible})=1-\left(\frac{1}{2}+o_n(1)\right)^n

where $o_n(1)$ denotes a quantity that tends to 0 as $n$ tends to infinity.

We can see that as $n$ gets larger, the probability gets closer to 1.

As for binary matrices over the finite field $\mathbb{F}_2$ , it corresponds to the case when $q = 2$ in the next section of this article, where matrices over finite fields are discussed in more detail. One thing to note is that the change in probability as $n$ gets larger is different: it decreases and tends to a limiting number strictly between 0 and 1.

Matrices over finite fields

There is a pretty neat argument, which can be found in several sources including Wikipedia, that gives us the probability of finding an invertible matrix in the space of square matrices over a finite field $\mathbb{F}_q$ ³.

In the lingo of abstract algebra, the number of invertible $n \times n$ matrices over the finite field $\mathbb{F}_q$ is the order of the general linear group of degree $n$ over $\mathbb{F}_q$ , $\mathrm{GL}_n(\mathbb{F}_q)$ . There is a pretty neat argument that gives us the probability of finding an invertible matrix in the space of square matrices over a finite field $\mathbb{F}_q$ , which is described in several sources.

For a square matrices of size $n$ whose entries are taken from a finite field $\mathbb{F}_q$ of order $q$ (i.e. contains $q$ elements), which we denote here by $A$ , there are $q$ possibilities in each entry. Since there are $n^2$ available slots, we have a total of $q^{n^2}$ such matrices. If we only consider one of its columns, then as there are $n$ entries in a column, there are $q^n$ possibilities.

Note that $A$ is invertible if and only if for $j = 1,\ldots,n$ , the $j$ -th column cannot be expressed as a linear combination of the preceding $j-1$ columns. For the first column $v_1$ , there are $q^n-1$ possibilities, as only the zero vector is excluded. For the second column $v_2$ , since we need to exclude the $q$ multiples of the first column, there are $q^n-q$ possibilities. For the third column $v_3$ , we need to exclude the linear combinations of the first and second column, i.e. vectors in the form of $av_1+bv_2$ where $a,b=1,\ldots,q$ , which contains $q \cdot q = q^2$ vectors, so in the end we have $q^n-q^2$ choices. Continuing this pattern, the last column will have $q^n-q^{n-1}$ choices. Thus, the number of invertible matrices over $\mathbb{F}_q$ , or the order of $\mathrm{GL}_n(\mathbb{F}_q)$ is given by

\prod _{k=0}^{n-1}(q^{n}-q^{k})=(q^{n}-1)(q^{n}-q)(q^{n}-q^{2})\ \cdots \ (q^{n}-q^{n-1}).

It follows that the invertibility probability is

\frac{\prod _{k=0}^{n-1}(q^{n}-q^{k})}{q^{n^2}}=\prod_{k=0}^{n-1}(1-q^{k-n}) < 1-\frac{1}{q}.

We then see that as the order $q$ increases, the probability tends to 1, whereas as the matrix size $n$ increases, the probability decreases and tends to a limiting number strictly between 0 and 1; for the case of binary matrices over $\mathbb{Z}_2$ , the limit is approximately 0.288788.

Random integer matrices

For the case of $n \times n$ matrices with entries over the integers, with an appropriate probability distribution and measure chosen², since there are infinitely many such random integer matrices, the result is analogous to the real matrices case as discussed above, i.e. the probability of finding an invertible matrix is 1, in the sense of being almost surely.

Sidenote: diagonalisable matrices

There is another matrix-related property that is nice to work with and highly sought after for its usefulness in improving computational efficiency: being diagonalisable. Sadly, not all matrices are diagonalisable, even over complex numbers, but what is the probability of finding one, or more accurately speaking: the "density" of diagonalisable matrices?

There is a paper about this topic that quotes or features some important results, but they are mainly about integer matrices instead of real matrices. They are only presented briefly as a list here, but one can consult the paper for more details.

The probability of an $n \times n$ matrix, where $n$ is arbitrary fixed, with real entries being diagonalisable over the complex numbers (i.e. the eigenvalues are allowed to be non-real), is 1.
If the entries are only allowed to be integers, the probability of such $n \times n$ matrix being diagonalisable over the complex numbers is also 1.
It turns out that for $n \times n$ $n \times n$ matrices over the real or rational numbers, the question becomes remarkably difficult; even for $2 \times 2$ $2 \times 2$ matrices the result is not straightforward: techinques in multivariable calculus and mathematical analysis are employed in the said paper to prove the results. Nonetheless, here are some key results concerning $2 \times 2$ $2 \times 2$ matrices provided by the paper:
- The probability of finding a $2 \times 2$ matrix with only integer entries that is diagonalisable over the real numbers (i.e. the eigenvalues can only be real numbers) among all $2 \times 2$ matrices with only integer entries is $\dfrac{49}{72}$ . Not a particularly elegant number, but this seems to show that diagonalisable matrices might be denser than we expect.
- The probability of obtaining a $2 \times 2$ matrix with only integer entries that is diagonalisable over the rational numbers (i.e. the eigenvalues can only be rational) among all $2 \times 2$ matrices with only integer entries is $0$ , implying that diagonalisable integer matrices with only rational eigenvalues are so sparse that it is virtually negligible.

References, further reading

A Reddit post thread that discusses this topic
Hetzel, A. J., Liew, J. S., & Morrison, K. E. (2007). The Probability that a Matrix of Integers Is Diagonalizable. The American Mathematical Monthly, 114(6), 491–499. https://doi.org/10.1080/00029890.2007.11920438
Caron, R. & Traynor, T. (2005). The Zero Set of a Polynomial.
Wicklin, R. (2011). What is the chance that a random matrix is singular?
Sources that discuss binary matrices and matrices over a finite field:
Tao, T. & Vu, V. (2007). On the singularity probability of random Bernoulli matrices. Journal of the American Mathematical Society, 20(3), 603-628.
Bourgain, V., Vu, V. H. & Wood, P. M. (2009). On the singularity probability of discrete random matrices. Journal of Functional Analysis, 258(2), 559-603.
Some Mathematics Stack Exchange posts referenced:

Funnily enough, this almost sounds like an oxymoron, yet it's just mutually understood in the mathematics literature (and beyond) to mean 'anything but not everything'. If you're confused, see here for explanation. ↩
Technically, we are considering the space of $n \times n$ matrix whose entries are allowed to be any real number, so that they act as absolutely continuous random variables. We also require the entries to be independent and identically distributed (i.i.d.) so that each entry does not influence each other and exhibits the same trend. To ensure that this question of looking for a probability is well-posed, we need to specify a probability measure. In this case, the Lebesgue measure (or Borel measure) will suffice. ↩ ↩²
For those who are puzzled by the term "finite field", one can think about counting the numbers on a clock face that only has 11 numbers instead of the usual 12. The number pointed by the hands cycles back from 11 to 1 as time passes. This gives us an example of a cyclic group of order 11: the integers modulo 11, i.e. $\mathbb{Z}_{11}$ . Since 11 is prime, this gives us an example of a finite field as well: the field of integers modulo $p$ where $p$ is prime. Indeed, this is just a relatively simple example of finite fields. For those who want to know more about the subject, which is central in the study of Galois theory, I recommend this excellent expository paper by Keith Conrad. ↩

Continuity of a Vector Norm Mapping

September 16, 2024 · 5 min read

The Context

In a mathematical proof to show the existence of the singular-value decomposition (SVD)¹ in real matrices which I have recently come across, there is a (small) statement that is applied in it without proof, which goes like this:-

Let $A \in \mathbb{R^{m \times n}}$ be a real matrix and $x \in \mathbb{R}^n$ a real vector, then the map $f : \mathbb{R}^n \to \mathbb{R}$ , defined by $f(x)=\Vert Ax \Vert_2$ , where $\Vert\cdot\Vert_2$ represents the 2-norm (Euclidean norm), is continuous.

This statement is used to show the existence of the largest singular value of a matrix $A \in \mathbb{R}^{m \times n}$ , usually denoted as $\sigma_1$ , when defining $\sigma_1 := \sup_{x \in S} \Vert Ax \Vert_2$ , where $S$ represents the unit hypersphere centred at the origin.

This article describes a proof of the statement above, which makes use of the induced matrix norm (also known as the operator norm).

The Good Old Epsilon-Delta

We employ the $\varepsilon-\delta$ definition of a continuous function for this proof, i.e.

Let $(X,d_X)$ and $(Y,d_Y)$ be metric spaces. A function $f : X \to Y$ is continuous in a set $A \subseteq X$ if for any point $a \in A$ and any $\varepsilon > 0$ , there exists some $\delta>0$ , which depends on $\varepsilon$ and/or $a$ , such that whenever $d_X(x,a)<\delta$ where $x \in X$ , we have $d_Y(f(x),f(a))<\varepsilon$ .

This definition provides rigour to the notion that a continuous function leaves no 'gaps' to its graph, i.e. the distance between two points on the function graph is arbitrarily small as we 'zoom in' indefinitely so that the distance between two points on the domain becomes arbitrarily small.

In our case here, $X = \mathbb{R}^n$ , $Y=\mathbb{R}$ and $d_X = d_Y = \Vert\cdot\Vert_2$ .

The induced matrix 2-norm, whose definition is stated as follows, also comes into play here.

\Vert A \Vert_2= \sup_{x \in \mathbb{R}^n \backslash \{0\}}\frac{\Vert Ax \Vert_2}{\Vert x \Vert_2} \text{, where } A \in \mathbb{R}^{m \times n}.

With these in place, let's start the proof:-

Let $x \in \mathbb{R}^n$ , $A \in \mathbb{R^{m \times n}}$ and $\varepsilon > 0$ be arbitrary, and set $\delta=\dfrac{\varepsilon}{\Vert A \Vert_2}$ .

If $A = 0$ ，i.e. $A$ is the zero matrix, then $f$ is essentially the function that maps everything to zero, which is clearly continuous².

Next, we consider nonzero $A$ . We first note that from the definition of the induced matrix 2-norm, we can see that for any $x,y \in \mathbb{R}^n$ where $x \neq y$ , we have that

\Vert A \Vert_2=\sup_{x-y \in \mathbb{R}^n \backslash \{0\}}\frac{\Vert Ax-Ay \Vert_2}{\Vert x-y \Vert_2} \geqslant \frac{\Vert A(x-y) \Vert_2}{\Vert x-y \Vert_2}

\implies \Vert Ax-Ay \Vert_2 \leqslant \Vert A \Vert_2 \Vert x-y \Vert_2. \tag{*}

Therefore, it follows that for $y \in \mathbb{R}^n$ such that $\Vert x-y \Vert_2<\delta$ ,²

\begin{align*} \Vert f(x)-f(y) \Vert_2 = \left\Vert \Vert Ax \Vert_2-\Vert Ay \Vert_2 \right\Vert_2 & \leqslant \Vert Ax-Ay \Vert_2 \\ & \leqslant \Vert A \Vert_2 \Vert x-y \Vert_2 \\ & < \Vert A \Vert_2 \cdot \delta \\ & = \Vert A \Vert_2 \cdot \frac{\varepsilon}{\Vert A \Vert_2} \\ & = \varepsilon. \end{align*}

Note that the first $\leqslant$ is due to the triangle inequality, and the second $\leqslant$ comes from (*).

This thus concludes the proof that the map $x \mapsto \Vert Ax \Vert_2$ is continuous.

The Nicer Things

In fact, the statement above is a special case and combination of a number of nice results, which are stated as follows:-

The proof of each of these statements is linked as above.

I am linking the explainer article written by Gregory Gundersen here as I find the article straightforward to understand due to its use of plain language and lack of jargons, as well as illustrations that help readers in visualising the concept. If you want to learn more about SVDs, in particular an alternative existence proof for the SVDs of real matrices, do refer to this article as well. ↩
The case when $x = y$ is immediate: we have that $\left\Vert \Vert Ax \Vert_2-\Vert Ay \Vert_2 \right\Vert_2=0 < \varepsilon$ , by the literal definition of $\varepsilon$ . The exact same argument can be applied for the case when $A = 0$ . ↩ ↩²
The Wikipedia article about Lipschitz continuity is linked here. ↩

Generalising the Geometric Series

July 26, 2024 · 13 min read

This article was submitted to the Summer of Math Exposition (SoME) 2024 and ranked 17 among all 44 valid non-video entries in the contest.

Level 0: Geometric Series

A series that looks like the following:

\frac{1}{c}+\frac{1}{c^2}+\frac{1}{c^3}+\cdots=\sum_{k=1}^{\infty}\frac{1}{c^k}, \quad c \text{ is a constant with } |c|>1

is a convergent geometric series, which can be evaluated using the following infinite sum formula:

\frac{a}{1-r}

where $a$ is the first term of the series and $r$ is the common ratio. In the case of the series above, $a = \frac{1}{c}$ and $r = \frac{1}{c}$ . Plugging them in, we obtain the sum, $S_0$ , for the above series:

S_0=\frac{1}{c-1}.

This series is useful in various applications, from distance calculation in physics, compound interests in finance, population growth in biology to even fascinating topics like fractal geometry. Indeed, this series is good and fun, but like most mathematicians may ask, "Can we generalise it even further, and how?"

This article demonstrates one way this series can be generalised, which is by allowing variables in the numerator of the summed over terms.

Level 1: Arithmetico-geometric Series

Now, what if instead of the numerator being a constant number, it is the index of summation $k$ ?

\sum_{k=1}^{\infty}\frac{k}{c^k}, \quad c \text{ is a constant with } |c|>1.

It somewhat resembles the previous series but is no longer a geometric one. In fact, it is now an arithmetico-geometric series, with each term in the sum being the product of its respective term of an arithmetic sequence and that of a geometric sequence.

It seems like the infinite sum formula above is not really useful currently. How should we tackle this problem then?

Scaling and Shifting

Since the variable $k$ in the numerator is of degree $1$ , let's call the desired sum $S_1$ , and list the first few terms of the series.

S_1=\frac{1}{c}+\frac{2}{c^2}+\frac{3}{c^3}+\cdots

Next, let's multiply all of the terms by $\dfrac{1}{c}$ . We then obtain $\dfrac{1}{c} S$ , with the terms becoming the following.

\frac{1}{c}S_1=\frac{1}{c^2}+\frac{2}{c^3}+\frac{3}{c^4}+\cdots

Now, observe that with the exception of the $k=1$ term, there is a difference of $\dfrac{1}{c^k}$ between each term in the former and latter series. With this in mind, let's see what happens if we subtract the former sum by the latter.

\frac{c-1}{c}S_1=\frac{1}{c}+\frac{1}{c^2}+\frac{1}{c^3}+\cdots=\sum_{k=1}^{\infty}\frac{1}{c^k}=S_0

We recover the geometric series at the beginning! The key here is that we have managed to reduce the problem down to what we've seen before, and we can now apply the geometric infinite sum formula!

How can we express this in terms of $S$ then? Recalling what we have previously done, we can in fact write it as $S-\frac{1}{c} S$ . Doing some simplification and applying the infinite sum formula gives us the following equation:

\frac{c-1}{c}S_1=\frac{1}{c-1}

Solving this equation finally gives us the desired answer, also known as the Gabriel's staircase:

S_1=\frac{c}{(c-1)^2}.

note

The infinite sum formula has an application in probability theory: it gives us the expected value of a discrete random variable defined by a geometric distribution.

Explanations of some jargons involved.

A random variable is a representation of a collection of possible outcomes of an experiment, each of which being assigned a probability value.

There are two types of random variables: discrete and continuous. Roughly speaking, a discrete random variable is "defined over whole (natural) numbers like 1, 2 and 3" whereas a continuous random variable is "defined over decimals (real numbers) like 0.2, 1.67 or even π." Formally, a discrete random variable takes on a countable set of possible outcomes while a continuous random variable takes on an uncountable set of possible values.

A common example of a discrete random variable, labelled here as $X$ , is the roll of a six-sided fair die 🎲. There are six possible results of a die roll: $1,2,3,4,5,6$ , each of which occurring with a probability of $\frac{1}{6}$ , thus $X=\{1,2,3,4,5,6\}$ , and each of the probabilities can be tabulated as follows:

$k$	$P(X=k)$
$1$	$\frac{1}{6}$
$2$	$\frac{1}{6}$
$3$	$\frac{1}{6}$
$4$	$\frac{1}{6}$
$5$	$\frac{1}{6}$
$6$	$\frac{1}{6}$

The expected value, also called expectation, of a random variable is the average value of all possible outcomes of an experiment, weighted by their respective probabilities. Its formula is
$E[X]=\sum_{k \in X}k \cdot P(X=k)$
which is the sum of the products of all possible values of a random variable $X$ and their respective probabilities.

Using the previous die 🎲 example, the expected value obtained from a single die roll is then
$1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + 3 \cdot \frac{1}{6} + 4 \cdot \frac{1}{6} + 5 \cdot \frac{1}{6} + 6 \cdot \frac{1}{6} = \frac{7}{2} = 3.5.$
A geometric distribution is a probability distribution that gives the number of binomial trials (i.e. you either "succeed" or "fail") needed until achieving the first success.

If the probability of success on each trial is $p$ , then the probability of having the $n$ -th trial as the first success is
$P(X=n)=(1-p)^{n-1}p.$
Using this result, the expected value formula and the fact that the number of trials can theoretically be infinite, the expectation of this distribution is then found to be
$E[X]=p \sum_{k=1}^{\infty}{k \cdot (1-p)^{k-1}}$
which explains why it is related to the arithmetico-geometric series discussed in this section. In fact, this infinite sum corresponds to the formula with $c$ substituted by $\frac{1}{1-p}$ (so that it lies within the interval of convergence), less the multiplicative constant $p$ and the index shift.

Upon further simplification while accounting for the multiplicative constant, as well as the index shift by adding a factor of $\frac{1}{1-p}$ , we obtain this surprisingly simple result:
$p \cdot \cfrac{\frac{1}{1-p}}{\left(\frac{1}{1-p}-1\right)^2} \cdot \frac{1}{1-p}=\frac{1}{p}.$

This is helpful in calculating the average number of trials needed before attaining the first success. Here are a few examples.

🪙 For the $c = 2$ case, the infinite sum, which is also $2$ , gives us the expected value of the number of tosses of a fair coin until obtaining the first "tail".
🕹️ This formula is also involved in calculating the expected number of encounters required to catch all Pokémons in a simplified, idealisitic Pokémon game, as explained in this Numberphile video featuring Tom Crawford.

note

For the $c=10$ case, this sum also provides a proof that the infinite sum $0.1+0.02+0.003+0.0004+\cdots=\sum_{k=1}^{\infty}\frac{k}{10^k}=\frac{10}{81}$ is in fact rational, which may not be obvious at first glance.

warning

This method only works if the series is indeed convergent. In other words, it won't work for a series like 1 - 1 + 1 - 1 + ... (which has an oscillating sum) or 1 + 1/2 + 1/3 + ... (which, surprisingly, blows up to positive infinity).

As a sidenote, this also means that the use of the same method covered in the infamous Numberphile -1/12 video to "evaluate" the sum of natural numbers is actually wrong (as explained with more detail in this Mathologer video), because the series mentioned in that video are clearly not even convergent to begin with!

A proof to show that $\sum_{k=1}^{\infty}\frac{k}{c^k}$ is indeed convergent.

The ratio test will come in handy here, i.e. we need to evaluate the limit $L = \lim\limits_{k \to \infty}\left\vert{\frac{k+1\text{-th term}}{k\text{-th term}}}\right\vert = \lim\limits_{k \to \infty}\left\vert{\cfrac{\frac{k+1}{c^{k+1}}}{\frac{k}{c^k}}}\right\vert$ , which gives us $L=\frac{1}{c} < 1$ , so the series is convergent.

In fact, it exhibits a stronger form of convergence: it is absolutely convergent, i.e. even if we replace each term in the series with the absolute value of themselves, the series will still converge!

Level 2: Quadratic-geometric Series

At this point, one may ask, "Why stop here? Why not replace the numerator from $k$ to $k^2$ and find the general formula of the resulting new series? It will still be convergent (proof below) anyways."

\sum_{k=1}^{\infty}{\frac{k^2}{c^k}}

Let's try to apply the same technique we used to obtain the general formula for the case when $p=1$ . Here, we call the desired sum $S_2$ .

S_2=\frac{1}{c}+\frac{4}{c^2}+\frac{9}{c^3}+\cdots

Multiplying throughout by $\dfrac{1}{c}$ yields

\frac{1}{c}S_2=\frac{1}{c^2}+\frac{4}{c^3}+\frac{9}{c^4}+\cdots

Finally, subtracting the former by the latter, we obtain

\frac{c-1}{c}S_2=\frac{1}{c}+\frac{3}{c^2}+\frac{5}{c^3}+\cdots=\sum_{k=1}^{\infty}\frac{2k-1}{c^k}=2S_1-S_0.

Once again, we've managed to utilise the fact that the pairwise differences of consecutive perfect squares produce the sequence of odd numbers to reduce this series down to simpler results.

Applying the previous results obtained so far, we finally have

\frac{c-1}{c}S_2=\sum_{k=1}^{\infty}\frac{2k-1}{c^k}=2 \cdot \sum_{k=1}^{\infty}\frac{k}{c^k}-\sum_{k=1}^{\infty}\frac{1}{c^k}=2 \cdot \frac{c}{(c-1)^2}-\frac{1}{c-1}

\implies S_2=\frac{c(c+1)}{(c-1)^3}.

Level 3: Cubic-geometric Series

Now, what can we get if the numerator is a cubic term?

\sum_{k=1}^{\infty}{\frac{k^3}{c^k}}

We will call this sum $S_3$ and let's try the same method again.

S_3=\frac{1}{c}+\frac{8}{c^2}+\frac{27}{c^3}+\cdots

\implies \frac{1}{c}S_3=\frac{1}{c^2}+\frac{8}{c^3}+\frac{27}{c^4}+\cdots

\implies \frac{c-1}{c}S_3=\frac{1}{c}+\frac{7}{c^2}+\frac{19}{c^3}+\frac{37}{c^4}+\cdots \tag{\#}

Notice that in (#), the numerator terms are in the form of $k^3-(k-1)^3$ , with $k = 1,2,3,\ldots$ This simplifies to $3k^2-3k+1$ , which means that (#) can be rewritten as:

\frac{c-1}{c}S_3=3\sum_{k=1}^{\infty}{\frac{k^2}{c^k}}-3\sum_{k=1}^{\infty}{\frac{k}{c^k}}+\sum_{k=1}^{\infty}{\frac{1}{c^k}}=3S_2-3S_1+S_0

Finally, we then arrive at:

\frac{c-1}{c}S_3=3 \cdot \frac{c(c+1)}{(c-1)^3}-3 \cdot \frac{c}{(c-1)^2}+\frac{1}{c-1}

\implies S_3=\frac{c^3+4c^2+c}{(c-1)^4}.

You may start to notice a pattern here, and this will be discussed in the next section.

Level p: Polynomial-geometric Series

All of these naturally lead to the question: what happens if we generalise the power of $k$ on the numerator to be any positive integer, i.e. what is the infinite sum of the following series?

\sum_{k=1}^{\infty}\frac{k^p}{c^k}, \quad c \text{ is a constant with } |c|>1,\; p>1 \text{ is an integer}.

We know from previous sections that such series are in fact convergent, so we could try to find the respective general formulae for them.

A proof to show that this series is convergent.

Using the ratio test again like the previous proof, we find that $L=\lim\limits_{k \to \infty}\left\vert{\cfrac{\frac{(k+1)^p}{c^{k+1}}}{\frac{k^p}{c^k}}}\right\vert=\frac{1}{c} < 1$ as well, so it is indeed convergent.

Let's call this sum $S_p$ . Inductively, one can see that if we obtain $\dfrac{c-1}{c}S_p$ , its numerator terms are then in the form of $k^p-(k-1)^p$ , where $k=1,2,3,\ldots$

Not only is this where the binomial theorem comes into play, you can observe that the highest-power term, $k^p$ gets cancelled out while the terms of lower powers remain intact. This means that this method enables us to obtain a recurrence relation of $S_p$ in terms of the lower-power sums, i.e. $S_{p-1}$ , $S_{p-2}$ and so on, and this can help us to evaluate its general formula.

Since $k^p-(k-1)^p=k^p-\sum_{i=0}^{p}\binom{p}{i}k^{p-i}(-1)^i=\sum_{i=1}^{p}\binom{p}{i}k^{p-i}(-1)^{i+1}$ , we then arrive at this recurrence relation:

S_p=\frac{c}{c-1} \cdot \sum_{i=1}^{p}(-1)^{i+1}\binom{p}{i}S_{p-i}.

In fact, this derivation is very similar to that in this short paper by Alan Gorfin.

A Segue into Combinatorics

Interestingly, the closed form of this general formula has connections with combinatorics. According to a short paper by Tom Edgar, $S_p$ has the following formula, due to Carlitz:

S_p=\frac{c \cdot A_{p}(c)}{(c-1)^{p+1}}

where $A_p(c)=\sum_{k=0}^{p-1}A(p,k)c^k$ denotes the $p$ -th order Eulerian polynomial (following the Wikipedia notation).

The coefficients of the Eulerian polynomial, denoted $A(p,k)$ as above, are called Eulerian numbers. We can construct a triangle similar to the Pascal's triangle using Eulerian numbers, with $p$ starting from $1$ and incrementing one step at a time when traversing down the column, whereas $k$ ranges from $0$ to $p-1$ as we go from the left to the right of each row. This triangle is then usually called the Euler's triangle.

\begin{array}{lc} p=1: & 1 \\ p=2: & 1 \quad 1 \\ p=3: & 1 \quad 4 \quad 1 \\ p=4: & 1 \quad 11 \quad 11 \quad 1 \\ p=5: & 1 \quad 26 \quad 66 \quad 26 \quad 1 \\ p=6: & 1 \quad 57 \quad 302 \quad 302 \quad 57 \quad 1 \\ & \quad \quad \quad \vdots \quad \quad \quad \end{array}

Eulerian numbers have an application in combinatorics. To quote Wikipedia,

the Eulerian number is the number of permutations of the numbers $1$ to $n$ in which exactly $k$ elements are greater than the previous element (permutations with $k$ "ascents").

For example, $A(3,1) = 4$ , gives us the number of permutations of $1,2,3$ with exactly one element being greater than the previous element, i.e. $132, 213, 231, 312$ .

This means that this extension of the geometric series provides us a connection to a combinatorial pattern! Isn't that cool?

Level s: Complex-geometric Series

As a sidenote, what if we generalise even further, so that the numerator variable could take on powers of any complex number?

This brings us beyond the realm (heh) of real numbers and opens ourselves up to the world of Dirichlet series and polylogarithm function, which are common research topics in analytic number theory.

A Dirichlet series is defined as a series of the form

\sum_{k=1}^{\infty}{\frac{a_k}{k^s}}

where $s$ is a complex number and $a_k$ is a complex sequence.

The polylogarithm function is then a particular collection of Dirichlet series for which $a_k$ is a power series (the free variable in the terms of the series has successively increasing integer powers) in $z$ , where $|z|<1$ and $z$ is allowed to be complex (hence the absolute value here is in fact the modulus). Here, $s$ is called the order or weight of the function.

\operatorname {Li} _{s}(z)=\frac{z}{1^s}+{z^{2} \over 2^{s}}+{z^{3} \over 3^{s}}+\cdots=\sum _{k=1}^{\infty }{z^{k} \over k^{s}}

One can then notice that $S_p$ discussed previously is in fact $\operatorname{Li}_{-p}\left(\frac{1}{c}\right)$ , with $|c| > 1$ .

Unfortunately, even allowing analytic continuation, obtaining a value or an explicit expression for $\operatorname{Li}_{-s}\left(\frac{1}{c}\right)$ for even just positive rational values of $s$ , let alone a non-real number, is difficult. For example, even a closed-form expression for $\operatorname{Li}_{-\frac{3}{2}}\left(\frac{1}{c}\right)=\sum_{k=1}^{\infty}\frac{k\sqrt{k}}{c^k}$ is currently unknown.

That said, polylogarithm function is still a topic with various directions of research, including its integral representations, dilogarithms (i.e. when $s = 2$ ) and polylogarithm ladders.

To read more about the subject, as well as other topics covered here, feel free to search online using the keywords or navigate to the references/further reading section below.

References, further reading

Prime Numbers and Harmonic Series

May 5, 2024 · 3 min read

The Statement

There is a prime number related result which is adapted from a tutorial question in a course about number theory, which is stated as follows.

For any prime $p$ , $1+\frac{1}{2}+\frac{1}{3}+\cdots+\frac{1}{p-1}$ is divisible by $p$ .

The Proof

A proof of this result is fairly straightforward, when observed correctly.

Indeed, when $p=2$ , $1+\frac{1}{p-1}=1+1=2$ is divisible by $2$ .

Now, we only consider the odd primes, then we should have an even number of terms for the sum above. Now here's the key part: note that the terms of the sum above can be rearranged such that the first and last term are next to each other, as well as the second and second last term, and so on. When we sum each of the pairs together, we then obtain

\frac{p}{p-1}+\frac{p}{2(p-2)}+\cdots+\cfrac{p}{\left(\frac{p-1}{2}\right)\left(\frac{p-1}{2}+1\right)}.

We can then see that we are able to factorise $p$ out of the sum, so it is indeed divisible by $p$ .

In fact, using this result as well as the fact that there are an infinite number of prime numbers, which makes the set of prime numbers to be unbounded above due to the definition of prime numbers itself that necessitates them to be integers, it follows that the series $\sum_{n=1}^{\infty}\frac{1}{n}$ diverges to positive infinity. This provides an alternative proof to the ingenious method that produces an infinite sum of one halves.

How about the other way around?

This gives rise to another interesting question: can we prove that there are infinitely many primes, assuming the divergence of the harmonic series?

In fact, this is true. If we break down the denominator of each term of the harmonic series into its respective prime factorisation, then apply the distributive law and the formula for the geometric infinite sum, we then obtain the following result due to Euler, where $\mathbb{P}$ denotes the set of prime numbers:

\sum _{i=1}^{\infty }{\frac {1}{i}}=\prod _{p\in \mathbb {P} }\left(1+{\frac {1}{p}}+{\frac {1}{p^{2}}}+\cdots \right)=\prod _{p\in \mathbb {P} }{\frac {1}{1-1/p}}.

What Euler did next was taking logarithms on both sides and applying the Taylor series of logarithms gives the following:

\displaystyle \ln \prod _{p\in \mathbb {P} }{\frac {1}{1-1/p}}=\sum _{p\in \mathbb {P} }\ln {\frac {1}{1-1/p}}=\sum _{p\in \mathbb {P} }\left({\frac {1}{p}}+{\frac {1}{2p^{2}}}+{\frac {1}{3p^{3}}}+\cdots \right)=\sum _{p\in \mathbb {P} }{\frac {1}{p}}+\text{convergent terms}.

Since this is equal to the divergent harmonic series and a finite sum containing finite terms must converge, it follows that there must be infinitely many primes. This result also implies that the sum of reciprocals of prime numbers diverges, which was more rigorously verfied by Franz Mertens.

Attempting Challenge of Wits 2023

February 6, 2024 · One min read

In August 2023, I submitted my solution to the second challenge of Chalenge of Wits 2023 organised by the Defence Science & Technology sector of the Government of Singapore, which is about data science. However, since I am not a Singaporean, I was not qualified to receive the rewards of the challenge.

The problem statement of the challenge (with required data set and answer) can be found here.

Here is my submitted code for the challenge, which can also be found via this GitHub repository.

import xml.etree.ElementTree as ET
import os
import matplotlib.pyplot as plt
import traceback

cwd = os.getcwd()
K = 2254
coordinates = []
invalid = [317, 1768]

for i in range(K):
    if i in invalid: 
        continue
    try:
        filename = cwd + '/log_' + str(i) + '.xml'
        # with open(filename, 'r') as f:
        # read_data = f.read()
        tree = ET.parse(filename)
        root = tree.getroot()
        location_raw = root[0].text
        coordinate = tuple(map(lambda text: int(text), location_raw.split(',')))
        coordinates.append(coordinate)
    except Exception:
        print(traceback.format_exc())
        print(i)

plt.scatter(*zip(*coordinates))
plt.savefig('key.jpeg')

Higher-order Functions and Set Theory

February 6, 2024 · 3 min read

Higher-order function (HOF) is a type of nested function that offers a higher level of abstraction than ordinary functions, where instead of taking primitives or variables as arguments (parameters) and outputting objects, functions are taken in and output by HOFs.

The subtleties between a function composition and HOF can be shown through the syntax of Python, which can result in different behaviours depending on how we close the parentheses.

A classic example is demonstrated as follows.

def thrice(f):
    return lambda x: f(f(f(x)))

add1 = lambda x: x + 1

x = thrice(thrice(add1))(0) # 9
y = thrice(thrice)(add1)(0) # 27

In the example above, x will output 9 because the defining line of code is interpreted ‘right-to-left’ (like function composition), first composing the add1 function thrice to produce a function equivalent to lambda x: x + 3, then composing the first output thrice to produce a function equivalent to lambda x: x + 9, thus 0 + 9 = 9.

As for y, it will output 27 because it is defined ‘left-to-right’, first composing the thrice function itself three times, resulting in an HOF that effectively composes an input function $3^3=27$ times, which then takes in the function add1 to produce a function equivalent to lambda x: x + 27, and finally we get 0 + 27 = 27.

Higher-order functions are closely related to mathematics, in the sense that they enable the implementation of operators in mathematics, as well as a family of functions, using code.

For example¹, in set theory, we can define a family of functions $F$ , which itself is also considered a function from $\mathbb{N}^{\mathbb{N}} \to \mathbb{N}^{\mathbb{N}}$ . This operator $F$ takes in a function $f : \mathbb{N} \to \mathbb{N}$ and outputs another function $F(f) : \mathbb{N} \to \mathbb{N}$ that takes in whatever $f$ takes in and outputs whatever $f$ outputs plus one.

Phrased in a set-theoretic language as ‘pure’ as possible, $F$ is defined as

F=\{(f,\{(n,f(n)+1) : n \in \mathbb{N}\}) : f \in \mathbb{N}^{\mathbb{N}}\}.

An HOF implementation of $F$ in Python can then look like the following:

# The following type hints are to indicate that the functions are in N^N, and are unnecessary for practical purposes.
from typing import Annotated, Callable
from annotated_types import Ge # requires additional pip install: https://pypi.org/project/annotated-types/

def F(f: Callable[Annotated[int, Ge(0)], Annotated[int, Ge(0)]]) -> Callable[Annotated[int, Ge(0)], Annotated[int, Ge(0)]]:
    def output(n: Annotated[int, Ge(0)]) -> Annotated[int, Ge(0)]:
        return f(n) + 1
    return output

f = lambda x: x + 1
F(f)(0) # 2

This example is taken from the lecture notes of the Set Theory course conducted during the Fall 2022 semester by Dilip Raghavan in National University of Singapore. ↩

What is the Curve whose Tangent Segment is Bisected by its Own Tangent Point?

August 10, 2023 · One min read

This is supposed to be an article submitted as an entry of the third Summer of Math Exposition (SoME3) contest.

It was originally published on my main website but I have changed my place of posting blogs to this site. For reverse compatibility (so that the old link is still accessible), I am linking the article instead of reposting it, which can be found here.

Why though?​

Introduction​

It is way more often than you may expect​

Determinant Argument​

Subspace Dimension Argument​

Matrices with discrete entries​

Binary matrices​

Matrices over finite fields​

Random integer matrices​

Sidenote: diagonalisable matrices​

References, further reading​

Footnotes​

The Context​

The Good Old Epsilon-Delta​

The Nicer Things​

Footnotes​

Level 0: Geometric Series​

Level 1: Arithmetico-geometric Series​

Scaling and Shifting​

Level 2: Quadratic-geometric Series​

Level 3: Cubic-geometric Series​

Level p: Polynomial-geometric Series​

A Segue into Combinatorics​

Level s: Complex-geometric Series​

References, further reading​

The Statement​

The Proof​

How about the other way around?​

Footnotes​

Why though?

Introduction

It is way more often than you may expect

Determinant Argument

Subspace Dimension Argument

Matrices with discrete entries

Binary matrices

Matrices over finite fields

Random integer matrices

Sidenote: diagonalisable matrices

References, further reading

Footnotes

The Context

The Good Old Epsilon-Delta

The Nicer Things

Footnotes

Level 0: Geometric Series

Level 1: Arithmetico-geometric Series

Scaling and Shifting

Level 2: Quadratic-geometric Series

Level 3: Cubic-geometric Series

Level p: Polynomial-geometric Series

A Segue into Combinatorics

Level s: Complex-geometric Series

References, further reading

The Statement

The Proof

How about the other way around?

Footnotes