Coprime numbers, also known as relatively prime numbers, are integers that do not share any prime factors, or equivalently having a greatest common divisor (GCD) of 1. Although the definition of coprime integers are originally for a pair of integers, it can be extended to any number of integers due to its associativity.
When I first learnt about coprime integers in a number theory course, there was a question that was not covered during the course, nor did it occur to me until recently: what is the probability of choosing a pair of coprime integers at random1?
There turns out to be a definitive answer to this question with several well-established proofs, and it is quite an elegant one:
π26=0.6079271...≈61%
Surprisingly, π seems to appear out of nowhere in the answer, which makes this a topic that could make into 3Blue1Brown's "Why pi?" video series. However, this should look familiar to those who are familiar with the Basel problem, or the sum of reciprocals of perfect squares. In fact, they are very closely related.
He does not go for the usual brute-force, 'bottom-up' method of running through massive numbers of integer pair samples while checking for the coprimality of each pair and keeping track of the odds in order to see if the probability converges to a certain constant. Instead, he takes a 'top-down' approach by looking at the probability of a pair of integers having a greatest common divisor (GCD) of 2 or above.
What does this mean? Suppose that we call the first number x1 and the second number x2. Let's start with 2 as the target factor. For x1 and x2 to have an GCD of 2, they must at least both be divisible by 2. Since modular arithmetic tells us that we can group integers evenly based on their respective remainders after being divided by a certain factor, the probability of x1 (resp. x2) being divisible by 2 (i.e. having remainder zero) is 21.
Note that 2 must be the highest common factor of x1 and x2, so if we divide both of them by 2, they cannot share any other factor (except for 1 of course). Note that this is exactly the definition of coprimality! In other words, we have obtained a pair of coprime integers (namely 2x1 and 2x2) through this procedure. What is the probability of finding such a pair? That's exactly what we're looking for; let's call that value p.
The key of the procedure just now is that it gives us an important characterisation of a pair of integers having a certain number as their GCD: they must both be divisible by said number, and after dividing them by that number, the new resulting pair must be coprime.
Why?
If x1 and x2 had other common factors greater than 1 (say, b) after being divided by their GCD (denoted here as a), then ab, which is greater than a, should have been the GCD instead, leading to a contradiction.
What is the probability of finding an integer pair like this? The argument above tells us that it is the product of the probabilities, which represent the criteria that must be fulfilled simultaneously, that we obtained through the procedure, i.e. 21⋅21⋅p=4p.
At this stage, you may have figured out that this procedure can be repeated for every other integer factors. For 3, we obtain 31⋅31⋅p=9p; for 4, we obtain 16p, and so on. We can even deduce the general pattern: for a factor k, the probability of an integer pair having k as their GCD is k2p. This general pattern even works for the case when k=1, because an integer pair having 1 as their GCD essentially means that they are coprime, so the corresponding probability is indeed p.
Now, we see how everything comes together which helps us to obtain the value of p that we want. Note that any integer pair has the GCD of any number. This means that the probability of obtaining any integer pair among the sea of all integer pairs, which is clearly 1, is equal to the sum of all probabilities of an integer pair having the GCD of any number, ranged over all integers! Rewriting this description into an algebraic equation, we obtain
k=1∑∞k2p=1.
Rearranging the variables a bit yields
p=∑k∈Z>0k211⟹p=6π21=π26.
In other words, the probability p of a randomly chosen integer pair being coprime is the reciprocal of the sum of reciprocal of squares! That's where the Basel problem, i.e. the reason why π shows up in the solution, plays a role!
Another cool part of this argument is that it can be easily extended to an arbitrary number of randomly chosen integers, say, n of them.
By investigating n possible integers, say, x1,x2,…,xn instead of just two like before, calculating the probability of these integers having a certain number k as their GCD now involves multiplying k1 for n times in total, followed by multiplying p at the end as before.
In a similar vein, we then obtain that
p=∑k∈Z>0kn11.
Does the denominator ring a bell? It is in fact the definition of the Riemann zeta function!
ζ(s)=n=1∑∞ns1=1s1+2s1+3s1+⋯
Rather, for the special case of positive integers, it is more commonly known as the p-series.
It may be tempting to ask what the probability will become as the number of integers chosen 'tends to infinity'. It may even be more tempting to deduce from the general formula we just obtained to conclude that the answer should be 1. However, this question is meaningless because there can be contradicting outcomes depending on what kind of infinite collections of integers we are considering. For instance, the probability is 0 for all even numbers, but is 1 for all positive integers; notice that both cases involve an infinite number of integers.
What if we consider a finite range of numbers and a finite number of choices instead? According to this paper, this probability seems to decrease as the number of choices increases, which is frankly an interesting result.
Technically, it is impossible to choose a positive integer truly randomly such that each choice occurs with the same probability. However, we can still make formal mathematical definitions about such 'randomness' to suit our needs here; see here or here for more information. ↩
Suppose that we select arbitrarily a number of, say, n integers, denoted by a1,…,an. If we obtain their lowest common multiple (LCM), and label it as M:=lcm(a1,…,an), then take for each of the initial n integers the corresponding number ki (i=1,…,n) such that multiplying by that number gives us the LCM, i.e. kiai=M for each i, an intriguing pattern starts to emerge: all the numbers ki are coprime, i.e. gcd(k1,…,kn)=1.
The following is an example code snippet that illustrates this phenomenon. This snippet generates a list of n random integers between 0 and K, obtains their LCM, extract the factor ki for each of ai, where i=1,…,n, from the LCM and calculate the greatest common divisor (GCD) of the ki's.
Repeatedly running this code generates a different group of n integers, but notice that the GCD obtained in the end is always 1.
The definitions of lcm_euclidean_general and gcd_euclidean_general in the snippet can be found in this note.
from random import randint n, K =4,2000# feel free to change the parameters here test =[randint(0, K)for i inrange(n)] lcm = lcm_euclidean_general(*test) smallest_k =tuple(map(lambda x: lcm // x, test)) print(gcd_euclidean_general(*smallest_k))# 1
To those who are familiar with the topic of coprime integers, this phenomenon should not come as a surprise. Nevertheless, a brief explanation of this fact will be presented here.
If we assume that the numbers ki, where i=1,…,n are not coprime, or equivalently gcd(k1,…,kn)>1, and we call the GCD to be G, then we can write ki=Gbi where bi is an integer and G>1. It follows that M=Gbiai for each i.
However, this also means that we can divide each of them throughout by G and obtain GM=biai for each i. Note that GM<M because M>0 and G>1, and is a constant integer because G is also a common factor of M. Note as well that since each bi is an integer, this means that GM is a common multiple of the ai's that is smaller than the lowest common multiple M? This does not make sense, right? This is where we reach a (logical) contradiction for which the only way to resolve this is to conclude that the GCD G cannot be greater than 1 all along, thus the ki's from the start must be coprime!
We have thus demonstrated the technique of proof by contradiction to confirm the validity of this fact.
Invertible matrices represent linear transformations that are of full rank, i.e. those that do not 'squish' the space in a way that reduces its dimension, and injective (one-to-one).
The columns of an (n×n real) invertible matrix, are linearly independent to each other, and thus spans the full Rn space.
As nice mathematical structures usually do not happen frequently, this makes one wonder: in the universe of square matrices of an arbitrarily fixed1 size, say, n×n, what is the probability of finding an invertible matrix?
Surprisingly (at least to me), it turns out to be a very technical problem that has very different answers depending on various factors with respect to the entries of the matrix, including their probability distribution, allowable types of numbers (discrete or continuous, etc.) and even the underlying measure. In other words, there isn't a blanket answer and certain conditions need to be specified in order to reach an answer.
Let's first consider the scenario that is considered 'default' or 'standard' by most people: the probability of finding an invertible matrix among the space of an n×n matrix of real numbers2.
In this scenario, the answer turns out to be a surprising one (again, at least to me): it is almost surely to find an invertible matrix, i.e. the probability is 1 but this does not mean all real matrices are invertible (there are indeed singular, or non-invertible, matrices). This is a phenomenon that can happen when we are considering infinite sample spaces; even though there are an infinite number of singular matrices, invertible matrices still outnumber them so much that they are probabilistically negligible (literally zero probability). To quote the Wikipedia article about invertible matrices,
Singular matrices are rare in the sense that if a square matrix's entries are randomly selected from any bounded region on the number line or complex plane, the probability that the matrix is singular is 0, that is, it will "almost never" be singular.
There are two main approaches to prove this result, both of which make use of linear algebra and measure theory: one involves determinants, whereas the other involves subspaces and its dimension.
The argument involving determinants, as found in this post and this post from Mathematics Stack Exchange, applies the fact that invertible matrices have nonzero determinant, or in other words, singular matrices have zero determinant. The determinant of a real n×n matrix (with n2 entries) is also a polynomial function, as shown below (σ represents a permutation function and sgn(σ) is either 1 or -1):
Therefore, since det(M)=0 if M is singular, the collection of singular matrices is equivalent to the zero set of the determinant as a polynomial function from Rn2 to R.
There is a mathematical result which states that the zero set of a polynomial function has measure zero (a proof of this can be found here or here). It's fine if you don't understand what 'measure zero' means: what matters here is that an event represented by a (sub)set of measure zero has zero probability.
Thus, there is zero probability that a random real square matrix is singular, so the probability to obtain an invertible matrix is 1 (in the almost surely sense).
Why is that so? A less rigorous reasoning of this goes as follows: note that for an n×n matrix to be singular, it must have at least one column that is a linear combination of all the other n−1 columns.
Now, for a single nonzero vector v in Rn, the collection of Rn vectors that are linearly dependent to that particular vector are its scalar multiples, i.e. {kv:k∈R}, and this is a one-dimensional subspace of Rn. For two vectors v1 and v2, the collection of vectors that are linearly dependent to them is {av1+bv2:a,b∈R}, which forms a 2-dimensional subspace. Following this pattern, if we have n−1 vectors in Rn, the collection of vectors that are linearly dependent to them is exactly their linear span, which is an n−1-dimensional subspace.
This implies that the column space of a singular matrix is an at most n−1-dimensional subspace of Rn, so the space of singular matrices form an n−1-dimensional hypercube, compared to an n-dimensional hypercube formed by the space of all square matrices. Thus, the probability of finding a singular matrix is:
n-dimensional volume of a cube in Rnn-dimensional volume of a cube in Rn−1=0.
An alternative but very similar argument, as described here, states that due to the fact that singular matrices have zero determinant which gives us one constraint equation, the set of singular n×n matrices form an n2−1-dimensional vector space, which similarly has measure zero, with respect to the n2-dimensional space of all n×n matrices over the real numbers.
Either way, it follows that the probability of finding an invertible real matrix is, surprisingly, 1. In other words, for whatever real matrices that you come up with randomly, it is extremely likely to have an inverse!
Things start to get interesting when we restrict the entries of a square matrix to be discrete values, in particular binary (i.e. 0 or 1), finite field and integer values.
For binary matrices, the probability trend depends on the underlying ring (or in laymen's terms: number system) of the entries of the matrix. For binary matrices over the real numbers, it is a long-time research problem stemming from the 1960s that involves contributions by multiple mathematicians, and is settled by Tikhomirov in his 2018 paper: the probability that an n×n binary matrix Mn is invertible is given by
P(Mn is invertible)=1−(21+on(1))n
where on(1) denotes a quantity that tends to 0 as n tends to infinity.
We can see that as n gets larger, the probability gets closer to 1.
As for binary matrices over the finite field F2, it corresponds to the case when q=2 in the next section of this article, where matrices over finite fields are discussed in more detail. One thing to note is that the change in probability as n gets larger is different: it decreases and tends to a limiting number strictly between 0 and 1.
There is a pretty neat argument, which can be found inseveralsources including Wikipedia, that gives us the probability of finding an invertible matrix in the space of square matrices over a finite fieldFq3.
In the lingo of abstract algebra, the number of invertible n×n matrices over the finite field Fq is the order of the general linear group of degree n over Fq, GLn(Fq).
There is a pretty neat argument that gives us the probability of finding an invertible matrix in the space of square matrices over a finite fieldFq, which is described in several sources.
For a square matrices of size n whose entries are taken from a finite field Fq of orderq (i.e. contains q elements), which we denote here by A, there are q possibilities in each entry. Since there are n2 available slots, we have a total of qn2 such matrices. If we only consider one of its columns, then as there are n entries in a column, there are qn possibilities.
Note that A is invertible if and only if for j=1,…,n, the j-th column cannot be expressed as a linear combination of the preceding j−1 columns. For the first column v1, there are qn−1 possibilities, as only the zero vector is excluded. For the second column v2, since we need to exclude the q multiples of the first column, there are qn−q possibilities. For the third column v3, we need to exclude the linear combinations of the first and second column, i.e. vectors in the form of av1+bv2 where a,b=1,…,q, which contains q⋅q=q2 vectors, so in the end we have qn−q2 choices. Continuing this pattern, the last column will have qn−qn−1 choices. Thus, the number of invertible matrices over Fq, or the order of GLn(Fq) is given by
We then see that as the order q increases, the probability tends to 1, whereas as the matrix size n increases, the probability decreases and tends to a limiting number strictly between 0 and 1; for the case of binary matrices over Z2, the limit is approximately 0.288788.
For the case of n×n matrices with entries over the integers, with an appropriate probability distribution and measure chosen2, since there are infinitely many such random integer matrices, the result is analogous to the real matrices case as discussed above, i.e. the probability of finding an invertible matrix is 1, in the sense of being almost surely.
There is another matrix-related property that is nice to work with and highly sought after for its usefulness in improving computational efficiency: being diagonalisable. Sadly, not all matrices are diagonalisable, even over complex numbers, but what is the probability of finding one, or more accurately speaking: the "density" of diagonalisable matrices?
There is a paper about this topic that quotes or features some important results, but they are mainly about integer matrices instead of real matrices. They are only presented briefly as a list here, but one can consult the paper for more details.
The probability of an n×n matrix, where n is arbitrary fixed, with real entries being diagonalisable over the complex numbers (i.e. the eigenvalues are allowed to be non-real), is 1.
If the entries are only allowed to be integers, the probability of such n×n matrix being diagonalisable over the complex numbers is also 1.
It turns out that for n×n matrices over the real or rational numbers, the question becomes remarkably difficult; even for 2×2 matrices the result is not straightforward: techinques in multivariable calculus and mathematical analysis are employed in the said paper to prove the results. Nonetheless, here are some key results concerning 2×2 matrices provided by the paper:
The probability of finding a 2×2 matrix with only integer entries that is diagonalisable over the real numbers (i.e. the eigenvalues can only be real numbers) among all 2×2 matrices with only integer entries is 7249. Not a particularly elegant number, but this seems to show that diagonalisable matrices might be denser than we expect.
The probability of obtaining a 2×2 matrix with only integer entries that is diagonalisable over the rational numbers (i.e. the eigenvalues can only be rational) among all 2×2 matrices with only integer entries is 0, implying that diagonalisable integer matrices with only rational eigenvalues are so sparse that it is virtually negligible.
Funnily enough, this almost sounds like an oxymoron, yet it's just mutually understood in the mathematics literature (and beyond) to mean 'anything but not everything'. If you're confused, see here for explanation. ↩
For those who are puzzled by the term "finite field", one can think about counting the numbers on a clock face that only has 11 numbers instead of the usual 12. The number pointed by the hands cycles back from 11 to 1 as time passes. This gives us an example of a cyclic group of order 11: the integers modulo 11, i.e. Z11. Since 11 is prime, this gives us an example of a finite field as well: the field of integers modulo p where p is prime. Indeed, this is just a relatively simple example of finite fields. For those who want to know more about the subject, which is central in the study of Galois theory, I recommend this excellent expository paper by Keith Conrad. ↩
In a mathematical proof to show the existence of the singular-value decomposition (SVD)1 in real matrices which I have recently come across, there is a (small) statement that is applied in it without proof, which goes like this:-
Let A∈Rm×n be a real matrix and x∈Rn a real vector, then the map f:Rn→R, defined by f(x)=∥Ax∥2, where ∥⋅∥2 represents the 2-norm (Euclidean norm), is continuous.
This statement is used to show the existence of the largest singular value of a matrix A∈Rm×n, usually denoted as σ1, when defining σ1:=supx∈S∥Ax∥2, where S represents the unit hypersphere centred at the origin.
This article describes a proof of the statement above, which makes use of the induced matrix norm (also known as the operator norm).
Let (X,dX) and (Y,dY) be metric spaces. A function f:X→Y is continuous in a set A⊆X if for any point a∈A and any ε>0, there exists some δ>0, which depends on ε and/or a, such that whenever dX(x,a)<δ where x∈X, we have dY(f(x),f(a))<ε.
This definition provides rigour to the notion that a continuous function leaves no 'gaps' to its graph, i.e. the distance between two points on the function graph is arbitrarily small as we 'zoom in' indefinitely so that the distance between two points on the domain becomes arbitrarily small.
In our case here, X=Rn, Y=R and dX=dY=∥⋅∥2.
The induced matrix 2-norm, whose definition is stated as follows, also comes into play here.
∥A∥2=x∈Rn\{0}sup∥x∥2∥Ax∥2, where A∈Rm×n.
With these in place, let's start the proof:-
Let x∈Rn, A∈Rm×n and ε>0 be arbitrary, and set δ=∥A∥2ε.
If A=0,i.e. A is the zero matrix, then f is essentially the function that maps everything to zero, which is clearly continuous2.