Talk Notes
[self
math
]
This, this is a story about dorm lectures.
While in Oxford, I felt extraordinarily restless. I do not know nearly enough mathematics and I do not think this feeling of lacking will ever go away. I imagine adulthood is accepting this.
But how do you even, you know, convince yourself that you know math? Do you just read a lot?
The answer is yes. But you also need to communicate it. It is not that hard to read a ton and move on, but to actually sit down and write it into an explainable way for people? That’s the crux. That’s why doing math is “hard.”
Anyway, I gave a ~2.5-3 hour lecture on Probabilistic Combinatorics. The issue with giving talks on combinatorics is they almost always feel like group problem solving sessions, compared to lectures on set theory being more free to build upwards. Combinatorics is the field of mathematics where techniques are what matter, not the base level objects.
Studying probabilistic combinatorics is probably the best way to convince yourself that Erdös is underhyped. Almost every single beautiful proof I have seen has been a result of his, or in some way could be traced back to him.
This is the proof that would make anyone fall in love with the subject. It is on sum-free subsets.
We say that a set $S$ is sum-free if there are no triples $a, b, c$ in $S$ such that $a+b=c.$ The following proof works for either $S$ consisting of real numbers, or naturals, and is cleaner to do in the naturals.
A priori, most people should not expect that given a set $S$ of natural numbers, that you can consistently find a “large” subset that is sum-free. Consider the cases where $S = [n]$, or the first $n$ positive numbers. You can take all the odds from $S$ to create a large sum-free set, so you have a density of $1/2$.
I’m going to actually prove really quickly that this is maximal, though I have forgotten the proof, so I’ll time myself. Since we’re looking at finite sets, this is something that needs the pigeonhole principle. Let $S’$ be our sum-free subset of $S$, and say $m = \max(S’)$.
Say $m$ is odd. Then consider the pairings $(1, m-1), \ldots, (\lfloor m / 2 \rfloor, \lceil m/2\rceil)$. There are $\lfloor m /2 \rfloor$ many pairs. So if the size of the subset is larger than $m/2,$ it must contain two elements that add to $m$. The case where $m$ is even is near identical, as you do casework on if it contains $m/2$ or not.
Since we have an upper bound on the size based on the maximal element, and the maximal element is at most $\max(S) = n$, we can have at most $n/2$ elements when $S = [n]$.
I’m happy, since that only took me a few seconds to think about. However, for the next proof, it took me signifigantly longer to do it, and I knew the end answer beforehand. I also had a “hint.” I mentioned that one way to construct a maximal sum-free subset of $[n]$ was to take all of the odds, but an alternate strategy is to take the “upper half” of the set, as any two sums will always overshoot the maximal element.
So, here we go. Let $S$ be any set of $n$ integers. For our sake, say $S$ does not contain $0$, as otherwise $0 + 0 = 0$, and it’s just annoying.
Let $M = \max({\mid s\mid \in S})$. We want to find a prime $p$ such that it is $2 \pmod 3$ and larger than $2M+1$. The first condition is available to us via Dirichlet’s theorem, but is really only a technicality, so I don’t want to expand that out here. But the reason we want $p > 2M+1$ is because we want each element of $S \pmod p$ to be distinct, and smaller primes might not allow it.
Why did we do this? Well, what are nice sets that are sum-free in $\mathbb{Z}/p\mathbb{Z}$? The all odds does still work, but note that the upper half one might not! Specifically, by adding together a number that is close to $p$ to a number that is close to $p/2$, you will get around $p/2$ in modular arithmatic, which the upper half set has.
Instead, we look at the middle thirds. This is where knowing the answer that we’re looking for comes in real darn handy, though I’ve hidden it from you for now. Note that we set $p\equiv 2 \pmod{3}$. Write $p = 3k+2$. Consider the intervals $[1, k], [k+1, 2k+1], [2k+2, 3k+1].$ The middle one is of the most interest to us, which we will denote as $I$ as it is an interval, and it will be referred to as the middle third.
Note that the sum of any two elements in $I$ will always land you outside of $I$! This is not as good as the $n/2$ bound that we had before, and is now instead $k+1 \approx p/3$.
Now for the probability. Let $r$ be a uniformly random variable over elements in $(\mathbb{Z}/p\mathbb{Z})^\times$, e.g. $r$ is nonzero in the field. Let $rS = {rs \pmod{p} \mid s \in S}$. Note this is also a random variable. We then care about the random variable $X = \mid rS \cap I\mid $.
Why? If we can get a lower bound on $X$, then we have found a sum-free set in $rS$, as the subset of a sum-free set is sum-free. But note that being sum-free in $rS$ is also equal to being sum-free in general, as $r$ has a multiplicative inverse.
Specifically, if $a + b = c$, then $a + b = c \pmod{p}$. We would then have $ra + rb = rc \pmod{p}$. Since we’ve found a set such that the last equation does not hold, the first one cannot hold.
So, we need to find a bound on $X$. And this is where we use the linearity of expectation. Note that $rs$ for any $s \in S$ attains all of the values in $(\mathbb{Z}/p\mathbb{Z})^\times$, as $r$ can attain each of those values, and $s$ is nonzero in this field. So, the probability that $rs \in I$ is $\mid I\mid /(p-1) = (k+1)/(3k+1) \geq 1/3$. You have to get used to these kinds of inequalities in this kind of math, and I feel like I’ve only recently begun to be okay with guessing and going back to check it.
So, we can do:
\[\mathbb{E}[X] = \sum_{s \in S}\mathbb{P}(rs \in I) \geq n/3.\]Based on the former motivation for $X$, we now know that there is a set of size at least $n/3$ inside of $S$ that is sum-free.
Ultimately, the theorem states that for any set $S$ of $n$ nonnegative integers, you can find a subset that is sum-free and size at least $n/3$.
It is really, really hard for me to convey how insane I think this is beyond typing it out many times. But I hope it is clear. In some sense this is another example that the human mind is not really used to the linearity of the expectation operator, but I digress.
Even more insane is the fact that this was improved near the beginning of 2025 to an asymptotic bound of about $n/3 + \log(\log(n))$. This was work done by Bedert and dives way deeper into additive techniques that were first talked about by Bourgain in ‘97. They are sitting on my backlog of papers to read. Obviously the extended question is to find out if these bounds are sharp, e.g. to find sets with certain structures that force this bound. But I do not see a way forward for this, barring checking the Littlewood norm for a bunch of cases.
This talk was done at a society called PedroSoc, which was run by perhaps one of the best men I will ever get to meet in my life. I have met very few men of such strong conviction as he, and I was happy to talk to him always. He was the one that laughed and said that one should come to Oxford already knowing all the math and to spend the time for love.
Anyway, oftentimes when I give a talk my rough notes on the subject are much more in depth than what I actually end up talking about. I had extraordinarily messy notes back then, and so I was reading over it after the talk while walking outside. I walked outside at night a lot in Oxford, mostly because it was quiet.
But I also walked because I could call my friends from back in California, to stay connected with them from afar. I called Logan because… it felt right.
Fast-forward a bit, he takes it upon himself to start dorm lectures at Stanford. He is joined in this effort by two twins, Peter Bennett and Ailon Goraly (they are not related, they have their own twins). These lectures are very different and vary in format and are beautiful. I have been to three, all on different things.
I am not doing a great service to explain how wonderful these are. I truly believe that the fundamental issue of the sciences and art is that of communication. And this is perhaps the culmination of all of that, you know? A bunch of college kids just trying to explain the things that they love.
So I recently gave one of the lectures there. It was short, 15 minutes long. I’ve given short lectures on math before, but I knew that the audience was going to be a mix of mathematical backgrounds, so I decided against doing something super technical.
Instead, I talked about ridiculously cool people that I had learned about over the years.
The talk sequence was messy and abrupt and I really should have played some music in the background. Much of it was off of information that I found online, not all off wikipedia, but some of it from there.
I wrote down a lot of notes on the preliminaries to the work that Gill had done, as he was mostly in the field of continued fractions. Specifically, I wrote down the rough structure of Cohn’s argument for the ctfrac expansion of $e$, as the talk before me was on how $e$ showed up naturally in computer science optimization problems (roughly think about optimizing the quantity $(c/x)^x$ with respect to $x$).
But to keep on the combinatorics theme, I wanted to mention Folkman, especially his combinatorics theorem.
The setup is as follows. We say that a graph $G$ is $H$-Ramsey if any 2-coloring of $G$ contains a monochromatic copy of $H$. Ramsey’s theorem then states simply that for any $n$, there exists some $G=K_N$ that is $K_n$-Ramsey.
I actually find that this is pretty intuitive. The proof itself kind of gives way to how it works inductively. I also think there’s probably a proceedure to take some $K_n$ and run casework on it in the same way that you can run casework on $K_3$ in $K_6$, but proofs of this type are most likely machine-checkable instead of human-readable.
But what breaks my brain for me is the following result of Folkman. Let $n \geq 2$. Then there is always some graph $G$ that is $K_n$-Ramsey, but it does not contain a $K_{n+1}$.
The notation that Folkman uses originally in his amazing paper is slightly different, but it is a short paper that I read while sitting on a couch in Utah. It took me a very long time to read.
I feel this weird sense that unless I have memorized a proof verbatim or have let it marinate within me for a long time, I cannot write the proof down without it looking like a direct copy, unless I personally find issue with the way the proof is formatted. When you read this, you will probably look at Folkman’s paper and go “huh, man, he really just copy pasted this stuff,” and to you I say, well, sort of. Typing it out word for word and deleting and resorting and reorganizing is the POINT of reading mathematics. A computer scientist that I really look up to has more on her blog.
Anyway, the proof is very, very similar to the proof of Ramsey’s theorem. Ramsey’s theorem is usually proven using off-diagonal Ramsey numbers, taking a “sum” of two graphs of smaller Ramsey number, and then arguing via size.
Instead of taking the sum of two complete graphs, instead he takes two graphs that have the Folkman property for off-diagonals and combining them in a way that avoids the larger graphs, and then showing that it has the Ramsey property via a size argument.
I’ll flesh out this sketch. First, we’re going to adopt his notation. We let $\delta(G)$ be the size of the largest complete subgraph of $G$. He calls this dimension for a discrete topological reason that to me always feels a bit extra. Like I get these are all simplicies at the end of the day, but within the context of these problems, I haven’t seen that much use of topology yet. We let $f(k_1, k_2)$ be the min. dimension of graphs inside of $\Gamma(k_1, k_2)$, where $\Gamma(k_1, k_2)$ is all graphs that when two colored contain a red $K_{k_1}$ or a blue $K_{k_2}$. This is nonempty by Ramsey’s theorem.
You want equality, so you want to show that $f(k_1, k_2)$ is lower bounded by $\max(k_1, k_2)$. But this is immediate by definition Ramsey property.
As this is an induction, you need a base case, which is when either $k_i$ is equal to $2$. Since this is just an edge, we have $K_{k_j} \in \Gamma(k_i, k_j)$ where $j$ is the other index, so we have $f(k_i, k_j) = \max(k_i, k_j)$.
The induction is on the sum of elements $k_1+k_2$ with $k_1, k_2 \geq 2$. We’ve just shown for when either one is equal to $2$, it’s done, so we start out with the following assumptions.
Suppose $k_1 + k_2 \geq 6$, and that when $k_1’, k_2’ \geq$ and $k_1’ + k_2’ < k_1+k_2$ we have $f(k_1’, k_2’) = \max(k_1’, k_2’)$. We further assume that $k_1, k_2 \geq 3$. For ease of notation let $m = \max(k_1, k_2)$.
By the IH, we have graphs $G_1 \in \Gamma(k_1-1, k_2)$, $G_2 \in \Gamma(k_1, k_2 - 1)$, and $G_3 \in \Gamma(k_1-1, k_2-1)$, such that each one has minimal dimension. This $G_3$ will be secret tool to help us later, as we’re going to glue together $G_1, G_2$ in a clever way, as each of their dimensions is upper bounded by $m$.
Label $G_1, G_2$ such that each vertex is distinct. Then we can define their disjoint union, which has the same upper bound on the dimension $m$. We let $M$ be the number of $m-1$ element subsets of the vertices of these two graphs. Note that we don’t immediately know what $M$ is! I don’t believe Folkman did much work on asymptotics for this graph.
Let $N$ be the number of ways that the edges of a new graph $H_2 = H((m-1)^2M^2, G_3)$ can be partitioned intwo two classes. Folkman doesn’t specify here if “partition” is ordered, but it is implied later. This is not a part of the proof that I am willing to explain, mostly as it is the “hidden technical lemma that is secretly really fucking hard.” Explicitly, this graph $H$ has the same dimension as $G_3,$ so $m-1$, and if you partition the vertices into $(m-1)^2M^2$ classes, then for some class $C_i$, there exists a subset $S \subseteq C_i$ such that the subgraph induced by $S$ in $H$ is isomorphic to $G_3$.
We need this tool once more. Let $V_1$ be the set of vertices in $H_1 = H(N, G_1 \sqcup G_2)$, which is the same as before, just swapping out the constants. This should seem weird. Let $V_2$ be the set of vertices of the original $H$. Finally, let $X$ be set containing all $m-1$ subsets of $V_1$.
This is where the construction actually is, and is where I would benefit from drawing a good picture. Our vertex set of choice is $V = (V_1 \times V_2) \cup X$. So now we need to figure out our edge set via casework.
If we have $(v_1, v_2), (w_1, w_2) \in V$, we connect these two if $v_1 = w_1$ and $v_2, w_2$ are adjacent in $H_2$, that weird graph from before, or alternatively, if $v_2 = w_2$ and $v_1$ is adjacent to $w_1$ in $H_1.$
If we have $(v_1, v_2) \in V$ and $S \in X$, we draw an edge iff $v_1 \in S$. Note this means that there are a good number of edges, except I don’t really know the size asymptotics of these $H$-graphs, but this definetly fills up the number of edges.
We don’t connect any elements in $X$ with each other. So this graph looks kind of funny so far. Let $G$ be the resulting graph.
Now you want to show that $G$ is $(k_1, k_2)$-Ramsey and also that $\delta(G) \leq \max(k_1, k_2)$. It is much harder to prove the first. The second is a more standard argument.
Let $A$ be some maximal complete subgraph contained in $G$, e.g. $A$ is isomorphic to $K_{\delta(G)}$. If $A$ contains an element from $X$, call it $S$, note it must only contain one element, so the vertices are $S, (v_1, w_1), \ldots, (v_{\delta-1}, w_{\delta-1})$. Note that each $v_i \in S$. If each is distinct, then clearly there are at most $m-1$, so we get the upper bound $\mid A \mid \leq m$. So, say they are not distinct, and reorder it so $v_1 = v_2$. Then we have that $(w_1, w_2)$ is an edge in $H_2$, so they are distinct. Note that for the rest of the vertices, $w_i$ can be at most one of $w_1, w_2$, meaning that to be connected in $A$ it must be that $v_i$ is equal to $v_1$ or $v_2$. Since $v_1 = v_2$, all of the first coordinates are the same, and all $w_i$s must be distinct adjacent vertices in $H_2$. Note that the dimension of $H_2$ was forced to me $m-1$, so $\delta(A) \leq (m-1)+1$.
In all honesty, reread that. I personally think that’s a very clean proof, though it only really uses the property of $H_2$ having a certain dimension. It also immediately makes it clear how to proceed.
Instead, if $A$ has no elements of $X$, then its only vertex pairs. Using the same exact argument, you get that the dimension is at most $m-1$ if all of the first coordinates are the same (as $\delta(H_2) = m-1$), or instead we have that the dimension is at most $m$ (as $\delta(H_1) = \delta(G_1 \sqcup G_2) = m$).
So that’s how you can construct this graph that avoids containing a larger complete subgraph using the parts from $G_1, G_2$. It isn’t as clean as the Ramsey proof, as the sum requires this new external graph with a weird property, which are needed for the final half of the proof.
Let $(C_1, C_2)$ be a partition of the edges of $G$ into two classes. Let $u$ be a vertex from $H_1$. Then we have an edge between $(u, w_1)$ and $(u, w_2)$ if $(w_1, w_2)$ is an edge in $H_2$. Then define $D_i(u)$ as the set of edges $(w_1, w_2)$ of $H_2$ such that the edge between $(u, v_1), (u, v_2)$ is in $C_i$. Think of this as an intersection operation.
Note that $(D_1(u), D_2(u))$ partitions the edges of $H_2$. Note again that by definition there are exactly $N$ such partitions. So now we’re going to partition $H_1$ into $N$ classes, by putting $v_1, v_2 \in V_1$ in the same class if they induce the same partition, e.g. $(D_1(v_1), D_2(v_2))$. One of these classes contains a subset $U$ which induces a subgraph isomorphic to $G_1 \sqcup G_2$ with the additional structure from before. Let $U$ be this subset, let $(D_1, D_2)$ be the partition of the edges of $H_2$ that the vertices are labelled with.
Let $w \in V_2$. Then $U \times {w}$ induces a subgraph of $G$ that is isomorphic to $G_1 \sqcup G_2$ as all the vertices in $U$ are connected in the same way that $G_1 \sqcup G_2$ are. THIS IS THE RAMSEY TYPE SUM ADDITION THAT IS USED BY THE WAY. ALL CAPS FOR IMPORTANCE.
If there is some $w \in V_2$ that contains some $K_{k_i}$ in this subgraph with all edges in the same class $C_i$, then we are done, as this is the Ramsey property. Otherwise, for all $w \in V_2$, neither $C_1$ or $C_2$ intersected with $G_1 \sqcup G_2$ contain their respective complete graph. Womp womp.
But now we use the property of $G_1, G_2$. Define $S_1(w), S_2(w)$ for $w \in V_2$ by the set of vertices in $U$ such that $\mid S_i(w)\mid = k_i-1$ and the graph induced by vertex set $S_i(w) \times {w}$ is isomorphic to $K_{k_i}.$ This must happen in the failure case, because $G_1, G_2$ are both Ramsey type, just with minus ones. Note they both must fail at the same time as well.
Because $k_1, k_2$ might be different, we’re going to instead work with $T_i(w)$, which are the exact same functions, just that they choose a larger subset, e.g. $S_i(w) \subseteq T_i(w) \subseteq U$ and $T_i(w)$ contains $m-1$ elements. This only adds to one of them.
Note that $T_i(w) \in X$, the subsets of size $m-1$. So now this is actually a vertex inside of the big $G$! Furthermore, note it is adjacent to each of the vertices $T_i(v) \times {w}$ where $w\in V_2$, by definition of edges in $G$ (first element must live inside of the subset).
If all of these are in the same class $C_i$ for some $w$, then note ${T_i(w)} \cup (S_i(w) \times {w})$ is a $K_{k_i}$, as the second product by definition is a $K_{k_i-1}$ in $C_i$ and all we’ve done is add the necessary edges to bump it up.
So obviously assume we don’t have that anymore, womp womp. Then for each $w \in V_2$ we have edges of the form ${T_1(w), (u_1(w), v)} \in C_2$ and ${T_2(w), (u_2(w), w)} \in C_1$, where $u_i(w) \in T_i(w)$. Effectively we know that one of the edges is bad, so it’s just in the other class, this is just explicitly writing out that edge.
Home stretch now kiddos. Note $U$ has the same number of vertices as $G_1 \sqcup G_2$, by definition $M$ is the number of $m-1$ subsets of the vertices, so you can also partition $U$. Note then theere are $(m-1)^2M^2$ ordered quadruples $(u_1, u_2, T_1, T_2)$ where $u_i \in T_i \subset U$ and both $T_1$ and $T_2$ have $m-1$ elements.
The neat way of labelling here is by partitioning the vertices of $H_2$ by saying $v, w$ are the same iff $(u_1(v), u_2(v), T_1(v), T_2(v))$ are the same for both $v, w$. Using the property of $H_2$, there is some subset $V \subset V_2$ that induces a subgraph isomorphic to $G_3 \in \Gamma(k_1-1, k_2-1)$, and by our labelling, each of them have the same quadruple, say $(u_1, u_2, T_1, T_2)$. I personally feel like this is the most trickery part, as there is so much structure that is kind of ignored (e.g. the $u_i$ here are just elements! such that the edge is there. there might be multiple choices), but that’s kind of why it is nice.
Note further that $u_1, u_2 \in U$ by construction, so $(D_1(u_i), D_2(u_i))$ are the same as $(D_1, D_2)$. This is a partition of the edges of $G_3$, since its a partition of $H_2$ and induces one onto the subgraph. Note that $G_3$ is $(k_1-1, k_2-1)$-Ramsey, so it contains some subset $W \subseteq V$ such that $W$ is isomorphic to some $K_{k_i-1}$ and contained in $D_i$, note this means it is also contained in $C_i$ as $D_i$ was simply a type of “intersection.”
Note that ${u_j} \times W$ is isomorphic to some $K_{k_i-1}$ inside of $G$, where $i \neq j$. By definition, the edges from $u_j$ connected to $W$ are in $C_i$. So, now, all that we need to do? Note that $({u_j} \times W) \cup {T_j}$ is isomorphic to a $K_{k_i}$, and all possible edges are inside of $C_i$.
Because it is almost midnight and I have work in the morning, I will not add pictures to this currently. But by god I’m happy I read this again. I had maybe ~50% of this memorized, and only the big picture parts. I do not yet understand the technical lemma, and would not be able to give a talk on it yet.
I mentioned him in the talk both for this theorems beauty and to show the sides of mathematicians that aren’t talked about much. Folkman got brain cancer. Erdös and others would visit him in the hospital and give him math problems to boost his spirits. But he felt as though the cancer ate away at his ability to solve the problems. He later committed suicide.
So if you’re a theoretician, please play. The past 2 years have been full of people who seem so… traumatized by the academic landscape? I do think back to Freedman. Michael Freedman, on his way to his PhD, decided to buy himself a Linear Algebra textbook. He taped it to the wheel as he began the long drive from California to New Jersey, as he had not yet read all of it during his short time at UC Berkeley. The source of this was a topologist I met in Utah. Freedman was also a Topologist, so I’m inclined to believe the source.
Jupyter Notebooks and the compute they offer are perhaps one of the best gifts to have gotten in the past couple of years.
If you want to read more, these lecture notes from Wigderson are extraordinarily well done.
So I really do feel as though I need to give more proper talks, or I need to write a lot more math to feel whole.
PS - Thank you to Vivian Loh, Yuval Widgerson, Pedro de Oliveira Lengruber Lack, Logan Graves, Peter Bennett, Ailon Goraly. Thank you to Folkman and Erdös.