Some thoughts about “automated theorem searching”

Let me begin by giving a spoiler warning. If you haven’t watched “The Prisoner” you might be spoiled about one of the episodes. Not that matters, for a show from nearly 40 years ago, but you should definitely watch it. It is a wonderful show. And even if you haven’t watched it, it’s just one episode, not the whole show. So you can keep on reading.

So, I’m fashionably late to the party (with some good excuse, see my previous post), but after the recent 200 terabytes proof for the coloring of Pythagorean triples, the same old questions are raised about whether or not at some point computers will be better than us in finding new theorems, and proving them too.

My answer is a cautious yes, with the caveat that we can still end up regressing to the middle ages with a side of nuclear winter. Or some other catastrophe. But to quote Rick Sanchez, “That’s planning for failure, it’s even stupider than regular planning.” and who am I to argue with the smartest man in the universe? (Although, granted, Rick doesn’t care about humanity and he has his portal gun, so that’s a solid backup plan…) But I digress.

Assuming that we continue on the path on which we are set, and we don’t collapse under our own weight in the world, it is probable that in a few centuries computers can do mathematics better than people. They will probably be able to search for, and prove, theorems that we wouldn’t even think about.

And here is where “The Prisoner” comes in. In the episode “The General”, Number Two devises a plan for super-saturated-quick-learning through flickering TV statics or something. The plan is to educate everyone, and achieve a measure of uniformity in the general populous. Number Six retorts, however, that if you just transmit facts to people, what you get is still just a row of cabbages, and Number Two answers “Indeed, knowledgeable cabbages”. Knowing a bunch of historical facts does not mean that you know history. You are devoid of context, intricacies, and you are only left with dry and meaningless facts. It is us, the people, who generate context, who weave the seams into the names, dates and numbers. We give history meaning, because it is meaningful to us, specifically. It has not meaning otherwise.

In a show about the struggle of the individual against the society that wants them to conform and be a subdued and productive member of society, this is one of the strongest episodes where this message was brought to the surface.

Now. Who writes all these lectures? Ah, that would be The General. A supercomputer which is capable of answering every question, from advanced mathematics to crop spraying. And before Number Two can feed into the computer a question which was driving part of the plot, the claim of the supercomputer’s power is contested by Number Six, who then types a short question and feeds it into the computer.

The computer crashes, causing a bit a kerfuffle. As things cool down, the following dialogue ensues:

Number Two: What was the question?
Number Six: It’s insoluble for man or machine.
Number Two: What was it?
Number Six: W, H, Y, question mark.
Number Two: “Why?”
Number Six: Why.
Number Two: …why?

Setting the epistemological awesomeness of Number Six aside; this is what separates, in my opinion, self-aware creatures from robots. We are capable of wondering “why” something is, and why is it even important? And here we circle back to automated theorem searching and proving. Until we develop a conscious machine, that can appreciate the intricate beauty of mathematical questions, and actively select whether or not a theorem is worth searching for, based on past knowledge, and past interest and the likes of it (and any of this will have to be a learning algorithm, because there is just no rigid definition for what is a beautiful mathematical statement), until we can create machines that have this ability, automated searching for theorems is doomed to produced a row of cabbages, even if knowledgeable cabbages, rather than mathematical results that would have been produced by mathematicians.

Because mathematics is meaningful to us, as humans, more than it is meaningful in any other non-existent way. And until computers can endow their existence with the search for meaning, they cannot appreciate why something like coloring of Pythagorean triples is interesting, or why the proof of independence of the continuum hypothesis from the axioms of set theory is beautiful.

Until then, mathematics is a human activity, and a social one too.

(Note, by the way, I didn’t say anything about computer verified proofs. That is a different story altogether, and I have different, albeit equally strong, opinions there.)

Iterating Symmetric Extensions

I don’t usually like to write about new papers. I mean, it’s a paper, you can read it, you can email me and ask about it if you’d like. It’s there. And indeed, for my previous papers, I didn’t even mention them being posted on arXiv/submitted/accepted/published. This paper is a bit different; but don’t worry, this is not your typical “new paper” post.

If you don’t follow arXiv very closely, I have posted a paper titled “Iterating Symmetric Extensions“. This is going to be the first part of my dissertation. The paper is concerned with developing a general framework for iterating symmetric extensions, which oddly enough, is something that we didn’t really know how to do until now. There is a refinement of the general framework to something I call “productive iterations” which impose some additional requirements, but allow greater freedom in the choice of filters used to interpret the names. There is an example of a class-length iteration, which effectively takes everything that was done in the paper and uses it to produce a class-length iteration—and thus a class length sequence of models—where slowly, but surely, Kinna-Wagner Principles fail more and more. This means that we are forcing “diagonally” away from the ordinals. So the models produced there will not be defined by their set of ordinals, and sets of sets of ordinals, and so on.

One seemingly unrelated theorem extends a theorem of Grigorieff, and shows that if you take an iteration of symmetric extensions, as defined in the paper, then the full generic extension is one homogeneous forcing away. This is interesting, as it has applications for ground model definability for models obtained via symmetric extensions and iterations thereof.

But again, all that is in the paper. We’re not here to discuss these results. We’re not here to read some funny comic with a T-Rex and a befuddled audience either. We’re here to talk about how the work came into fruition. Well, parts of that process. Because I feel that often we don’t talk about these things. We present the world with a completed work, or some partial work, and we move on. We don’t stop to dwell on the hardship we’ve endured. We assume, and probably correctly, that most people have endured similar difficulties one time or another. So there is no need to explain, or expose any of the background details. Well. Screw that. This is my blog, and I can write about it if I want to. And I do.

So, the idea of iterating symmetric extensions came to me when I was finishing my masters, I was thinking about a way to extend symmetric extensions, because it seemed to me that we ran this tool pretty much into the ground, and I was looking for a tool that will enable us to dig deeper into the world of non-AC models. It was a good timing, too. Menachem [Magidor] had told me about this interesting model that they constructed in Bristol at some workshop, and it seemed like a good test subject (dubbed “The Bristol Model” from that point onward). When I settled into this idea, and Menachem explained to me the vague details of the construction, it immediately seemed to me as an iteration of symmetric extensions. So I set on to develop a method that will enable me to formalize and reconstruct this model (I did that, and while I have a set of notes with a written account, I will soon start transforming those into a proper paper, so I hope that by the end of July I will have something to show for).

The first idea came to me when I was in Vienna in September of 2013. I was sure it’s going to work easy peasy, and so I left it to focus on other issues of the hour. When I came back to this a few months later, Menachem and I talked about it and identified a few possible weak spots. Somehow we managed to convince ourselves that this is not a real issue, and I started working the details. Headstrong and cocksure, I was certain there just a few small technical details which will be solved in a couple of days worth of work. But math had other plans, and I spent about a year and a half before things worked out.

Specifically because I kept running into small problems. Whenever I wrote about some statement that it’s “clear” or “obvious”, there were troubles with that later. Whenever I was sure that something has to be true, it turned out to be false. And I had to rewrite my notes many times over. Usually more or less from scratch. Luckily for me, Martin Goldstern was visiting Jerusalem for a few months during the spring semester of 2015, and he was kind enough to hear my ideas and point a lot of these problems. “Oh, just make sure that such and such is true” he would say, and the next day I’d find him and say something along the lines “Yeah, it turned out that it’s false, so I had to do this and that to circumvent the problem, but now it simplified these proofs”. And the process repeated itself. This long process is one of the great sources for this blog post of mine, and this post and also that post.

Closing in on the summer, Yair [Hayut] was listening to whatever variant I had at the time, and at some point he disagreed with one of the things I had to say. “Surely you can’t disagree with this theorem, it only relies on the lemma that I showed you as the first lemma, and you’ve agreed to that”. He pondered a little bit, and said “No, I actually disagree with the lemma”. We paused, we thought about it, and we came up with one or two counterexamples to that lemma. It was exactly the issue Menachem and I identified, and suddenly all the problems that were plaguing me because obvious consequences of that very problem.

I had worked very hard over the course of the next two months, and I managed to salvage the idea from oblivion. It was a good thing, too, because shortly after I’d visit the Newton Institute, and I had the chance to present this over the course of 8 hours to whomever was interested. And a few people were. But the definition was just terrible. I was happy it’s working, though, so I left it aside to cool down for a bit, while I worked on other projects of my thesis.

And now, I sat down to write this paper. And as I was writing it, I realized how to simplify some of the horrible details, which is great. This caused some of the proofs to be clearer, better and more of what you’d expect of these proofs. And that’s all I ever wanted, really. It took me two years, but it feels good to be over with, I hope. Now we wait for the referee report… and a year from now, when I’ve forgotten all about this, I’ll probably grunt, groan, and revise the damn thing, when the report will show up. Or maybe sooner.

Well… I’m done venting. Next stop, writing up The Bristol Model paper.

Addendum:

Okay, maybe this sounds like I’m treating this as a rare process. And to some extent, it is. This is my first big piece of research. You can only have one first of those. Yes, mathematical research is a process. A long and slow process. I’m not here to complain about this, or argue otherwise. I’m here to point out the obvious, and complain that I never heard people talking about these sort of slow processes. Only about the “So he hopped on a plane, came over here, and we banged this thing together in a couple of weeks time”, which is really awesome and sort of exciting. But someone has to stand up and say “No, this was a slow and torturous process that drained the life out of me for the better part of two years”.

Syntactic T-Rex: Irregularized

One of my huge pet peeves is with people who think that writing $1+2+3+\ldots=-\frac1{12}$ is a reasonable thing without context. Convention dictates that when no context is set, we interpret infinite summation as the usual convergence of a series, namely the limit of the partial sums, if it exists (and of course that $1+2+3+\ldots$ does not converge to any real number). However, a lot of people who are [probably] not mathematicians per se, insist that just because you can set up a context in which the above equality holds, e.g., Ramanujan summation or zeta regularization, then it is automatically perfectly fine to write this out of nowhere without context and being treated as wrong.

But those people forget that $0=1$ is also very true in the ring with a single element; or you know, just in any structure for a language including the two constant symbols $0$ and $1$, where both constants are interpreted to be the same object. And hey, who even said that $0$ and $1$ have to denote constants? Why not ternary relations, or some other thing?

Well. The short answer is that they are not used for anything other than constants, because the readers are mostly human (sometimes computers, and sometimes cats), and they take strong hinting from the choice of letters as setting up context. If I use $n$ for some index, it hints to the reader that this is a natural number, or $\varepsilon$ hints at a very small amount when it comes to analysis. If I use $\kappa$, at least in set theory, this hints at a cardinal. Whenever I work with people, we run into a joke that as far as large cardinals go $\delta$ is generally a Woodin cardinal, sometimes an extendible, and rarely a supercompact. And $\kappa$ is always regular, unless it was a measurable that we singularized somehow.

The point is that $$\lim_{\varepsilon\to\infty}\int_\pi^{\frac1{\omega}}\int_\delta^\kappa\varepsilon\cdot\aleph_0(\omega_3,\Omega,\Bbb R)\operatorname d\Bbb R\operatorname d\Omega=42$$ is a valid mathematical statement, which should cause most, if not all, mathematicians to cringe, look away, and possibly burst into tears. Because it feels wrong.

But hey, don’t leave the site just yet. I know that you didn’t come here to read my tirade against people who misunderstand the whole point of an implicit context. You came here for a Mathematical T-Rex comic!

(Thanks to Matt Inman of The Oatmeal for the template, which can be found here.)

Quick update from Norwich

It’s been a while, quite a while, since I last posted anything. Even a blurb.

I’m visiting David Asperó in Norwich at the moment, and on Sunday, the 12th, I will return home. It seems that the pattern is that you work most of the day, then head for a few drinks and dinner. Mathematics is eligible for the first two beers, philosophy of mathematics for the next two, and mathematical education for the fifth beer. Then it’s probably a good idea to stop. Also it is usually last call, so you kinda have to stop.

If luck is with us tomorrow, there might be some great news in the near future. If not, then there might be some other, good, or at least interesting, news in the near future.

And in other unrelated news, there are some updates coming in the next couple of weeks. I hope.