I’ll ask the same question I asked Michał when he gave the talk at the KGRC, as I’m curious about your views on this:
There’s an argument in the philosophical literature using Tennenbaum-style phenomena that goes roughly as follows:
1. No non-standard model of PA is recursive.
2. Reflecting on our practice of addition and multiplication, we see that these are recursive functions.
C. Therefore, we single out the standard model by using our understanding of a recursive procedure.
The normal response (pretty persuasively argued in Tim Button and Peter Smith’s article on this) is that the notion of recursive presupposes some notion of finiteness in understanding recursive computation, which is precisely what a sceptic about arithmetic would deny.
Now yourself and Michał have shown that something similar to Tennenbaum holds for equivalence relations (in fact, that the problem seems to be with specifically with *relations* rather than any bi-interpretability with set theory or anything of that sort).
My question: Do you think your result tells for or against Tennenbaum-style arguments such as the one presented above?
Best Wishes,
Neil
]]>From the mid 1990’s to about 2012, no results have been published on Laver tables or the quotient algebras of elementary embeddings. Nevertheless, set theorists have considered the algebras of elementary embeddings to be important enough that they have devoted Chapter 11 in the 24 chapter Handbook of Set Theory to the algebras of elementary embeddings.
Since I am the only one researching generalizations of Laver tables, and only 3 other people have published on Laver tables since the 1990’s, I have attempted to make the paper self-contained and readable to a general mathematician.
Any comments or criticism either by e-mail or on this post about the paper would be appreciated.
Let me now summarize some of the results from the paper.
The ternary Laver tables are much different than the classical and multigenic Laver tables (I used to call the multigenic Laver tables “generalized Laver tables”) computationally in the following ways:
And I will probably post the version of the paper on Generalizations of Laver tables (135 pages with proofs and 86 pages without proofs) without proofs in a couple of days. Let me know if the calculator is easy to use or not.
As with the classical and multigenic Laver tables, the ternary Laver tables also produce vivid images. I will post these images soon.
]]>I hope you’ll be posting slides, and do you know if there will be a live feed (or video for later)? I’d love to watch along as I did last year on Joel’s 50th.
]]>I guess in the version with the infinitely many well-ordered fish in the bucket, the move is to decrease the number of fish in one bucket by any amount, and then add any number of fish to the buckets to the left? I think to terminate, we still need the restriction that the number of non-empty buckets at any stage of the game is finite.
I think the winning strategy for this game is to make sure that after your move, the first two buckets (0 and 1) have the same number of fish, same for buckets 2 and 3, 4 and 5, etc.
]]>So, it took a bit longer than expected to get this ship-shape, but I’m basically there. You can find the ctms paper here:
and the coding of extensions paper here:
https://www.academia.edu/31683677/Universism_and_Extensions_of_V
There will probably be a few tweaks to be made, but its getting there. The former is submitted, and the latter will be soon!
Best,
Neil
]]>0 likes
]]>0 likes
]]>For all $n$, let $L_{n}:\{0,\ldots,2^{n-1}\}\rightarrow\{0,\ldots,2^{n-1}\}$ be the mapping that reverses the digits in the binary expansion of a natural number. Let $L_{n}^{\sharp}:\{1,\ldots,2^{n}\}\rightarrow\{1,\ldots,2^{n}\}$ be the mapping where
$L_{n}^{\sharp}(x)=L_{n}(x-1)+1$. Let $\#_{n}$ be the operation on $\{1,\ldots,2^{n}\}$ defined by $x\#_{n}y=L_{n}^{\sharp}(L_{n}^{\sharp}(x)*_{n}L_{n}^{\sharp}(y))$ where $*_{n}$ is the classical Laver table operation on $\{1,\ldots,2^{n}\}$.
Let $C_{n}=\{(\frac{x}{2^{n}},\frac{x\#_{n}y}{2^{n}})\mid x,y\in\{1,\ldots,2^{n}\}\}$. Then $C_{n}$ is a subset of $[0,1]\times[0,1]$ and the sets $C_{n}$ converge in the Hausdorff metric to a compact subset $C\subseteq[0,1]\times[0,1]$. The link that I gave gives images of $C$ that you may zoom in to.
Since $A_{48}$ is still the largest classical Laver table ever computed, we are only able to zoom into $C$ with $2^{48}\times 2^{48}$ resolution (which is about 281 trillion by 281 trillion so we can see microscopic detail).
As I kind of expected, these images of the classical Laver tables are quite tame compared to the wildness of the final matrix which one obtains from the generalized Laver tables $(A^{\leq 2^{n}})^{+}$; the generalized Laver tables give more fractal-like images while the classical Laver tables give more geometric images. I conjecture that the set $C$ has Hausdorff dimension $1$ though I do not have a proof. The simplicity of these images of the classical Laver tables gives some hope for computing the classical Laver tables past even $A_{96}$.
Some regions in the set $C$ may look to be simply smooth vertical or diagonal lines, but if there exists a rank-into-rank cardinal, then every single neighborhood in $C$ has fractal features if you zoom in far enough (I suspect that you will need to zoom in for a very very long time before you see any fractal features and I also suspect that you will need to zoom into the right location to see the fractal behavior).
]]>In this post, we shall use large cardinals and forcing to prove the existence of certain classes of finite self-distributive algebras with a compatible linear ordering. The results contained in this note shall be included in my (hopefully soon to be on Arxiv) 100+ page paper Generalizations of Laver tables. In this post, I have made no attempt to optimize the large cardinal hypotheses.
For background information, see this post or see Chapter 11 in the Handbook of Set Theory.
We shall let $\mathcal{E}_{\alpha}$ denote the set of all elementary embeddings $j:V_{\alpha}\rightarrow V_{\alpha}.$
By this answer, I have outlined a proof that the algebra $\mathcal{E}_{\lambda}/\equiv^{\gamma}$ is locally finite. We therefore have established a deep connection between the top of the large cardinal hierarchy and finite algebras.
In this note, we shall use two important ideas to construct finite self-distributive algebras. The main idea is to generalize the square root lemma for elementary embeddings so that one obtains elementary embeddings with the desired properties.
$\textbf{Theorem: (Square Root Lemma)}$ Let $j\in\mathcal{E}_{\lambda+1}$. Then there is some $k\in\mathcal{E}_{\lambda}$ where $k*k=j|_{V_{\lambda}}$.
$\mathbf{Proof}:$ By elementarity
$$V_{\lambda+1}\models\exists k\in\mathcal{E}_{\lambda}:k*k=j|_{V_{\lambda}}$$
if and only if
$$V_{\lambda+1}\models\exists k\in\mathcal{E}_{\lambda}:k*k=j(j|_{V_{\lambda}})$$
which is true. Therefore, there is some $k\in\mathcal{E}_{\lambda}$ with $k*k=j|_{V_{\lambda}}$. $\mathbf{QED}$
The other idea is to work in a model such that there is a cardinal $\lambda$ where there are plenty of rank-into-rank embeddings from $V_{\lambda}$ to $V_{\lambda}$ but where $V_{\lambda}\models\text{V=HOD}$. If $V_{\lambda}\models\text{V=HOD}$, then $V_{\lambda}$ has a definable linear ordering which induces a desirable linear ordering on rank-into-rank embeddings and hence linear orderings on finite algebras. The following result can be found in this paper.
$\mathbf{Theorem}$ Suppose that there exists a non-trivial elementary embedding $j:V_{\lambda+1}\rightarrow V_{\lambda+1}$. Then in some forcing extension $V[G]$ there is some elementary embedding $k:V[G]_{\lambda+1}\rightarrow V[G]_{\lambda+1}$ where
$V[G]_{\lambda}\models\text{V=HOD}$.
Therefore it is consistent relative to large cardinals that there exists a non-trivial elementary embedding $j:V_{\lambda+1}\rightarrow V_{\lambda+1}$ such that $V_{\lambda}\models\text{V=HOD}$.
Now suppose that $V_{\lambda}\models\text{V=HOD}$. Then there exists a linear ordering $\ll$ of $V_{\lambda}$ which is definable in $V_{\lambda}$. If $j\in\mathcal{E}_{\lambda}$ and $\gamma$ is a limit ordinal with $\gamma<\lambda$, then define $j\upharpoonright_{\gamma}:V_{\gamma}\rightarrow V_{\gamma+1}$ by $j\upharpoonright_{\gamma}(x)=x\cap V_{\gamma}$ for each $x\in V_{\gamma}.$ Take note that $j\upharpoonright_{\gamma}=k\upharpoonright_{\gamma}$ if and only if $j\equiv^{\gamma}k$. Define a linear ordering $\trianglelefteq$ on $\mathcal{E}_{\lambda}$ where $j\trianglelefteq k$ if and only if $j=k$ or there is a limit ordinal $\alpha$ where $j\upharpoonright_{\alpha}\ll k\upharpoonright_{\alpha}$ but where $j\upharpoonright_{\beta}=k\upharpoonright_{\beta}$ whenever $\beta<\alpha$. Define a linear ordering $\trianglelefteq$ on $\{j\upharpoonright_{\gamma}\mid j\in\mathcal{E}_{\lambda}\}$ by letting $j\upharpoonright_{\gamma}\triangleleft k\upharpoonright_{\gamma}$ if and only if there is some limit ordinal $\beta\leq\gamma$ where $j\upharpoonright_{\beta}\ll k\upharpoonright_{\beta}$ but where $j\upharpoonright_{\alpha}=k\upharpoonright_{\alpha}$ whenever $\alpha$ is a limit ordinal with $\alpha<\beta$. By elementarity, the linear ordering $\trianglelefteq$ satisfies the following compatibility property: if $k\upharpoonright_{\gamma}\trianglelefteq l\upharpoonright_{\gamma}$, then $(j*k)\upharpoonright_{\gamma}\trianglelefteq(j*l)\upharpoonright_{\gamma}$. We say that a linear ordering $\leq$ on a Laver-like LD-system $(X,*)$ is a compatible linear ordering if $y\leq z\Rightarrow x*y\leq x*z$. If $V_{\lambda}\models\text{V=HOD}$, then $\mathcal{E}_{\lambda}/\equiv^{\gamma}$ has a compatible linear ordering defined by $[j]_{\gamma}\leq[k]_{\gamma}$ if and only if $j\upharpoonright_{V_{\gamma}}\trianglelefteq k\upharpoonright_{V_{\gamma}}$.
Using generalized Laver tables, we know that the set $\{\text{crit}(j):j\in\langle j_{1},…,j_{n}\rangle\}$ has order-type $\omega$. Let $\text{crit}_{r}(j_{1},…,j_{n})$ be the $r$-th element of the set $$\{\text{crit}(j):j\in\langle j_{1},…,j_{n}\rangle\}$$ ($\text{crit}_{0}(j_{1},…,j_{n})$ is the least element of $\{\text{crit}(j):j\in\langle j_{1},…,j_{n}\rangle\}$). Let $T:\bigcup_{n\in\omega}\mathcal{E}_{\lambda}^{n}\rightarrow V_{\omega\cdot 2}$ be a mapping definable in $(V_{\lambda+1},\in)$ where $T(j_{1},…,j_{m})=T(k_{1},…,k_{n})$ if and only if $m=n$ and if $\gamma=\text{crit}_{r} (j_{1},…,j_{m})$ and $\delta=\text{crit}_{r}(k_{1},…,k_{n})$, then there is some isomorphism $\phi:\langle j_{1},…,j_{m}\rangle/\equiv^{\gamma}\rightarrow\langle k_{1},…,k_{n}\rangle/\equiv^{\delta}$ where $\phi([j_{i}]_{\gamma})=[k_{i}]_{\delta}$. We remark that if $T(j_{1},…,j_{m})=T(k_{1},…,k_{n})$, then the subspaces $\overline{\langle j_{1},…,j_{m}\rangle}$ and $\overline{\langle k_{1},…,k_{n}\rangle}$ of $\mathcal{E}_{\lambda}$ are homeomorphic by an isomorphism of algebras preserving $*,\circ$ ($\mathcal{E}_{\lambda}$ can be given a complete metric that induces a canonical uniformity on $\mathcal{E}_{\lambda}$).
The following technical result is a generalization of the Square-Root Lemma, and a simplified special case of the following results can be found in this answer that I gave.
Then there are $(w_{r,s})_{1\leq r\leq n,1\leq s\leq p}$ in $\mathcal{E}_{\lambda}$ where
$\mathbf{Proof:}$ For $1\leq i\leq p$, let $A_{i}$
$$=\{(w_{1}\upharpoonright_{\text{crit}_{v}(w_{1},…,w_{n})},…,w_{n}\upharpoonright_{\text{crit}_{v}(w_{1},…,w_{n})}):
T(j_{1},…,j_{m},w_{1},…,w_{n})=x_{i}\}.$$
Then $\ell_{i}(A_{i})$
$$=\{(w_{1}\upharpoonright_{\text{crit}_{v}(w_{1},…,w_{n})},…,w_{n}\upharpoonright_{\text{crit}_{v}(w_{1},…,w_{n})}):
T(\ell_{i}*j_{1},…,\ell_{i}*j_{m},w_{1},…,w_{n})=x_{i}\}.$$
Therefore,
$$(k_{1,i}\upharpoonright_{\mu},…,k_{n,i}\upharpoonright_{\mu})\in\ell_{i}(A_{i})$$ for $1\leq i\leq p$. Since
$k_{r,1}\upharpoonright_{\mu}=…=k_{r,p}\upharpoonright_{\mu}$, we have
$$(k_{1,1}\upharpoonright_{\mu},…,k_{n,1}\upharpoonright_{\mu})=…=(k_{1,p}\upharpoonright_{\mu},…,k_{n,p}\upharpoonright_{\mu}).$$
Therefore, let $$(\mathfrak{k}_{1},…,\mathfrak{k}_{n})=(k_{1,1}\upharpoonright_{\mu},…,k_{n,1}\upharpoonright_{\mu}).$$
Then
$$(\mathfrak{k}_{1},…,\mathfrak{k}_{n})\in\ell_{1}(A_{1})\cap…\ell_{p}(A_{p})\cap V_{\mu+\omega}$$
$$=\ell_{1}(A_{1})\cap…\cap\ell_{1}(A_{p})\cap V_{\mu+\omega}$$
$$\subseteq\ell_{1}(A_{1}\cap…\cap A_{p}).$$
Therefore, $A_{1}\cap…\cap A_{p}\neq\emptyset.$
Let $(\mathfrak{w}_{1},…,\mathfrak{w}_{n})\in A_{1}\cap…\cap A_{p}$. Then there are $(w_{r,s})_{1\leq r\leq n,1\leq s\leq p}$ in $\mathcal{E}_{\lambda}$ where
$$(\mathfrak{w}_{1},…,\mathfrak{w}_{n})=(w_{1,i}\upharpoonright_{\text{crit}_{v}(w_{1,i},…,w_{n,i})},…,w_{n,i}\upharpoonright_{\text{crit}_{v}(w_{1,i},…,w_{n,i})})$$
and
$$T(j_{1},…,j_{m},w_{1,i},…,w_{n,i})=x_{i}$$
for $1\leq i\leq p.$ Therefore, there is some $\alpha<\lambda$ with $\text{crit}_{v}(w_{1,s},...,w_{n,s})=\alpha$ for $1\leq s\leq p$ and where $w_{r,1}\equiv^{\alpha}\ldots\equiv^{\alpha}w_{r,p}$ for $1\leq r\leq n$. $\mathbf{QED}$
$\mathbf{Remark:}$ The above theorem can be generalized further by considering the classes of rank-into-rank embeddings
described in this paper.
If $Y$ is a finite reduced Laver-like LD-system, then let $\approx$ be the relation on $Y^{<\omega}$ where $(x_{1},...,x_{m})\approx(y_{1},...,y_{n})$ if and only if $m=n$ and whenever $\langle x_{1},...,x_{m}\rangle$ and $\langle y_{1},...,y_{n}\rangle$ both have more than $v+1$ critical points, then there is an isomorphism
\[\iota:\langle x_{1},...,x_{m}\rangle/\equiv^{\text{crit}_{v}(x_{1},...,x_{m})}\rightarrow
\langle y_{1},...,y_{n}\rangle/\equiv^{\text{crit}_{v}(y_{1},...,y_{n})}\]
where $\iota([x_{i}])=[y_{i}]$ for $1\leq i\leq n$.
Then there is some finite reduced Laver-like LD-system $X$ along with
\[x,(y_{r,s})_{1\leq r\leq n,1\leq s\leq p}\in X\]
such that
ind
I challenge the readers of this post to remove the large cardinal hypotheses from the above theorem (one may still use the freeness of subalgebras $\varprojlim_{n}A_{n}$ and the fact that $2*_{n}x=2^{n}\Rightarrow 1*_{n}x=2^{n}$ though).
So it turns out that by taking stronger large cardinal axioms, one can induce a linear ordering on the algebras of elementary embeddings without having to resort to working in forcing extensions. We say that a cardinal $\delta$ is an I1-tower cardinal if for all $A\subseteq V_{\delta}$ there is some $\kappa<\delta$ such that whenever $\gamma<\delta$ there is some cardinal $\lambda<\delta$ and non-trivial elementary embedding $j:V_{\lambda+1}\rightarrow V_{\lambda+1}$ such that $\text{crit}(j)=\kappa$ and where $j(\kappa)>\delta$ and where $j(A)=A$. If $A$ is a good enough linear ordering on $V_{\delta}$, then $A\cap V_{\lambda}$ induces a compatible linear ordering the set of all elementary embeddings $j:V_{\lambda}\rightarrow V_{\lambda}$ such that $j(A\cap V_{\gamma})=A\cap V_{j(\gamma)}$ for all $\gamma<\lambda$. It is unclear where the I1-tower cardinals stand on the large cardinal hierarchy or whether they are even consistent.
It turns out that we can directly show that if $j:V_{\lambda+1}\rightarrow V_{\lambda+1}$ is a non-trivial elementary embedding, then there is a linear ordering $B$ of $V_{\lambda}$ where $j(B)=B$. In fact, if $j:V_{\lambda}\rightarrow V_{\lambda}$ is a non-trivial elementary embedding, $\mathrm{crit}(j)=\kappa$, and $A$ is a linear ordering of $V_{\lambda}$, then if we let $B=\bigcup_{n}j^{n}(A)$, then $B$ is a linear ordering of $V_{\lambda}$ and $j(B\cap V_{\gamma})=B\cap V_{j(\gamma)}$ whenever $\gamma<\lambda$. In particular, if $j$ extends to an elementary embedding $j^{+}:V_{\lambda+1}\rightarrow V_{\lambda+1}$, then $j^{+}(B)=B$. One can therefore prove the results about finite permutative LD-systems by working with the linear ordering that comes from $B$ instead of the linear ordering that comes from the fact that $V_{\lambda}[G]\models V=HOD$ in some forcing extension. One thing to be cautious of when one announces results before publication is that perhaps the proofs are not optimal and that one can get away with a simpler construction.
Philosophy and research project proposals
In the above results, we have worked in a model $V$ where there are non-trivial maps $j:V_{\lambda}\rightarrow V_{\lambda}$ and where $V_{\lambda}\models\text{V=HOD}$ in order to obtain compatible linear orderings on finite algebras. However, if we work in different forcing extensions with rank-into-rank embeddings instead, then I predict that one may obtain from large cardinals different results about finite algebras.
I predict that in the near future, mathematicians will produce many results about finite or countable self-distributive algebras using forcing and large cardinals where the large cardinal hypotheses cannot be removed. I also predict that rank-into-rank cardinals will soon prove results about structures that at first glance have little to do with self-distributivity.
I must admit that I am not 100 percent convinced of the consistency of the large cardinals around the rank-into-rank level. My doubt is mainly due to the existence of finite reduced Laver-like LD-systems which cannot be subalgebras of $\mathcal{E}_{\lambda}/\equiv^{\gamma}$. However, if no inconsistency is found, then the results about finite or countable structures that arise from very large cardinals would convince me not only of the consistency of very large cardinals but also the existence of these very large cardinals. Therefore, people should investigate the finite algebras which arise from very large cardinals in order to quell all doubts about the consistency or the existence of these very large cardinals.
Since it is much more likely that the Reinhardt cardinals are inconsistent than say the I1 cardinals are inconsistent, I also propose that we attempt to use the algebras of elementary embeddings to show that Reinhardt cardinals are inconsistent. I have not seen anyone investigate the self-distributive algebras of elementary embeddings at the Reinhardt level. However, I think that investigating the self-distributive algebras of elementary embeddings would be our best hope in proving that the Reinhardt cardinals are inconsistent.
]]>Where I grew up in Mtl is unusual: right downtown, at the corner of City Councillors and Sherbrooke St (1951-1960). Hard to believe there were attached single-family houses on that street!
]]>But I ended up at Smith College, and then Macalester College, and am now retired in Colorado (at 10,000 feet elev).
]]>
(defun ancestors (node) ;; => list
"Returns the list of strict ancestors of NODE."
(when (node-parents node)
(remove-duplicates
(apply #'append
(node-parents node)
(map 'list #'ancestors (node-parents node))))))
(defun graph-implies-p (X Y)
(member X (cons Y (ancestors Y))))
I say I’m not convinced because `graph-implies-p` is only , and N is only 427. Most time is spent on `graph-implies-not-p`, which also searches direct descendants/ancestors first, but seems to be necessarily . What I think might make a difference (when drawing very large graphs) is only graphing direct implications of the subgraph, instead of graphing everything and having `dot tred` remove the extra edges. Perhaps depths will help there. Correctness of the predicates is most important, so I’m reluctant to complicate the code. Since very large graphs seem to only have an aesthetic use (they aren’t really readable), taking care of this is not high in my priority list. :)
Could you elaborate on your suggestion to ‘track some data and create some caching mechanism for “common forms”‘?
]]>And this whole thing sends me back to my freshman year, learning about BFS and DFS algorithms. In this case, I’d go with a mix of the two, breaking the depth into chunks of several implications at a time (ranked from “most directly implied” to least) and going through each of them BFS-style.
]]>0 likes
]]>But of course that this decision has to be ratified by a majority of 3/4th before we can even approach the Vogons with this offer, which will then require several dozen forms and committees to process this through until they are able to debate whether or not they want to populate this additional committee for deciding what gets into the Choiceless Dictionary.
]]>We can test such a feature for CGraph with another (smaller?) dataset of implications, for example large cardinals in ZFC? Or even reverse mathematics, but I bet that’s definitely not a smaller dataset.
I suppose the issue would be mostly who gets to and has time to decide which forms should be added and which results are incorporated.
]]>0 likes
]]>Under 2. (ii) it should read \phi \leftrightarrow \phi^M (for some reason, my object-language “iff” got removed on posting).
]]>“I actually only specified that the ground model of the Boolean valued model should be elementarily equivalent to V”
That is a good point: Dunno how I missed that (must have been tired and/or rushed)!
“Can you say more about how we can get a countable transitive model that is elementarily equivalent to V”
Obviously V can’t tell that such a ctm provides its own truth definition (by Tarski). Two options:
1. Use a truth predicate, add the Tarski axioms, and permit the use of a truth predicate in the Comprehension and Replacement axioms. Prove reflection, reflect on formula “x is a Godel-code for some phi and Tr(x)”. Then the structure can see a ctm elementarily equivalent to for ZFC-formulas by usual Skolemisation and Collapse. (Thanks to Sam Roberts for mentioning use of truth predicates for this).
2. Extend ZFC with a constant M and the following axioms:
(i) M is countable and transitive.
(ii) \phi \phi^(M) [by Tarski, this has to be a scheme]
This is actually a conservative extension of ZFC.
“and why this is important (you mentioned something about generic embeddings)?”
It depends what you want out of forcing. If you just want your relative consistency proof, then all you need is a ctm for (a big enough finite fragment of) ZFC. But maybe you want to prove some facts about V that go well beyond its ZFC-satisfaction. Maybe you want to do this by combinatorially bolting sets together in the extension. Say you had good reason to believe in the existence of an ideal required to facilitate a generic embedding (as I understand it, we can perfectly well have this in V even if we believe there are no generics for V). But moving to a ctm for ZFC doesn’t guarantee that I can do this construction (it might not have the ideal), or even if it does, that some consequences determined in relation to other facts in V hold in the ctm (our ctm might radically differ to V with respect to other independent facts!). We want a nice, uniform way of getting the most accurate picture possible of extensions *of V*, not a model something-a-bit-like-V.
Of course, technically speaking you could just move to a ctm of (a big enough finite fragment of) ZFC + “Exists an ideal of the required kind” + “Other truths of V”. This isn’t a technical point about the metamathematics of forcing, rather I’m making a point about *philosophical* importance. If you want to mimic extensions *of V*, you need something that looks very much like V but has the generics, not something that looks a little like V.
“Do you mind adding links to PDFs if you already have preprints?”
I’m actually just redrafting the two papers now (as well as making a new website) and so will add a link in the next couple of weeks to this comment thread.
]]>The classical Laver tables can be given in ZFC a linear ordering such that the endomorphisms are monotone functions. When we reorder the classical Laver tables according to this compatible linear ordering we get the tables in the following link.
http://boolesrings.org/jvanname/lavertables-database-classical-fullcompatibletabletoa5/
The following link gives the compressed versions of the multiplication table for the classical Laver tables reordered according to the compatible linear ordering.
http://boolesrings.org/jvanname/lavertables-database-classical-alltablescompatibletoa10/
The compatible linear ordering on the classical Laver tables produces fractal like patterns that converge to a compact subset of the unit square. These images are found on the following link.
http://boolesrings.org/jvanname/lavertables-visualization-classical-imagesofcompatibletables/
And this page gives images of the fractal pattern that comes from the classical Laver tables.
http://boolesrings.org/jvanname/lavertables-visualization-classical-imagesoftables/
Hopefully I will be able to finish my long paper on the classical Laver tables over the next couple of months.
]]>0 likes
]]>0 likes
]]>The first is the assumed knowledge of the students. You can’t talk about Cantor–Bendixson analysis with people whose grasp of topology is weak to non-existent. Sure, you can define all the relevant terms, but that is time you’re not getting back from other topics. You can’t talk about motivating inaccessible cardinals and measurable cardinals with people whose grasp in model theory is weak, or with people that never saw a proof of the incompleteness theorem. Yes, you can cite these, but it doesn’t quite cut to the same depth. My class is very heterogeneous in this aspect, there are sophomore undergrads, there are seniors, there are grad students; there are some who had taken a course about the incompleteness theorem, and some of them are only taking their first logic course in parallel. If you want to retain a large class, a large heterogeneous class, it means making everyone feel that they actually follow you. And this means that it might be a good idea to stick to some basics.
For what it’s worth, the basic set theory course—for the past three years, under Azriel’s helm with me on the exercises—covered quite more than order embedding the reals into P(N). We covered choice related examples, we covered ordinal arithmetic, we covered the notion of cofinality and we covered the basics too. So now we need to talk about clubs, stationary sets, getting to know cardinal arithmetic better, and talking about why some axioms are consistent: specifically Foundation which is not introduced in the basic course, and Choice which is just saying let’s build L that I find to be important for later in set theory also.
The second issue is “follow your heart”. My first course in logic and set theory was with Uri Abraham, who is a wonderful teacher, but it was a very basic course. We covered very little in terms of set theory, and the very very basics of logic (what is a structure, what is logical entailment; without even getting into compactness, or even the axiom of choice). The rest of the courses (logic, axiomatic set theory, descriptive set theory) were taught by Matti Rubin when I took them. Now Matti is a wonderful person to talk to, and he’s very fun as a teacher. But I got out of these courses knowing almost nothing relevant. The axiomatic set theory course spent most of its time proving the equiconsistency of Foundation and its negation, in a proof that is essentially in PRA (without really talking about coding proofs into PRA). At the time I thought it was great, but when the next semester I had taken a reading course to learn forcing (there weren’t enough students to get the course to be frontal), I realized that I don’t understand almost anything about how you actually prove something in set theory, or how you do any research in set theory (we did other things, sure, but not many things, and only for a couple of weeks). For a course that is supposed to give you a taste of set theory, this was terrible. I’ve since covered the rest of the ground, of course, either with Menachem and his many courses, or by myself (by reading and by interacting with people like you online). But I much rather give my students a better taste as to what set theory is like.
And yes, you are absolutely right, that there are topics that you can use to demonstrate this to a better degree. You can show them a myriad of combinatorial constructions, or large cardinals which do not require a complicated statement (so Woodin cardinals are off the table here). You could even show them topological consequences of forcing axioms. But at the end of the day, you need to give them a breadth of ground to work with. The next semester there will be a course about forcing and independence. I need to prepare them to that course, and I need to prepare them to the future if they choose to be set theorists (many won’t be, but some will definitely be); and most importantly, those who will drop out over the next few weeks, and those that will not continue in logic-related topics, should leave this course feeling that set theory is an important and interesting topic, so when they are sitting in some committee a couple of decades from now, a set theorist might have a better chance in getting whatever the committee is voting on because they won’t be biased against the topic.
Okay, maybe the line between those two points is a bit blurred, and they make more of a point and a half. The thing I’m trying to say is that (1) I have a very heterogeneous class, and (2) doing “the things close to you, the way you want to do them” might not be in the best interest of the students for a first class in set theory.
]]>In my own case, for example, at the graduate level I tend to dwell on the Cantor-Bendixson analysis, which both motivates the ordinal concept (it is after all the source of the ordinal concept), while also remaining connected with the reals, and it leads directly to the continuum hypothesis. Then, I aim as quickly as possible for large cardinals, getting to inaccessibles, hyperinaccessibles and Mahlos. Apart from having an inherently attractive combinatorics, the large cardinals also showcase the awesome size of the objects studied in set theory, which is inspiring and motivating on its own. In addition, the fact that we can’t prove they exist leads to profound philosophical and foundational issues and questions, and this motivates the consistency strength arguments and relative consistency proofs generally. I would expect that you would want to spend some significant time on ZF and the cases where AC fails, but I rarely spend much time on that at all.
At the undergraduate level, I tend to spend a big chunk of time on the back-and-forth construction for DLO, and then universality, such as building universal partial orders, and I always look quite a bit at embedding into the lattice P(N). For example, can you order-embed the real line into P(N) with subset?
]]>(Mike- I cleaned up the link.)
]]>0 likes
]]>0 likes
]]>0 likes
]]>The argument above gives a counterexample assuming the existence of some statement $\phi$ such that $\phi \lor \lnot\phi$ is not true. In that case, the set $\{p,\{0\}\}$ is not decidable since $p = \{0\}$ is equivalent to $\phi$. Of course, and this is the main point of my post, this example uses extensionality.
]]>0 likes
]]>In general topology, there are two different kinds of topological spaces. There are the topological spaces that satisfy higher separation axioms such as the 3 dimensional space that we live in; when most people think of general topology (especially analysts and algebraic topologists), they usually think of spaces which satisfy higher separation axioms. On the other hand, there are topological spaces which only satisfy lower separation axioms; these spaces at first glance appear very strange since sequences can converge to multiple points. They feel much different from spaces which satisfy higher separation axioms. These spaces include the Zariski topology, finite non-discrete topologies, and the cofinite topology. Even spaces that set theorists consider such as the ordinal topology on a cardinal $\kappa$ or the Stone-Cech compactication $\beta\omega$ satisfy higher separation axioms; after all, $\beta\omega$ is the maximal ideal space of $\ell^{\infty}$. The general topology of lower separation axioms is a different field of mathematics than the general topology of higher separation axioms.
However, can we in good conscience formally draw the line between the lower separation axioms and the higher separation axioms or is the notion of a higher separation axiom simply an informal notion? If there is a line, then where do we draw the line between these two kinds of topological spaces?
As the sole owner of a silver badge in general topology on mathoverflow, I declare that the axiom complete regularity is the place where we need to draw the line between the lower separation axioms and the higher separation axioms. I can also argue that complete regularity is correct cutoff point by appealing to an authority greater than myself; the American Mathematical Society’s MSC-classification (the authority on classifying mathematics subjects) also delineates the lower separation axioms and the higher separation axioms at around complete regularity:
54D10-Lower separation axioms ($T_0$–$T_3$, etc.)
54D15-Higher separation axioms (completely regular, normal, perfectly or collectionwise normal, etc.)
Let me now give a few reasons why complete regularity is the pivotal separation axiom.
Hausdorffness is not enough. We need at least regularity.
Hausdorff spaces are appealing to mathematicians because Hausdorff spaces are precisely the spaces where each net (or filter) converges to at most one point. However, the condition that every net converges to at most one point should not be enough for a space to feel like it satisfies higher separation axioms. Not only do I usually want filters to converge to at most one point, but I also want the closures of the elements in a convergent filter to also converge. However, this condition is equivalent to regularity.
$\mathbf{Proposition}:$ Let $X$ be Hausdorff space. Then $X$ is regular if and only if whenever $\mathcal{F}$ is a filter that converges to a point $x$, the filterbase $\{\overline{R}\mid R\in\mathcal{F}\}$ also converges to the point $x$.
The next proposition formulates regularity in terms of the convergence of nets. The intuition behind the condition in the following proposition is that for spaces that satisfy higher separation axioms, if $(x_{d})_{d\in D},(y_{d})_{d\in D}$ are nets such that $x_{d}$ and $y_{d}$ get closer and closer together as $d\rightarrow\infty$, and if $(y_{d})_{d\in D}$ converges to a point $x$, then $(x_{d})_{d\in D}$ should also converge to the same point $x$.
$\mathbf{Proof:}$ $\rightarrow$ Suppose that $(x_{d})_{d\in D}$ does not converge to $x$. Then there is an open neighborhood $U$ of $x$ where $\{d\in D\mid x_{d}\not\in U\}$ is cofinal in $D$. Therefore, there is some open set $V$ with $x\in V\subseteq\overline{V}\subseteq U$. Therefore, let $U_{d}=(\overline{V})^{c}$ whenever $d\in D$ and $U_{d}$ be an arbitrary set otherwise. Then whenever $y_{d}\in U_{d}$ for each $d\in D$, the set $\{d\in D\mid y_{d}\not\in U\}$ is cofinal in $D$. Therefore, $(y_{d})_{d\in D}$ does not converge to $x$ either.
$\leftarrow$ Suppose now that $X$ is not regular. Then there is an $x\in X$ and an open neighborhood $U$ of $x$ such that if $V$ is an open set with $x\in V$, then $V\not\subseteq U$. Therefore, let $D$ be a directed set and let $U_{d}$ be an open neighborhood of $x$ for each $d\in D$ such that for all open neighborhoods $V$ of $x$ there is a $d\in D$ so that if $e\geq d$, then $U_{d}\subseteq V$. Then let $x_{d}\in\overline{U_{d}}\setminus U$ for all $d\in D$. Then $(x_{d})_{d\in D}$ does not converge to $x$. Now suppose that $V_{d}$ is a neighborhood of $x_{d}$ for each $d\in D$. Then for each $d\in D$, we have $V_{d}\cap U_{d}\neq\emptyset$. Therefore, let $y_{d}\in V_{d}\cap U_{d}$. Then $(y_{d})_{d\in D}$ does converge to $x$. $\mathbf{QED}$.
Complete regularity is closed under most reasonable constructions
If there is a main separation axiom that draws the line between higher separation axioms and lower separation axioms, then this main separation axiom should be closed under constructions such as taking subspaces and taking arbitrary products. Since every completely regular space is isomorphic to a subspace $[0,1]^{I}$, the crossing point between lower and higher separation axioms should be no higher than complete regularity.
Not only are the completely regular spaces closed under taking products and subspaces, but the completely regular spaces are also closed under taking ultraproducts, the $P$-space coreflection, box products and other types of products, and various other constructions. Since we want our main separation axiom to be closed under most reasonable standard constructions and no lower than regularity, regularity and complete regularity are the only two candidates for our main separation axiom. We shall now find out why complete regularity is a better candidate than regularity for such a separation axiom.
Completely regular spaces can be endowed with richer structure
The completely regular spaces are precisely the spaces which can be given extra structure that one should expect to have in a topological space.
While a topological space gives one the notion of whether a point is touching a set, a proximity gives on the notion of whether two sets are touching each other. Every proximity space has an underlying topological space. Proximity spaces are defined in terms of points and sets with no mention of the real numbers, but proximity spaces are always completely regular. Furthermore, the compatible proximities on a completely regular space are in a one-to-one correspondence with the Hausdorff compactifications of the space.
$\mathbf{Theorem:}$ A topological space is completely regular if and only if it can be endowed with a compatible proximity.
The notion of a uniform space is a generalization of the notion of a metric space so that one can talk about concepts such as completeness, Cauchy nets, and uniform continuity in a more abstract setting. A uniform space gives one the notion of uniform continuity in the same way the a topological space gives one the notion of continuity. The definition of a uniform space is also very set theoretic, but it turns out that that every uniform space is induced by a set of pseudometrics and hence completely regular.
$\mathbf{Theorem:}$ A topological space is completely regular if and only if it can be endowed with a compatible uniformity.
For example, it is easy to show that every $T_{0}$-topological group can be given a compatible uniformity. Therefore, since the topological groups can always be given compatible uniformities, every topological group (and hence every topological vector space) is automatically completely regular.
Complete regularity is the proper line of demarcation between low and high separation axioms since the notions of a proximity and uniformity (which capture intuitive notions related to topological spaces without referring to the real numbers) induce precisely the completely regular spaces.
The Hausdorff separation axiom generalizes poorly to point-free topology
I realize that most of my readers probably have not yet been convinced of the deeper meaning behind point-free topology, but point-free topology gives additional reasons to prefer regularity or complete regularity over Hausdorffness.
Most concepts from general topology generalize to point-free topology seamlessly including separation axioms (regularity, complete regularity, normality), connectedness axioms (connectedness, zero-dimensionality, components), covering properties (paracompactness,compactness, local compactness, the Stone-Cech compactification), and many other properties. The fact that pretty much all concepts from general topology extend without a problem to point-free topology indicates that point-free topology is an interesting and deep subject. However, the notion of a Hausdorff space does not generalize very well from point-set topology to point-free topology. There have been a couple attempts to generalize the notion of a Hausdorff space to point-free topology. For example, John Isbell has defined an I-Hausdorff frame to be a frame $L$ such that the diagonal mapping $D:L\rightarrow L\oplus L$ is a closed localic mapping ($\oplus$ denotes the tensor product of frames). I-Hausdorff is a generalization of Hausdorffness since it generalizes the condition “$\{(x,x)\mid x\in X\}$ is closed” which is equivalent to Hausdorffness. Dowker and Strauss have also proposed several generalizations of Hausdorffness. You can read more about these point-free separation axioms at Karel Ha’s Bachelor’s thesis here. These many generalizations of the Hausdorff separation axioms are not equivalent. To make matters worse, I am not satisfied with any of these generalizations of Hausdorffness to point-free topology.
It is often the case that when an idea from general topology does not extend very well to point-free topology, then that idea relies fundamentally on points. For example, the axiom $T_{0}$ is completely irrelevant to point-free topology since the axiom $T_{0}$ is a pointed concept. Similarly, the axiom $T_{1}$ is not considered for point-free topology since the notion of a $T_{1}$-space is also fundamentally a pointed notion rather than a point-free notion. For a similar reason, Hausdorffness does not extend very well to point-free topology since the definition of Hausdorffness seems to fundamentally rely on points.
Just like in point-set topology, in point-free topology there is a major difference between the spaces which do not satisfy higher separation axioms and the spaces which do satisfy higher separation axioms. The boundary between lower separation axioms and higher separation axioms in point-set topology should therefore also extend to a boundary between lower separation axioms and higher separation axioms in point-free topology. Almost all the arguments for why complete regularity is the correct boundary between lower and higher separation axioms that I gave here also hold for point-free topology. Since Hausdorffness is not very well-defined in a point-free context, one should not regard Hausdorffness as the line of demarcation between lower separation axioms and higher separation axioms in either point-free topology or point-set topology.
Conclusion
Spaces that only satisfy lower separation axioms are good too.
While completely regular spaces feel much different from spaces which are not completely regular, spaces which satisfy only lower separation axioms are very nice in their own ways. For example, non $T_{1}$-spaces have a close connection with ordered sets since every non-$T_{1}$-space has a partial ordering known as the specialization ordering. I do not know much about algebraic geometry, but algebraic geometers will probably agree that spaces which only satisfy the lower separation axioms are important. Frames (point-free topological spaces) which only satisfy lower separation axioms are also very nice from a lattice theoretic point of view; after all, frames are precisely the complete Heyting algebras.
The underappreciation for complete regularity
The reason why Hausdorffness is often seen as a more important separation axiom than complete regularity is that Hausdorffness is easy to define than complete regularity. The definition of Hausdorffness only refers to points and sets while complete regularity refers to points, sets, and continuous real-valued functions. Unfortunately, since the definition of complete regularity is slightly more complicated than the other separation axioms, complete regularity is not often given the credit it deserves. For example, in the hierarchy of separation axioms, complete regularity is denoted as $T_{3.5}$. It is not even given an integer. However, Hausdorffness is denoted as $T_{2}$, regularity is denoted as $T_{3}$ and normality is denoted as $T_{4}$. Furthermore, when people often mention separation axioms they often fail to give complete regularity adequate attention. When discussing separation axioms in detail, one should always bring up and emphasize complete regularity.
In practice, the Hausdorff spaces that people naturally comes across are always completely regular. I challenge anyone to give me a Hausdorff space which occurs in nature or has interest outside of general topology which is not also completely regular. The only Hausdorff spaces which are not completely regular that I know of are counterexamples in general topology and nothing more. Since all Hausdorff spaces found in nature are completely regular, complete regularity should be given more consideration than it is currently given.
]]>For one reason or another, I didn’t post it back then. But now I was thinking about something related, that didn’t come into a full post of its own, where syntax and semantics were also separate, and I ran into this comic. So I figured, eh, why not post this one.
]]>Classical Laver tables computation
Recall that the classical Laver table is the unique algebra $A_{n}=(\{1,…,2^{n}\},*_{n})$ such that $x*_{n}(y*_{n}z)=(x*_{n}y)*_{n}(x*_{n}z)$, $x*_{n}1=x+1$ for $x<2^{n}$, and $2^{n}*_{n}1=1$. The operation $*_{n}$ is known as the application operation. Even though the definition of the classical Laver tables is quite simple, the classical Laver tables are combinatorially very complicated structures, and there does not appear to be an efficient algorithm for computing the classical Laver tables like there is for ordinary multiplication.
If $x,r$ are positive integers, then let $(x)_{r}$ denote the unique integer in $\{1,…,r\}$ such that $x=(x)_{r}\,(\text{mod}\,r)$. The mappings $\phi:A_{n+1}\rightarrow A_{n},\iota:A_{n}\rightarrow A_{n+1}$ defined by $\phi(x)=(x)_{\text{Exp}(2,n)}$ and $\iota(x)=x+2^{n}$ are homomorphisms between self-distributive algebras. If one has an efficient algorithm for computing $A_{n+1}$, then these homomorphisms allow one to compute $A_{n}$ efficiently as well. Therefore, the problem of computing $A_{n}$ gets more difficult as $n$ gets larger.
Randall Dougherty was able to write an algorithm that computes the application in $A_{48}$ around 1995. This algorithm is outlined in his paper [1] and will be outlined in this post as well. So far no one has written any algorithm that improves upon Dougherty’s algorithm nor has anyone been able to compute even $A_{49}$. However, with enough effort, it may be possible to compute in $A_{n}$ for $n\leq 96$ with today’s computational resources, but I do not think that anyone is willing to exert the effort to compute $A_{96}$ at this moment since in order to compute in $A_{96}$ one needs to construct a rather large lookup table.
We shall begin by outlining three algorithms for computing in classical Laver tables, and after discussing these three algorithms for classical Laver table computation, we shall explain Dougherty’s algorithm.
The easiest to write algorithm for computing the classical Laver tables is simply the algorithm that directly uses the definition of a classical Laver table. In other words, in this algorithm, we would evaluate $x*1$ to $x+1$, and we evaluate $x*y$ to $(x*(y-1))*(x+1)_{\text{Exp}(2,n)}$ whenever $y>1$.
This algorithm is extremely inefficient. This algorithm works for $A_{4}$ on my computer, but I have not been able to compute in $A_{5}$ using this algorithm. Even though this algorithm is terrible for computing the application in classical Laver tables, a modification of this algorithm can be used to calculate the application operation in generalized Laver tables very efficiently.
In this algorithm, we fill up the entire multiplication table for $A_{n}$ by computing $x*y$ by a double induction which is descending on $x$ and for each $x$ we compute $x*y$ by an ascending induction on $y$. Here is the code for constructing the multiplication for algorithm 2 in GAP (the multiplication table for $A_{n}$ is implemented in GAP as a list of lists).
table:=[]; table[2^n]:=[1..2^n];
for i in Reversed([1..2^n-1]) do table[i]:=[i+1];
for j in [2..2^n] do table[i][j]:=table[table[i][j-1]][i+1]; od; od;
I have been able to calculate $A_{13}$ using this algorithm before running out of memory.
The difference between algorithm 1 and algorithm 2 is analogous to two algorithms for computing the Fibonacci numbers. The following program fibslow
in GAP for computing the Fibonacci numbers is analogous to algorithm 1.
fibslow:=function(x) if x=1 or x=2 then return 1; else return fibslow(x-1)+fibslow(x-2); fi; end;
This program takes about fibslow(x)
many steps to compute fibslow(x)
which is very inefficient. Algorithm 1 is inefficient for similar reasons. However, by computing the Fibonacci numbers in a sequence and storing the Fibonacci numbers in memory as a list, one obtains the following much more efficient algorithm fibfast
for computing the Fibonacci numbers.
fibfast:=function(x) local list,i;
if x<3 then return 1;
else list:=[1,1]; for i in [3..x] do list[i]:=list[i-1]+list[i-2]; od; return list[x]; fi; end;
One can calculate the classical Laver tables much more quickly using algorithm 2 instead of using algorithm 1 for reasons similar to why the Fibonacci numbers are more easily computable using fibfast
than fibslow
.
One of the first things that one notices when one observes the classical Laver tables is that the rows in the classical Laver tables are periodic, and this periodicity allows the classical Laver tables to be more quickly computed. Click here for the full multiplication tables for the Laver tables up to $A_{5}$.
Algorithm 3 is similar to algorithm 2 except that one computes only one period for each row. For example, instead of computing the entire 2nd row $[3,12,15,16,3,12,15,16,3,12,15,16,3,12,15,16]$ in $A_{4}$, in algorithm 3 one simply computes $[3,12,15,16]$ once and observe that $2*_{4}x=2*_{4}(x)_{4}$ whenever $1\leq x\leq 16$.
Using this algorithm, I am able to calculate up to $A_{22}$ on a modern computer before running out of memory. If one compresses the data computed by this algorithm, then one should be able to calculate up to $A_{27}$ or $A_{28}$ before running out of memory.
The lengths of the periods of the rows in classical Laver tables are all powers of 2 and the lengths of the periods of the rows in the classical Laver tables are usually quite small. Let $o_{n}(x)$ denote the least natural number such that $x*_{n}o_{n}(x)=2^{n}$. The motivation behind $o_{n}(x)$ lies in the fact that $x*_{n}y=x*_{n}z$ iff $y=z(\text{mod}\,\text{Exp}(2,o_{n}(x)))$, so $2^{o_{n}(x)}$ is the period of the $x$-th row in $A_{n}$. The maximum value of $o_{n}(x)$ is $n$, but in general $o_{n}(x)$ is usually quite small. We have $E(o_{10}(x))=2.943$, $E(o_{20}(x))=3.042$, and $E(o_{48}(x))=3.038$ (Here $E$ denotes the expected value. $E(o_{48}(x))$ has been calculated from a random sample from $A_{48}$ of size 1000000). Therefore, since $o_{n}(x)$ is usually small, one can calculate and store the $x$-th row in memory without using too much space or time even without compressing the computed data.
Dougherty’s algorithm for computing in the classical Laver tables is based on the following lemmas.
The following result by Dougherty gives examples for when the premise of the above Lemma holds.
More generally Dougherty has proven the following result.
Now assume that $t=2^{r},n\leq 3t$ and that $x*_{n}y$ has already been computed whenever $x\leq 2^{t},y\leq 2^{n}$. Then we may compute $x*_{48}y$ for any $x,y\in A_{n}$ by using the following algorithm:
Using the above algorithm and the precomputed values $x*_{48}y$ for $x\leq 2^{16}$, I have been able to compute 1,200,000 random instances of $x*_{48}y$ in a minute on my computer. One could also easily compute in $A_{48}$ by hand using this algorithm with only the precomputed values $x*_{48}y$ where $x\leq 2^{16}$ for reference (this reference can fit into a book).
Dougherty’s algorithm for computing the clasical Laver tables has been implemented here.
In order for Dougherty’s algorithm to work for $A_{n}$, we must first compute the $x$-th row in $A_{n}$ for $x\leq 2^{t}$. However, one can compute this data by induction on $n$. In particular, if $n<3t$ and one has an algorithm for computing $A_{n}$, then one can use Dougherty's algorithm along with a suitable version of algorithm 3 to compute the $x$-th row in $A_{n+1}$ for $x\leq 2^{t}$. I was able to compute from scratch $x*_{48}y$ for $x\leq 2^{16}$ in 757 seconds.
Generalized Laver tables computation
Let $n$ be a natural number and let $A$ be a set. Let $A^{+}$ be the set of all non-empty strings over the alphabet $A$. Then let $(A^{\leq 2^{n}})^{+}=\{\mathbf{x}\in A^{+}:|\mathbf{x}|\leq 2^{n}\}$. Then there is a unique self-distributive operation $*$ on $(A^{\leq 2^{n}})^{+}$ such that $\mathbf{x}*a=\mathbf{x}a$ whenever $|\mathbf{x}|<2^{n},a\in A$ and $\mathbf{x}*\mathbf{y}=\mathbf{y}$ whenever $|\mathbf{x}|=2^{n}$. The algebra $((A^{\leq 2^{n}})^{+},*)$ is an example of a generalized Laver table. If $|A|=1$, then $(A^{\leq 2^{n}})^{+}$ is isomorphic to $A_{n}$. If $|A|>1$, then the algebra $(A^{\leq 2^{n}})^{+}$ is quite large since $|(A^{\leq 2^{n}})^{+}|=|A|\cdot\frac{|A|^{\text{Exp}(2,n)}-1}{|A|-1}$.
Let $\mathbf{x}[n]$ denote the $n$-th letter in the string $\mathbf{x}$ (we start off with the first letter instead of the zero-th letter). For example, $\text{martha}[5]=\text{h}$.
When computing the application operation in $(A^{\leq 2^{n}})^{+}$, we may want to compute the entire string $\mathbf{x}*\mathbf{y}$ or we may want to compute a particular position $(\mathbf{x}*\mathbf{y})[\ell]$. These two problems are separate since it is much easier to compute an individual position $(\mathbf{x}*\mathbf{y})[\ell]$ than it is to compute the entire string $\mathbf{x}*\mathbf{y}$, but computing $\mathbf{x}*\mathbf{y}$ by calculating each $(\mathbf{x}*\mathbf{y})[\ell]$ individually is quite inefficient.
We shall present several algorithms for computing generalized Laver tables starting with the most inefficient algorithm and then moving on to the better algorithms. These algorithms will all assume that one already has an efficient algorithm for computing the application operation in the classical Laver tables.
If $x,y\in\{1,…,2^{n}\}$, then let $FM_{n}^{+}(x,y)$ denote the integer such that in $(A^{\leq 2^{n}})^{+}$ if $FM_{n}^{+}(x,y)>0$ then $(a_{1}…a_{x}*b_{1}…b_{2^{n}})[y]=b_{FM_{n}^{+}(x,y)}$ and if $FM_{n}^{+}(x,y)<0$ then $(a_{1}...a_{x}*b_{1}...b_{2^{n}})[y]=a_{-FM_{n}^{+}(x,y)}$. If one has an algorithm for computing $FM_{n}^{+}(x,y)$, then one can compute the application operation simply by referring to $FM_{n}^{+}(x,y)$. The function $FM_{n}^{+}$ can be computed using the same idea which we used in algorithm 2 to calculate the classical Laver tables. In particular, in this algorithm, we compute $FM_{n}^{+}(x,y)$ by a double induction on $x,y$ which is descending on $x$ and for each $x$ the induction is ascending on $y$. I was able to calculate up to $FM_{13}^{+}$ using this algorithm. Using the Sierpinski triangle fractal structure of the final matrix, I could probably compute up to $FM_{17}^{+}$ or $FM_{18}^{+}$ before running out of memory.
Algorithm B for computing in the generalized Laver tables is a modified version of algorithm 1. Counterintuitively, even though algorithm 1 is very inefficient for calculating in the classical Laver tables, algorithm B is very efficient for computing the application operation in generalized Laver tables.
If $\mathbf{x}$ is a string, then let $|\mathbf{x}|$ denote the length of the string $\mathbf{x}$ (for example, $|\text{julia}|=5$). If $\mathbf{x}$ is a non-empty string and $n$ a natural number, then let $(\mathbf{x})_{n}$ denote the string where we remove the first $|\mathbf{x}|-(|\mathbf{x}|)_{n}$ elements of $\mathbf{x}$ and keep the last $(|\mathbf{x}|)_{n}$ elements in the string $\mathbf{x}$. For example, $(\text{elizabeth})_{5}=\text{beth}$ and $(\text{solianna})_{4}=\text{anna}$.
One calculates $\mathbf{x}*\mathbf{y}$ in $(A^{\leq 2^{n}})^{+}$ using the following procedure:
It is not feasible to compute the entire string $\mathbf{x}*\mathbf{y}$ in $(A^{\leq 2^{n}})^{+}$ when $n$ is much larger than 20 due to the size of the outputs. Nevertheless, it is very feasible to compute the symbol $(\mathbf{x}*\mathbf{y})[\ell]$ in $(A^{\leq 2^{n}})^{+}$ whenever $n\leq 48$ by using a suitable modification of algorithm B. I have been able to compute on my computer using this suitable version of algorithm B on average about 3000 random values of $(\mathbf{x}*\mathbf{y})[\ell]$ in $(A^{\leq 2^{48}})^{+}$ in a minute. To put this in perspective, it took me on average about 400 times as long to compute a random instance of $(\mathbf{x}*\mathbf{y})[\ell]$ in $(A^{\leq 2^{48}})^{+}$ than it takes to compute a random instance of $x*y$ in $A_{48}$. I have also been able to compute on average 1500 values of $(\mathbf{x}*\mathbf{y})[\ell]$ in $(A^{\leq 2^{48}})^{+}$ per minute where the $\mathbf{x},\mathbf{y},\ell$ are chosen to make the calculation $(\mathbf{x}*\mathbf{y})[\ell]$ more difficult. Therefore, when $\mathbf{x},\mathbf{y},\ell$ are chosen to make the calculation $(\mathbf{x}*\mathbf{y})[\ell]$ more difficult, it takes approximately 800 times as long to calculate $(\mathbf{x}*\mathbf{y})[\ell]$ than it takes to calculate an entry in $A_{48}$. This is not bad for calculating $(\mathbf{x}*\mathbf{y})$ to an arbitrary precision in an algebra of cardinality about $10^{10^{13.928}}$ when $|A|=2$.
Algorithm C is a modification of algorithm B that uses he same ideas in Dougherty’s method of computing in classical Laver tables to compute in the generalized Laver tables $(A^{\leq 2^{n}})^{+}$ more efficiently.
More generally, we have the following result.
Furthermore, suppose that
$$\langle x_{1,1},…,x_{1,Exp(2,t)}\rangle…\langle x_{u,1},…,x_{u,Exp(2,t)}\rangle*_{n-t}
\langle y_{1,1},…,y_{1,Exp(2,t)}\rangle…\langle y_{1,1},…,y_{1,Exp(2,t)}\rangle\langle y_{v,1},…,y_{v,w}\rangle$$
$$=\langle z_{1,1},…,z_{1,Exp(2,t)}\rangle…\langle z_{p-1,1},…,z_{p-1,Exp(2,t)}\rangle\langle z_{p,1},…,z_{p,w}\rangle.$$
Then $\mathbf{x}*_{n}\mathbf{y}=(z_{1,1}…z_{1,Exp(2,t)})…(z_{p-1,1}…z_{p-1,Exp(2,t)})(z_{p,1}…z_{p,w}).$
One can compute $\mathbf{x}*_{n}\mathbf{y}$ recursively with the following algorithm:
$\langle x_{1,1},…,x_{1,2^{t}}\rangle…\langle x_{u,1},…,x_{u,2^{t}}\rangle*_{n-t}
\langle y_{1,1},…,y_{1,2^{t}}\rangle…\langle y_{v-1,1},…,y_{v-1,2^{t}}\rangle\langle y_{v,1},…,y_{v,w}\rangle$
$=\langle z_{1,1},…,z_{1,2^{t}}\rangle…\langle z_{p-1,1},…,z_{p-1,2^{t}}\rangle\langle z_{p,1},…,z_{p,w}\rangle$.
Then evaluate $\mathbf{x}*_{n}\mathbf{y}$ to $z_{1,1}…z_{1,2^{t}}…z_{p-1,1}…z_{p-1,2^{t}}z_{p,1}…z_{p,w}$.
As with algorithms A and B, there is a local version of algorithm C that quickly computes the particular letter $\mathbf{x}*\mathbf{y}[\ell]$. Both the local and the global versions of algorithm C are about 7 or so times faster than the corresponding version of algorithm B. Algorithm C for computing generalized Laver tables has been implemented online here.
Conclusion
The simple fact that calculating $(\mathbf{x}*\mathbf{y})[\ell]$ is apparantly hundreds of times more difficult than calculating in $A_{48}$ is rather encouraging since this difficulty in calculation suggests that the generalized Laver tables have some combinatorial intricacies that go far beyond the classical Laver tables. These combinatorial intricacies can be seen in the data computed from the generalized Laver tables.
Much knowledge and understanding can be gleaned from simply observing computer calculations. Dougherty’s result which allowed one to compute $A_{48}$ in the first place was proven only because computer calculations allowed Dougherty to make the correct conjectures which were necessary to obtain the results. Most of my understanding of the generalized Laver tables $(A^{\leq 2^{n}})^{+}$ has not come from sitting down and proving theorems, but from observing the data computed from the generalizations of Laver tables. There are many patterns within the generalized Laver tables that can be discovered through computer calculations.
While the problems of computing the generalized Laver tables have been solved to my satisfaction, there are many things about generalizations of Laver tables which I would like to compute but for which I have not obtained an efficient algorithm for computing. I am currently working on computing the fundamental operations in endomorphic Laver tables and I will probably make a post about endomorphic Laver table computation soon.
All of the algorithms mentioned here have been implemented by my using GAP and they are also online here at boolesrings.org/jvanname/lavertables. While the problems of computing the generalized Laver tables have been solved to my satisfaction, there are many things about generalizations of Laver tables which I would like to compute but for which I have not obtained an efficient algorithm for computing.
I must mention that I this project on the generalizations of Laver tables is the first mathematical project that I have done which makes use of computer calculations.
1. Critical points in an algebra of elementary embeddings, II. Randall Dougherty. 1995.
]]>Classical Laver tables
The $n$-th classical Laver table is the unique algebra $A_{n}=(\{1,…,2^{n}\},*)$ where
Let $\mathcal{E}_{\lambda}$ denote the set of all elementary embeddings $j:V_{\lambda}\rightarrow V_{\lambda}$
Suppose that $j\in\mathcal{E}_{\lambda}$ is a rank-into-rank embedding and $\gamma<\lambda$. Then there is some $n$ where $\langle j\rangle/\equiv^{\gamma}$ is isomorphic to $A_{n}$.
Generalizations of Laver tables
The following list of algebras lists out all the generalizations of the notion of a classical Laver table which I have investigated.
Classical Laver tables-These algebras have one generator and one binary operation. The braid group acts on these algebras.
Generalized Laver table-These algebras have multiple generators, but one binary operation. These algebras are always locally finite.
Endomorphic Laver tables-These algebras can have possibly multiple generators, possibly multiple operations, and the operations can have arbitrary arity. Endomorphic Laver tables are usually infinite. Endomorphic Laver tables seem to be very difficult to compute in part due to the immense size of the output. There are generalizations of the notion of a braid group (positive braid monoid) that act on the endomorphic Laver tables. For instance, suppose that $G_{n}$ is the group presented by $\{\sigma_{i}\mid 0\leq i<n\}$ with relations $\sigma_{i}\sigma_{i+1}\sigma_{i+2}\sigma_{i}=\sigma_{i+2}\sigma_{i}\sigma_{i+1}\sigma_{i+2}$ and
$\sigma_{i}\sigma_{j}=\sigma_{j}\sigma_{i}$ whenever $|i-j|>2$. Let $G_{n}^{+}$ be the monoid presented by the same generators and relations. Then $G_{n}^{+}$ acts on $X^{n-2}$. Furthermore, the algebra $G_{\omega}$ can be given a ternary self-distributive operation which I conjecture gives rise to free ternary self-distributive algebras.
Partially endomorphic Laver tables-These algebras have multiple generators, multiple operations, the operations can have arbitrary arity, only some of the operations distribute with each other. These algebras are always infinite.
Twistedly endomorphic Laver tables-These algebras have multiple generators, multiple operations, operations can have arbitrary arity, these algebras satisfy the twisted self-distributivity laws such as $t(a,b,t(x,y,z))=t(t(a,b,x),t(b,a,y),t(a,b,z))$. Any semigroup can be used to generate more complicated twisted self-distributivity identities. Twistedly endomorphic Laver tables do not seem to arise in any way from algebras of elementary embeddings.
So far I am the only researcher working on the generalizations of Laver tables although around last July another researcher (a knot theorist) has expressed interest and has some ideas that one can research about generalized Laver tables.
Permutative LD-systems
Suppose that $*$ is a binary function symbol. Then define the Fibonacci terms $t_{n}(x,y)$ by the following rules:
For example, if $x=y=1$ and $*$ is addition $+$, then $t_{n}(x,y)$ is simply the $n$-th Fibonacci number.
The first few Fibonacci terms are
The algebra of elementary embeddings $(\mathcal{E}_{\lambda},*,\circ)$ satisfies the braid identity $j\circ k=(j*k)\circ j$. Furthermore, by applying the braid identity multiple times, we obtain $j\circ k=(j*k)\circ j=((j*k)*j)\circ(j*k)=…=t_{n+1}(j,k)\circ t_{n}(j,k).$
Now suppose that $(\kappa_{n})_{n\in\omega}$ is a cofinal increasing sequence in $\lambda$. Then define a metric $d$ on $\mathcal{E}_{\lambda}$ by letting $d(j,k)=\frac{1}{n}$ whenever $j\neq k$ where $n$ is the least natural number with $j|_{V_{\kappa_{n}}}\neq k|_{V_{\kappa_{n}}}$ and where $d(j,j)=0$.
$\mathbf{Proposition:}$ $(\mathcal{E}_{\lambda},d)$ is a complete metric space without isolated points.
$\textbf{Corollary:}$ $|\mathcal{E}_{\lambda}\mid\geq 2^{\aleph_{0}}$.
$\textbf{Proof:}$ This follows from the fact that every complete metric space without any isolated points has cardinality at least continuum.
$\textbf{Proposition:}$ Let $j,k\in\mathcal{E}_{\lambda}$. Then
- if $\textrm{crit}(j)>\textrm{crit}(k)$, then
- $t_{2n+1}(j,k)\rightarrow j\circ k$
- $t_{2n}(j,k)\rightarrow Id_{V_{\lambda}}$
- if $\textrm{crit}(j)\leq\textrm{crit}(k)$, then
- $t_{2n}(j,k)\rightarrow j\circ k$
- $t_{2n+1}(j,k)\rightarrow Id_{V_{\lambda}}$.
An LD-system is an algebra $(X,*)$ that satisfies the identity $x*(y*z)=(x*y)*(x*z)$.
Let $X$ be an LD-system. An element $x\in X$ is said to be a left-identity if $x*y=y$ for each $y\in X$. Let $\textrm{Li}(X)$ denote the set of all left-identities of $X.$ A subset $L\subseteq X$ is said to be a left-ideal if $y\in L$ implies that $x*y\in L$ as well.
An LD-system $X$ is said to be permutative if
The motivation behind the notion of a permutative LD-system is that the permutative LD-systems capture the notion that the Fibonacci terms converge to the identity without any reference to any topology.
Example: $\mathcal{E}_{\lambda}/\equiv^{\gamma}$ is a reduced permutative LD-system.
A permutative LD-system is said to be reduced if $|\textrm{Li}(X)|=1$.
Proposition: Let $X$ be a permutative LD-system. Let $\simeq$ be the equivalence relation on $X$ where $x\simeq y$ iff $x=y$ or if $x,y\in \textrm{Li}(X)$. Then $\simeq$ is a congruence on $X$ and $X/\simeq$ is a reduced permutative LD-system.
An LD-monoid is an algebra $(X,*,\circ,1)$ where $(X,\circ,1)$ is a monoid and where
Example: $(\mathcal{E}_{\lambda},*,\circ,1)$ is an LD-monoid.
Theorem: Suppose that $(X,*)$ is a reduced permutative LD-system with left-identity $1$. Then define an operation $\circ$ on $X$ where $x\circ y=t_{n+1}(x,y)$ where $n$ is the least natural number with $t_{n}(x,y)=1$. Then $(X,*,\circ,1)$ is an LD-monoid.
Critical points:
Define $x^{n}*y=x*(x*(…*(x*y)))$ ($n$-copies of $x$). More formally, $x^{0}*y=y$ and $x^{n+1}*y=x*(x^{n}*y)=x^{n}*(x*y)$.
Suppose that $X$ is a permutative LD-system and $x,y\in X$. Then define $\textrm{crit}(x)\leq \textrm{crit}(y)$ iff there exists some $n$ where $x^{n}*y\in \textrm{Li}(X)$.
Theorem: Suppose that $X$ is a permutative LD-system. Then
- $\textrm{crit}(x)\leq \textrm{crit}(x)$
- $\textrm{crit}(x)\leq \textrm{crit}(y)$ and $\textrm{crit}(y)\leq \textrm{crit}(z)$ implies $\textrm{crit}(x)\leq \textrm{crit}(z)$.
- $\textrm{crit}(x)\leq \textrm{crit}(y)$ implies $\textrm{crit}(r*x)\leq \textrm{crit}(r*y)$.
- $\textrm{crit}(x)$ is maximal in $\{\textrm{crit}(y)\mid y\in X\}$ if and only if $x\in \textrm{Li}(X)$.
- $\{\textrm{crit}(x)\mid x\in X\}$ is a linear ordering.
- If $(X,*)$ is reduced, then $\textrm{crit}(x\circ y)=Min(\textrm{crit}(x),\textrm{crit}(y)).$
Let $\textrm{crit}[X]=\{\textrm{crit}(x)\mid x\in X\}$. If $x\in X$, then define the mapping $x^{\sharp}:\textrm{crit}[X]\rightarrow \textrm{crit}[X]$ by $x^{\sharp}(\textrm{crit}(y))=\textrm{crit}(x*y)$.
The motivation behind the function $x^{\sharp}$ is the basic fact about elementary embeddings that if $j,k\in\mathcal{E}_{\lambda}$, then $j(\textrm{crit}(k))=\textrm{crit}(j*k)$.
Theorem: Suppose that $X$ is a permutative LD-system.
- $x^{\sharp}(\alpha)\geq\alpha$ for $x\in X,\alpha\in \textrm{crit}[X]$.
- $x^{\sharp}(\alpha)>\alpha$ if and only if $\alpha\geq \textrm{crit}(x)$ and $\alpha\neq Max(\textrm{crit}[X])$.
- Let $A=\{\alpha\in \textrm{crit}[X]\mid x^{\sharp}(\alpha)\neq Max(\textrm{crit}[X])\}$. Then $x^{\sharp}|_{A}$ is injective.
$\mathbf{Theorem}$ Let $X$ be a permutative LD-system and let $\simeq$ be a congruence on $X$. Then there is a partition $A,B$ of $\textrm{crit}[X]$ such that $A$ is a downwards closed subset, $B$ is an upwards closed subset, and
- whenever $\textrm{crit}(x)\in B$ there is some $y\in \textrm{Li}(X)$ with $x\simeq y$ and
- whenever $\textrm{crit}(x)\in A$ and $x\simeq y$ we have $\textrm{crit}(x)=\textrm{crit}(y)$.
Furthermore, if $X$ is reduced, then $\simeq$ is also a congruence with respect to the composition operation $\circ$.
For example, suppose that $X$ is a permutative LD-system and $\alpha\in\textrm{crit}[X]$. Then there exists some $r\in X$ with $\textrm{crit}(r)=\alpha$ and where $r*r\in \textrm{Li}(X)$. Therefore define an equivalence relation $\equiv^{\alpha}$ by letting $x\equiv^{\alpha}y$ if and only if $r*x=r*y$. Then $\equiv^{\alpha}$ is a congruence on $X$ that does not depend on the choice of $r$. In the above, theorem, if $\simeq$ is the equivalence relation $\equiv^{\alpha}$, then $A=\{\beta\in\textrm{crit}[X]\mid\beta<\alpha\}$ and $B=\{\beta\in\textrm{crit}[X]\mid\beta\geq\alpha\}$.
Generalized Laver tables
Let $A$ be a set. Let $A^{+}$ denote the set of all strings over the alphabet $A$. Let $\preceq$ denote the prefix ordering on $A$. Let $L\subseteq A^{+}$ be a downwards closed subset such that $L\cap B^{+}$ is finite whenever $B\in[A]^{<\omega}$. Let $M=\{\mathbf{x}a\mid\mathbf{x}\in L,a\in A\}\cup A$. Let $F=M\setminus L$. Then there is a unique operation $*$ such that
The algebra $(M,*)$ is called a pre-generalized Laver table. If $(M,*)$ is an LD-system, then we shall call $(M,*)$ is a generalized Laver table. Every generalized Laver table is a permutative LD-system.
Let $|\mathbf{x}|$ denote the length of a string $\mathbf{x}$.
If $A$ is a set, then $(A^{\leq 2^{n}})^{+}=\{\mathbf{x}\in A^{+}:|\mathbf{x}|\leq 2^{n}\}$ is a generalized Laver table. The operation $*$ on $(A^{\leq 2^{n}})^{+}$ can be computed very quickly here even though
$$|(A^{\leq 2^{n}})^{+}|=|A|\cdot\frac{|A|^{2^{n}}-1}{|A|-1}.$$
The only hinderence to computing $*$ seems to be the length of the output.
$\mathbf{Theorem:}$ Suppose that $j\in\mathcal{E}_{\lambda},\alpha<\lambda$. Then $(j*j)(\alpha)\leq j(\alpha)$.
$\mathbf{Proof:}$ Let $\beta$ be the least ordinal with $j(\beta)>\alpha$. Then
$$V_{\lambda}\models\forall x<\beta,j(x)\leq\alpha,$$
so by applying elementarity, we have
$$V_{\lambda}\models\forall x < j(\beta),j*j(x)\leq j(\alpha).$$
Therefore, since $\alpha < j(\beta)$, we have $(j*j)(\alpha)\leq j(\alpha)$.
I am Joseph Van Name, and I have recently joined Booles’ Rings. I currently know several of the people here on Booles’ Rings through either mathoverflow.net or through the New York City logic community. I enjoy reading the mathematical posts here on Booles’ rings, and I am glad to be a part of this community.
I have requested to join Booles’ Rings in part due to my recent research endeavors towards understanding Laver tables. Through Booles’ Rings, I intend to post data, images, computer programs, and of course short mathematical expositions about these generalizations of the notion of a Laver table. Of course, I will also make posts about other areas of mathematics that I have researched in the past including publications and notes and slides for past talks. I therefore plan on having two portions of my site with one portion containing all the information on Laver tables one could ask for while the other portion is about all my other research projects.
Hopefully, through Booles’ Rings, I will use generalizations of Laver tables to establish a much needed common ground between set theory (in particular large cardinals) and structures such as self-distributive algebras, knots, braids and possibly other areas. By relating large cardinals to more conventional areas of mathematics, I intend to help non set-theorists see large cardinals not as being irrelevant objects that lie high above the clouds but as objects of a practical importance despite their astonishing size.
In the past, I have researched Stone duality which relates various fields of mathematics together including general and point-free topology, set theory (in particular the category of filters and Boolean-valued models), category theory, Boolean algebras, and more generally ordered sets, and a few other areas. However, since July of 2015, I have been researching generalizations of the notion of a Laver table. I would say that I have been more or less an applied set theorist (whatever that means) at least for most of the past year.
I have graduated from the University of South Florida in May of 2013. However, since the University of South Florida did not have a practicing logician, I was completely on my own in my research and I had to guide and formulate my own research myself. Fortunately, through my work on Stone duality, I was able to appreciate different areas of mathematics and relate these diverse areas of mathematics to each other. I currently reside in the New York City metropolitan area and am a part of the New York City mathematical logic community.
]]>I didn’t say above, but I think the note is a very useful reference!
]]>If I had to guess, it would be the thesis that HOTT can replace ZFC as an adequate “foundation for mathematics”?
]]>All this from one featured image, exactly! :) Just imagine what will come out while staring at the wallpaper version of the full diagram! :D
]]>It’s not clear to me why 104 and 182 are not equivalent (cofinality of an ordinal is always regular).
There is a parenthesis mismatch in 58.
It’s not hard to show that 365 implies 170, but not vice versa.
108 seems a bit odd, since $\omega_2$ is itself never the countable union of countable sets (it could be the countable union of countable unions of countable sets), so it’s power set certainly isn’t either.
And that’s all from the image appearing on this post… :)
]]>Please do comment with any observations on this featured diagram if you see something!
]]>Back to the origin, when I came back to Hanover, New Hampshire. I’ve now moved to Burlington, Vermont, which is roughly half way between Hanover and Montréal, Québec, which is my true origin. I’m very happy to start a new position at the University of Vermont!]]>
The book by Hindman&Strauss has a section on filters and compactifications in a later chapter — that might be suitable for 2).
]]>Possibly relevant is my October 2004 sci.math post “Generalized Quantifiers” (URLs below). FYI, the Math Forum version has a lot of strange formatting errors. See also Brian Thomson’s 1985 book “Real Functions”, and see Thomson’s earlier 2-part survey Derivation bases on the real line (which contain examples and side-detours not in his book).
google sci.math URL:
https://groups.google.com/forum/#!msg/sci.math/rhZEhXynVLQ/MI0MJ0ZQIvoJ
Math Forum sci.math URL:
http://mathforum.org/kb/message.jspa?messageID=3556191
Your theorem is an example of an existential sentence about cardinals in the language with only \(\lt \) and exponentiation. Can you determine which sentences in that language are provable in ZF? More generally, expand the set of sentences about cardinals considered, obviously to include addition and multiplication, and perhaps alternating quantifiers.
After Peter Krautzberger’s tiny blogging challenge, I decided to spend 30 minutes thinking and writing about this…
I will allow myself some liberties in interpreting Friedman’s open ended question. The celebrated solution of Hilbert’s Tenth Problem by Davis, Matiyasevich, Putnam and Robinson shows that the existential theory of the finite cardinal numbers (even without exponentiation) is as complex as it could be: it is \(\Sigma_1\)-complete. I’ll interpret Friedman’s question as asking whether the existential theory of all cardinal numbers could be simpler than that of finite cardinals.
My gut instinct is: of course not! But this is not a trivial question. For example asking whether \(p^n + q^n = r^n\) has a solution where \(2 \lt n\) and \(0 \lt p,q,r\) has proven extremely challenging for finite cardinals but there are plenty of solutions with infinite cardinals!
One plan of attack is to try to define the finite cardinals among all cardinalsusing an existential formula. To get started, I’ll start with a rich language including constants \(0\), \(1\), relations \(=\), \(\lt \) as well as addition, multiplication and exponentiation. Assuming the Axiom of Choice, finite cardinals are defined by the simple quantifier-free formula \[\mathfrak{p} \lt \mathfrak{p}+1.\] Without assuming choice, this only defines the Dedekind finite sets. Fortunately, if \(\mathfrak{p}\) is infinite then \(2^{2^{\mathfrak{p}}}\) is never Dedekind finite. So the finite sets are always described by the quantifier-free formula \[2^{2^{\mathfrak{p}}} \lt 2^{2^{\mathfrak{p}}} + 1.\] This shows that the existential theory of all cardinals in this rich language is as complex as that of finite cardinals.
Do things change if we restrict the language? Indeed, the first part of Friedman’s question only mentions exponentiation. For finite cardinals, we can recover addition and multiplication using familiar rules of exponents: \[\begin{aligned}
a \times b &= c &&\iff& (2^a)^b = 2^c; \\
a + b &= c &&\iff& (2^{2^a})^{2^b} = 2^{2^c}. \\
\end{aligned}\] We can even eliminate \(0\) and \(1\) since any finite base other than \(0\) and \(1\) will do instead of \(2\). So, for example, \[a \times b = c \iff \exists x,y,z(x \lt y \land y \lt z \land (z^a)^b = z^c).\] Therefore, the existential theory of finite cardinals with just exponentiation is just as complex as that with addition and multiplication.
Of course, we can’t use this to recover addition and multiplication of infinite cardinals using the methods we used above, but if we can define the class of finite cardinals then we can use the above to recover addition and multiplication for these cardinals. Can we define finite cardinals using only \(=\), \(\lt \) and exponentiation? Well, the finite cardinals are the only ones that satisfy the monstrous inequality \[2^{2^{2^{2^{\mathfrak{p}}}}} \lt \left(2^{2^{2^{2^{\mathfrak{p}}}}}\right)^2.\] We can then eliminate \(2\) using the same trick as above to get an existential definition: \[\exists\mathfrak{q},\mathfrak{r},\mathfrak{s}\left(\mathfrak{q} \lt \mathfrak{r} \land \mathfrak{r} \lt \mathfrak{s} \land \mathfrak{s}^{\mathfrak{s}^{\mathfrak{s}^{\mathfrak{s}^{\mathfrak{p}}}}} \lt \left(\mathfrak{s}^{\mathfrak{s}^{\mathfrak{s}^{\mathfrak{s}^{\mathfrak{p}}}}}\right)^{\mathfrak{s}}\right).\] So my gut instinct was right: the existential theory of all cardinals using only \(=\), \(\lt \) and exponentiation is at least as complex as that of finite cardinals!
This took more than 30 minutes but it didn’t take much more and it was a lot of fun! Hopefully, the time constraint didn’t impact the content quality too much…
]]>I’m looking forward to meeting you in Munich!
]]>Regarding Moving around every few years for postdocs would be exhausting
, I wholeheartedly agree! Every such decision has pros and cons. They are usually not obvious at the outset. Moreover, they change over time. The lesson I learned though my own course is that every decision is right, at least at the time you make it, and there is never any point regretting it later on… Just keep on doing what you do best all the time!
We went on to look at the paradoxical situations that arise when someone gets a positive test result for a rare disease. Should they be worried? If the test is 99% accurate, but the disease occurs in say, 1 in million, then a positive result is not so worrisome: in million people, there will be about 1 true positive result, and about 10,000 false positives, since 1% of a million is 10,000. So the odds that you’ve actually got the disease, given that you tested positive, is 1 in 10,000.
But your situation is completely different! It would be as though we had calculated the odds of getting HHHTTT, and then when I actually flipped the coin on stage, i actually got the same pattern HHHTTT. Totally weird! And very unlikely. But you know, if it wasn’t that, it would have been some other totally unlikely thing, like getting all green lights, or all red lights, or getting the serial number 123456 on your receipt at Starbucks.
]]>We can start by frowning deeply at the NSA table at the JMM…
]]>The issue with Bauer’s description is the type family \(\tau.\) Formally, a type family indexed by a type \(\mathcal{U}\) is an element \(\tau:\mathcal{U}\to\mathcal{V},\) where \(\mathcal{V}\) is some larger universe. This type \(\tau\) becomes a type family only after applying Tarski-style coercion associated to \(\mathcal{V}\) that promotes elements of \(\mathcal{V}\) to actual types. On the face of it, this is circular: \(\tau\) is the Tarski-style coercion associated to the universe \(\mathcal{U},\) which is defined in terms of that of \(\mathcal{V},\) which…
One way to break this circularity is to have a designated super universe \(\mathcal{U}^\ast\) — hence the name Super HoTT — whose associated Tarski-style coercion and associated machinery are implemented in the formal system itself. In particular, \(\mathcal{U}^\ast\) is associated to a primitive decoding function \(T\) such that \(T(A)\ \mathsf{type}\) for any element \(A:\mathcal{U}^\ast.\) In Super HoTT, this is the only such decoding function. Other universes arise à la Bauer, i.e., as structures \((\mathcal{U},\tau,\pi,\sigma,\ldots)\) where now \(\tau:\mathcal{U}\to\mathcal{U}^\ast\) is a type family that realizes this universe’s Tarski-style coercion via composition with the super universe’s decoding function \(T.\) In fact, since there is only one super universe, the Russell-style approach doesn’t lead to many problems. In other words, there is little harm in suppressing the decoding function \(T\) in favor of a simple Russell-style rule \[\frac{A:\mathcal{U}^\ast}{A\ \mathsf{type}}.\] In fact, the \(\mathsf{type}\) judgment is often informally confounded with an actual universal type \(\mathsf{Type}.\) This type of identification can lead to serious issues such as Girard’s Paradox; the super universe approach is one of the many safe ways around these problems.
So far, this is pure type theory, we haven’t said anything HoTT specific. To make this a flavor of HoTT, we postulate that there are enough univalent universes or even lots of them. Universes à la Bauer are mere structures, they can be univalent or not, they can have extra structure or lack some structure. Univalent universes can be identified among these structures so we can formalize a rule saying that any two \(A,B:\mathcal{U}^\ast\) are contained in a univalent universe. One can also formalize rules saying that one can have cumulative univalent universes, cumulative univalent universes with propositional resizing, cumulative classical univalent universes, etc. The possibilities for specialized flavors of Super HoTT are vast and varied!
Note that the super universe doesn’t have to be univalent. It doesn’t need to have much fancy structure at all. It’s not even necessary for \(\mathcal{U}^\ast\) to have a natural number type. All of that can happen inside the univalent universes à la Bauer. In fact, a bare-bones super universe is probably more flexible framework than a rich one. Different choices of structure for the super universe and how it interacts with the universes inside it will lead to different flavors of Super HoTT with variable strength. Richer super universe structure allows one to express stronger and stronger properties for the univalent universes it contains.
One advantage of this approach is that universes can arise "on their own" in the sense that any suitable structure \((\mathcal{U},\tau,\pi,\sigma,\ldots)\) is a universe, no matter how it comes about. Though this can be a mixed blessing if one wants more hands-on control over which types are universes. One potential drawback of Super HoTT is that not all types can belong to univalent universes. In particular, the super universe \(\mathcal{U}^\ast,\) cannot belong to any univalent universe inside \(\mathcal{U}^\ast.\) In this way, Super HoTT is more like \(\mathsf{NBG}\) or \(\mathsf{MK}\) than \(\mathsf{ZF}.\) For ontological reasons, it might be preferable to have a more self-contained system analogously to \(\mathsf{ZF}.\) In any case, Super HoTT is one of the more flexible formal ways of handling universes in HoTT and it is a perfectly viable alternative to the formal system presented in the appendix of the HoTT book.^{2}
E. Palmgren, On universes in type theory. In Twenty-five years of constructive type theory (Venice, 1995), Oxford Logic Guides 36, Oxford University Press, New York, 1998, pages 191–204.
Univalent Foundations Program, Homotopy Type Theory: Univalent Foundations of Mathematics, Institute for Advanced Study, Princeton, 2013.
Also note that the issues I bring up are mostly irrelevant to conducting everyday mathematics in HoTT. While this post is largely for people interested in technical aspects of HoTT, I will nevertheless try to keep the the post at a medium level of technicality since these aspects of HoTT do come occasionally up in everyday mathematics.
In section 1.3, the HoTT book introduces universes as follows:
[W]e introduce a hierarchy of universes \[\mathcal{U}_0 : \mathcal{U}_1 : \mathcal{U}_2 : \cdots\] where every universe \(\mathcal{U}_i\) is an element of the next universe \(\mathcal{U}_{i+1}\). Moreover, we assume that our universes are cumulative, that is that all the elements of the \(i\)^{th} universe are also elements of the \((i+1)\)^{st} universe, i.e. if \(A : \mathcal{U}_i\) then also \(A : U_{i+1}.\)
In the comment discussion mentioned in the preamble, I expressed concerns with the nature of the subscripts in this description. As a result of this discussion, some minor clarification were made to the book. In particular, the following paragraph was added at the end of section 1.3:
As a non-example, in our version of type theory there is no type family "\(\lambda (i:\mathbb{N}).\,\mathcal{U}_i\)". Indeed, there is no universe large enough to be its codomain. Moreover, we do not even identify the indices \(i\) of the universes \(\mathcal{U}_i\) with the natural numbers \(\mathbb{N}\) of type theory (the latter to be introduced in §1.9).
And further remarks were added in the notes to chapter 1. The purpose of this post is to explain why this matters and also to explain how fiddling with the structure of universes affects HoTT. I will focus on Classical HoTT (HoTT with the Axiom of Choice for sets and hence the Law of Excluded Middle for propositions) though comparable issues arise without these assumptions.
To illustrate the issue, let’s peek at chapter 10 and in particular at section 10.5, where the book describes the interpretation of the cumulative hierarchy of sets inside HoTT. In any universe \(\mathcal{U}_i\) one can find a type \(V_i\) that faithfully interprets the cumulative hierarchy and (since we are working in Classical HoTT) Theorem 10.5.11 shows that \(V_i\) is a model ZFC. In particular, HoTT proves the consistency of ZFC. In fact, HoTT proves the consistency of even stronger theories! With some work, one can show that each \(V_i\) is a level of \(V_{i+1}\) whose ordinal rank \(\kappa_i\) is an inaccessible cardinal in \(V_i.\) Therefore, for each \(i,\) Classical HoTT proves the consistency of \(\mathsf{ZFC}+ \mathsf{IC}_i,\) where \(\mathsf{IC}_i\) abbreviates the statement "there are at least \(i\) inaccessible cardinals."
Does this mean that Classical HoTT proves \(\forall i\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\)? No, it doesn’t. To understand this, remember that \(\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\) is a purely arithmetical statement which basically states that there is no proof of \(0 = 1\) from the axioms of \(\mathsf{ZFC}+ \mathsf{IC}_i.\) Hence \(\forall i\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\) is also an arithmetical statement. The only way for HoTT to make sense of this is for \(i\) to range over the type of natural numbers \(\mathbb{N}.\) This is explicitly excluded by the remark quoted above from the end of section 1.3.
The reason for this exclusion becomes more visible if we use an alternative way to handle universes, replacing the hierarchy with a rule which simply says that any two types are contained in a common universe. More precisely (but still suppressing many technical details) given any two types \(A,B\) we can introduce a universe type \(\mathcal{U}\) such that \(A, B : \mathcal{U}.\) The above argument showing that Classical HoTT proves the consistency of \(\mathsf{ZFC}+ \mathsf{IC}_i\) requires \(i\) applications of this rule in order to obtain a sufficiently long chain of universes. Since proofs are finite, a proof can only use the above universe rule finitely often. Therefore, this process does not lead to a proof of \(\forall i:\mathbb{N}\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\) in Classical HoTT.
In order to get an actual proof of \(\forall i:\mathbb{N}\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\) in we need to have a hierarchy of universes indexed by \(\mathbb{N}.\) More formally, we need a type family \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) which will allow us to use induction on \(\mathbb{N}\) to obtain an actual proof of \(\forall i:\mathbb{N}\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i).\) Doing so, we inadvertently obtain more than we set out. The type family \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) must live in some universe \(\mathcal{U}\) and the cumulative hierarchy \(V\) associated with this \(\mathcal{U}\) satisfies \(\mathsf{ZFC}+ \mathsf{IC}_\omega,\) where \(\mathsf{IC}_\omega\) says that there are infinitely many inaccessible cardinals. Therefore Classical HoTT with the hypothesis of such a family \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) proves \(\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_\omega),\) which is much stronger than what we set out to do.
In fact, Classical HoTT with \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) proves \(\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_{\omega+1}),\) \(\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_{\omega+2}),\) and so on. Why stop there? What if we have a family \(\lambda i:W\,.\,\mathcal{U}_i\) for every wellordering \(W\)? Can we go beyond? These "large universe hypotheses" are direct analogues of large cardinal hypotheses from set theory. These may seem frivolous at first but such hypotheses necessarily come into play to answer concrete mathematical questions in Classical HoTT such as whether it is possible to extend Lebesgue measure to all subsets of \(\mathbb{R}.\)
Let me use this opportunity to point out a feature of the alternative way to handle universes using a rule as outlined above: the rule implies that every type belongs to some universe. The book handles this with this brief remark in section 1.3:
When we say that \(A\) is a type, we mean that it inhabits some universe \(\mathcal{U}_i.\)
This is a clever workaround that unfortunately has some side effects. There is no way to formally say "every type belongs to some \(\mathcal{U}_i\)." The universes \(U_i\) are named constants which means that there is no way to quantify over them. Thus effect of the remark is not to prohibit other types but only to declare anything else to be an "illegetimate" type. The intent is to simplify technical issues but it makes types like \(\lambda i:\mathbb{N}\,.\,U_i\) illegitimate and effectively blocks the possibility of large universe hypotheses as described above.
Mike Shulman suggested a band-aid for this problem in the discussion mentioned earlier, pointing out that there could be universes, even legitimate ones, other than the named universes \(\mathcal{U}_i.\) For example, we could introduce an infinite hierarchy of universes in \(\mathcal{U}_{17}\) if we wanted to. This is true but I do not find this fix fully satisfactory. Universes are very special types that are equipped with a lot of technical machinery to allow elements of universes to become actual types and much more. The book mostly suppresses this machinery, and rightfully so since there are many options which are all very technical and not particularly enlightening. Nevertheless, this machinery is implicitly present and it does not appear on its own. My reading of the syntactic hierarchy used in the book is that this machinery is instantiated for each named universe \(\mathcal{U}_i,\) specifically. To say that there might be other universe suggests that players don’t have all the rules of the game, which is disconcerting. My alternative proposal with a rule to introduce a universe containing any two types implicitly assumes that there is some form of additional judgment for universes (or similar device) and that this judgment instantiates all the machinery to make a universe. I find this kind of approach more satisfactory since it allows one to add hypotheses such as the existence of \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) without changing the rules of the game.
Let me close by mentioning that there are plenty of alternatives for these issues. For example, it is not necessarily desirable for every type to belong to a universe. While formalizing a proof \(\forall i:\mathbb{N}\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\) above, we accidentally proved a lot more because \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) had to live in some universe. If \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) doesn’t live in a universe, but induction on \(\mathbb{N}\) still works for such types, we can still prove \(\forall i:\mathbb{N}\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\) but we can’t necessarily prove \(\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_\omega)\) as we did above. In fact, there is no need for a single solution: \(\mathsf{ZFC},\) \(\mathsf{NBG},\) \(\mathsf{MK}\) and other set theories have always coexisted pleasantly. However, there is a need to discuss these solutions and to appropriately identify them so that there is no pointless confusion when such aspects of HoTT do matter.
]]>The main issue with fields is to correctly handle the implicit negation in the term nonzero. Since the natural logic of HoTT is intuitionistic rather than classical, this is much more subtle than one would expect.
One issue with the basic commutative ring theory we developed last time is that there is a one-element ring. This is a feature of every equational theory since all identities are satisfied in the trivial algebra with one element. The problem is that equational logic doesn’t have negation so there is no way to state a nontriviality axiom like \(\mathsf{O}\not\approx\mathsf{I}\) in this context. These issues are not new and not special to HoTT but the problem is amplified since logic in HoTT is may have more truth values than just true and false. Consequently, negation in HoTT always needs a little more care.
The usual way to say \(\mathsf{O}\not\approx\mathsf{I}\) is the strongest one, namely that \(\mathsf{O}=_R \mathsf{I}\) is the empty type \(\mathbf{0},\) or: \[{\small\fbox{$\phantom{b}$nontrivial$\phantom{q}$}} :\equiv(\mathsf{O}=_R \mathsf{I}) \to \mathbf{0}.\] This is stronger than just saying that \(\mathsf{O}=_R \mathsf{I}\) isn’t inhabited and it may exclude more than just the one element ring in the absence of the law of excluded middle [§3.4].
In general, we define inequality by \[(x \neq_R y) :\equiv(x =_R y) \to \mathbf{0}.\] So we can simply write \(\mathsf{O}\neq_R \mathsf{I}\) instead of \({\small\fbox{$\phantom{b}$nontrivial$\phantom{q}$}}.\) To see this in action, let’s look at nonunits. The usual definition gives \[\operatorname{\mathsf{nonunit}}(x) :\equiv\prod_{y:R} (x \cdot y \neq_R \mathsf{I}).\] The recursion and induction rules for \(\Sigma\)-types [§1.6] show that \(\operatorname{\mathsf{nonunit}}(x)\) is equivalent to the negation \(\lnot\operatorname{\mathsf{unit}}(x) :\equiv\operatorname{\mathsf{unit}}(x) \to \mathbf{0}.\)^{1} In classical mathematics, a local ring is one where the nonunits form an ideal, which is then necessarily the unique maximal ideal of the ring. Assuming \({\small\fbox{$\phantom{b}$nontrivial$\phantom{q}$}},\) we see that \(\operatorname{\mathsf{nonunit}}(\mathsf{O})\) is inhabited. It is also easy to see that \[\prod_{x,y:R} (\operatorname{\mathsf{nonunit}}(x)+\operatorname{\mathsf{nonunit}}(y) \to \operatorname{\mathsf{nonunit}}(x \cdot y))\] is always inhabited. Thus, all that is missing is the axiom: \[{\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}} :\equiv\prod_{x,y:R} (\operatorname{\mathsf{nonunit}}(x)\times\operatorname{\mathsf{nonunit}}(y) \to \operatorname{\mathsf{nonunit}}(x+y)).\] Classically, fields are local rings where the maximal ideal has only one element \(\mathsf{O}.\) So a reasonable axiom for fields is \[{\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}} :\equiv\prod_{x:R} (\operatorname{\mathsf{nonunit}}(x) \to x =_R \mathsf{O}).\] (Note that \({\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}\) implies \({\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}}.\)) Classically, these are indeed the fields we know and love but things break down in an intuitionistic setting. If these were fields, then we should be able to conclude that nonzero elements are units. However, from \({\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}\) we can only conclude that \(x \neq_R \mathsf{O}\to \lnot\operatorname{\mathsf{nonunit}}(x)\) and the conclusion there is the double negative \(\lnot\lnot\operatorname{\mathsf{unit}}(x)\) rather than the positive statement \(\operatorname{\mathsf{unit}}(x).\) The fact is that we cannot conclude that every nonzero element is a unit from this axiom without using the law of excluded middle! A similar issue arises with local rings as defined above: the axiom \({\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}}\) does not imply the property that if \(x + y\) is a unit then at least one of \(x\) and \(y\) is a unit.
The solution to this isssue is to reverse the relation between equality and inequality. What if instead of inequality being the negation of equality we had that equality was the negation of inequality? This may sound counterintuitive at first but this is often how things work. Consider the case of two computable real numbers \(a\) and \(b,\) that is real numbers for which we have an algorithm which on input \(k\) gives us a rational approximation within \(2^{-k}\) of the real number in question. How do I determine whether \(a\) and \(b\) are equal?
So, for computable real numbers, inequality is more primitive than equality!
This leads us to apartness relations: a binary relation \(\mathrel{\rlap{\neq\,}{\,\neq}_{}}\) that satisfies the following two axioms \[\begin{gathered}
\lnot(x \mathrel{\rlap{\neq\,}{\,\neq}_{}}x) \\
x \mathrel{\rlap{\neq\,}{\,\neq}_{}}y \to x \mathrel{\rlap{\neq\,}{\,\neq}_{}}z \lor y \mathrel{\rlap{\neq\,}{\,\neq}_{}}z
\end{gathered}\] These axioms ensure that the negation of \(x \mathrel{\rlap{\neq\,}{\,\neq}_{}}y\) is an equivalence relation; a tight apartness relation is one where \[\lnot(x \mathrel{\rlap{\neq\,}{\,\neq}_{}}y) \to x = y\] and thus the negation of \(x \mathrel{\rlap{\neq\,}{\,\neq}_{}}y\) is equality. In classical logic, every apartness relation is the negation of an equivalence relation but this is not true intuitionistically. In fact, the negation of equality is not necessarily an apartness relation!
Without further ado, let us state exactly how we can define local rings using apartness. For example, a more positive way to define local rings is that \[x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} y :\equiv\operatorname{\mathsf{unit}}(x – y)\] is an apartness relation. The first apartness axiom \(\lnot(x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} x)\) actually follows from \({\small\fbox{$\phantom{b}$nontrivial$\phantom{q}$}}\) since \(\operatorname{\mathsf{unit}}(\mathsf{O}) \to (\mathsf{O}=_R \mathsf{I}).\) The second apartness axiom boils down to to the fact discussed above that if \(x+y\) is a unit then at least one of \(x\) and \(y\) is a unit. It is tempting to use the type \[\prod_{x:R} (\operatorname{\mathsf{unit}}(x+y) \to \operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y))\] to describe this but this doesn’t work because of the exclusive nature of sum types. Instead, we must use the propositional truncation \(\Vert\operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y)\Vert,\) which is how the logical disjunction \(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)\) is defined [§3.7]. Thus a strongly local ring is a a nontrivial ring that satisfies \[{\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} :\equiv\prod_{x,y:R} (\operatorname{\mathsf{unit}}(x+y) \to \operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)).\]
Before going further with these ideas, let’s practice some our intuitionistic reasoning by verifying that \[{\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}}.\] Actually, we won’t be practicing our intuitionistic reasoning, we’ll just verify our logic by thinking in terms of functions between types rather than logical implication between propositions. Although an implication is not always equivalent to its contrapositive in intuitionistic logic, we do always have that \[(P \to Q) \to (\lnot Q \to \lnot P).\] In terms of types and functions, this the special case of \[(A \to B) \to ((B \to C) \to (A \to C)),\] where \(A :\equiv{P},\) \(B :\equiv{Q},\) \(C :\equiv\mathbf{0}.\) This looks even more familiar when uncurried: \[(A \to B) \times (B \to C) \to (A \to C).\] Thus, we see that \[{\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} \to (\lnot(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)) \to \lnot\operatorname{\mathsf{unit}}(x+y)).\] Now recall that the disjunction \(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)\) is defined as the propositional truncation \(\Vert\operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y)\Vert\) [§3.7]. This means that \(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)\) is a proposition such that \[\operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y)) \to P \quad\simeq\quad (\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)) \to P\] for any proposition \(P.\)^{2} Since \(\mathbf{0}\) is a proposition, we see that \[\lnot\operatorname{\mathsf{unit}}(x) \times\lnot\operatorname{\mathsf{unit}}(y) \ \simeq\ (\operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y)) \to \mathbf{0}\ \simeq\ \lnot(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)),\] where the left equivalence is the special case \(A :\equiv{P},\) \(B :\equiv{Q},\) \(C :\equiv\mathbf{0}\) of \[(A + B) \to C \quad\simeq\quad (A \to C) \times (B \to C).\] Finally, recombining the above, we obtain \[{\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} \to (\lnot\operatorname{\mathsf{unit}}(x)\times\lnot\operatorname{\mathsf{unit}}(y) \to \lnot\operatorname{\mathsf{unit}}(x+y)),\] which is equivalent to \({\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}}\) because \(\lnot\operatorname{\mathsf{unit}}(x) \leftrightarrow\operatorname{\mathsf{nonunit}}(x).\)
Analogously to positive axiom for local rings, the positive axiom for fields is \[{\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}} :\equiv\prod_{x:R} (\operatorname{\mathsf{unit}}(x) \lor x =_R \mathsf{O}),\] which implies both \({\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}}\) and \({\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}.\) To see that \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}},\) we will make use of the distributive law \[(P_1 \lor P_2) \land Q \quad\leftrightarrow\quad (P_1 \land Q) \lor (P_2 \land Q),\] which follows from the general equivalence \[(A_1 + A_2) \times B \quad\simeq\quad A_1 \times B + A_2 \times B\] by taking propositional truncations. In particular, \[(\operatorname{\mathsf{unit}}(x) \lor x =_R \mathsf{O}) \times \operatorname{\mathsf{unit}}(x+y) \leftrightarrow\operatorname{\mathsf{unit}}(x)\times\operatorname{\mathsf{unit}}(x+y) \lor (x =_R \mathsf{O}) \times \operatorname{\mathsf{unit}}(x+y).\] Since \(\operatorname{\mathsf{unit}}(x) \times \operatorname{\mathsf{unit}}(x+y) \to \operatorname{\mathsf{unit}}(x)\) and \((x =_R \mathsf{O}) \times \operatorname{\mathsf{unit}}(x+y) \to \operatorname{\mathsf{unit}}(y),\) we see that \[(\operatorname{\mathsf{unit}}(x) \lor x =_R \mathsf{O}) \times \operatorname{\mathsf{unit}}(x+y) \to (\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y))\] and it follows that \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}}.\) To see that \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}},\) we will make use of the fact that \[(P \lor Q) \land \lnot Q \to P.\] This also follows from the distributive law above. Indeed, \[(P \lor Q) \land \lnot Q \ \leftrightarrow\ (P \land \lnot Q) \lor (Q \land \lnot Q) \ \leftrightarrow\ P \land \lnot Q\] because \(Q \land \lnot Q\) is \(\mathbf{0}.\) Note that \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}}\) is stronger than \({\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}}\times{\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}\) since it implies that \[\prod_{x,y:R} (x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} y \lor x =_R y)\] and thus that \(R\) has decidable equality [§3.4]. In other words, \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}}\) gives the equivalence \[x \neq_R y \leftrightarrow x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} y,\] which means that \(\neq_R\) is an apartness relation, whereas \({\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}}\times{\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}\) only gives that \(\mathrel{\rlap{\neq\,}{\,\neq}_{R}}\) is a tight apartness relation: \[x =_R y \leftrightarrow\lnot(x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} y).\]
The analysis above gives three plausible definitions for fields:
While each is stronger than the previous, they all have their uses and there is no choice which is always better than the others. Discrete fields behave the same as fields in classical logic since \(R\) has decidable equality. Heyting fields are preferred in constructive circles and most definitions of the field of real numbers yield Heyting fields that are not necessarily discrete [§11.2, §11.3]. While residual fields are harder to work with, they do have their uses since the quotient of a nontrivial ring by a maximal ideal is a residual field which is not necessarily Heyting.
It is common for classical equivalences to break down in intuitionistic logic and therefore many natural concepts that are classically equivalent separate in intuitionistic settings. While this can be an annoyance, it is quite natural in the context of proof-relevant mathematics. Indeed, when we prove that a theorem about fields in a proof-relevant way, we cannot merely argue that every field satisfies the conclusion, we must proceed from known properties of fields and work our way to the desired conclusion. The precise properties used in this process naturally leads to a more refined understanding of the hypotheses of the theorem.
Before closing this installment of HoTT Math, let me draw your attention to an important lesson from the above:
While the logic of HoTT is intuitionistic, knowledge of intuitionistic logic is not necessary to work in HoTT.
Thinking in terms of types and functions as we did above works very well. To illustrate once more, consider the following classical equivalences: \[\begin{aligned}
P \to \lnot Q \quad&\leftrightarrow\quad Q \to \lnot P, \\
\lnot P \to Q \quad&\leftrightarrow\quad \lnot Q \to P.
\end{aligned}\] Which are valid in intuitionistic logic? When translated into types and functions, we get that these are the special cases of \[\begin{aligned}
A \to (B \to C) \quad&\simeq\quad B \to (A \to C), \\
(A \to C) \to B \quad&\simeq\quad (B \to C) \to A,
\end{aligned}\] where \(A :\equiv{P},\) \(B :\equiv{Q},\) \(C :\equiv\mathbf{0}.\) The first classical equivalence is intuitionistically valid. After uncurrying both sides of \[A \to (B \to C) \quad\simeq\quad B \to (A \to C),\] we obtain the obvious equivalence \[A \times B \to C \quad\simeq\quad B \times A \to C.\] On the other hand, the general form of the second equivalence makes no sense at all. Taking \(C\) to be just about anything other than \(\mathbf{0}\) leads to obviously false equivalences such as \(\mathbf{1}\to A \simeq\mathbf{1}\to B\) when \(C :\equiv\mathbf{1}.\) Indeed, the second classical equivalence is not intuitionistically valid since taking \(Q :\equiv\lnot P\) leads to \[\lnot P \to \lnot P \quad\leftrightarrow\quad \lnot \lnot P \to P\] and double negation elimination is not intuitionistically valid.
The equivalence of \[\textstyle\prod_{x:A} (B(x) \to C) \quad\simeq\quad \left({\textstyle\sum_{x:A} B(x)}\right)\to C\] is the dependent type analogue of currying/uncurrying. Indeed, if instead of the dependent type \(B:A \to \mathcal{U}\) we have a simple type \(B,\) we obtain the familiar equivalence \[A \to (B \to C) \quad\simeq\quad (A \times B) \to C.\] In full generality, we have the unwieldly equivalence \[\prod_{(a,b):\sum_{x:A} B(x)} C(a,b) \quad\simeq\quad \prod_{a:A} \prod_{b:B(a)} C(a,b),\] where \(B:A \to \mathcal{U}\) and \(C:\sum_{x:A} B(x) \to \mathcal{U}\) [§2.15].
This is the standard trick to work with propositional truncations: so long as \(P\) is a proposition, \(\Vert A \Vert \to P\) is equivalent to \(A \to P.\) This is very important for handling disjunctions and existential quantifiers. Since \(P \lor Q\) "forgets" which of \(P\) and \(Q\) holds, it is not generally possible to do a case analysis. However, if \(R\) is a proposition then \(P \lor Q \to R\) is equivalent to \(P + Q \to R\) and the coproduct \(P + Q\) does allow for case analysis. This explains why in addition to \[\lnot P \land \lnot Q \quad\leftrightarrow\quad \lnot(P \lor Q),\] the following "halves" of De Morgan’s laws hold in intuitionistic logic \[\begin{aligned}
P \land Q \quad&\to\quad \lnot(\lnot P \lor \lnot Q), \\
P \lor Q \quad&\to\quad \lnot(\lnot P \land \lnot Q), \\
\lnot P \lor \lnot Q \quad&\to\quad \lnot(P \land Q),
\end{aligned}\] while their converses do not.
Sorry I missed your call oh-so-many weeks ago. How was the conference?
]]>A commutative (unital) ring is a set \(R\) with two constants \(\mathsf{O},\mathsf{I}:1\to R\), one unary operation \({-\square}:R \to R\), two binary operations \({\square+\square},{\square\cdot\square}:R \times R \to R\) along with the usual axioms: \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(\mathsf{O},-\square,\square+\square)\), \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(\mathsf{I},{\square\cdot\square})\), \[{\small\fbox{$\phantom{b}$distributivity$\phantom{q}$}}({\square+\square},{\square\cdot\square}) : \prod_{x,y,z:R} ((x+y)\cdot z =_R x \cdot z + y \cdot z) \times \prod_{x,y,z:R} (x\cdot(y+z) =_R x \cdot y+x \cdot z),\] and \[{\small\fbox{$\phantom{b}$commutativity$\phantom{q}$}}({\square\cdot\square}) : \prod_{x,y:R} (x \cdot y =_R y \cdot x).\] Commutativity of addition can be derived as usual using the fact that \[x+(x+y)+y =_R (x+y)\cdot(\mathsf{I}+\mathsf{I}) =_R x + (y + x) + y.\] If you want to test your path manipulation skills, you can try to write down the resulting term for \[{\small\fbox{$\phantom{b}$commutativity$\phantom{q}$}}({\square+\square}) : \prod_{x,y:R} (x + y =_R y+x).\] However, as we discussed last time, this is not necessary since the proof is ultimately not relevant when the carrier of an algebra is a set.
Reading off the usual definition of unit, we obtain the type \[\operatorname{\mathsf{unit}}(x) :\equiv\sum_{y:R} (x\cdot y =_R \mathsf{I}).\] Elements of type \(\operatorname{\mathsf{unit}}(x)\) are pairs \((y,p)\), where \(y:R\) and \(p: x \cdot y =_R \mathsf{I}\). Because \(R\) is a set and inverses are unique, there is always at most one such \((y,p)\), which means that \(\operatorname{\mathsf{unit}}(x)\) is a proposition. Therefore, we can comprehend the set of units: \[R^\times :\equiv\{x:R \mid \operatorname{\mathsf{unit}}(x)\} :\equiv\sum_{x:R} \operatorname{\mathsf{unit}}(x).\] (Subset comprehension is discussed in §3.5 of the book.) Technically, elements of \(R^\times\) are triples \((x,y,p)\) where \(x,y:R\) and \(p:(x \cdot y =_R \mathsf{I})\). Thus \(R^\times\) is not merely a subset of \(R\) but each element of \(R^\times\) comes equipped with a justification for being in \(R^\times\). Since for each \(x:R\) there is at most one \((x,y,p):R^\times\), we can identify the two without much confusion but the extra coordinates are actually helpful for proving things about \(R^\times\).
To illustrate this, let’s outline the verification that \(R^\times\) is a group under multiplication. We will view elements of \(R^\times\) as pairs \((x,y)\) but we will forget the path witnessing that \(x \cdot y =_R \mathsf{I}\). Since we’re working with sets, we shouldn’t worry about paths anyway.^{1} To begin, we need to isolate the group operations:
First, we see that \((\mathsf{I},\mathsf{I}):R^\times\) since we have a path \(\mathsf{I}\cdot\mathsf{I}=_R \mathsf{I}\) by the identity axiom; this will be our group identity.
Next, we see that if \((x_0,y_0),(x_1,y_1):R^\times\) then \((x_0 \cdot x_1,y_1 \cdot y_0):R^\times\). Indeed, if \(x_0 \cdot y_0 =_R \mathsf{I}\) and \(x_1 \cdot y_1 =_R \mathsf{I}\) then \[x_0 \cdot x_1 \cdot y_1 \cdot y_0 =_R x_0 \cdot \mathsf{I}\cdot y_0 =_R x_0 \cdot y_0 =_R \mathsf{I}.\] From this, we directly extract our group multiplication \(\square\cdot\square:R^\times \times R^\times \to R^\times\).
Finally, we see that if \((x,y):R^\times\) then \((y,x):R^\times\). (This is immediate from commutativity but a slightly longer argument shows that this works even without this assumption.) As a result, we get the group inverse \(\square^{-1}:R^\times \to R^\times\).
To conclude the argument, it suffices to verify that \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(R^\times,\square^{-1},\square\cdot\square)\) is inhabited. Associativity and identity are immediate consequences of \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(R,\mathsf{I},\square\cdot\square)\). The tacit path coordinates are handy for inverses: the path \((x,y)\cdot(y,x) =_{R^\times} (\mathsf{I},\mathsf{I})\) simply consists of the two implied paths \(x \cdot y =_R \mathsf{I}\) and \(y \cdot x =_R \mathsf{I}\)!
The key difference between this proof and the classical proof that \(R^\times\) is a group is the way we used the definition of \(R^\times\). At each step of the classical proof, we need to invoke the definition of \(R^\times\) to get the relevant inverses, do the same computations, and then forget the extra information. In the proof-relevant argument, the subset \(R^\times\) incorporates its definition and the relevant inverses are already there to be used in computations. To keep the argument from involving too many (ultimately irrelevant) path computations, we still forgot one piece of the definition of \(R^\times\) in the outline above. This kind of selective unraveling can be useful since formal definitions can rapidly become unwieldy. We did end up invoking the path coordinates at the very end to verify the identity axiom. However, we didn’t really need the paths themselves, we just needed to remember that elements \((x,y)\) aren’t arbitrary pairs, they are pairs such that \(x \cdot y =_R \mathsf{I}\). This is what the seemingly irrelevant path coordinate does, the relevant data is not the path itself, it is the type \(x \cdot y =_R \mathsf{I}\) of the path that matters.
The lesson is that while proof-relevant mathematics does force you to carry extra baggage, that baggage is actually handy. Moreover, you can always manage the burden by selectively unraveling the contents of the baggage. In fact, you also carry this baggage when doing classical mathematics except that you conceal the baggage in your memory, recalling each item as you need it. The problem with this strategy is that it’s easy to forget things if you are not careful. Proof-relevant mathematics forces you to keep track of everything!
In the comments, Toby Bartels pointed out that the argument above contains a major type-theoretic faux pas: using type declarations as propositions. The cause of this is the deliberate omission of the path coordinate which leads to phrases like "we see that \((\mathsf{I},\mathsf{I}):R^\times\) since […]" and "we see that if \((x,y):R^\times\) then \((y,x):R^\times\)." The problem is that type declarations are either correct or not (and this is decidable!) but not true or false. Indeed, true and false are both values that can be used but correct and incorrect are not — there is no value in being incorrect! This may seem like a subtle difference but it is actually a very important one in type theory since there is little other separation of syntax and semantics.
Below, I produced three more variants of the above argument that avoid the trouble with Version 0 above. The first is simply does not omit the path coordinates. The second uses the membership relation \(\in\) where the original used an erroneous type declaration. The third is a direct translation of the classical proof, also using the membership relation. To facilitate comparison, I tried to keep the three versions as close as possible to the original. I am curious to know which of the four versions you prefer, or if you have yet another version that you prefer!
Version 1. Let’s outline the verification that \(R^\times\) is a group under multiplication. To begin, we need to isolate the group operations:
First, \((\mathsf{I},\mathsf{I},i):R^\times\) is our group identity where \(i:\mathsf{I}\cdot\mathsf{I}=_R \mathsf{I}\) is an instance of the identity axiom.
Next, the group product is defined by \((x_0,y_0,p_0)\cdot(x_1,y_1,p_1) := (x_0 \cdot x_1,y_1 \cdot y_0,q)\), where \[q:x_0 \cdot x_1 \cdot y_1 \cdot y_0 =_R x_0 \cdot \mathsf{I}\cdot y_0 =_R x_0 \cdot y_0 =_R \mathsf{I}.\]
Finally, group inverses are defined by \((x,y,p)^{-1} := (y,x,q)\), where \(q:y \cdot x =_R x \cdot y =_R \mathsf{I}.\) (A slightly longer argument shows that this works even without the commutativity assumption.)
To conclude the argument, it suffices to verify that \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(R^\times,\square^{-1},\square\cdot\square)\) is inhabited. Associativity and identity are immediate consequences of \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(R,\mathsf{I},\square\cdot\square)\). The path \((x,y,p)\cdot(y,x,q) =_{R^\times} (\mathsf{I},\mathsf{I},i)\) is composed of \(p:x \cdot y =_R \mathsf{I}\), \(q:y \cdot x =_R \mathsf{I}\), and the paths must match since \(R\) is a set.
Version 2. Let’s outline the verification that \(R^\times\) is a group under multiplication. By virtue of univalence, there are multiple ways to look at \(R^\times\). Instead of viewing it as a subset of \(R\) we can also view it as a subset of \(R \times R\), namely \[R^\times = \{(x,y):R \times R \mid x \cdot y =_R \mathsf{I}\} = \sum_{x,y:R} (x \cdot y =_R \mathsf{I}).\] Thus \((x,y) \in_{R \times R} R^\times\) denotes the proposition \(x \cdot y =_R \mathsf{I}.\) It turns out that this view of \(R^\times\) will be easier to work with. To begin, we need to isolate the group operations:
First, we see that \((\mathsf{I},\mathsf{I}) \in_{R\times R} R^\times\) since we have a path \(\mathsf{I}\cdot\mathsf{I}=_R \mathsf{I}\) by the identity axiom; this will be our group identity.
Next, we see that if \((x_0,y_0),(x_1,y_1) \in_{R \times R} R^\times\) then \((x_0 \cdot x_1,y_1 \cdot y_0) \in_{R \times R} R^\times\) too. Indeed, if \(x_0 \cdot y_0 =_R \mathsf{I}\) and \(x_1 \cdot y_1 =_R \mathsf{I}\) then \[x_0 \cdot x_1 \cdot y_1 \cdot y_0 =_R x_0 \cdot \mathsf{I}\cdot y_0 =_R x_0 \cdot y_0 =_R \mathsf{I}.\] From this, we directly extract our group multiplication \(\square\cdot\square:R^\times \times R^\times \to R^\times\).
Finally, we see that if \((x,y) \in_{R \times R} R^\times\) then \((y,x) \in_{R \times R} R^\times\) since \(x \cdot y =_R \mathsf{I}\) implies \(y \cdot x =_R \mathsf{I}\) by commutativity. (A slightly longer argument shows that this works even for non-commutative rings.) As a result, we get the group inverse \(\square^{-1}:R^\times \to R^\times\).
To conclude the argument, it suffices to verify that \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(R^\times,\square^{-1},\square\cdot\square)\) is inhabited. Associativity and identity are immediate consequences of \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(R,\mathsf{I},\square\cdot\square)\). The path \((x,y)\cdot(y,x) =_{R^\times} (\mathsf{I},\mathsf{I})\) is composed of \(x \cdot y =_R \mathsf{I}\) and \(y \cdot x =_R \mathsf{I}\), which hold because \((x,y), (y,x) \in_{R \times R} R^\times\).
Version 3. Let’s outline the verification that \(R^\times\) is a group under multiplication. Per definition of \(R^\times\), we will write \(x \in_R R^\times\) to mean \(\mathsf{unit}(x)\). To begin, we need to isolate the group operations:
First, we see that \(\mathsf{I}\in_{R} R^\times\) since \(\mathsf{I}\cdot\mathsf{I}=_R \mathsf{I}\) by the identity axiom; this will be our group identity.
Next, we see that if \(x_0,x_1 \in_{R} R^\times\) then \(x_0 \cdot x_1 \in_{R} R^\times\) too. Indeed, since \(\mathsf{unit}(x_0)\) and \(\mathsf{unit}(x_1)\) there are \(y_0,y_1:R\) such that \(x_0 \cdot y_0 =_R \mathsf{I}\) and \(x_1 \cdot y_1 =_R \mathsf{I}.\) Then \[x_0 \cdot x_1 \cdot y_1 \cdot y_0 =_R x_0 \cdot \mathsf{I}\cdot y_0 =_R x_0 \cdot y_0 =_R \mathsf{I}\] which shows that \(\mathsf{unit}(x_0 \cdot x_1).\) Thus our group multiplication is (the restriction of) the usual ring multiplication.
Finally, we see that if \(x \in_{R} R^\times\) then, by definition of the proposition \(\mathsf{unit}(u)\), there is a unique \(y:R\) such that \(x \cdot y =_R \mathsf{I}\). By commutativity, \(y \cdot x =_R \mathsf{I}\) which means that \(y \in_{R} R^\times\) and this \(y\) is the group inverse of \(x\).
To conclude the argument, it suffices to verify that \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(R^\times,\square^{-1},\square\cdot\square)\) is inhabited. Associativity and identity are immediate consequences of \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(R,\mathsf{I},\square\cdot\square)\). The remaining identity \(x \cdot x^{-1} =_{R^\times} \mathsf{I}\) follows immediately from the definition of group inverses.
]]>A signature is a sequence \(\sigma\) of sets indexed by natural numbers. The elements of the set \(\sigma_n\) are intended to be symbols for \(n\)-ary operations (\(0\)-ary operations are simply constants). For example, the signature for rings can be thought as \[(\{0,1\},\{-\},\{+,\cdot\},\varnothing,\varnothing,\dots).\] It is generally assumed that the sets \(\sigma_n\) are mutually disjoint so that each symbol has a definite arity. In the end, only the cardinality of the sets \(\sigma_n\) matters since the particular symbols used for the operations is mostly a matter of taste.
The logic of universal algebra is equational logic. The symbols from the signature \(\sigma\) can be strung together to form terms. In addition to the symbols from \(\sigma\), we need an infinite set \(\{x_0,x_1,x_2,\dots\}\) of variable symbols. Formally, terms are defined inductively using the two rules:
Equational logic deals with identities of the form \(t \approx s\) where \(t\) and \(s\) are terms. The rules for equational logic are reflexivity, symmetry, transitivity \[\frac{}{t \approx t}, \quad \frac{t \approx s}{s \approx t}, \quad \frac{t \approx s, s \approx r}{t \approx r}\] together with substitution and replacement \[\frac{t \approx s}{t[x/r] \approx s[x/r]}, \quad \frac{s \approx r}{t[x/r] \approx t[x/s]},\] where \([x/r]\) denotes the act of replacing every occurrence of the variable symbol \(x\) with the term \(r\). We write \(\Gamma \vdash t \approx s\) to signify that the identity \(t \approx s\) follows using a combination of these rules from the identities in \(\Gamma\).
An algebra \(\mathfrak{A}\) (with signature \(\sigma\)) is a type \(A\) together with an interpretation \[I:\left({\textstyle\sum_{n:\mathbb{N}} \sigma_n \times A^n}\right) \to A.\] If \(f\) is a symbol for an \(n\)-ary operation in \(\sigma_n\), then the interpretation \(f^{\mathfrak{A}}:A^n \to A\) is \[f^{\mathfrak{A}}(a_1,\dots,a_n) := I(f,a_1,\dots,a_n).\] These interpretations can be used to evaluate any term starting from a variable assignment \(\mathcal{V}:\mathbb{N}\to A\) in the usual recursive manner: \[x_i^{\mathfrak{A},\mathcal{V}} := \mathcal{V}(i), \quad (f t_1 \dots t_k)^{\mathfrak{A},\mathcal{V}} := f^{\mathfrak{A}}(t_1^{\mathfrak{A},\mathcal{V}},\dots,t_k^{\mathfrak{A},\mathcal{V}}).\] The satisfaction relation for the algebra \(\mathfrak{A}\) and the identity \(t \approx s\) is then the type \[(\mathfrak{A} \vDash t \approx s) :\equiv \prod_{\mathcal{V}:\mathbb{N}\to A} (t^{\mathfrak{A},\mathcal{V}} =_A s^{\mathfrak{A},\mathcal{V}}).\] If the carrier \(A\) is a set then each identity type \(t^{\mathfrak{A},\mathcal{V}} =_A s^{\mathfrak{A},\mathcal{V}}\) has at most one element and hence \(\mathfrak{A} \vDash t \approx s\) is a proposition with the usual meaning. In the general case, \(\mathfrak{A} \vDash t \approx s\) can carry substantial path information.
The fact that equational logic is sound and complete for the semantics is called Birkhoff’s Theorem: \[\Gamma \vdash t \approx s \quad\text{if and only if}\quad \Gamma \vDash t \approx s.\] Clasically, the right hand side means that every algebra \(\mathfrak{A}\) with signature \(\sigma\), whose carrier \(A\) is a set, that satisfies all equations in \(\Gamma\) also satisfies \(t \approx s\). In HoTT, this translates to the type \[(\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s) :\equiv \prod_{\mathfrak{A}/\operatorname{\mathsf{set}}} \left(\prod_{u \approx v:\Gamma} (\mathfrak{A} \vDash u \approx v) \to (\mathfrak{A} \vDash t \approx s)\right)\] where the outer product is over all set-based algebras \(\mathfrak{A}\) with signature \(\sigma\). An alternate interpretation is \[(\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s) :\equiv \prod_{\mathfrak{A}/\operatorname{\mathsf{any}}} \left(\prod_{u \approx v:\Gamma} (\mathfrak{A} \vDash u \approx v) \to (\mathfrak{A} \vDash t \approx s)\right)\] where the outer product is over algebras with signature \(\sigma\) based on any type. Consequently there are two possible ways to interpret Birkhoff’s Theorem in HoTT!
The first is closest to the classical version:
Weak Birkhoff Theorem. \(\left\Vert\Gamma \vdash t \approx s\right\Vert\) is equivalent to \(\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s\).
The vertical lines \(\Vert\square\Vert\) indicate that one should take the propositional truncation of \(\Gamma \vdash t \approx s\) (§3.7). Indeed, when translated into HoTT, \(\Gamma \vdash t \approx s\) is the type of all proofs of \(t \approx s\) from \(\Gamma\) in equational logic. The propositional truncation agglomerates all these proofs into one and simply asserts the unqualified existence of such a proof. This is necessary since the right hand side \(\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s\) is a proposition so the backward implication, without propositional truncation, would amount to a canonical choice of proof with basically no information.
On the other hand, \(\Gamma \vDash_{\text{any}} t \approx s\) is much more expressive because of all the possible identity paths. With all that information on hand, it is not so unlikely that we could reconstruct a proof. This suggests another interpretation of Birkhoff’s Theorem:
Strong Birkhoff Theorem. \(\Gamma \vdash t \approx s\) is equivalent to \(\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s\).
As I stated it just now, this is unlikely to be true. However, I suspect it becomes true with some tweaking of \(\Gamma \vdash t \approx s\) to only allow proofs that are normalized in some sense or to formally identify proofs that are trivial variations of each other. I haven’t had much time to think about this so I’ll leave it as an open conjecture for now and hope that I can come back to it later…
The soundness part (forward implication) of both forms are true. Last time, I emphasized the usefulness of the weak soundness theorem but I forgot to talk about substitution and replacement! We did talk about how paths could be reversed and composed to get symmetry and transitivity; details are in §2.1 of the book. Since I think it’s important to know what goes on behind the scenes, let’s talk about the remaining two rules. Unfortunately, and this explains the earlier omission, neither is particularly pleasant.
Substitution works exactly as you expect. An element \(p:(\mathfrak{A} \vDash t \approx s)\) is a function that maps each variable assignment \(\mathcal{V}:\mathbb{N}\to A\) to a path \(p(\mathcal{V}):t^{\mathfrak{A},\mathcal{V}} =_A s^{\mathfrak{A},\mathcal{V}}\). Given a term \(r\) and a variable \(x\), the corresponding element of \((\mathfrak{A} \vDash t[x/r] \approx s[x/r])\) is the composition of \(p\) with the function that sends each variable assignment \(\mathcal{V}:\mathbb{N}\to A\) to the assignment where all values stay the same except that the value of \(x\) is changed to \(r^{\mathfrak{A},\mathcal{V}}\). Verifying that the result really is an element of \(\mathfrak{A} \vDash t[x/r] \approx s[x/r]\) is a tedious computation which is best performed under a rug.
The trick behind replacement is revealed in §2.2: functions are functors. The identity types give each type a higher groupoid structure and functions act functorially in the sense that with any \(f:A \to B\) comes with more functions \(\mathsf{ap}_f\) which translate paths \(p:x =_A y\) into paths \(\mathsf{ap}_f(p):f(x) =_B f(y)\). The details for soundness of the replacement rule are gory but they are essentially the same as in the classical proof. There is some work needed to single out the variable \(x\) and interpret the term \(t\) as a function \(t^{\mathfrak{A},\mathcal{V}}_x:A \to A\) where all other variables are fixed. Then, \(\mathsf{ap}_{t^{\mathfrak{A},\mathcal{V}}_x}\) can be used to transform paths \(s^{\mathfrak{A},\mathcal{V}} =_A r^{\mathfrak{A},\mathcal{V}}\) into paths \(t^{\mathfrak{A},\mathcal{V}}_x(s^{\mathfrak{A},\mathcal{V}}) =_A t^{\mathfrak{A},\mathcal{V}}_x(r^{\mathfrak{A},\mathcal{V}})\). Thus, we obtain \[(\mathfrak{A} \vDash s \approx r) \to (\mathfrak{A} \vDash t[x/s] \approx t[x/r]).\]
Since each rule of equational logic is sound, we therefore have the strong soundness \((\Gamma \vdash t \approx s) \to (\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s)\). To get weak soundness, we compose this with the obvious implication \((\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s) \to (\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s)\) and we conclude that \(\left\Vert\Gamma \vdash t \approx s\right\Vert \to (\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s)\) from the fact that \(\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s\) is a proposition. The completenesss part of Birkhoff’s theorem uses syntactic algebras and HoTT has a really neat way to construct these that we will talk about in a later HoTT Math post.
In conclusion, equational logic works just fine in HoTT. In general, because of proof relevance, it is best to keep track of the equational proofs in order to keep track of the identity paths. The strong soundness map \[(\Gamma \vdash t \approx s) \to (\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s)\] guarantees that equational proofs will lead to identities, but different proofs can lead to different identity paths! When the carrier is a set, less care is necessary, the weak soundness map \[\Vert\Gamma \vdash t \approx s\Vert \to (\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s)\] shows that the mere existence of an equational proof leads to the desired identities. This is exactly how algebra and equational logic are used classically.
]]>A group is a set \(G\) equipped with a constant \(e:1 \to G\), a unary operation \(\square^{-1}:G \to G\) and a binary operation \({\square\cdot\square}:G \times G \to G\) that satisfy the usual axioms. In HoTT the group axioms must be motivated, so the group \(G\) comes with three more components: \[\begin{aligned}
{\small\fbox{$\phantom{Q}$associativity$\phantom{Q}$}} &: \prod_{x,y,z:G} (x\cdot(y\cdot z) =_G (x\cdot y)\cdot z) \\
{\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}} &: \prod_{x:G} (x \cdot e =_G x) \times \prod_{x:G}(e \cdot x = x)\\
{\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}} &: \prod_{x:G} (e = x^{-1} \cdot x) \times \prod_{x:G} (e =_G x \cdot x^{-1}) \\
\end{aligned}\] The axioms are concrete objects in proof-relevant mathematics! The last two axioms are actually the conjunction of two axioms since conjunction in HoTT correspond to products. It’s conceptually convenient to package them together and use \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_1\) and \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_2\) for the two conjuncts obtained by applying the projections. In fact, it makes sense to pack all of these into one \[{\small\fbox{$\phantom{Q}$group$\phantom{Q}$}} :\equiv{\small\fbox{$\phantom{Q}$associativity$\phantom{Q}$}}\times{\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}\times{\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}}\] (but, for the sake of clarity, it is best to refrain from using things like \({\small\fbox{$\phantom{Q}$group$\phantom{Q}$}}_1\) for \({\small\fbox{$\phantom{Q}$associativity$\phantom{Q}$}}\)).
The axioms types above are parametrized by \(G:\mathsf{Set}\) and the three components \(e\), \(\square^{-1}\), \(\square\cdot\square\), so one can form the type of all groups: \[\mathsf{Grp}:\equiv\sum_{\substack{G:Set\\e:1\to G\\\square^{-1}:G\to G\\\square\cdot\square:G\times G \to G}} {\small\fbox{$\phantom{Q}$group$\phantom{Q}$}}(G,e,\square^{-1},\square\cdot\square).\] This level of formalism is cumbersome and since it is perfectly clear what the type \(\mathsf{Grp}\) is from the narrative description, it is best to avoid it. The only difference from classical mathematics is that the axioms are given as concrete objects rather than abstract statements.
Familiar identities, for example the fact that \(e\) is its own inverse, can be obtained by combining the group axioms. We have a path \({\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}}_1(e):e =_G e^{-1}\cdot e\) and another path \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_1(e^{-1}):e^{-1}\cdot e =_G e^{-1}\). Concatenating these two paths yields the desired path \[{\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}}_1(e)\cdot{\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_1(e^{-1}):e^{-1} =_G e.\]
There are other ways to see this, for example, symmetric reasoning yields \[{\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}}_2(e)\cdot{\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_2(e^{-1}):e^{-1} =_G e.\] In general, these two paths are need not be the same but since \(G\) is a set, i.e., a 0-type, there is at most one path in \(e^{-1} =_G e\) and therefore the two paths above must be the same.
Let’s try something a tad more complicated — an old favorite — the uniqueness of identity elements. To say that \(u:G\) is a left identity element means exhibiting an element of the type \[\prod_{x:G} (u \cdot x =_G x);\] so \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_1\) shows that \(e\) is an identity element. Similarly, \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_2\) shows that \(e\) is a right identity element. Classically, we know that if \(u\) is a left identity and if \(v\) is a right identity then we must have \(u = v\). In HoTT, this corresponds to exhibiting an element of type \[\mathsf{lrid}:\equiv\prod_{u,v:G} \Big(\prod_{x:G} (u \cdot x =_G x) \times \prod_{y:G} (y \cdot v =_G y) \to (u =_G v)\Big).\]
To prove this, let’s fix \(u,v:G\). Given \(p:\prod_{x:G} (u \cdot x =_G x)\) and \(q:\prod_{y:G} (y \cdot v =_G y)\) we have paths \[\begin{aligned} p(v) &: u \cdot v =_G v, & q(u)&:u \cdot v =_G u,\end{aligned}\] and therefore a path \[q(u)^{-1} \cdot p(v): u =_G v.\] The classical proof would end here but to get the desired element of \(\mathsf{lrid}\), we need to wrap things up as follows. Since the final path was obtained uniformly from \(p,q\), we have an element \[\lambda p,q.q(u)^{-1} \cdot p(v):\prod_{x:G} (u \cdot x =_G x) \times \prod_{y:G} (y \cdot v =_G y) \to (u =_G v)\] and we can then unfix \(u,v:G\) to obtain the desired element \[\lambda u,v.\lambda p,q . q(u)^{-1} \cdot p(v):\mathsf{lrid}.\]
I slightly abused \(\lambda\)-notation in the argument above but the idea was only to give a hint how the argument could be formalized. In fact, we did not need to worry about this at all. Since \(G\) is a set, there is at most one path in \(u =_G v\) and thus there is also at most one element in \[\prod_{x:G} (u \cdot x =_G x) \times \prod_{y:G} (y \cdot v =_G y) \to (u =_G v)\] by function extensionality (§4.9). Therefore, the unique choice principle (§3.9) applies to give the desired element of \(\mathsf{lrid}\). In fact, \(\mathsf{lrid}\) also has exactly one element by function extensionality.
In the end, the classical proof of uniqueness of inverses was sufficient and unambiguous. In fact, the same is true for essentially all similar facts of elementary algebra. This is good for multiple reasons but most importantly this means that doing math in HoTT does not involve getting caught up in elementary stuff like this: you can invoke \(\mathsf{lrid}\) any time without referencing a particular proof since the proof is unique. There is one important caveat:
It is very important that the carrier type of a group is a set!
I fell into this trap when I asked this MathOverflow question. It is very tempting to think that the loop space \(\Omega(A,x) :\equiv(x =_A x)\) is a group for every \(x:A\). This is only true if \(A\) is a 1-type. Otherwise the carrier of \(\Omega(A,x)\) is not necessarily a 0-type, the uniqueness of paths is lost, \(\mathsf{lrid}\) can have many different proofs, which are all relevant and must not be forgotten!
In conclusion, elementary group theory works fine in HoTT. The general feel is a bit different but it’s more fun than cumbersome to see how the axioms work in relevant mathematics. The same is true for much of elementary algebra and, ultimately, proof relevance is never cumbersome since proofs are almost always unique. It is important to remember that the carrier of an algebraic structure must always be a set for this to work!
In the next installment of HoTT Math, we will continue with elementary field theory which presents an additional difficulty since we must handle the fact that multiplicative inverses do not always exist…
]]>This preamble will serve to accumulate a table of contents and various conventions and notations that come up along the way. The only prerequisites (or rather corequisites) are the first two chapters of the (free) Homotopy Type Theory book. Further prerequisites and reverences to later topics in the book will always be indicated where they occur.
We did not set out to write a book. The present work has its origins in our collective attempts to develop a new style of "informal type theory" that can be read and understood by a human being, as a complement to a formal proof that can be checked by a machine. Univalent foundations is closely tied to the idea of a foundation of mathematics that can be implemented in a computer proof assistant. Although such a formalization is not part of this book, much of the material presented here was actually done first in the fully formalized setting inside a proof assistant, and only later "unformalized" to arrive at the presentation you find before you — a remarkable inversion of the usual state of affairs in formalized mathematics.
The danger in writing such a book is to fall into the minefields of logic wars. The authors successfully avoided much of these traps, so logicians from other perspectives can read the book without too much cringing. To avoid unnecessary confusion, I recommend mentally substituting most instances of "set theory" by the more apropos "classical mathematics." Readers from strongly opposing points of view should be prepared for a certain amount of propaganda, which is to be expected in a book written to promote one point of view. Barring these caveats, you will find an enjoyable and well-written book on a very interesting subject. Readers should not be too concerned with the word "homotopy" in the title, homotopy theory is not required background for the book though some basic knowledge of the ideas of topology and homotopy theory helps to understand the motivation behind certain concepts.
Having addressed all the necessary caveats, let’s talk about why this book is interesting and why you should read it…
The most interesting aspect from my point of view is that HoTT fully supports proof-relevant mathematics, a way of doing mathematics where proofs are real objects that are manipulated on the same level as numbers, sets, functions and all the usual objects of classical mathematics. This is not a brand new idea, logicians have been playing with proofs in this way for a long while, but HoTT brings this idea into the realm of everyday mathematics and that is a major step forward in mathematics.
The key difference with first-order logic is that equality is not primitive. To define a type \(A\) one must also define what equality means for \(A\). Formally if \(x,y:A\) then \(x =_A y\) (or \(\mathsf{Id}_A(x,y)\)) is another type, an identity type, which can be thought of as made of reasons to identify \(x\) and \(y\). Elements \(p:x =_A y\) are often called "paths" by analogy with topolgy. Indeed, these paths can be inverted and concatenated to realize symmetry and transitivity of equality, respectively; reflexivity is realized by a path \(\mathsf{refl}_x:x =_A x\). Thus each type \(A\) is actually a groupoid rather than a plain set. In fact, since each \(x =_A y\) is itself a type with its own identity types and so forth, the type \(A\) is actually a kind of \(\infty\)-groupoid.
It is this rich structure associated with each type is what permits HoTT to support proof relevant mathematics. To get a basic feel of how this works, the statement "\(x =_A y\) and \(y =_A z\)" is interpreted via the product type \((x =_A y)\times(y =_A z)\), whose elements are pairs of paths that explain why \(x\) is to be identified with \(y\) and why \(y\) is to be identified with \(z\). Similarly, "\(x =_A y\) or \(y =_A z\)" is interpreted via the coproduct type \((x =_A y) + (y =_A z)\), whose elements are either paths that explain why \(x\) is to be identified with \(y\) or paths that explain why \(y\) is to be identified with \(z\). The catch, as you may have guessed from the last example, is that this form of constructive reasoning is intuitionistic and thus not as familiar to mathematicians.
Interestingly, the learning curve for constructive reasoning appears to be much less steep with HoTT than with other constructive frameworks. One of the reasons is that the topological interpretation of the key concepts is very intuitive but more significantly HoTT provides many tools to revert to more familiar territory. The analogue of a plain set in HoTT is a \(0\)-type: a type \(A\) where the identity types \(x =_A y\) always contain at most one path. In other words, these are types where the groupoid structure is trivial and contains no other information than how to handle equality of elements. It is consistent with HoTT that the \(0\)-types form a model of ETCS, a classical theory of sets and functions. Thus, by "truncating" thoughts to \(0\)-types, one can revert to a more familiar classical setting.
It is natural to identify things that are not significantly different. For example, the axiom of extensionality in set theory identifies sets that have the same elements since the elements of a set are all that matter in this context. Extensionality for functions identifies functions that agree on all inputs. Univalence is an indiscernibility axiom in the same spirit: it identifies types that are not significantly different.
To make sense of equality for types, we first need to put them in an ambient type, a universe, with its associated identity types. We can’t have a truly universal type since that directly leads to the usual paradoxes of self-reference. Instead, we have a bunch of universes such that each type belongs to some universe and each universe is closed under the basic type formation rules. Once we have a universe \(\mathcal{U}\) we can talk about equality of types in \(\mathcal{U}\), and because \(\mathcal{U}\) is a type we have a lot of freedom in defining what equality means for types in \(\mathcal{U}\).
This is exactly what the univalence axiom does. It can be stated elegantly: \[(A \simeq B) \simeq (A =_\mathcal{U} B).\] The equivalence relation \({\simeq}\) is similar to isomorphism but it is slightly more permissive. To say \(A \simeq B\) requires the existence of a function \(f:A \to B\) together with \(\ell,r:B \to A\) such that \(\ell \circ f\) is homotopic to \(\mathsf{id}_A\) and \(f \circ r\) is homotopic to \(\mathsf{id}_B\). Given two functions \(f\) and \(g\) of the same dependent product type \(\prod_{x:A} B(x)\), a homotopy from \(f\) to \(g\) is an element of \(\prod_{x:A} (f(x) =_{B(x)} g(x))\). So \(f\) and \(g\) are homotopic if they agree on all inputs, which does not mean that \(f = g\) in the absence of function extensionality.
In general type theory, \(A \simeq B\) is definitely a good way to say "\(A\) and \(B\) are not siginificantly different" and thus univalence arguably captures the right indiscernibility axiom for type theory. The surprising fact is that such a strong axiom does not appear to collapse more than it should. The benefits of univalence are interesting and need to be explored further.
I still need to digest the book so this is probably only the first of many posts on HoTT. The next posts will undoubtedly wander deeper into the technical levels of HoTT. There are a few interesting leads but nothing definite yet.
There is only one thing that bugs me right now, which is the way universes are handled in the book. However, since these assumptions do not appear to be crucial for the development of HoTT and there are plenty of alternatives out there, I’m not overly concerned about this at the moment.
I will eventually need to talk about higher inductive types. These are really interesting and I’m happy to see that the book devotes an entire chapter to them. This is a very interesting outgrowth of this project and which deserves study even independently of HoTT.
]]>