This however wasn’t eventually enough to make stick and the reason was I wanted to go more applied and maybe one day work on the private sector (the set theory I was into was maybe more philosophical than mathematical as my teacher worked at the department of philosophy). Also the funding is not that easy to get for the foundations as it is for many other subjects. But I admit that it was a close call and would not have given it a try without a teacher that was so excited.
]]>To your last question on the private sector: what I’ve been told is that mathematicians aren’t hired in the private sector because of their choice of courses, but because learning these courses has ‘sculpted’ their thinking into becoming highly efficient problem-solving machines. Set theory still accomplishes that goal, as any other mathematical discipline.
]]>Does anyone even feel remotely embarrassed saying that their set theory research has no foreseeable practical applications? Does anyone feel uncomfortable with graduate students studying set theory while many of these students will eventually need to work in the private sector?
]]>This post is very useful for something (philosophical) I’m writing up now. Is there a way of citing it, or have versions of this appeared in print?
Best,
Neil
]]>A quick follow up on (2.): For sure the consistency strength will be way higher, but I’m wondering if such a j directly *implies* other class principles. So, for example, if I add such a j over NBG, can I get any class comprehension? There’s at least some places where they converge (e.g. in both $\Pi^1_1$-Comp and $NBG + \exists j: M \to V$ you can get a first-order truth definition.
I’m looking forward to speaking to Kameryn (we’ve had a couple of e-mail interchanges already).
]]>1. KM is equiconsistent with KM+, which has the class-choice principle, and so there is no increase in consistency strength to add class-choice. Indeed, it is conservative over KM for first-order assertions, since every KM model has a KM+ realization with the same sets.
2. The embeddings j:M to V, however, have a consistency strength on the order of “Ord is Ramsey” according to the Vickers and Welch result, so this consistency strength does go beyond KM. I think that this axiom is expressible in second-order set theory, since M and j are classes, and so it would fit into the hierarchy, but the strength is beyond KM. (Note that KM itself is weaker in consistency strength than ZFC + there is an inaccessible cardinal, which is fairly mild by large cardinal standards.
3. That is interesting. It seems to me that there is a huge diversity of models of the second-order theory, since most of the variations in first-order set theory carry over to this realm. So we can consider models of GBC with CH or GCH or not or with large cardinals and so on. A difference, however, is to look at how one can make changes in the second-order part of a model, while keeping the same first-order part. My student Kameryn Williams has some results in this perspective.
]]>0 likes
]]>1. I know that strong choice principles for classes are largely independent of KM, is there work examining the consistency strength of these principles? More generally, is there much of a space above KM?
2. I wonder where the Welch embedding $j: M \longrightarrow V$ might fit in here (say added over GBC or KM). While necessarily second-order, that doesn’t clearly fit into the template you’ve got there, and substantially goes beyond any of the theories outlined in terms of consistency strength (in virtue of the large cardinals it implies). [Years ago I had an idea to try and reverse over the existence of such an embedding, but couldn’t get anything other than the very trivial.]
3. More speculatively: Clearly this slots nicely into your Multiverse View. However, do you think there are substantial differences in argumentation for Multiversism between the first-order and second-order claims? It’s interesting because there seems to be less of a `zoo’ of worlds, and you don’t obviously get incompatibility (perhaps because of inattention due to metamathematical quibbles)? Is there a philosophical equivalent of CH and ~CH in the second-order case?
Best,
Neil
]]>0 likes
]]>Indeed, although the mathematical project here is inspired by the issue of actualism versus potentialism, it is not meant to be taken as a piece of potentialist mathematics. The situation here is not unlike a logician who might use classical logic in order to analyze the power of a particular intuitionistic logical system.
My view of the project is that we seem able to use this kind of nonstandard analysis to provide a way in part to make sense of some views, such as ultrafinitism, which otherwise have had such a difficult time to be coherently expressed. What we have here in our potentialist system is a mathematical model that appears to share many of the main features that the ultrafinitists assert about the nature of mathematical reality, and which has a kind of friendly simulation of those views, if one adopts a nonstandard perspective, but of which we can provide a full mathematical account using the actualist mathematics at our disposal.
The conclusion is that our analysis seems to provide reasons to expect the ultrafinitist/potentialist perspective to have S4 as its potentialist validities, while opening the door to S5 for sentences.
In addition, our analysis tends to highlight some issues with philosophical significance, which we haven’t otherwise seen discussed enough. For example, using the arbitrary-extension model of potentialism rather than the end-extension model corresponds to the philosophical idea for an ultrafinitist that one might gain access to some large numbers, without necessarily yet having access to all smaller numbers. For example, perhaps it makes sense for an ultrafinitist to be able to analyze the number googol-plex-bang, which is a truly enormous number yet having a comparatively small description, without yet being committed to the actual existence yet of all the smaller numbers.
In our potentialist system, we nevertheless prove that still S4 is the class of validities for this version of potentialism.
Ultimately, my view of the philosophical value of the mathematical project is that the potentialist systems we are studying seem to exhibit many of the features that the potentialists seem to want, while avoiding the problematic issues, such as the commonly mentioned awkwardness for an ultrafinitist to assert the existence of a largest number. So we feel that by analyzing these potentialist systems, we gain philosophical insight into the potentialist viewpoint.
]]>I may be off the mark, but historically isn’t the statement “Consider the collection of all the models…” precisely of the type that is rejected by those who only want to consider potential infinities ? I thought that they would not want to deal with as an actual object, but only talk about “ for as large as one may whish” (a never-ending process rather than a given object).
If so, then in what way is your work on “the distinction between actual and potential infinity”, rather than on “exploring the structure of all possible models” (possible, not potential) ?
]]>So therefore, if for any two H values the ratio of the
(sin(H*pi/N) * csc(pi/N))^2
is irrational,
then no N-gon is embeddable on an integer lattice in any dimension.
When H=1 and 2 this formula simplifies and the criterion becomes
that
4*cos(pi/N)^2
be irrational. And this indeed is irrational for every N>0 besides 1,2,3,4,6.
J. H. Conway, C. Radin, L. Sadun:
On Angles Whose Squared Trigonometric Functions Are Rational,
Discrete & Computational Geometry 22 (1999) 321-332.
And sure enough, regular triangles and hexagons are embeddable in the 3D integer lattice
on planes x+y+z=constant, despite their non-embeddability in the plane,
and in the plane N=4 is embeddable, and on the line N=1 and 2.
So this completely settles the question for all N in all dimensions.
–Warren D. Smith
=Charlie Sitler
212 755 6666
]]>1 likes
]]>But lest we also forget that part of academic dishonesty is citation inflation. And I hear this from people who are not academic and are slightly familiar with paper writing (“You know, it’s this I’ll cite you and you’ll cite me sort of thing, right?”).
So even if this is not the case, there is a reason to be slightly more careful with these things.
0 likes
]]>But let us not forget that, foremost, citations are there for the author to:
1. give credit;
2. point out a noteworthy read.
p.s.
3rd parties who would analyze Shelah’s papers are likely to come to the ridiculous conclusion that he intentionally insert self-citations.
2 likes
]]>And that’s not a good look for a small circle of mathematicians. And probably some people inside this circle don’t like it either.
(But I’ll reiterate and say that I generally agree with the whole approach to ample bibliography.)
0 likes
]]>Generally speaking, I do tend to include quite a lot of citations of a “see also” nature, simply to save the time of my readers who would otherwise have to redo the literature scan that I already did. Apparently, some anonymous referees forbids this, making life harder for everyone.
1 likes
]]>2 likes
]]>0 likes
]]>Yes, Shore’s argument is the stepping-up $\aleph_1\nrightarrow[\aleph_0 : \aleph_1]^2_{\aleph_1}$ to $\aleph_2\nrightarrow[\aleph_1]^3_{\aleph_1}$ using a Kurepa tree. I bet Hajnal knew this argument as well, but in the Erdos-Hajnal-Mate-Rado book (Theorem 51.2), this is attributed to (Shore, 1974).
Generally speaking, the challenge of this tutorial is to deliver as many ideas as possible while keeping a reasonable flow. To obtain such a flow, one sometimes needs to connect disjoint components. In any case, I have updated the slides to include a citation on page 14, and changed the hypothesis of Shore’s theorem to better reflect what was known back in 1974.
There are of course many additional ideas that did not make it into this 3-lectures tutorial. I may expend these slides in the future, but not before I am done with a couple of previous commitments.
0 likes
]]>https://meta.mathoverflow.net/a/3278/30967
Please do not get too upset about the “forced time out”, personally I would have given you the advice to step back a bit until the moderators have sorted out and corrected things too 😉
I generally think MathOverflow is a great gem, full of immensely bright and nice cool people who are very helpful to each other.
And you as a high-rep user are plainly among those nice people, thinking about cool interesting highly non-trivial topics and generously sharing your knowledge on MO.
So please don’t let a few brainless trolls spoil it for you.
Best wishes
Dilaton
This is not really an application but just that this fact was exploited here.
]]>I guess we can formulate this notion as the filter being rigid. Then one of the first obvious question would be, are there rigid filters on $\omega$ which are not ultrafilters?
]]>The older one is here:
http://projecteuclid.org/euclid.bsl/1182353825
I hope you’ll be posting slides, and do you know if there will be a live feed (or video for later)? I’d love to watch along as I did last year on Joel’s 50th.
]]>So, it took a bit longer than expected to get this ship-shape, but I’m basically there. You can find the ctms paper here:
and the coding of extensions paper here:
https://www.academia.edu/31683677/Universism_and_Extensions_of_V
There will probably be a few tweaks to be made, but its getting there. The former is submitted, and the latter will be soon!
Best,
Neil
]]>Where I grew up in Mtl is unusual: right downtown, at the corner of City Councillors and Sherbrooke St (1951-1960). Hard to believe there were attached single-family houses on that street!
]]>But I ended up at Smith College, and then Macalester College, and am now retired in Colorado (at 10,000 feet elev).
]]>
(defun ancestors (node) ;; => list
"Returns the list of strict ancestors of NODE."
(when (node-parents node)
(remove-duplicates
(apply #'append
(node-parents node)
(map 'list #'ancestors (node-parents node))))))
(defun graph-implies-p (X Y)
(member X (cons Y (ancestors Y))))
I say I’m not convinced because `graph-implies-p` is only , and N is only 427. Most time is spent on `graph-implies-not-p`, which also searches direct descendants/ancestors first, but seems to be necessarily . What I think might make a difference (when drawing very large graphs) is only graphing direct implications of the subgraph, instead of graphing everything and having `dot tred` remove the extra edges. Perhaps depths will help there. Correctness of the predicates is most important, so I’m reluctant to complicate the code. Since very large graphs seem to only have an aesthetic use (they aren’t really readable), taking care of this is not high in my priority list. :)
Could you elaborate on your suggestion to ‘track some data and create some caching mechanism for “common forms”‘?
]]>And this whole thing sends me back to my freshman year, learning about BFS and DFS algorithms. In this case, I’d go with a mix of the two, breaking the depth into chunks of several implications at a time (ranked from “most directly implied” to least) and going through each of them BFS-style.
]]>But of course that this decision has to be ratified by a majority of 3/4th before we can even approach the Vogons with this offer, which will then require several dozen forms and committees to process this through until they are able to debate whether or not they want to populate this additional committee for deciding what gets into the Choiceless Dictionary.
]]>We can test such a feature for CGraph with another (smaller?) dataset of implications, for example large cardinals in ZFC? Or even reverse mathematics, but I bet that’s definitely not a smaller dataset.
I suppose the issue would be mostly who gets to and has time to decide which forms should be added and which results are incorporated.
]]>(Mike- I cleaned up the link.)
]]>The argument above gives a counterexample assuming the existence of some statement $\phi$ such that $\phi \lor \lnot\phi$ is not true. In that case, the set $\{p,\{0\}\}$ is not decidable since $p = \{0\}$ is equivalent to $\phi$. Of course, and this is the main point of my post, this example uses extensionality.
]]>For point-free topology, every space does have such a core. In point-free topology, every frame $L$ has a smallest dense sublocale $\mathrm{Ro}(L)$ consisting of all regular open elements. One should think of $Ro(L)$ as something like the “intersection” of all dense open elements in $L$. The sublocale $\mathrm{Ro}(L)$ is a complete Boolean algebra, and every complete Boolean algebra is a completely regular frame (and much more).
If $L$ is a compact hausdorff frame, then one can endow the frame $\mathrm{Ro}(L)$ with extra structure that makes $\mathrm{Ro}(L)$ into a De-Vries algebra $(\mathrm{Ro}(L),\preceq)$. The relation $\preceq$ should be thought of as a proximity on the frame $Ro(L)$. One can recover the compact Hausdorff frame $L$ from the De-Vries algebra $(\mathrm{Ro}(L),\preceq)$ since the category of all De Vries algebras is equivalent to the category of all compact Hausdorff frames which is equivalent to the category of all compact Hausdorff spaces.
There does not seem to be any canonical way of taking a canonical dense subspace of a completely regular space without resorting to point-free topology though.
]]>If I had to guess, it would be the thesis that HOTT can replace ZFC as an adequate “foundation for mathematics”?
]]>All this from one featured image, exactly! :) Just imagine what will come out while staring at the wallpaper version of the full diagram! :D
]]>It’s not clear to me why 104 and 182 are not equivalent (cofinality of an ordinal is always regular).
There is a parenthesis mismatch in 58.
It’s not hard to show that 365 implies 170, but not vice versa.
108 seems a bit odd, since $\omega_2$ is itself never the countable union of countable sets (it could be the countable union of countable unions of countable sets), so it’s power set certainly isn’t either.
And that’s all from the image appearing on this post… :)
]]>Please do comment with any observations on this featured diagram if you see something!
]]>Back to the origin, when I came back to Hanover, New Hampshire. I’ve now moved to Burlington, Vermont, which is roughly half way between Hanover and Montréal, Québec, which is my true origin. I’m very happy to start a new position at the University of Vermont!]]>
The book by Hindman&Strauss has a section on filters and compactifications in a later chapter — that might be suitable for 2).
]]>Possibly relevant is my October 2004 sci.math post “Generalized Quantifiers” (URLs below). FYI, the Math Forum version has a lot of strange formatting errors. See also Brian Thomson’s 1985 book “Real Functions”, and see Thomson’s earlier 2-part survey Derivation bases on the real line (which contain examples and side-detours not in his book).
google sci.math URL:
https://groups.google.com/forum/#!msg/sci.math/rhZEhXynVLQ/MI0MJ0ZQIvoJ
Math Forum sci.math URL:
http://mathforum.org/kb/message.jspa?messageID=3556191
Your theorem is an example of an existential sentence about cardinals in the language with only \(\lt \) and exponentiation. Can you determine which sentences in that language are provable in ZF? More generally, expand the set of sentences about cardinals considered, obviously to include addition and multiplication, and perhaps alternating quantifiers.
After Peter Krautzberger’s tiny blogging challenge, I decided to spend 30 minutes thinking and writing about this…
I will allow myself some liberties in interpreting Friedman’s open ended question. The celebrated solution of Hilbert’s Tenth Problem by Davis, Matiyasevich, Putnam and Robinson shows that the existential theory of the finite cardinal numbers (even without exponentiation) is as complex as it could be: it is \(\Sigma_1\)-complete. I’ll interpret Friedman’s question as asking whether the existential theory of all cardinal numbers could be simpler than that of finite cardinals.
My gut instinct is: of course not! But this is not a trivial question. For example asking whether \(p^n + q^n = r^n\) has a solution where \(2 \lt n\) and \(0 \lt p,q,r\) has proven extremely challenging for finite cardinals but there are plenty of solutions with infinite cardinals!
One plan of attack is to try to define the finite cardinals among all cardinalsusing an existential formula. To get started, I’ll start with a rich language including constants \(0\), \(1\), relations \(=\), \(\lt \) as well as addition, multiplication and exponentiation. Assuming the Axiom of Choice, finite cardinals are defined by the simple quantifier-free formula \[\mathfrak{p} \lt \mathfrak{p}+1.\] Without assuming choice, this only defines the Dedekind finite sets. Fortunately, if \(\mathfrak{p}\) is infinite then \(2^{2^{\mathfrak{p}}}\) is never Dedekind finite. So the finite sets are always described by the quantifier-free formula \[2^{2^{\mathfrak{p}}} \lt 2^{2^{\mathfrak{p}}} + 1.\] This shows that the existential theory of all cardinals in this rich language is as complex as that of finite cardinals.
Do things change if we restrict the language? Indeed, the first part of Friedman’s question only mentions exponentiation. For finite cardinals, we can recover addition and multiplication using familiar rules of exponents: \[\begin{aligned}
a \times b &= c &&\iff& (2^a)^b = 2^c; \\
a + b &= c &&\iff& (2^{2^a})^{2^b} = 2^{2^c}. \\
\end{aligned}\] We can even eliminate \(0\) and \(1\) since any finite base other than \(0\) and \(1\) will do instead of \(2\). So, for example, \[a \times b = c \iff \exists x,y,z(x \lt y \land y \lt z \land (z^a)^b = z^c).\] Therefore, the existential theory of finite cardinals with just exponentiation is just as complex as that with addition and multiplication.
Of course, we can’t use this to recover addition and multiplication of infinite cardinals using the methods we used above, but if we can define the class of finite cardinals then we can use the above to recover addition and multiplication for these cardinals. Can we define finite cardinals using only \(=\), \(\lt \) and exponentiation? Well, the finite cardinals are the only ones that satisfy the monstrous inequality \[2^{2^{2^{2^{\mathfrak{p}}}}} \lt \left(2^{2^{2^{2^{\mathfrak{p}}}}}\right)^2.\] We can then eliminate \(2\) using the same trick as above to get an existential definition: \[\exists\mathfrak{q},\mathfrak{r},\mathfrak{s}\left(\mathfrak{q} \lt \mathfrak{r} \land \mathfrak{r} \lt \mathfrak{s} \land \mathfrak{s}^{\mathfrak{s}^{\mathfrak{s}^{\mathfrak{s}^{\mathfrak{p}}}}} \lt \left(\mathfrak{s}^{\mathfrak{s}^{\mathfrak{s}^{\mathfrak{s}^{\mathfrak{p}}}}}\right)^{\mathfrak{s}}\right).\] So my gut instinct was right: the existential theory of all cardinals using only \(=\), \(\lt \) and exponentiation is at least as complex as that of finite cardinals!
This took more than 30 minutes but it didn’t take much more and it was a lot of fun! Hopefully, the time constraint didn’t impact the content quality too much…
]]>I’m looking forward to meeting you in Munich!
]]>Regarding Moving around every few years for postdocs would be exhausting
, I wholeheartedly agree! Every such decision has pros and cons. They are usually not obvious at the outset. Moreover, they change over time. The lesson I learned though my own course is that every decision is right, at least at the time you make it, and there is never any point regretting it later on… Just keep on doing what you do best all the time!
We went on to look at the paradoxical situations that arise when someone gets a positive test result for a rare disease. Should they be worried? If the test is 99% accurate, but the disease occurs in say, 1 in million, then a positive result is not so worrisome: in million people, there will be about 1 true positive result, and about 10,000 false positives, since 1% of a million is 10,000. So the odds that you’ve actually got the disease, given that you tested positive, is 1 in 10,000.
But your situation is completely different! It would be as though we had calculated the odds of getting HHHTTT, and then when I actually flipped the coin on stage, i actually got the same pattern HHHTTT. Totally weird! And very unlikely. But you know, if it wasn’t that, it would have been some other totally unlikely thing, like getting all green lights, or all red lights, or getting the serial number 123456 on your receipt at Starbucks.
]]>The issue with Bauer’s description is the type family \(\tau.\) Formally, a type family indexed by a type \(\mathcal{U}\) is an element \(\tau:\mathcal{U}\to\mathcal{V},\) where \(\mathcal{V}\) is some larger universe. This type \(\tau\) becomes a type family only after applying Tarski-style coercion associated to \(\mathcal{V}\) that promotes elements of \(\mathcal{V}\) to actual types. On the face of it, this is circular: \(\tau\) is the Tarski-style coercion associated to the universe \(\mathcal{U},\) which is defined in terms of that of \(\mathcal{V},\) which…
One way to break this circularity is to have a designated super universe \(\mathcal{U}^\ast\) — hence the name Super HoTT — whose associated Tarski-style coercion and associated machinery are implemented in the formal system itself. In particular, \(\mathcal{U}^\ast\) is associated to a primitive decoding function \(T\) such that \(T(A)\ \mathsf{type}\) for any element \(A:\mathcal{U}^\ast.\) In Super HoTT, this is the only such decoding function. Other universes arise à la Bauer, i.e., as structures \((\mathcal{U},\tau,\pi,\sigma,\ldots)\) where now \(\tau:\mathcal{U}\to\mathcal{U}^\ast\) is a type family that realizes this universe’s Tarski-style coercion via composition with the super universe’s decoding function \(T.\) In fact, since there is only one super universe, the Russell-style approach doesn’t lead to many problems. In other words, there is little harm in suppressing the decoding function \(T\) in favor of a simple Russell-style rule \[\frac{A:\mathcal{U}^\ast}{A\ \mathsf{type}}.\] In fact, the \(\mathsf{type}\) judgment is often informally confounded with an actual universal type \(\mathsf{Type}.\) This type of identification can lead to serious issues such as Girard’s Paradox; the super universe approach is one of the many safe ways around these problems.
So far, this is pure type theory, we haven’t said anything HoTT specific. To make this a flavor of HoTT, we postulate that there are enough univalent universes or even lots of them. Universes à la Bauer are mere structures, they can be univalent or not, they can have extra structure or lack some structure. Univalent universes can be identified among these structures so we can formalize a rule saying that any two \(A,B:\mathcal{U}^\ast\) are contained in a univalent universe. One can also formalize rules saying that one can have cumulative univalent universes, cumulative univalent universes with propositional resizing, cumulative classical univalent universes, etc. The possibilities for specialized flavors of Super HoTT are vast and varied!
Note that the super universe doesn’t have to be univalent. It doesn’t need to have much fancy structure at all. It’s not even necessary for \(\mathcal{U}^\ast\) to have a natural number type. All of that can happen inside the univalent universes à la Bauer. In fact, a bare-bones super universe is probably more flexible framework than a rich one. Different choices of structure for the super universe and how it interacts with the universes inside it will lead to different flavors of Super HoTT with variable strength. Richer super universe structure allows one to express stronger and stronger properties for the univalent universes it contains.
One advantage of this approach is that universes can arise "on their own" in the sense that any suitable structure \((\mathcal{U},\tau,\pi,\sigma,\ldots)\) is a universe, no matter how it comes about. Though this can be a mixed blessing if one wants more hands-on control over which types are universes. One potential drawback of Super HoTT is that not all types can belong to univalent universes. In particular, the super universe \(\mathcal{U}^\ast,\) cannot belong to any univalent universe inside \(\mathcal{U}^\ast.\) In this way, Super HoTT is more like \(\mathsf{NBG}\) or \(\mathsf{MK}\) than \(\mathsf{ZF}.\) For ontological reasons, it might be preferable to have a more self-contained system analogously to \(\mathsf{ZF}.\) In any case, Super HoTT is one of the more flexible formal ways of handling universes in HoTT and it is a perfectly viable alternative to the formal system presented in the appendix of the HoTT book.^{2}
E. Palmgren, On universes in type theory. In Twenty-five years of constructive type theory (Venice, 1995), Oxford Logic Guides 36, Oxford University Press, New York, 1998, pages 191–204.
Univalent Foundations Program, Homotopy Type Theory: Univalent Foundations of Mathematics, Institute for Advanced Study, Princeton, 2013.
Also note that the issues I bring up are mostly irrelevant to conducting everyday mathematics in HoTT. While this post is largely for people interested in technical aspects of HoTT, I will nevertheless try to keep the the post at a medium level of technicality since these aspects of HoTT do come occasionally up in everyday mathematics.
In section 1.3, the HoTT book introduces universes as follows:
[W]e introduce a hierarchy of universes \[\mathcal{U}_0 : \mathcal{U}_1 : \mathcal{U}_2 : \cdots\] where every universe \(\mathcal{U}_i\) is an element of the next universe \(\mathcal{U}_{i+1}\). Moreover, we assume that our universes are cumulative, that is that all the elements of the \(i\)^{th} universe are also elements of the \((i+1)\)^{st} universe, i.e. if \(A : \mathcal{U}_i\) then also \(A : U_{i+1}.\)
In the comment discussion mentioned in the preamble, I expressed concerns with the nature of the subscripts in this description. As a result of this discussion, some minor clarification were made to the book. In particular, the following paragraph was added at the end of section 1.3:
As a non-example, in our version of type theory there is no type family "\(\lambda (i:\mathbb{N}).\,\mathcal{U}_i\)". Indeed, there is no universe large enough to be its codomain. Moreover, we do not even identify the indices \(i\) of the universes \(\mathcal{U}_i\) with the natural numbers \(\mathbb{N}\) of type theory (the latter to be introduced in §1.9).
And further remarks were added in the notes to chapter 1. The purpose of this post is to explain why this matters and also to explain how fiddling with the structure of universes affects HoTT. I will focus on Classical HoTT (HoTT with the Axiom of Choice for sets and hence the Law of Excluded Middle for propositions) though comparable issues arise without these assumptions.
To illustrate the issue, let’s peek at chapter 10 and in particular at section 10.5, where the book describes the interpretation of the cumulative hierarchy of sets inside HoTT. In any universe \(\mathcal{U}_i\) one can find a type \(V_i\) that faithfully interprets the cumulative hierarchy and (since we are working in Classical HoTT) Theorem 10.5.11 shows that \(V_i\) is a model ZFC. In particular, HoTT proves the consistency of ZFC. In fact, HoTT proves the consistency of even stronger theories! With some work, one can show that each \(V_i\) is a level of \(V_{i+1}\) whose ordinal rank \(\kappa_i\) is an inaccessible cardinal in \(V_i.\) Therefore, for each \(i,\) Classical HoTT proves the consistency of \(\mathsf{ZFC}+ \mathsf{IC}_i,\) where \(\mathsf{IC}_i\) abbreviates the statement "there are at least \(i\) inaccessible cardinals."
Does this mean that Classical HoTT proves \(\forall i\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\)? No, it doesn’t. To understand this, remember that \(\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\) is a purely arithmetical statement which basically states that there is no proof of \(0 = 1\) from the axioms of \(\mathsf{ZFC}+ \mathsf{IC}_i.\) Hence \(\forall i\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\) is also an arithmetical statement. The only way for HoTT to make sense of this is for \(i\) to range over the type of natural numbers \(\mathbb{N}.\) This is explicitly excluded by the remark quoted above from the end of section 1.3.
The reason for this exclusion becomes more visible if we use an alternative way to handle universes, replacing the hierarchy with a rule which simply says that any two types are contained in a common universe. More precisely (but still suppressing many technical details) given any two types \(A,B\) we can introduce a universe type \(\mathcal{U}\) such that \(A, B : \mathcal{U}.\) The above argument showing that Classical HoTT proves the consistency of \(\mathsf{ZFC}+ \mathsf{IC}_i\) requires \(i\) applications of this rule in order to obtain a sufficiently long chain of universes. Since proofs are finite, a proof can only use the above universe rule finitely often. Therefore, this process does not lead to a proof of \(\forall i:\mathbb{N}\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\) in Classical HoTT.
In order to get an actual proof of \(\forall i:\mathbb{N}\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\) in we need to have a hierarchy of universes indexed by \(\mathbb{N}.\) More formally, we need a type family \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) which will allow us to use induction on \(\mathbb{N}\) to obtain an actual proof of \(\forall i:\mathbb{N}\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i).\) Doing so, we inadvertently obtain more than we set out. The type family \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) must live in some universe \(\mathcal{U}\) and the cumulative hierarchy \(V\) associated with this \(\mathcal{U}\) satisfies \(\mathsf{ZFC}+ \mathsf{IC}_\omega,\) where \(\mathsf{IC}_\omega\) says that there are infinitely many inaccessible cardinals. Therefore Classical HoTT with the hypothesis of such a family \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) proves \(\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_\omega),\) which is much stronger than what we set out to do.
In fact, Classical HoTT with \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) proves \(\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_{\omega+1}),\) \(\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_{\omega+2}),\) and so on. Why stop there? What if we have a family \(\lambda i:W\,.\,\mathcal{U}_i\) for every wellordering \(W\)? Can we go beyond? These "large universe hypotheses" are direct analogues of large cardinal hypotheses from set theory. These may seem frivolous at first but such hypotheses necessarily come into play to answer concrete mathematical questions in Classical HoTT such as whether it is possible to extend Lebesgue measure to all subsets of \(\mathbb{R}.\)
Let me use this opportunity to point out a feature of the alternative way to handle universes using a rule as outlined above: the rule implies that every type belongs to some universe. The book handles this with this brief remark in section 1.3:
When we say that \(A\) is a type, we mean that it inhabits some universe \(\mathcal{U}_i.\)
This is a clever workaround that unfortunately has some side effects. There is no way to formally say "every type belongs to some \(\mathcal{U}_i\)." The universes \(U_i\) are named constants which means that there is no way to quantify over them. Thus effect of the remark is not to prohibit other types but only to declare anything else to be an "illegetimate" type. The intent is to simplify technical issues but it makes types like \(\lambda i:\mathbb{N}\,.\,U_i\) illegitimate and effectively blocks the possibility of large universe hypotheses as described above.
Mike Shulman suggested a band-aid for this problem in the discussion mentioned earlier, pointing out that there could be universes, even legitimate ones, other than the named universes \(\mathcal{U}_i.\) For example, we could introduce an infinite hierarchy of universes in \(\mathcal{U}_{17}\) if we wanted to. This is true but I do not find this fix fully satisfactory. Universes are very special types that are equipped with a lot of technical machinery to allow elements of universes to become actual types and much more. The book mostly suppresses this machinery, and rightfully so since there are many options which are all very technical and not particularly enlightening. Nevertheless, this machinery is implicitly present and it does not appear on its own. My reading of the syntactic hierarchy used in the book is that this machinery is instantiated for each named universe \(\mathcal{U}_i,\) specifically. To say that there might be other universe suggests that players don’t have all the rules of the game, which is disconcerting. My alternative proposal with a rule to introduce a universe containing any two types implicitly assumes that there is some form of additional judgment for universes (or similar device) and that this judgment instantiates all the machinery to make a universe. I find this kind of approach more satisfactory since it allows one to add hypotheses such as the existence of \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) without changing the rules of the game.
Let me close by mentioning that there are plenty of alternatives for these issues. For example, it is not necessarily desirable for every type to belong to a universe. While formalizing a proof \(\forall i:\mathbb{N}\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\) above, we accidentally proved a lot more because \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) had to live in some universe. If \(\lambda i:\mathbb{N}\,.\,\mathcal{U}_i\) doesn’t live in a universe, but induction on \(\mathbb{N}\) still works for such types, we can still prove \(\forall i:\mathbb{N}\,\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_i)\) but we can’t necessarily prove \(\operatorname{Con}(\mathsf{ZFC}+ \mathsf{IC}_\omega)\) as we did above. In fact, there is no need for a single solution: \(\mathsf{ZFC},\) \(\mathsf{NBG},\) \(\mathsf{MK}\) and other set theories have always coexisted pleasantly. However, there is a need to discuss these solutions and to appropriately identify them so that there is no pointless confusion when such aspects of HoTT do matter.
]]>The main issue with fields is to correctly handle the implicit negation in the term nonzero. Since the natural logic of HoTT is intuitionistic rather than classical, this is much more subtle than one would expect.
One issue with the basic commutative ring theory we developed last time is that there is a one-element ring. This is a feature of every equational theory since all identities are satisfied in the trivial algebra with one element. The problem is that equational logic doesn’t have negation so there is no way to state a nontriviality axiom like \(\mathsf{O}\not\approx\mathsf{I}\) in this context. These issues are not new and not special to HoTT but the problem is amplified since logic in HoTT is may have more truth values than just true and false. Consequently, negation in HoTT always needs a little more care.
The usual way to say \(\mathsf{O}\not\approx\mathsf{I}\) is the strongest one, namely that \(\mathsf{O}=_R \mathsf{I}\) is the empty type \(\mathbf{0},\) or: \[{\small\fbox{$\phantom{b}$nontrivial$\phantom{q}$}} :\equiv(\mathsf{O}=_R \mathsf{I}) \to \mathbf{0}.\] This is stronger than just saying that \(\mathsf{O}=_R \mathsf{I}\) isn’t inhabited and it may exclude more than just the one element ring in the absence of the law of excluded middle [§3.4].
In general, we define inequality by \[(x \neq_R y) :\equiv(x =_R y) \to \mathbf{0}.\] So we can simply write \(\mathsf{O}\neq_R \mathsf{I}\) instead of \({\small\fbox{$\phantom{b}$nontrivial$\phantom{q}$}}.\) To see this in action, let’s look at nonunits. The usual definition gives \[\operatorname{\mathsf{nonunit}}(x) :\equiv\prod_{y:R} (x \cdot y \neq_R \mathsf{I}).\] The recursion and induction rules for \(\Sigma\)-types [§1.6] show that \(\operatorname{\mathsf{nonunit}}(x)\) is equivalent to the negation \(\lnot\operatorname{\mathsf{unit}}(x) :\equiv\operatorname{\mathsf{unit}}(x) \to \mathbf{0}.\)^{1} In classical mathematics, a local ring is one where the nonunits form an ideal, which is then necessarily the unique maximal ideal of the ring. Assuming \({\small\fbox{$\phantom{b}$nontrivial$\phantom{q}$}},\) we see that \(\operatorname{\mathsf{nonunit}}(\mathsf{O})\) is inhabited. It is also easy to see that \[\prod_{x,y:R} (\operatorname{\mathsf{nonunit}}(x)+\operatorname{\mathsf{nonunit}}(y) \to \operatorname{\mathsf{nonunit}}(x \cdot y))\] is always inhabited. Thus, all that is missing is the axiom: \[{\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}} :\equiv\prod_{x,y:R} (\operatorname{\mathsf{nonunit}}(x)\times\operatorname{\mathsf{nonunit}}(y) \to \operatorname{\mathsf{nonunit}}(x+y)).\] Classically, fields are local rings where the maximal ideal has only one element \(\mathsf{O}.\) So a reasonable axiom for fields is \[{\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}} :\equiv\prod_{x:R} (\operatorname{\mathsf{nonunit}}(x) \to x =_R \mathsf{O}).\] (Note that \({\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}\) implies \({\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}}.\)) Classically, these are indeed the fields we know and love but things break down in an intuitionistic setting. If these were fields, then we should be able to conclude that nonzero elements are units. However, from \({\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}\) we can only conclude that \(x \neq_R \mathsf{O}\to \lnot\operatorname{\mathsf{nonunit}}(x)\) and the conclusion there is the double negative \(\lnot\lnot\operatorname{\mathsf{unit}}(x)\) rather than the positive statement \(\operatorname{\mathsf{unit}}(x).\) The fact is that we cannot conclude that every nonzero element is a unit from this axiom without using the law of excluded middle! A similar issue arises with local rings as defined above: the axiom \({\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}}\) does not imply the property that if \(x + y\) is a unit then at least one of \(x\) and \(y\) is a unit.
The solution to this isssue is to reverse the relation between equality and inequality. What if instead of inequality being the negation of equality we had that equality was the negation of inequality? This may sound counterintuitive at first but this is often how things work. Consider the case of two computable real numbers \(a\) and \(b,\) that is real numbers for which we have an algorithm which on input \(k\) gives us a rational approximation within \(2^{-k}\) of the real number in question. How do I determine whether \(a\) and \(b\) are equal?
So, for computable real numbers, inequality is more primitive than equality!
This leads us to apartness relations: a binary relation \(\mathrel{\rlap{\neq\,}{\,\neq}_{}}\) that satisfies the following two axioms \[\begin{gathered}
\lnot(x \mathrel{\rlap{\neq\,}{\,\neq}_{}}x) \\
x \mathrel{\rlap{\neq\,}{\,\neq}_{}}y \to x \mathrel{\rlap{\neq\,}{\,\neq}_{}}z \lor y \mathrel{\rlap{\neq\,}{\,\neq}_{}}z
\end{gathered}\] These axioms ensure that the negation of \(x \mathrel{\rlap{\neq\,}{\,\neq}_{}}y\) is an equivalence relation; a tight apartness relation is one where \[\lnot(x \mathrel{\rlap{\neq\,}{\,\neq}_{}}y) \to x = y\] and thus the negation of \(x \mathrel{\rlap{\neq\,}{\,\neq}_{}}y\) is equality. In classical logic, every apartness relation is the negation of an equivalence relation but this is not true intuitionistically. In fact, the negation of equality is not necessarily an apartness relation!
Without further ado, let us state exactly how we can define local rings using apartness. For example, a more positive way to define local rings is that \[x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} y :\equiv\operatorname{\mathsf{unit}}(x – y)\] is an apartness relation. The first apartness axiom \(\lnot(x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} x)\) actually follows from \({\small\fbox{$\phantom{b}$nontrivial$\phantom{q}$}}\) since \(\operatorname{\mathsf{unit}}(\mathsf{O}) \to (\mathsf{O}=_R \mathsf{I}).\) The second apartness axiom boils down to to the fact discussed above that if \(x+y\) is a unit then at least one of \(x\) and \(y\) is a unit. It is tempting to use the type \[\prod_{x:R} (\operatorname{\mathsf{unit}}(x+y) \to \operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y))\] to describe this but this doesn’t work because of the exclusive nature of sum types. Instead, we must use the propositional truncation \(\Vert\operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y)\Vert,\) which is how the logical disjunction \(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)\) is defined [§3.7]. Thus a strongly local ring is a a nontrivial ring that satisfies \[{\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} :\equiv\prod_{x,y:R} (\operatorname{\mathsf{unit}}(x+y) \to \operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)).\]
Before going further with these ideas, let’s practice some our intuitionistic reasoning by verifying that \[{\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}}.\] Actually, we won’t be practicing our intuitionistic reasoning, we’ll just verify our logic by thinking in terms of functions between types rather than logical implication between propositions. Although an implication is not always equivalent to its contrapositive in intuitionistic logic, we do always have that \[(P \to Q) \to (\lnot Q \to \lnot P).\] In terms of types and functions, this the special case of \[(A \to B) \to ((B \to C) \to (A \to C)),\] where \(A :\equiv{P},\) \(B :\equiv{Q},\) \(C :\equiv\mathbf{0}.\) This looks even more familiar when uncurried: \[(A \to B) \times (B \to C) \to (A \to C).\] Thus, we see that \[{\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} \to (\lnot(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)) \to \lnot\operatorname{\mathsf{unit}}(x+y)).\] Now recall that the disjunction \(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)\) is defined as the propositional truncation \(\Vert\operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y)\Vert\) [§3.7]. This means that \(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)\) is a proposition such that \[\operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y)) \to P \quad\simeq\quad (\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)) \to P\] for any proposition \(P.\)^{2} Since \(\mathbf{0}\) is a proposition, we see that \[\lnot\operatorname{\mathsf{unit}}(x) \times\lnot\operatorname{\mathsf{unit}}(y) \ \simeq\ (\operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y)) \to \mathbf{0}\ \simeq\ \lnot(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)),\] where the left equivalence is the special case \(A :\equiv{P},\) \(B :\equiv{Q},\) \(C :\equiv\mathbf{0}\) of \[(A + B) \to C \quad\simeq\quad (A \to C) \times (B \to C).\] Finally, recombining the above, we obtain \[{\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} \to (\lnot\operatorname{\mathsf{unit}}(x)\times\lnot\operatorname{\mathsf{unit}}(y) \to \lnot\operatorname{\mathsf{unit}}(x+y)),\] which is equivalent to \({\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}}\) because \(\lnot\operatorname{\mathsf{unit}}(x) \leftrightarrow\operatorname{\mathsf{nonunit}}(x).\)
Analogously to positive axiom for local rings, the positive axiom for fields is \[{\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}} :\equiv\prod_{x:R} (\operatorname{\mathsf{unit}}(x) \lor x =_R \mathsf{O}),\] which implies both \({\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}}\) and \({\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}.\) To see that \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}},\) we will make use of the distributive law \[(P_1 \lor P_2) \land Q \quad\leftrightarrow\quad (P_1 \land Q) \lor (P_2 \land Q),\] which follows from the general equivalence \[(A_1 + A_2) \times B \quad\simeq\quad A_1 \times B + A_2 \times B\] by taking propositional truncations. In particular, \[(\operatorname{\mathsf{unit}}(x) \lor x =_R \mathsf{O}) \times \operatorname{\mathsf{unit}}(x+y) \leftrightarrow\operatorname{\mathsf{unit}}(x)\times\operatorname{\mathsf{unit}}(x+y) \lor (x =_R \mathsf{O}) \times \operatorname{\mathsf{unit}}(x+y).\] Since \(\operatorname{\mathsf{unit}}(x) \times \operatorname{\mathsf{unit}}(x+y) \to \operatorname{\mathsf{unit}}(x)\) and \((x =_R \mathsf{O}) \times \operatorname{\mathsf{unit}}(x+y) \to \operatorname{\mathsf{unit}}(y),\) we see that \[(\operatorname{\mathsf{unit}}(x) \lor x =_R \mathsf{O}) \times \operatorname{\mathsf{unit}}(x+y) \to (\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y))\] and it follows that \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}}.\) To see that \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}},\) we will make use of the fact that \[(P \lor Q) \land \lnot Q \to P.\] This also follows from the distributive law above. Indeed, \[(P \lor Q) \land \lnot Q \ \leftrightarrow\ (P \land \lnot Q) \lor (Q \land \lnot Q) \ \leftrightarrow\ P \land \lnot Q\] because \(Q \land \lnot Q\) is \(\mathbf{0}.\) Note that \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}}\) is stronger than \({\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}}\times{\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}\) since it implies that \[\prod_{x,y:R} (x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} y \lor x =_R y)\] and thus that \(R\) has decidable equality [§3.4]. In other words, \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}}\) gives the equivalence \[x \neq_R y \leftrightarrow x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} y,\] which means that \(\neq_R\) is an apartness relation, whereas \({\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}}\times{\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}\) only gives that \(\mathrel{\rlap{\neq\,}{\,\neq}_{R}}\) is a tight apartness relation: \[x =_R y \leftrightarrow\lnot(x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} y).\]
The analysis above gives three plausible definitions for fields:
While each is stronger than the previous, they all have their uses and there is no choice which is always better than the others. Discrete fields behave the same as fields in classical logic since \(R\) has decidable equality. Heyting fields are preferred in constructive circles and most definitions of the field of real numbers yield Heyting fields that are not necessarily discrete [§11.2, §11.3]. While residual fields are harder to work with, they do have their uses since the quotient of a nontrivial ring by a maximal ideal is a residual field which is not necessarily Heyting.
It is common for classical equivalences to break down in intuitionistic logic and therefore many natural concepts that are classically equivalent separate in intuitionistic settings. While this can be an annoyance, it is quite natural in the context of proof-relevant mathematics. Indeed, when we prove that a theorem about fields in a proof-relevant way, we cannot merely argue that every field satisfies the conclusion, we must proceed from known properties of fields and work our way to the desired conclusion. The precise properties used in this process naturally leads to a more refined understanding of the hypotheses of the theorem.
Before closing this installment of HoTT Math, let me draw your attention to an important lesson from the above:
While the logic of HoTT is intuitionistic, knowledge of intuitionistic logic is not necessary to work in HoTT.
Thinking in terms of types and functions as we did above works very well. To illustrate once more, consider the following classical equivalences: \[\begin{aligned}
P \to \lnot Q \quad&\leftrightarrow\quad Q \to \lnot P, \\
\lnot P \to Q \quad&\leftrightarrow\quad \lnot Q \to P.
\end{aligned}\] Which are valid in intuitionistic logic? When translated into types and functions, we get that these are the special cases of \[\begin{aligned}
A \to (B \to C) \quad&\simeq\quad B \to (A \to C), \\
(A \to C) \to B \quad&\simeq\quad (B \to C) \to A,
\end{aligned}\] where \(A :\equiv{P},\) \(B :\equiv{Q},\) \(C :\equiv\mathbf{0}.\) The first classical equivalence is intuitionistically valid. After uncurrying both sides of \[A \to (B \to C) \quad\simeq\quad B \to (A \to C),\] we obtain the obvious equivalence \[A \times B \to C \quad\simeq\quad B \times A \to C.\] On the other hand, the general form of the second equivalence makes no sense at all. Taking \(C\) to be just about anything other than \(\mathbf{0}\) leads to obviously false equivalences such as \(\mathbf{1}\to A \simeq\mathbf{1}\to B\) when \(C :\equiv\mathbf{1}.\) Indeed, the second classical equivalence is not intuitionistically valid since taking \(Q :\equiv\lnot P\) leads to \[\lnot P \to \lnot P \quad\leftrightarrow\quad \lnot \lnot P \to P\] and double negation elimination is not intuitionistically valid.
The equivalence of \[\textstyle\prod_{x:A} (B(x) \to C) \quad\simeq\quad \left({\textstyle\sum_{x:A} B(x)}\right)\to C\] is the dependent type analogue of currying/uncurrying. Indeed, if instead of the dependent type \(B:A \to \mathcal{U}\) we have a simple type \(B,\) we obtain the familiar equivalence \[A \to (B \to C) \quad\simeq\quad (A \times B) \to C.\] In full generality, we have the unwieldly equivalence \[\prod_{(a,b):\sum_{x:A} B(x)} C(a,b) \quad\simeq\quad \prod_{a:A} \prod_{b:B(a)} C(a,b),\] where \(B:A \to \mathcal{U}\) and \(C:\sum_{x:A} B(x) \to \mathcal{U}\) [§2.15].
This is the standard trick to work with propositional truncations: so long as \(P\) is a proposition, \(\Vert A \Vert \to P\) is equivalent to \(A \to P.\) This is very important for handling disjunctions and existential quantifiers. Since \(P \lor Q\) "forgets" which of \(P\) and \(Q\) holds, it is not generally possible to do a case analysis. However, if \(R\) is a proposition then \(P \lor Q \to R\) is equivalent to \(P + Q \to R\) and the coproduct \(P + Q\) does allow for case analysis. This explains why in addition to \[\lnot P \land \lnot Q \quad\leftrightarrow\quad \lnot(P \lor Q),\] the following "halves" of De Morgan’s laws hold in intuitionistic logic \[\begin{aligned}
P \land Q \quad&\to\quad \lnot(\lnot P \lor \lnot Q), \\
P \lor Q \quad&\to\quad \lnot(\lnot P \land \lnot Q), \\
\lnot P \lor \lnot Q \quad&\to\quad \lnot(P \land Q),
\end{aligned}\] while their converses do not.
A commutative (unital) ring is a set \(R\) with two constants \(\mathsf{O},\mathsf{I}:1\to R\), one unary operation \({-\square}:R \to R\), two binary operations \({\square+\square},{\square\cdot\square}:R \times R \to R\) along with the usual axioms: \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(\mathsf{O},-\square,\square+\square)\), \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(\mathsf{I},{\square\cdot\square})\), \[{\small\fbox{$\phantom{b}$distributivity$\phantom{q}$}}({\square+\square},{\square\cdot\square}) : \prod_{x,y,z:R} ((x+y)\cdot z =_R x \cdot z + y \cdot z) \times \prod_{x,y,z:R} (x\cdot(y+z) =_R x \cdot y+x \cdot z),\] and \[{\small\fbox{$\phantom{b}$commutativity$\phantom{q}$}}({\square\cdot\square}) : \prod_{x,y:R} (x \cdot y =_R y \cdot x).\] Commutativity of addition can be derived as usual using the fact that \[x+(x+y)+y =_R (x+y)\cdot(\mathsf{I}+\mathsf{I}) =_R x + (y + x) + y.\] If you want to test your path manipulation skills, you can try to write down the resulting term for \[{\small\fbox{$\phantom{b}$commutativity$\phantom{q}$}}({\square+\square}) : \prod_{x,y:R} (x + y =_R y+x).\] However, as we discussed last time, this is not necessary since the proof is ultimately not relevant when the carrier of an algebra is a set.
Reading off the usual definition of unit, we obtain the type \[\operatorname{\mathsf{unit}}(x) :\equiv\sum_{y:R} (x\cdot y =_R \mathsf{I}).\] Elements of type \(\operatorname{\mathsf{unit}}(x)\) are pairs \((y,p)\), where \(y:R\) and \(p: x \cdot y =_R \mathsf{I}\). Because \(R\) is a set and inverses are unique, there is always at most one such \((y,p)\), which means that \(\operatorname{\mathsf{unit}}(x)\) is a proposition. Therefore, we can comprehend the set of units: \[R^\times :\equiv\{x:R \mid \operatorname{\mathsf{unit}}(x)\} :\equiv\sum_{x:R} \operatorname{\mathsf{unit}}(x).\] (Subset comprehension is discussed in §3.5 of the book.) Technically, elements of \(R^\times\) are triples \((x,y,p)\) where \(x,y:R\) and \(p:(x \cdot y =_R \mathsf{I})\). Thus \(R^\times\) is not merely a subset of \(R\) but each element of \(R^\times\) comes equipped with a justification for being in \(R^\times\). Since for each \(x:R\) there is at most one \((x,y,p):R^\times\), we can identify the two without much confusion but the extra coordinates are actually helpful for proving things about \(R^\times\).
To illustrate this, let’s outline the verification that \(R^\times\) is a group under multiplication. We will view elements of \(R^\times\) as pairs \((x,y)\) but we will forget the path witnessing that \(x \cdot y =_R \mathsf{I}\). Since we’re working with sets, we shouldn’t worry about paths anyway.^{1} To begin, we need to isolate the group operations:
First, we see that \((\mathsf{I},\mathsf{I}):R^\times\) since we have a path \(\mathsf{I}\cdot\mathsf{I}=_R \mathsf{I}\) by the identity axiom; this will be our group identity.
Next, we see that if \((x_0,y_0),(x_1,y_1):R^\times\) then \((x_0 \cdot x_1,y_1 \cdot y_0):R^\times\). Indeed, if \(x_0 \cdot y_0 =_R \mathsf{I}\) and \(x_1 \cdot y_1 =_R \mathsf{I}\) then \[x_0 \cdot x_1 \cdot y_1 \cdot y_0 =_R x_0 \cdot \mathsf{I}\cdot y_0 =_R x_0 \cdot y_0 =_R \mathsf{I}.\] From this, we directly extract our group multiplication \(\square\cdot\square:R^\times \times R^\times \to R^\times\).
Finally, we see that if \((x,y):R^\times\) then \((y,x):R^\times\). (This is immediate from commutativity but a slightly longer argument shows that this works even without this assumption.) As a result, we get the group inverse \(\square^{-1}:R^\times \to R^\times\).
To conclude the argument, it suffices to verify that \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(R^\times,\square^{-1},\square\cdot\square)\) is inhabited. Associativity and identity are immediate consequences of \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(R,\mathsf{I},\square\cdot\square)\). The tacit path coordinates are handy for inverses: the path \((x,y)\cdot(y,x) =_{R^\times} (\mathsf{I},\mathsf{I})\) simply consists of the two implied paths \(x \cdot y =_R \mathsf{I}\) and \(y \cdot x =_R \mathsf{I}\)!
The key difference between this proof and the classical proof that \(R^\times\) is a group is the way we used the definition of \(R^\times\). At each step of the classical proof, we need to invoke the definition of \(R^\times\) to get the relevant inverses, do the same computations, and then forget the extra information. In the proof-relevant argument, the subset \(R^\times\) incorporates its definition and the relevant inverses are already there to be used in computations. To keep the argument from involving too many (ultimately irrelevant) path computations, we still forgot one piece of the definition of \(R^\times\) in the outline above. This kind of selective unraveling can be useful since formal definitions can rapidly become unwieldy. We did end up invoking the path coordinates at the very end to verify the identity axiom. However, we didn’t really need the paths themselves, we just needed to remember that elements \((x,y)\) aren’t arbitrary pairs, they are pairs such that \(x \cdot y =_R \mathsf{I}\). This is what the seemingly irrelevant path coordinate does, the relevant data is not the path itself, it is the type \(x \cdot y =_R \mathsf{I}\) of the path that matters.
The lesson is that while proof-relevant mathematics does force you to carry extra baggage, that baggage is actually handy. Moreover, you can always manage the burden by selectively unraveling the contents of the baggage. In fact, you also carry this baggage when doing classical mathematics except that you conceal the baggage in your memory, recalling each item as you need it. The problem with this strategy is that it’s easy to forget things if you are not careful. Proof-relevant mathematics forces you to keep track of everything!
In the comments, Toby Bartels pointed out that the argument above contains a major type-theoretic faux pas: using type declarations as propositions. The cause of this is the deliberate omission of the path coordinate which leads to phrases like "we see that \((\mathsf{I},\mathsf{I}):R^\times\) since […]" and "we see that if \((x,y):R^\times\) then \((y,x):R^\times\)." The problem is that type declarations are either correct or not (and this is decidable!) but not true or false. Indeed, true and false are both values that can be used but correct and incorrect are not — there is no value in being incorrect! This may seem like a subtle difference but it is actually a very important one in type theory since there is little other separation of syntax and semantics.
Below, I produced three more variants of the above argument that avoid the trouble with Version 0 above. The first is simply does not omit the path coordinates. The second uses the membership relation \(\in\) where the original used an erroneous type declaration. The third is a direct translation of the classical proof, also using the membership relation. To facilitate comparison, I tried to keep the three versions as close as possible to the original. I am curious to know which of the four versions you prefer, or if you have yet another version that you prefer!
Version 1. Let’s outline the verification that \(R^\times\) is a group under multiplication. To begin, we need to isolate the group operations:
First, \((\mathsf{I},\mathsf{I},i):R^\times\) is our group identity where \(i:\mathsf{I}\cdot\mathsf{I}=_R \mathsf{I}\) is an instance of the identity axiom.
Next, the group product is defined by \((x_0,y_0,p_0)\cdot(x_1,y_1,p_1) := (x_0 \cdot x_1,y_1 \cdot y_0,q)\), where \[q:x_0 \cdot x_1 \cdot y_1 \cdot y_0 =_R x_0 \cdot \mathsf{I}\cdot y_0 =_R x_0 \cdot y_0 =_R \mathsf{I}.\]
Finally, group inverses are defined by \((x,y,p)^{-1} := (y,x,q)\), where \(q:y \cdot x =_R x \cdot y =_R \mathsf{I}.\) (A slightly longer argument shows that this works even without the commutativity assumption.)
To conclude the argument, it suffices to verify that \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(R^\times,\square^{-1},\square\cdot\square)\) is inhabited. Associativity and identity are immediate consequences of \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(R,\mathsf{I},\square\cdot\square)\). The path \((x,y,p)\cdot(y,x,q) =_{R^\times} (\mathsf{I},\mathsf{I},i)\) is composed of \(p:x \cdot y =_R \mathsf{I}\), \(q:y \cdot x =_R \mathsf{I}\), and the paths must match since \(R\) is a set.
Version 2. Let’s outline the verification that \(R^\times\) is a group under multiplication. By virtue of univalence, there are multiple ways to look at \(R^\times\). Instead of viewing it as a subset of \(R\) we can also view it as a subset of \(R \times R\), namely \[R^\times = \{(x,y):R \times R \mid x \cdot y =_R \mathsf{I}\} = \sum_{x,y:R} (x \cdot y =_R \mathsf{I}).\] Thus \((x,y) \in_{R \times R} R^\times\) denotes the proposition \(x \cdot y =_R \mathsf{I}.\) It turns out that this view of \(R^\times\) will be easier to work with. To begin, we need to isolate the group operations:
First, we see that \((\mathsf{I},\mathsf{I}) \in_{R\times R} R^\times\) since we have a path \(\mathsf{I}\cdot\mathsf{I}=_R \mathsf{I}\) by the identity axiom; this will be our group identity.
Next, we see that if \((x_0,y_0),(x_1,y_1) \in_{R \times R} R^\times\) then \((x_0 \cdot x_1,y_1 \cdot y_0) \in_{R \times R} R^\times\) too. Indeed, if \(x_0 \cdot y_0 =_R \mathsf{I}\) and \(x_1 \cdot y_1 =_R \mathsf{I}\) then \[x_0 \cdot x_1 \cdot y_1 \cdot y_0 =_R x_0 \cdot \mathsf{I}\cdot y_0 =_R x_0 \cdot y_0 =_R \mathsf{I}.\] From this, we directly extract our group multiplication \(\square\cdot\square:R^\times \times R^\times \to R^\times\).
Finally, we see that if \((x,y) \in_{R \times R} R^\times\) then \((y,x) \in_{R \times R} R^\times\) since \(x \cdot y =_R \mathsf{I}\) implies \(y \cdot x =_R \mathsf{I}\) by commutativity. (A slightly longer argument shows that this works even for non-commutative rings.) As a result, we get the group inverse \(\square^{-1}:R^\times \to R^\times\).
To conclude the argument, it suffices to verify that \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(R^\times,\square^{-1},\square\cdot\square)\) is inhabited. Associativity and identity are immediate consequences of \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(R,\mathsf{I},\square\cdot\square)\). The path \((x,y)\cdot(y,x) =_{R^\times} (\mathsf{I},\mathsf{I})\) is composed of \(x \cdot y =_R \mathsf{I}\) and \(y \cdot x =_R \mathsf{I}\), which hold because \((x,y), (y,x) \in_{R \times R} R^\times\).
Version 3. Let’s outline the verification that \(R^\times\) is a group under multiplication. Per definition of \(R^\times\), we will write \(x \in_R R^\times\) to mean \(\mathsf{unit}(x)\). To begin, we need to isolate the group operations:
First, we see that \(\mathsf{I}\in_{R} R^\times\) since \(\mathsf{I}\cdot\mathsf{I}=_R \mathsf{I}\) by the identity axiom; this will be our group identity.
Next, we see that if \(x_0,x_1 \in_{R} R^\times\) then \(x_0 \cdot x_1 \in_{R} R^\times\) too. Indeed, since \(\mathsf{unit}(x_0)\) and \(\mathsf{unit}(x_1)\) there are \(y_0,y_1:R\) such that \(x_0 \cdot y_0 =_R \mathsf{I}\) and \(x_1 \cdot y_1 =_R \mathsf{I}.\) Then \[x_0 \cdot x_1 \cdot y_1 \cdot y_0 =_R x_0 \cdot \mathsf{I}\cdot y_0 =_R x_0 \cdot y_0 =_R \mathsf{I}\] which shows that \(\mathsf{unit}(x_0 \cdot x_1).\) Thus our group multiplication is (the restriction of) the usual ring multiplication.
Finally, we see that if \(x \in_{R} R^\times\) then, by definition of the proposition \(\mathsf{unit}(u)\), there is a unique \(y:R\) such that \(x \cdot y =_R \mathsf{I}\). By commutativity, \(y \cdot x =_R \mathsf{I}\) which means that \(y \in_{R} R^\times\) and this \(y\) is the group inverse of \(x\).
To conclude the argument, it suffices to verify that \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(R^\times,\square^{-1},\square\cdot\square)\) is inhabited. Associativity and identity are immediate consequences of \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(R,\mathsf{I},\square\cdot\square)\). The remaining identity \(x \cdot x^{-1} =_{R^\times} \mathsf{I}\) follows immediately from the definition of group inverses.
]]>A signature is a sequence \(\sigma\) of sets indexed by natural numbers. The elements of the set \(\sigma_n\) are intended to be symbols for \(n\)-ary operations (\(0\)-ary operations are simply constants). For example, the signature for rings can be thought as \[(\{0,1\},\{-\},\{+,\cdot\},\varnothing,\varnothing,\dots).\] It is generally assumed that the sets \(\sigma_n\) are mutually disjoint so that each symbol has a definite arity. In the end, only the cardinality of the sets \(\sigma_n\) matters since the particular symbols used for the operations is mostly a matter of taste.
The logic of universal algebra is equational logic. The symbols from the signature \(\sigma\) can be strung together to form terms. In addition to the symbols from \(\sigma\), we need an infinite set \(\{x_0,x_1,x_2,\dots\}\) of variable symbols. Formally, terms are defined inductively using the two rules:
Equational logic deals with identities of the form \(t \approx s\) where \(t\) and \(s\) are terms. The rules for equational logic are reflexivity, symmetry, transitivity \[\frac{}{t \approx t}, \quad \frac{t \approx s}{s \approx t}, \quad \frac{t \approx s, s \approx r}{t \approx r}\] together with substitution and replacement \[\frac{t \approx s}{t[x/r] \approx s[x/r]}, \quad \frac{s \approx r}{t[x/r] \approx t[x/s]},\] where \([x/r]\) denotes the act of replacing every occurrence of the variable symbol \(x\) with the term \(r\). We write \(\Gamma \vdash t \approx s\) to signify that the identity \(t \approx s\) follows using a combination of these rules from the identities in \(\Gamma\).
An algebra \(\mathfrak{A}\) (with signature \(\sigma\)) is a type \(A\) together with an interpretation \[I:\left({\textstyle\sum_{n:\mathbb{N}} \sigma_n \times A^n}\right) \to A.\] If \(f\) is a symbol for an \(n\)-ary operation in \(\sigma_n\), then the interpretation \(f^{\mathfrak{A}}:A^n \to A\) is \[f^{\mathfrak{A}}(a_1,\dots,a_n) := I(f,a_1,\dots,a_n).\] These interpretations can be used to evaluate any term starting from a variable assignment \(\mathcal{V}:\mathbb{N}\to A\) in the usual recursive manner: \[x_i^{\mathfrak{A},\mathcal{V}} := \mathcal{V}(i), \quad (f t_1 \dots t_k)^{\mathfrak{A},\mathcal{V}} := f^{\mathfrak{A}}(t_1^{\mathfrak{A},\mathcal{V}},\dots,t_k^{\mathfrak{A},\mathcal{V}}).\] The satisfaction relation for the algebra \(\mathfrak{A}\) and the identity \(t \approx s\) is then the type \[(\mathfrak{A} \vDash t \approx s) :\equiv \prod_{\mathcal{V}:\mathbb{N}\to A} (t^{\mathfrak{A},\mathcal{V}} =_A s^{\mathfrak{A},\mathcal{V}}).\] If the carrier \(A\) is a set then each identity type \(t^{\mathfrak{A},\mathcal{V}} =_A s^{\mathfrak{A},\mathcal{V}}\) has at most one element and hence \(\mathfrak{A} \vDash t \approx s\) is a proposition with the usual meaning. In the general case, \(\mathfrak{A} \vDash t \approx s\) can carry substantial path information.
The fact that equational logic is sound and complete for the semantics is called Birkhoff’s Theorem: \[\Gamma \vdash t \approx s \quad\text{if and only if}\quad \Gamma \vDash t \approx s.\] Clasically, the right hand side means that every algebra \(\mathfrak{A}\) with signature \(\sigma\), whose carrier \(A\) is a set, that satisfies all equations in \(\Gamma\) also satisfies \(t \approx s\). In HoTT, this translates to the type \[(\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s) :\equiv \prod_{\mathfrak{A}/\operatorname{\mathsf{set}}} \left(\prod_{u \approx v:\Gamma} (\mathfrak{A} \vDash u \approx v) \to (\mathfrak{A} \vDash t \approx s)\right)\] where the outer product is over all set-based algebras \(\mathfrak{A}\) with signature \(\sigma\). An alternate interpretation is \[(\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s) :\equiv \prod_{\mathfrak{A}/\operatorname{\mathsf{any}}} \left(\prod_{u \approx v:\Gamma} (\mathfrak{A} \vDash u \approx v) \to (\mathfrak{A} \vDash t \approx s)\right)\] where the outer product is over algebras with signature \(\sigma\) based on any type. Consequently there are two possible ways to interpret Birkhoff’s Theorem in HoTT!
The first is closest to the classical version:
Weak Birkhoff Theorem. \(\left\Vert\Gamma \vdash t \approx s\right\Vert\) is equivalent to \(\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s\).
The vertical lines \(\Vert\square\Vert\) indicate that one should take the propositional truncation of \(\Gamma \vdash t \approx s\) (§3.7). Indeed, when translated into HoTT, \(\Gamma \vdash t \approx s\) is the type of all proofs of \(t \approx s\) from \(\Gamma\) in equational logic. The propositional truncation agglomerates all these proofs into one and simply asserts the unqualified existence of such a proof. This is necessary since the right hand side \(\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s\) is a proposition so the backward implication, without propositional truncation, would amount to a canonical choice of proof with basically no information.
On the other hand, \(\Gamma \vDash_{\text{any}} t \approx s\) is much more expressive because of all the possible identity paths. With all that information on hand, it is not so unlikely that we could reconstruct a proof. This suggests another interpretation of Birkhoff’s Theorem:
Strong Birkhoff Theorem. \(\Gamma \vdash t \approx s\) is equivalent to \(\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s\).
As I stated it just now, this is unlikely to be true. However, I suspect it becomes true with some tweaking of \(\Gamma \vdash t \approx s\) to only allow proofs that are normalized in some sense or to formally identify proofs that are trivial variations of each other. I haven’t had much time to think about this so I’ll leave it as an open conjecture for now and hope that I can come back to it later…
The soundness part (forward implication) of both forms are true. Last time, I emphasized the usefulness of the weak soundness theorem but I forgot to talk about substitution and replacement! We did talk about how paths could be reversed and composed to get symmetry and transitivity; details are in §2.1 of the book. Since I think it’s important to know what goes on behind the scenes, let’s talk about the remaining two rules. Unfortunately, and this explains the earlier omission, neither is particularly pleasant.
Substitution works exactly as you expect. An element \(p:(\mathfrak{A} \vDash t \approx s)\) is a function that maps each variable assignment \(\mathcal{V}:\mathbb{N}\to A\) to a path \(p(\mathcal{V}):t^{\mathfrak{A},\mathcal{V}} =_A s^{\mathfrak{A},\mathcal{V}}\). Given a term \(r\) and a variable \(x\), the corresponding element of \((\mathfrak{A} \vDash t[x/r] \approx s[x/r])\) is the composition of \(p\) with the function that sends each variable assignment \(\mathcal{V}:\mathbb{N}\to A\) to the assignment where all values stay the same except that the value of \(x\) is changed to \(r^{\mathfrak{A},\mathcal{V}}\). Verifying that the result really is an element of \(\mathfrak{A} \vDash t[x/r] \approx s[x/r]\) is a tedious computation which is best performed under a rug.
The trick behind replacement is revealed in §2.2: functions are functors. The identity types give each type a higher groupoid structure and functions act functorially in the sense that with any \(f:A \to B\) comes with more functions \(\mathsf{ap}_f\) which translate paths \(p:x =_A y\) into paths \(\mathsf{ap}_f(p):f(x) =_B f(y)\). The details for soundness of the replacement rule are gory but they are essentially the same as in the classical proof. There is some work needed to single out the variable \(x\) and interpret the term \(t\) as a function \(t^{\mathfrak{A},\mathcal{V}}_x:A \to A\) where all other variables are fixed. Then, \(\mathsf{ap}_{t^{\mathfrak{A},\mathcal{V}}_x}\) can be used to transform paths \(s^{\mathfrak{A},\mathcal{V}} =_A r^{\mathfrak{A},\mathcal{V}}\) into paths \(t^{\mathfrak{A},\mathcal{V}}_x(s^{\mathfrak{A},\mathcal{V}}) =_A t^{\mathfrak{A},\mathcal{V}}_x(r^{\mathfrak{A},\mathcal{V}})\). Thus, we obtain \[(\mathfrak{A} \vDash s \approx r) \to (\mathfrak{A} \vDash t[x/s] \approx t[x/r]).\]
Since each rule of equational logic is sound, we therefore have the strong soundness \((\Gamma \vdash t \approx s) \to (\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s)\). To get weak soundness, we compose this with the obvious implication \((\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s) \to (\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s)\) and we conclude that \(\left\Vert\Gamma \vdash t \approx s\right\Vert \to (\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s)\) from the fact that \(\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s\) is a proposition. The completenesss part of Birkhoff’s theorem uses syntactic algebras and HoTT has a really neat way to construct these that we will talk about in a later HoTT Math post.
In conclusion, equational logic works just fine in HoTT. In general, because of proof relevance, it is best to keep track of the equational proofs in order to keep track of the identity paths. The strong soundness map \[(\Gamma \vdash t \approx s) \to (\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s)\] guarantees that equational proofs will lead to identities, but different proofs can lead to different identity paths! When the carrier is a set, less care is necessary, the weak soundness map \[\Vert\Gamma \vdash t \approx s\Vert \to (\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s)\] shows that the mere existence of an equational proof leads to the desired identities. This is exactly how algebra and equational logic are used classically.
]]>A group is a set \(G\) equipped with a constant \(e:1 \to G\), a unary operation \(\square^{-1}:G \to G\) and a binary operation \({\square\cdot\square}:G \times G \to G\) that satisfy the usual axioms. In HoTT the group axioms must be motivated, so the group \(G\) comes with three more components: \[\begin{aligned}
{\small\fbox{$\phantom{Q}$associativity$\phantom{Q}$}} &: \prod_{x,y,z:G} (x\cdot(y\cdot z) =_G (x\cdot y)\cdot z) \\
{\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}} &: \prod_{x:G} (x \cdot e =_G x) \times \prod_{x:G}(e \cdot x = x)\\
{\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}} &: \prod_{x:G} (e = x^{-1} \cdot x) \times \prod_{x:G} (e =_G x \cdot x^{-1}) \\
\end{aligned}\] The axioms are concrete objects in proof-relevant mathematics! The last two axioms are actually the conjunction of two axioms since conjunction in HoTT correspond to products. It’s conceptually convenient to package them together and use \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_1\) and \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_2\) for the two conjuncts obtained by applying the projections. In fact, it makes sense to pack all of these into one \[{\small\fbox{$\phantom{Q}$group$\phantom{Q}$}} :\equiv{\small\fbox{$\phantom{Q}$associativity$\phantom{Q}$}}\times{\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}\times{\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}}\] (but, for the sake of clarity, it is best to refrain from using things like \({\small\fbox{$\phantom{Q}$group$\phantom{Q}$}}_1\) for \({\small\fbox{$\phantom{Q}$associativity$\phantom{Q}$}}\)).
The axioms types above are parametrized by \(G:\mathsf{Set}\) and the three components \(e\), \(\square^{-1}\), \(\square\cdot\square\), so one can form the type of all groups: \[\mathsf{Grp}:\equiv\sum_{\substack{G:Set\\e:1\to G\\\square^{-1}:G\to G\\\square\cdot\square:G\times G \to G}} {\small\fbox{$\phantom{Q}$group$\phantom{Q}$}}(G,e,\square^{-1},\square\cdot\square).\] This level of formalism is cumbersome and since it is perfectly clear what the type \(\mathsf{Grp}\) is from the narrative description, it is best to avoid it. The only difference from classical mathematics is that the axioms are given as concrete objects rather than abstract statements.
Familiar identities, for example the fact that \(e\) is its own inverse, can be obtained by combining the group axioms. We have a path \({\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}}_1(e):e =_G e^{-1}\cdot e\) and another path \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_1(e^{-1}):e^{-1}\cdot e =_G e^{-1}\). Concatenating these two paths yields the desired path \[{\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}}_1(e)\cdot{\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_1(e^{-1}):e^{-1} =_G e.\]
There are other ways to see this, for example, symmetric reasoning yields \[{\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}}_2(e)\cdot{\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_2(e^{-1}):e^{-1} =_G e.\] In general, these two paths are need not be the same but since \(G\) is a set, i.e., a 0-type, there is at most one path in \(e^{-1} =_G e\) and therefore the two paths above must be the same.
Let’s try something a tad more complicated — an old favorite — the uniqueness of identity elements. To say that \(u:G\) is a left identity element means exhibiting an element of the type \[\prod_{x:G} (u \cdot x =_G x);\] so \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_1\) shows that \(e\) is an identity element. Similarly, \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_2\) shows that \(e\) is a right identity element. Classically, we know that if \(u\) is a left identity and if \(v\) is a right identity then we must have \(u = v\). In HoTT, this corresponds to exhibiting an element of type \[\mathsf{lrid}:\equiv\prod_{u,v:G} \Big(\prod_{x:G} (u \cdot x =_G x) \times \prod_{y:G} (y \cdot v =_G y) \to (u =_G v)\Big).\]
To prove this, let’s fix \(u,v:G\). Given \(p:\prod_{x:G} (u \cdot x =_G x)\) and \(q:\prod_{y:G} (y \cdot v =_G y)\) we have paths \[\begin{aligned} p(v) &: u \cdot v =_G v, & q(u)&:u \cdot v =_G u,\end{aligned}\] and therefore a path \[q(u)^{-1} \cdot p(v): u =_G v.\] The classical proof would end here but to get the desired element of \(\mathsf{lrid}\), we need to wrap things up as follows. Since the final path was obtained uniformly from \(p,q\), we have an element \[\lambda p,q.q(u)^{-1} \cdot p(v):\prod_{x:G} (u \cdot x =_G x) \times \prod_{y:G} (y \cdot v =_G y) \to (u =_G v)\] and we can then unfix \(u,v:G\) to obtain the desired element \[\lambda u,v.\lambda p,q . q(u)^{-1} \cdot p(v):\mathsf{lrid}.\]
I slightly abused \(\lambda\)-notation in the argument above but the idea was only to give a hint how the argument could be formalized. In fact, we did not need to worry about this at all. Since \(G\) is a set, there is at most one path in \(u =_G v\) and thus there is also at most one element in \[\prod_{x:G} (u \cdot x =_G x) \times \prod_{y:G} (y \cdot v =_G y) \to (u =_G v)\] by function extensionality (§4.9). Therefore, the unique choice principle (§3.9) applies to give the desired element of \(\mathsf{lrid}\). In fact, \(\mathsf{lrid}\) also has exactly one element by function extensionality.
In the end, the classical proof of uniqueness of inverses was sufficient and unambiguous. In fact, the same is true for essentially all similar facts of elementary algebra. This is good for multiple reasons but most importantly this means that doing math in HoTT does not involve getting caught up in elementary stuff like this: you can invoke \(\mathsf{lrid}\) any time without referencing a particular proof since the proof is unique. There is one important caveat:
It is very important that the carrier type of a group is a set!
I fell into this trap when I asked this MathOverflow question. It is very tempting to think that the loop space \(\Omega(A,x) :\equiv(x =_A x)\) is a group for every \(x:A\). This is only true if \(A\) is a 1-type. Otherwise the carrier of \(\Omega(A,x)\) is not necessarily a 0-type, the uniqueness of paths is lost, \(\mathsf{lrid}\) can have many different proofs, which are all relevant and must not be forgotten!
In conclusion, elementary group theory works fine in HoTT. The general feel is a bit different but it’s more fun than cumbersome to see how the axioms work in relevant mathematics. The same is true for much of elementary algebra and, ultimately, proof relevance is never cumbersome since proofs are almost always unique. It is important to remember that the carrier of an algebraic structure must always be a set for this to work!
In the next installment of HoTT Math, we will continue with elementary field theory which presents an additional difficulty since we must handle the fact that multiplicative inverses do not always exist…
]]>This preamble will serve to accumulate a table of contents and various conventions and notations that come up along the way. The only prerequisites (or rather corequisites) are the first two chapters of the (free) Homotopy Type Theory book. Further prerequisites and reverences to later topics in the book will always be indicated where they occur.
We did not set out to write a book. The present work has its origins in our collective attempts to develop a new style of "informal type theory" that can be read and understood by a human being, as a complement to a formal proof that can be checked by a machine. Univalent foundations is closely tied to the idea of a foundation of mathematics that can be implemented in a computer proof assistant. Although such a formalization is not part of this book, much of the material presented here was actually done first in the fully formalized setting inside a proof assistant, and only later "unformalized" to arrive at the presentation you find before you — a remarkable inversion of the usual state of affairs in formalized mathematics.
The danger in writing such a book is to fall into the minefields of logic wars. The authors successfully avoided much of these traps, so logicians from other perspectives can read the book without too much cringing. To avoid unnecessary confusion, I recommend mentally substituting most instances of "set theory" by the more apropos "classical mathematics." Readers from strongly opposing points of view should be prepared for a certain amount of propaganda, which is to be expected in a book written to promote one point of view. Barring these caveats, you will find an enjoyable and well-written book on a very interesting subject. Readers should not be too concerned with the word "homotopy" in the title, homotopy theory is not required background for the book though some basic knowledge of the ideas of topology and homotopy theory helps to understand the motivation behind certain concepts.
Having addressed all the necessary caveats, let’s talk about why this book is interesting and why you should read it…
The most interesting aspect from my point of view is that HoTT fully supports proof-relevant mathematics, a way of doing mathematics where proofs are real objects that are manipulated on the same level as numbers, sets, functions and all the usual objects of classical mathematics. This is not a brand new idea, logicians have been playing with proofs in this way for a long while, but HoTT brings this idea into the realm of everyday mathematics and that is a major step forward in mathematics.
The key difference with first-order logic is that equality is not primitive. To define a type \(A\) one must also define what equality means for \(A\). Formally if \(x,y:A\) then \(x =_A y\) (or \(\mathsf{Id}_A(x,y)\)) is another type, an identity type, which can be thought of as made of reasons to identify \(x\) and \(y\). Elements \(p:x =_A y\) are often called "paths" by analogy with topolgy. Indeed, these paths can be inverted and concatenated to realize symmetry and transitivity of equality, respectively; reflexivity is realized by a path \(\mathsf{refl}_x:x =_A x\). Thus each type \(A\) is actually a groupoid rather than a plain set. In fact, since each \(x =_A y\) is itself a type with its own identity types and so forth, the type \(A\) is actually a kind of \(\infty\)-groupoid.
It is this rich structure associated with each type is what permits HoTT to support proof relevant mathematics. To get a basic feel of how this works, the statement "\(x =_A y\) and \(y =_A z\)" is interpreted via the product type \((x =_A y)\times(y =_A z)\), whose elements are pairs of paths that explain why \(x\) is to be identified with \(y\) and why \(y\) is to be identified with \(z\). Similarly, "\(x =_A y\) or \(y =_A z\)" is interpreted via the coproduct type \((x =_A y) + (y =_A z)\), whose elements are either paths that explain why \(x\) is to be identified with \(y\) or paths that explain why \(y\) is to be identified with \(z\). The catch, as you may have guessed from the last example, is that this form of constructive reasoning is intuitionistic and thus not as familiar to mathematicians.
Interestingly, the learning curve for constructive reasoning appears to be much less steep with HoTT than with other constructive frameworks. One of the reasons is that the topological interpretation of the key concepts is very intuitive but more significantly HoTT provides many tools to revert to more familiar territory. The analogue of a plain set in HoTT is a \(0\)-type: a type \(A\) where the identity types \(x =_A y\) always contain at most one path. In other words, these are types where the groupoid structure is trivial and contains no other information than how to handle equality of elements. It is consistent with HoTT that the \(0\)-types form a model of ETCS, a classical theory of sets and functions. Thus, by "truncating" thoughts to \(0\)-types, one can revert to a more familiar classical setting.
It is natural to identify things that are not significantly different. For example, the axiom of extensionality in set theory identifies sets that have the same elements since the elements of a set are all that matter in this context. Extensionality for functions identifies functions that agree on all inputs. Univalence is an indiscernibility axiom in the same spirit: it identifies types that are not significantly different.
To make sense of equality for types, we first need to put them in an ambient type, a universe, with its associated identity types. We can’t have a truly universal type since that directly leads to the usual paradoxes of self-reference. Instead, we have a bunch of universes such that each type belongs to some universe and each universe is closed under the basic type formation rules. Once we have a universe \(\mathcal{U}\) we can talk about equality of types in \(\mathcal{U}\), and because \(\mathcal{U}\) is a type we have a lot of freedom in defining what equality means for types in \(\mathcal{U}\).
This is exactly what the univalence axiom does. It can be stated elegantly: \[(A \simeq B) \simeq (A =_\mathcal{U} B).\] The equivalence relation \({\simeq}\) is similar to isomorphism but it is slightly more permissive. To say \(A \simeq B\) requires the existence of a function \(f:A \to B\) together with \(\ell,r:B \to A\) such that \(\ell \circ f\) is homotopic to \(\mathsf{id}_A\) and \(f \circ r\) is homotopic to \(\mathsf{id}_B\). Given two functions \(f\) and \(g\) of the same dependent product type \(\prod_{x:A} B(x)\), a homotopy from \(f\) to \(g\) is an element of \(\prod_{x:A} (f(x) =_{B(x)} g(x))\). So \(f\) and \(g\) are homotopic if they agree on all inputs, which does not mean that \(f = g\) in the absence of function extensionality.
In general type theory, \(A \simeq B\) is definitely a good way to say "\(A\) and \(B\) are not siginificantly different" and thus univalence arguably captures the right indiscernibility axiom for type theory. The surprising fact is that such a strong axiom does not appear to collapse more than it should. The benefits of univalence are interesting and need to be explored further.
I still need to digest the book so this is probably only the first of many posts on HoTT. The next posts will undoubtedly wander deeper into the technical levels of HoTT. There are a few interesting leads but nothing definite yet.
There is only one thing that bugs me right now, which is the way universes are handled in the book. However, since these assumptions do not appear to be crucial for the development of HoTT and there are plenty of alternatives out there, I’m not overly concerned about this at the moment.
I will eventually need to talk about higher inductive types. These are really interesting and I’m happy to see that the book devotes an entire chapter to them. This is a very interesting outgrowth of this project and which deserves study even independently of HoTT.
]]>