$$\gdef\d\partial$$
Today is our first encounter with a non-trivial theorem, the Sard theorem.
Theorem(Sard)Given a smooth map $F: M \to N$, the set of critical values of $F$ is a measure zero set in $N$.
We recall the definitions
Note: we don't have a canonical measure on $N$, but we have a class smooth density on $N$, which can be written as a smooth positive function times the Lebesgue measure induced by the coordinates.
It suffices to prove a local version of the Sard theorem, in a coordinate chart.
Theorem (Sard on $\R^n$) Let $U \subset \R^n$ be an open set, $F: U \to \R^m$ be a smooth map. Assume $n \geq m$. Then the discriminant set $\Delta_F$ is of measure zero.
Remark: We don't say anything about the measure of $Cr(F)$, in fact, it can be quite large. For example, consider a non-negative function $f: \R \to \R$, where $f$ vanishes on $[-1,1]$, then $[-1,1] \subset Cr(f)$, but $f([-1,1])$ is just a single point, $0$.
(I am copying the proof from Nicholascu's note, page 30, which was originally due to Milnor and Pontryagin.)
We denote $Cr^k(F) \subset Cr(F)$ denote the subset of points in $U$ such that all partial derivatives of $F$ up to order $k$ vanishes (check: this notion does not depends on the choice of coordinates). We obtain a descreasing filtration of closed sets $$ Cr(F) \supset Cr^1(F) \supset Cr^2(F) \supset Cr^3(F) \supset \cdots $$
Note: A point $p \in Cr(F)$ means $dF(p)$ is not surjective, a point $p \in Cr^1(F)$ means $dF(p)=0$.
We prove by induction. The case $n=0$ is trivial. We assume the case is true for any $n' < n$ and any $m \leq n'$. The inductive step is divided into 3 steps
Step 1: Set $Cr'_F = Cr_F - Cr_F^1$. We will show that there exists a countable open cover $\gdef\cO{\mathcal O} \{\cO_j\}_{j=1}^\infty$ of $Cr'_F$, such that $F(\cO_j \cap Cr'_F)$ is neglibile for all $j$. Since $Cr'_F$ is contained in a second countable space $\R^n$, every open cover has a countable refinement, hence suffice to prove that for each $u \in Cr'_F$, there is a neighborhood $\gdef\cN{\mathcal N} \cN$, such that $F(\cN \cap Cr'_F)$ is negligible.
Suppose $p \in Cr'_F$, since $dF(p)$ is not identifically zero, we may choose coordinate chart $(U, (x_i))$ centered around $p$ and $(V, (y_j))$ centered around $F(p)$, such that in this coordinate $F(x_1,\cdots,x_n) = (F_1(x), \cdots, F_m(x))$, and $$ F_1(x) = x_1. $$ ( Why we can have such coordinates? We can choose coordinate $(V, (y_j))$ first, and consider $y_j \circ F$ on $F^{-1}(V)$ for all $j$, there exists at least one $j$, such that $d(y_j\circ F)(p) \neq 0$, otherwise $dF(p)=0$. Then, wlog, assume $j=1$, and define $x_1 = y_1 \circ F$. )
Next, for every $t \in \R$, set $$\cN_t = \{ x \in \cN \mid x_1 = t\} $$ and define $$ G_t: \cN_t \to \R^{m-1}, p \mapsto (F_2(p), \cdots, F_m(p)) $$ Observe that $$ F(\cN \cap Cr'_F) = \bigcup_{t} \{t\} \times G_t(Cr_{G_t}) $$ By the induction hypothesis, we have the statement when the source dimension is $n-1$. Hence the (m-1) Lebesgue measure of $G_t(Cr_{G_t})$ is zero. By Fubini theorem $$ \mu_{m} (F(\cN \cap Cr'_F)) = \int \mu_{m-1}( G_t(Cr_{G_t})) dt = 0. $$
Step 2: Set $Cr^{(k)}_F := Cr^k_F - Cr^{k+1}_F$, and suppose $p \in Cr^{(k)}_F$. We may choose local coordinate $s_1, \cdots, s_n$ around $p$ and $y_1, \cdots, y_m$ around $F(p)$, such that $s_i(p)=0$ for all $i$ and $y_j(F(p))=0$ for all $j$. And furthermore, we assume that $$ \frac{\d^{k+1} y_1}{\d s_1^{k+1}}(p) \neq 0 $$ Then, we define $x_1 = \frac{\d^{k} y_1}{\d s_1^{k}}$ and $x_2=s_2,\cdots,x_n=s_n$. We choose $\cN$ to be a small enough neighborhood around $p$, such that $(x_i)$ forms a coordinate. Then $Cr^{k}_F \cap \cN$ is contained in the hyperplane $x_1=0$. (indeed, if $x_1 \neq 0$, then one $k$-th derivative of $F$ is non-zero, hence the point is not in $Cr^k_F$ by definition).
Define $$ G: \cN \cap \{x_1=0\} \to \R^m, \quad G(p) = F(p) \forall p\cN \cap \{x_1=0\} $$ Then $$ Cr^k_F \cap \cN = Cr_G^k, \quad F(Cr^k_F \cap \cN ) = G(Cr_G^k) $$ By induction hypothesis, $G(Cr_G^k)$ is negligible in $\R^m$, hence $F(Cr^k_F \cap \cN)$ is neglibile. By covering $Cr^{(k)}_F$ by such open cover, and take countable refinement, we can conclude that $F(Cr^{(k)}_F)$ is negligible.
Step 3 (the key step): Suppose $k > n / m - 1$. We will show that $F(Cr^k_F)$ is neglibile. More precisely, for every compact subset $S \subset U$, we will show that $F(S \cap Cr^k_F)$ is negligible.
From Taylor expansion around points in $S \cap Cr^k_F$, we know there exists $0< r_0 < 1$ and $\lambda_0>0$, depending only on $S$, such that if $C$ is a cube with sides $r < r_0$ and intersects $Cr^k_F \cap S$. Then $$ diam(F(C)) < \lambda_0 r^{k+1} $$ where for any set $A \subset \R^m$, the diameter is defined as $$diam(A) = sup \{|a_1 - a_2|, a_1, a_2 \in A \}.$$
( Recall the Taylor expansion formula, if $f: \R^n \to \R$ is a smooth function, then for any $k \geq 0$, there exists $r>0$, and $C > 0$, such that for any $|x|<r$, we have $$ f(x_1, \cdots, x_n) = f(0) + \sum_{1 \leq |\alpha| \leq k } f^{(\alpha)}\frac{x_1^{\alpha_1}\cdots x_n^{\alpha_n}}{\alpha_1! \cdots \alpha_n!} + R_k(x)$$ where the remainder $| R_k(x) | < C |x|^{k+1}$. Apply this remainder estimate to each function $F_i$, one can get the diameter)
Hence, the Lebesgue measure of the image is $$ \mu_m(F(C)) < C_1 r^{m(k+1)} = C_1 \mu_n(C)^{m(k+1)/n} $$
Now, we cover $Cr^k_F \cap S$ by finitely many cubes $\{C_l\}_{l=1}^N$, of edges $r< r_0$, with disjoint interiors. For each positive integer $P$, we may subdivide each $C_l$ into $P^n$ many subcubes of equal sizes. For every sub-cube $C_l^\sigma$ that intersects $Cr^k_F$, we have $$ \mu_m(F(C_l^\sigma)) \leq C_1 \mu_n(C_l^\sigma)^{m(k+1)/n} = \frac{C_1}{P^{m(k+1)}}\mu_n(C_l) $$ Hence $$\mu_m(F(C_l \cap Cr^k_F)) = \sum_{\sigma} \mu_m(F(C_l^\sigma \cap Cr^k_F)) \leq P^{n-m(k+1)} \mu_n(C). $$ Now, we may send $P$ to $\infty$, and conclude that $\mu_m(F(C_l \cap Cr^k_F)) = 0$.