Quadrature

The word quadrature comes from Latin quadratura — literally squaring, the classical geometric problem of constructing a square of equal area to a given figure. Mathematicians later borrowed the term for computing the area under a curve, and today it stands more broadly for computing any definite integral. This chapter is concerned with high-dimensional quadrature: extending the definite integral of one variable into multiple dimensions, integrating scalar fields over 2D regions and 3D solids.

Double and triple integrals

The 1D definite integral $\int_a^b f(x) \, dx$ accumulates a 1D scalar field over an interval $[a, b]$ . In two variables, we accumulate over a 2D region — a rectangle $D = [a, b] \times [c, d]$ — and call the result the double integral of $f$ over $D$ .

For a scalar field $f : D = [a, b] \times [c, d] \to \mathbb{R}$ on a rectangle, the double integral of $f$ over $D$ is the iterated integral

\iint_D f(x, y) \, dx \, dy = \int_c^d \int_a^b f(x, y) \, dx \, dy = \int_c^d \left( \int_a^b f(x, y) \, dx \right) dy

equivalently obtained by integrating in the opposite order:

\iint_D f(x, y) \, dy \, dx = \int_a^b \int_c^d f(x, y) \, dy \, dx = \int_a^b \left( \int_c^d f(x, y) \, dy \right) dx

To evaluate an iterated integral, peel one layer at a time from the inside out. The inner integral runs first, treating the other variable as a constant — that produces a function of one variable, which the outer integral then accumulates. Each $dx$ or $dy$ marks the variable being integrated at that step; the parentheses are just for clarity.

The same construction lifts to three variables. For a scalar field on a box $D = [a, b] \times [c, d] \times [e, f] \subseteq \mathbb{R}^3$ , three nested integrals give the triple integral:

For a scalar field $f : D = [a, b] \times [c, d] \times [e, f] \to \mathbb{R}$ on a 3D box, the triple integral of $f$ over $D$ is

\begin{aligned} \iiint_D f(x, y, z) \, dx \, dy \, dz &= \int_e^f \int_c^d \int_a^b f(x, y, z) \, dx \, dy \, dz \\ &= \int_e^f \left( \int_c^d \left( \int_a^b f(x, y, z) \, dx \right) dy \right) dz \end{aligned}

The recipe extends to any number of variables: peel off one variable at a time, evaluating the innermost integral first.

Fubini’s theorem — the order doesn’t matter

We wrote the double integral in two orders ( $dx \, dy$ versus $dy \, dx$ ) and asserted both give the same answer. That equivalence is not automatic — it’s a theorem.

Fubini’s theorem. If $f$ is continuous on a closed rectangular box $D = [a_1, b_1] \times \cdots \times [a_n, b_n] \subseteq \mathbb{R}^n$ , then all iterated integrals of $f$ over $D$ have the same value, regardless of the order of integration. For example in 2D:

\int_c^d \int_a^b f(x, y) \, dx \, dy = \int_a^b \int_c^d f(x, y) \, dy \, dx

What this buys us in practice: if one order makes the inner integral easier — perhaps because the integrand factors more cleanly when one variable is treated as constant — we can switch freely. The two orderings are not just numerically equal; they’re interchangeable as a computational strategy, and we will exploit that whenever we have the freedom to choose.

Standard domains

Rectangles and boxes are the easy targets — every integration bound is a constant. Most regions of practical interest aren’t that clean. The next-simplest class is the standard domain (or simple domain): one variable still spans a fixed interval, and the other slides between bounds that depend on the first.

A region $D \subseteq \mathbb{R}^2$ is a standard domain if it can be written in one of the two forms

D = \{\,(x, y) \mid a \leq x \leq b, \; l(x) \leq y \leq u(x)\,\}

or, with the variables swapped,

D = \{\,(x, y) \mid c \leq y \leq d, \; l(y) \leq x \leq u(y)\,\}

where $l$ and $u$ are continuous functions defining the lower and upper boundaries of $D$ .

The figure below shows both forms side by side. In either case, one variable spans a fixed interval, while the other lies between a lower and upper bound that shift as the first variable moves.

Rectangles are a special case. Take constant boundary functions $l(x) \equiv c$ and $u(x) \equiv d$ ; the first form collapses to $\{(x, y) \mid a \leq x \leq b, \; c \leq y \leq d\}$ — a plain rectangle. So everything we did in the previous section sits inside this framework as the trivially-constant case.

The construction lifts to three variables, with each inner layer’s bounds allowed to depend on every variable from the layers outside it:

A region $D \subseteq \mathbb{R}^3$ is a standard domain if it can be written as

D = \{\,(x, y, z) \mid a \leq x \leq b, \; l(x) \leq y \leq u(x), \; \tilde l(x, y) \leq z \leq \tilde u(x, y)\,\}

where $l, u$ and $\tilde l, \tilde u$ are continuous.

The pattern generalizes: each layer’s bounds may depend on all the variables from layers outside it; the outermost layer always lands on constants. Permuting which variable goes outermost gives equally valid standard-domain forms.

Splitting non-standard domains. Not every region fits the standard mold — take an annulus (a ring-shaped region between two concentric circles): any vertical line through the inner hole cuts it into two segments, not one. But many such regions can be partitioned into a finite collection of disjoint standard sub-domains, which is enough to bring them back inside the framework.

Integration over standard domains

With a standard domain, the bounds of the inner integrals are no longer constants — they depend on the variables from the layers outside. Otherwise the iterated structure is unchanged: peel from the inside out, treating outer variables as constants while evaluating each inner integral.

For a scalar field $f$ on a 2D standard domain $D \subseteq \mathbb{R}^2$ , the double integral of $f$ over $D$ is

\iint_D f(x, y) \, dy \, dx = \int_a^b \int_{l(x)}^{u(x)} f(x, y) \, dy \, dx

The outermost integral lands on the constant interval $[a, b]$ ; the inner integral has bounds that move with $x$ . The same recipe lifts to three variables, where each layer’s bounds may reference every variable from the layers outside it:

For a scalar field $f$ on a 3D standard domain $D \subseteq \mathbb{R}^3$ , the triple integral of $f$ over $D$ is

\iiint_D f(x, y, z) \, dz \, dy \, dx = \int_a^b \int_{l(x)}^{u(x)} \int_{\tilde l(x, y)}^{\tilde u(x, y)} f(x, y, z) \, dz \, dy \, dx

Swapping order takes work. Fubini still applies — all integration orders give the same value — but unlike the rectangular case the swap isn’t a mere relabeling of differentials. The inner bounds reference the outer variables, so changing order means re-describing $D$ in its alternative standard form (e.g. $y$ -outermost instead of $x$ -outermost in 2D) and deriving new boundary functions $l, u$ from the geometry. Worth doing whenever the alternative order makes the inner integral simpler or more intuitive to evaluate.

Splitting, formalized. When $D = D_1 \cup \cdots \cup D_r$ is a disjoint split of $D$ into standard sub-domains $D_1 \cup \cdots \cup D_r$ , then integral splits along the partition:

\int_D f = \int_{D_1} f + \cdots + \int_{D_r} f

This is the splitting workaround previewed earlier, made concrete — partition the awkward region into standard pieces, integrate over each, sum.

Area and volume of a standard domain

To calculate the area or volume of $D$ itself, we set $f$ to be the constant $1$ ( $f \equiv 1$ ) — that is, $f(x, y) = 1$ at every point of a 2D domain, or $f(x, y, z) = 1$ at every point of a 3D one. With the integrand contributing the same constant weight everywhere, accumulating over $D$ adds up all the infinitesimal pieces that fill it, and out comes the area or volume:

\text{area}(D) = \iint_D 1 \, dx \, dy, \qquad \text{vol}(D) = \iiint_D 1 \, dx \, dy \, dz

On a 2D standard domain $B$ , the area integral collapses cleanly:

\begin{aligned} \iint_B 1 \, dy \, dx &= \int_a^b \int_{l(x)}^{u(x)} 1 \, dy \, dx \\ &= \int_a^b \bigl( u(x) - l(x) \bigr) \, dx \\ &= \int_a^b u(x) \, dx - \int_a^b l(x) \, dx \end{aligned}

The double integral reduces to the area between two curves from 1D calculus — the area under the upper curve minus the area under the lower curve.

Integration under coordinate transformations

The integrals defined so far run in Cartesian coordinates: every region is described by $(x_1, \dots, x_n)$ , and every infinitesimal slice has volume $dx_1 \cdots dx_n$ . But sometimes a region’s geometry fits other coordinates much better — a circular disk wants polar, a sphere wants spherical — and an integral that’s painful in $x$ can become almost trivial after a rewrite. The price is a correction factor that accounts for how the new coordinates locally stretch or compress space.

That factor is the absolute value of the Jacobian determinant of the transformation, and the rule for swapping coordinates inside an integral is:

Let $B, D \subseteq \mathbb{R}^n$ be open and let $\psi : B \to D$ be a coordinate transformation, so each point $\mathbf{x} = (x_1, \dots, x_n) \in D$ has a unique address $\mathbf{y} = (y_1, \dots, y_n) \in B$ with $\mathbf{x} = \psi(\mathbf{y})$ . For an integrable scalar field $f$ on $D$ ,

\int_D f(\mathbf{x}) \, dx_1 \cdots dx_n = \int_B f(\psi(\mathbf{y})) \, |\det J_\psi(\mathbf{y})| \, dy_1 \cdots dy_n

The formula reads naturally: integrate the same field over $B$ using the new coordinates $y_i$ , but at every point reweight by how much $\psi$ distorts an infinitesimal volume there. Where $|\det J_\psi|$ is small, $\psi$ is compressing — many $y$ -points crowd into a small region of $D$ , so each contributes proportionally less to the total. Where $|\det J_\psi|$ is large, $\psi$ is stretching — each $y$ -point covers a larger patch of $D$ and counts for more.

The 1D analog. With $n = 1$ , a coordinate transformation is just a smooth bijection $\psi : [a, b] \to \psi([a, b])$ , and $J_\psi$ collapses to the single number $\psi'$ . The formula becomes

\int_{\psi([a, b])} f(x) \, dx = \int_a^b f(\psi(t)) \, |\psi'(t)| \, dt

— exactly the substitution rule from 1D calculus. The absolute value $|\psi'(t)|$ measures how much $\psi$ stretches an infinitesimal interval; that’s the role $|\det J_\psi|$ plays in higher dimensions.

Area of a disk. The open disk $D = \{(x, y) \mid x^2 + y^2 < R^2\}$ has awkward Cartesian bounds — the inner integral runs from $-\sqrt{R^2 - x^2}$ to $\sqrt{R^2 - x^2}$ . In polar coordinates, $\psi(r, \varphi) = (r \cos \varphi, r \sin \varphi)$ maps the rectangle $B = (0, R) \times (0, 2\pi)$ onto $D$ , with Jacobian determinant $\det J_\psi = r$ . Change of variables gives

\iint_D 1 \, dx \, dy = \int_0^{2\pi} \int_0^R r \, dr \, d\varphi = \pi R^2

— the disk area, no awkward bounds in sight.

Surface integrals

Every integral so far has accumulated a field over a flat region — an interval, a 2D area in $\mathbb{R}^2$ , a 3D solid in $\mathbb{R}^3$ . Surface integrals are a different kind of integration: the integration domain is a curved 2D patch sitting inside 3D space. The setup pairs a regular surface $\phi : B \subseteq \mathbb{R}^2 \to \mathbb{R}^3$ with a field function (scalar field or vector field) defined on a 3D region that contains the surface. The integral evaluates the function at each surface point $\phi(u, v)$ and accumulates those values, weighted by a local distortion factor built from the tangent vectors $\phi_u, \phi_v$ . Two flavors arise depending on whether the field is scalar- or vector-valued.

Scalar surface integrals

When the integrand is a scalar field $f : D \subseteq \mathbb{R}^3 \to \mathbb{R}$ with $\phi(B) \subseteq D$ , each parameter point $(u, v) \in B$ contributes the reading $f(\phi(u, v))$ , weighted by how much surface area the tiny parameter patch around $(u, v)$ sweeps out.

For a regular surface $\phi : B \subseteq \mathbb{R}^2 \to \mathbb{R}^3$ and a scalar field $f : D \subseteq \mathbb{R}^3 \to \mathbb{R}$ with $\phi(B) \subseteq D$ , the scalar surface integral of $f$ over $\phi$ is

\iint_\phi f \, ds = \iint_B f(\phi(u, v)) \, \|\phi_u(u, v) \times \phi_v(u, v)\| \, du \, dv

We multiply by $\|\phi_u(u, v) \times \phi_v(u, v)\|$ — the local area-stretch factor — because the integral runs over $\phi$ ‘s parameter domain $B$ , naturally weighting each $f$ -value by the parameter-patch area $du \, dv$ . But the values come from the surface, and each tiny parameter patch around $(u, v)$ maps to a surface patch of a different size — so weighting by $du \, dv$ alone doesn’t reflect how large the corresponding patch on the surface is. Scaling by $\|\phi_u \times \phi_v\|$ swaps in the surface-patch area as the weight, so each $f$ -value is weighted by the actual amount of surface it represents. For the geometric derivation, see the in-depth note below.

In-depth: where

\|\phi_u(u, v) \times \phi_v(u, v)\|

comes from

Pick a tiny rectangle in parameter space at $(u_0, v_0)$ with sides $du$ and $dv$ . Under $\phi$ , its four corners land on four nearby points of the surface, bounding a tiny curved patch — and we need its area.

A small step of size $du$ along the $u$ -parameter (with $v$ fixed) shifts the surface point by approximately $\phi_u(u_0, v_0) \, du$ — that is precisely the role of the tangent vectors: they record the 3D rate of change of $\phi$ when we wiggle one parameter at a time. Similarly a $dv$ -step shifts the point by $\phi_v(u_0, v_0) \, dv$ . The two displacements span a parallelogram in 3D with edges $\phi_u \, du$ and $\phi_v \, dv$ , and to first-order Taylor approximation the curved patch is indistinguishable from that parallelogram — same shape, and to that order, the same area.

So computing the patch’s area reduces to computing the parallelogram’s. By a standard property of the cross product, the area of a parallelogram spanned by two vectors in $\mathbb{R}^3$ is the cross-product norm of its edges — applied here, the patch carries surface area $\|\phi_u \times \phi_v\| \, du \, dv$ — i.e. the magnitude of the cross product of the tangent vectors, times the parameter-area element $du \, dv$ . The cross-product vector $\phi_u \times \phi_v$ itself is the surface normal at $\phi(u_0, v_0)$ — the perpendicular vector named when we introduced the tangent vectors — and its magnitude is exactly the area element we just derived.

So $\|\phi_u \times \phi_v\|$ is the local area-stretch factor: it converts one unit of parameter-space area at $(u_0, v_0)$ into the corresponding amount of true surface area. Where $\|\phi_u \times \phi_v\|$ is large, $\phi$ is stretching — each parameter point covers a wide swath of surface, so its $f$ -reading should count for more in the total. Where it is small, $\phi$ is compressing — many parameter points crowd onto a tiny surface patch, so each one should count for less. Multiplying $f(\phi(u, v))$ by the factor before integrating accounts for that distortion, leaving a value that depends only on the surface’s geometry, not on how we chose to parametrize it.

Cousin of change of variables. The shape mirrors the change-of-variables formula: both rewrite an integral over a curved or differently-coordinatized region as one over a flat parameter region, paying a local distortion factor at each point.

The difference is dimensional. Change of variables maps $\mathbb{R}^n \to \mathbb{R}^n$ — the Jacobian is square, and $|\det J_\psi|$ measures the local volume distortion. A surface parametrization maps $\mathbb{R}^2 \to \mathbb{R}^3$ — the Jacobian $J_\phi$ is $3 \times 2$ , non-square, so it has no determinant. But its two columns are precisely $\phi_u$ and $\phi_v$ , and the natural replacement for $|\det|$ is the area of the parallelogram those columns span — exactly $\|\phi_u \times \phi_v\|$ . Same role (local distortion factor); the codimension forces a cross product in place of a determinant.

Vector surface integrals

When the integrand is a vector field $\boldsymbol{v} : D \subseteq \mathbb{R}^3 \to \mathbb{R}^3$ with $\phi(B) \subseteq D$ , each parameter point $(u, v) \in B$ contributes the inner product of $\boldsymbol{v}(\phi(u, v))$ with the vector $\phi_u \times \phi_v$ — the surface normal itself, not its norm as in the scalar case.

For a regular surface $\phi : B \subseteq \mathbb{R}^2 \to \mathbb{R}^3$ and a vector field $\boldsymbol{v} : D \subseteq \mathbb{R}^3 \to \mathbb{R}^3$ with $\phi(B) \subseteq D$ , the vector surface integral of $\boldsymbol{v}$ over $\phi$ — also called the flux of $\boldsymbol{v}$ across $\phi$ , in the direction $\phi_u \times \phi_v$ — is

\iint_\phi \boldsymbol{v} \cdot ds = \iint_B \boldsymbol{v}(\phi(u, v))^\top (\phi_u(u, v) \times \phi_v(u, v)) \, du \, dv

At each surface point, $\boldsymbol{v} \cdot (\phi_u \times \phi_v)$ computes the scalar projection of $\boldsymbol{v}$ onto the surface normal weighted by the length of the normal, which is the area-stretch factor from the scalar case. (For the general inner-product principle behind splitting a dot product into “projection × magnitude”, see the weighted-projection remark.) To picture what this measures, think of $\boldsymbol{v}$ as a fluid’s velocity field — at each point in space, an arrow giving the direction and speed of the flow. At a surface point, split $\boldsymbol{v}$ into a perpendicular and a tangential part: only the perpendicular part crosses the surface (carries fluid from one side to the other), while the tangential part just slides along the surface, never crossing. A flow heading obliquely into the surface contributes its perpendicular share to crossing, not zero — only a flow that’s exactly parallel to the surface contributes nothing. So $\boldsymbol{v} \cdot (\phi_u \times \phi_v)$ — projection onto the normal, scaled by the patch’s surface area — measures the rate at which fluid is crossing the surface patch around $(u, v)$ . That is what we call the flux through the local patch. Integrating over $B$ sums these per-patch contributions into the total flux of $\boldsymbol{v}$ through $\phi$ — the overall rate at which the fluid crosses the surface. For the geometric breakdown, see the in-depth note below.

In-depth: how

\boldsymbol{v} \cdot (\phi_u \times \phi_v)

encodes flux

Using the geometric form of the inner product:

\boldsymbol{v}(\phi(u, v)) \cdot (\phi_u \times \phi_v) = \|\boldsymbol{v}\| \, \|\phi_u \times \phi_v\| \, \cos\theta,

where $\theta$ is the angle between $\boldsymbol{v}$ and the surface normal. Group the factors:

\underbrace{\|\boldsymbol{v}\| \cos\theta}_{\text{component of } \boldsymbol{v} \text{ along normal}} \;\cdot\; \underbrace{\|\phi_u \times \phi_v\|}_{\text{area element}}

— a product of two pieces we already understand. The first is the scalar projection of $\boldsymbol{v}$ onto the unit normal $\hat{\mathbf{n}} = (\phi_u \times \phi_v)/\|\phi_u \times \phi_v\|$ : the signed length of $\boldsymbol{v}$ ‘s shadow on the normal direction. It is positive when $\boldsymbol{v}$ points through the surface in the $\phi_u \times \phi_v$ direction, negative when it points the other way, and zero when $\boldsymbol{v}$ is tangent to the surface — purely tangential motion slides along the surface rather than crossing it. The second factor is the same area-stretch weight that appeared in the scalar case. Multiplied, they give the flux through the local surface patch — how much of $\boldsymbol{v}$ crosses that patch, accounting for both alignment and patch size.

Sign depends on orientation. $\phi_u \times \phi_v$ has a definite direction set by the parameter order; swapping $u$ and $v$ flips it, and with it the sign of the integral. Vector surface integrals are signed, with the sign tied to which side of $\phi$ counts as “outward” — that is what the qualifier “in the direction $\phi_u \times \phi_v$ ” records.

Physical readings. When $\boldsymbol{v}$ is a fluid-velocity field, the integral measures the volume of fluid crossing $\phi$ per unit time — the literal flux. When $\boldsymbol{v}$ is an electric field, it gives the electric flux through $\phi$ , the quantity that appears in Gauss’s law. In both cases the vanishing of tangential contributions matches physical intuition: fluid sliding parallel to a wall doesn’t cross it.

Surface area

The same trick that gave us the area or volume of a standard domain — set the integrand to the constant $1$ — lifts to surfaces. With $f \equiv 1$ in the scalar surface integral, the integrand contributes nothing of its own and what remains is a pure sum of area-stretch factors over the parameter domain. The result is the surface area of $\phi$ .

For a regular surface $\phi : B \subseteq \mathbb{R}^2 \to \mathbb{R}^3$ , the surface area of $\phi$ is the scalar surface integral of the unit function over $\phi$ :

O(\phi) = \iint_\phi 1 \, ds = \iint_\phi ds = \iint_B \|\phi_u(u, v) \times \phi_v(u, v)\| \, du \, dv

Each parameter patch of area $du \, dv$ maps under $\phi$ to a surface patch of area $\|\phi_u \times \phi_v\| \, du \, dv$ — the per-patch area element built earlier. Summing those elements across all of $B$ recovers the total area of the surface in 3D space.

Gauss’s divergence theorem

The two integrals built so far in this chapter look like they belong to different worlds: the triple integral accumulates a scalar density throughout a 3D solid, while the vector surface integral accumulates a vector field’s flow across a 2D boundary. Gauss’s divergence theorem says they are the same number in a precise case — integrating the divergence of a vector field over a 3D region equals the flux of that field through the region’s boundary surface.

Gauss’s divergence theorem. Let $B \subseteq \mathbb{R}^3$ be a domain with piecewise continuously differentiable boundary $\partial B$ . For any continuously differentiable vector field $\boldsymbol{v} : D \subseteq \mathbb{R}^3 \to \mathbb{R}^3$ with $B \subseteq D$ :

\iiint_B \operatorname{div} \boldsymbol{v} \, dx \, dy \, dz = \iint_{\partial B} \boldsymbol{v} \cdot ds

Each continuously differentiable piece of $\partial B$ must be parametrized by some $\phi$ with $\phi_u \times \phi_v$ pointing outward from $B$ .

The divergence referenced here is exactly the differential operator built earlier: $\operatorname{div} \boldsymbol{v} = \nabla \cdot \boldsymbol{v} = \sum_i \partial_i v_i$ , the trace of the Jacobian of $\boldsymbol{v}$ .

Reading both sides — divergence as source density, flux as net escape

The two integrals are different views on the same physical accounting:

Volume side ( $\iiint_B \operatorname{div} \boldsymbol{v}$ ). The divergence at a point $\mathbf{x}$ measures how much the vector field is locally creating per unit volume there: positive divergence means $\boldsymbol{v}$ has a tiny source at $\mathbf{x}$ pumping new flow out into space (a faucet); negative means a sink absorbing flow (a drain); zero means whatever flows in immediately flows out (incompressible flow). The triple integral sums those local source strengths over every point of $B$ — the total amount of stuff being generated inside $B$ per unit time.
Boundary side ( $\iint_{\partial B} \boldsymbol{v} \cdot ds$ ). The flux through $\partial B$ , with the outward orientation, measures the net amount of stuff crossing the boundary outward per unit time — what escapes minus what re-enters.

The theorem says the two match exactly: every drop of fluid produced by a source inside $B$ has to cross the boundary on its way out (it has nowhere else to go), and every drop absorbed by a sink must have entered through the boundary first. Local generation inside ↔ net escape through the wall — a conservation principle dressed in calculus.

The “outward” condition — what orientation each boundary patch must have

Recall from vector surface integrals that $\phi_u \times \phi_v$ has a definite direction fixed by the parameter order — swap $u$ and $v$ and the normal flips. For the theorem to balance, every boundary patch must be parametrized so its normal points away from the solid material of $B$ , never into it; the wrong orientation on any patch flips a sign on the right-hand side and breaks the equation.

The condition becomes subtle when $\partial B$ has more than one connected piece. Picture $B$ as the rubber wall of a hollow ball — material trapped between an outer sphere and an inner sphere — so $\partial B$ is the union of those two spheres. On the outer sphere, “out of the rubber” points away from the center, as you’d expect. On the inner sphere, “out of the rubber” points into the cavity — toward the center, which looks like the opposite direction. Both normals correctly leave $B$ ‘s material; the lesson is that “outward” is relative to where the material lives, not to any fixed origin.

Why the theorem is useful — computational shortcut and backbone of conservation laws

Two distinct payoffs:

A computational shortcut. The two sides of the equation often differ wildly in difficulty for the same problem, and the theorem lets us pick the easier one. If a triple integral $\iiint_B \operatorname{div} \boldsymbol{v}$ over an awkward 3D region is the painful side, swap it for the flux $\iint_{\partial B} \boldsymbol{v} \cdot ds$ through the boundary. Going the other way, a flux integral over a complicated closed surface — say one stitched from many patches that would each need separate parametrizations — can be rewritten as a single triple integral of $\operatorname{div} \boldsymbol{v}$ over the enclosed solid. Whichever side is friendlier wins.
The mathematical backbone of conservation laws. Most fundamental conservation statements in physics — of mass, of electric charge, of energy — are this theorem applied to a specific vector field. Gauss’s law of electromagnetism — electric flux through any closed surface equals the enclosed charge (up to a constant) — is the divergence theorem applied to the electric field. The continuity equation in fluid dynamics — “the rate of mass change inside a region equals the net mass flow across its boundary” — is the same identity rearranged into local form. The theorem is what lets physics translate freely between “what’s happening at every point inside” and “what’s crossing the boundary”.

Applications

Two short examples of where high-dimensional integrals show up in practice — one from statistics, one from machine learning.

Statistics — computing moments

Probability theory uses integrals to extract single-number summaries from a distribution. Given a random variable $X$ — a quantity whose value is uncertain, like a measurement or a coin flip — its probability density $f : \mathbb{R} \to \mathbb{R}$ is a non-negative function (with $\int f = 1$ ) encoding how likely each value of $X$ is. From it we extract single-number summaries — moments — by integrating powers of the offset from a reference point against the density.

For a probability density $f$ on $\mathbb{R}$ and a reference point $c \in \mathbb{R}$ , the $q$ -th moment of $f$ about $c$ is the definite integral

\mu_q = \int_{-\infty}^{\infty} (x - c)^q \, f(x) \, dx

Two specific $(q, c)$ choices recover the most familiar summary statistics:

$q = 1$ , $c = 0$ : $\mu_1 = \int x \, f(x) \, dx$ — the expected value (or mean) $\mathbb{E}[X]$ , the average of $X$ weighted by how likely each value is.
$q = 2$ , $c = \mu_1$ : $\mu_2 = \int (x - \mu_1)^2 \, f(x) \, dx$ — the variance, the average squared deviation of $X$ from its mean.

Reading the variance form. You may know variance from a finite sample as the average squared deviation, $\frac{1}{n}\sum_i (x_i - \bar{x})^2$ — every sample contributes equally with weight $\frac{1}{n}$ . In the integral, the density $f(x) \, dx$ takes that weighting role: each squared deviation $(x - \mu_1)^2$ is scaled by how likely $x$ is to occur, then accumulated. Same averaging operation, lifted from a discrete sample to a continuous distribution — and the expected value formula $\int x \, f(x) \, dx$ reads as the continuous version of $\frac{1}{n}\sum_i x_i$ .

The multivariate case. When the random object is a vector $\mathbf{x} \in \mathbb{R}^n$ with joint density $f : \mathbb{R}^n \to \mathbb{R}$ , the same recipe runs over $\mathbb{R}^n$ — every moment integral becomes an $n$ -fold integral, exactly the kind of high-dimensional quadrature this chapter has been about.

Machine learning — kernel density estimation

The previous example assumed the probability density $f$ was known. The reverse problem — given a sample $x_1, \dots, x_n$ drawn from an unknown distribution, estimate its density $f$ — is kernel density estimation (KDE). The idea is to place a small smooth “bump” at each data point and average them all: where the data clusters, the bumps pile up and the estimated density is high; where data is sparse, it is low.

\hat{f}_h(x) = \frac{1}{n} \sum_{i=1}^n \mathcal{K}\!\left(\frac{x - x_i}{h}\right)

The kernel $\mathcal{K}$ is a non-negative function (typically a Gaussian centered at zero), and the bandwidth $h$ controls each bump’s width — small $h$ gives narrow spikes that overfit individual points, large $h$ gives wide bumps that oversmooth fine structure. The catch is picking $h$ .

A standard selection criterion minimizes the mean integrated squared error:

\text{MISE}(h) = \mathbb{E}\!\left[ \int_{-\infty}^{\infty} (\hat{f}_h(x) - f(x))^2 \, dx \right]

— the average (over random samples) of how far $\hat{f}_h$ deviates from the true density $f$ , integrated across all $x$ . In practice $f$ is unknown, so MISE cannot be computed directly; practical methods such as cross-validation or Silverman’s rule of thumb approximate it instead. However, analyzing MISE theoretically reveals that finding the optimal $h$ requires evaluating an $n$ -fold integral over the data’s joint distribution — a high-dimensional quadrature problem that explains why no simple closed-form solution exists.