Scheduling

Scheduling problems share one abstract shape: a fixed set of tasks that need to get done, a fixed set of machines that can do them, and a question of when to put each task on which machine so that the whole job finishes as well as possible. “As well as possible” depends on the situation — most often it means finishing in the shortest possible total time — but the underlying structure stays the same: an ordering decision under constraints.

This is a different flavor of discrete modeling from the decision models of the previous chapter. There the discreteness lived in the choice space — finitely many actions, finitely many states — and the decision came down to a single pick from a matrix. Here the choice space is itself an arrangement: tasks laid out over time and across machines. The questions are no longer which row of a matrix to pick; they are how to lay things out in time.

Problem setup

A scheduling problem consists of:

a process — an overall piece of work to be carried out — decomposed into $n$ tasks (also called jobs) $A_1, \dots, A_n$ ;
a pool of $m$ machines $M_1, \dots, M_m$ that can execute those tasks;
an execution time $t_i^{(j)} \ge 0$ for each (task, machine) pair — the time machine $M_j$ takes to complete task $A_i$ ;
three constraints on how tasks and machines may be paired up over time:
- no task is carried out on more than one machine at a time,
- no machine handles more than one task at a time,
- but several machines may run in parallel.

The execution-time table $t_i^{(j)}$ is the most general form: the time a task takes can depend both on which task it is and on which machine runs it. Later sections specialize this — for instance when all machines are interchangeable, $t_i^{(j)}$ collapses to a single number $t_i$ per task — but for now leave both indices free.

The three constraints together carve out the legal moves of the problem. The first two are exclusion clauses: while a task runs it occupies one machine, and while a machine runs it occupies one task. The third is permission rather than prohibition — distinct machines may be busy at the same wall-clock moment, doing distinct tasks. This parallelism across machines is what makes the problem interesting; without it the model would collapse to choosing a total order on $n$ tasks for one resource.

Model parameters

A scheduling model is described by three kinds of parameters:

task parameters — what each $A_i$ is, what it requires, how long it takes;
machine parameters — what each $M_j$ is capable of and how it processes tasks (e.g. the execution times $t_i^{(j)}$ );
a target function (also called objective function or optimality criterion) — a rule that turns a candidate schedule into a single number to be optimized.

The first two are descriptive: they pin down the system being modeled. The third is the normative piece — it says what optimal means for the problem at hand, and the answer changes with the situation.

Goal

The goal is to find an optimal schedule: an assignment of tasks to machines and time slots that best satisfies the target function while respecting the constraints. At its heart this is an ordering problem — every candidate schedule comes down to which task runs where, and in what order.

The most common target is to minimize the overall time — the moment when the last task finishes — i.e. the schedule that gets everything done as early as possible. Other choices are possible (minimize total cost, maximize machine utilization, balance load across machines), and much of what follows in this chapter is about how the model and the algorithms change depending on which target is in play.

Machine characterization

The structure of the execution-time table $t_i^{(j)}$ — how the times relate to the tasks and to the machines — distinguishes one scheduling problem from another. Four classes cover the spectrum from the most restrictive setup to the fully general one.

A single machine pool has $m = 1$ . Every task is processed strictly serially on one resource — no parallelism is possible.

The single-machine case is the simplest version of the problem. Without parallelism, the only choice is the order in which tasks are processed.

Machines are identical when they are both qualitatively identical — every machine offers the same range of functionality and can handle every task — and quantitatively identical — every machine runs at the same speed. The execution time then depends only on the task:

t_i^{(j)} = t_i

The superscript $(j)$ drops because the machine identity is irrelevant.

Think of $m$ copies of the same industrial robot. Any task can be assigned to any robot, and each one finishes in the same time. The scheduling problem reduces to deciding how to spread the tasks across the machines.

Machines are qualitatively identical when they all support the same set of tasks but run at different speeds. Each machine $M_j$ carries a per-machine proportionality factor $q_j > 0$ , and the execution time scales accordingly:

t_i^{(j)} = q_j \cdot t_i

A larger $q_j$ marks a slower machine for every task it runs; setting all $q_j = 1$ recovers the identical-machines case.

This is the “same model, different generations” picture — a workshop with old and new versions of the same machine. The qualitative ability is uniform, but a job that takes 4 hours on the newest machine might take 6 hours on the previous generation.

Machines are arbitrary when both their functionality and their performance vary across the pool. The execution times $t_i^{(j)}$ are then arbitrary, with no algebraic relationship tying the rows or columns of the table together.

This is the fully general setup: each (task, machine) pair carries its own number, with no shared structure across the table. Most real-world scheduling problems live here — different machines with different abilities, running different jobs at different rates.

The four classes line up along a single trade-off: more general means more powerful, but also more difficult to analyze. The more general the machine model, the wider the range of real-world setups it can describe — and the less algebraic structure there is to exploit when searching for an optimum. The simpler classes often admit clean, direct algorithms; the more general ones typically force a combinatorial search over many candidate schedules.

A separate axis of generality is uncontrollable influences on the execution times — noise, machine failures, weather, the unforeseen. Modelling those requires replacing the deterministic $t_i^{(j)}$ with a random variable, and we return to that direction later in the chapter.

Process scheduling

We now narrow to the simplest setting: the execution time of each task is purely task-dependent, so every $A_i$ carries a single duration $t_i$ that doesn’t depend on the machine. We also assume machine availability is not a binding constraint — there are always enough machines to start a task the instant its prerequisites finish, no matter how many other tasks are already running. The only thing that ever forces one task to wait for another is a precedence arrow, and the natural language for those arrows is a directed graph.

This “unlimited resources” assumption will be relaxed later in the chapter. With finitely many machines, two precedence-unrelated tasks can still have to take turns when they target the same machine — that case is the job-shop problem of the next few sections.

The start time $s_i \ge 0$ of task $A_i$ is the moment at which $A_i$ is launched. The completion time is

c_i = s_i + t_i,

the moment at which $A_i$ finishes. We assume that once started, a task cannot be interrupted — it runs through to completion.

With every task carrying a start time and a completion time, “task $A_j$ has to wait for task $A_i$ ” becomes a clean numerical condition.

A precedence condition (also called a precedence constraint) from $A_i$ to $A_j$ , written $A_i \to A_j$ , requires that $A_j$ start only after $A_i$ has finished:

A_i \to A_j \,:\Leftrightarrow\, c_i \le s_j.

Equivalently, $A_i$ ‘s completion must precede (or coincide with) $A_j$ ‘s start.

The precedence graph

A list of precedence conditions has a natural picture: draw one node per task, and one arrow per precedence condition. The picture is a directed graph, and it captures the entire ordering structure of the process at a glance.

The precedence graph of a process is a directed graph $G := (V, E)$ where:

the vertex set is $V := \{A_1, \dots, A_n\}$ — one vertex per task — and each vertex $A_i$ is labeled with its execution time $t_i$ ;
the edge set is $E := \{ (A_i, A_j) : A_i \to A_j \}$ — one directed edge per precedence condition. The edges themselves carry no labels.

Two extra vertices close the graph at its endpoints:

an initial vertex $A_S$ with index $S := 0$ and $t_S := 0$ , joined by an edge $A_S \to A_i$ to every $A_i$ that has no incoming edge in $E$ — every task that depends on no other task;
a final vertex $A_E$ with index $E := n + 1$ and $t_E := 0$ , joined by an edge $A_i \to A_E$ from every $A_i$ that has no outgoing edge in $E$ — every task that nothing else depends on.

Putting the duration on the vertex and leaving the edge bare is a deliberate split. The vertex carries the quantitative data — how long the task takes — while the edge carries pure ordering information: ” $A_i$ must come before $A_j$ ”, no number attached. In graph-theoretic terms this is a node-weighted directed graph; only the vertices carry numbers.

The two bookend vertices $A_S$ and $A_E$ have zero cost ( $t_S = t_E = 0$ ). $A_S$ is a single common starting point that every otherwise-unconstrained task must follow; $A_E$ is a single common endpoint that every otherwise-unconstrained task must reach. The payoff comes later — the moment at which the entire process is finished is just the completion time of $A_E$ , one number instead of a maximum across many independent endpoints.

Admissible schedules and DAGs

With the precedence graph in hand, we can now write down what a schedule is — and what it means for one to actually respect the dependencies.

A schedule for a process is an assignment of a start time $s_i \ge 0$ to every task $A_i$ . The completion times then follow from $c_i = s_i + t_i$ .

A schedule on its own is just numbers — there is no built-in guarantee that it respects the precedence arrows. That is a separate property.

A schedule is admissible when every precedence condition $A_i \to A_j$ in $G$ is satisfied — that is, $c_i \le s_j$ for every edge $(A_i, A_j) \in E$ .

Not every precedence graph admits a non-trivial admissible schedule, though. A graph $G$ that contains a cycle — a path $A_{i_1} \to A_{i_2} \to \cdots \to A_{i_k} \to A_{i_1}$ that returns to its starting vertex — forces

s_{i_1} \ge c_{i_k} \ge s_{i_1} + \sum_{j=1}^{k} t_{i_j},

so $0 \ge \sum_{j=1}^k t_{i_j}$ , and since every $t_{i_j} \ge 0$ , every one of them must be zero.

Where each inequality comes from

The first inequality $s_{i_1} \ge c_{i_k}$ is just the closing edge of the cycle: the precedence condition $A_{i_k} \to A_{i_1}$ reads $c_{i_k} \le s_{i_1}$ .

The second inequality $c_{i_k} \ge s_{i_1} + \sum_{j=1}^k t_{i_j}$ comes from walking forward through the cycle starting at $A_{i_1}$ :

$c_{i_1} = s_{i_1} + t_{i_1}$ ;
$s_{i_2} \ge c_{i_1}$ (the edge $A_{i_1} \to A_{i_2}$ ), so $c_{i_2} = s_{i_2} + t_{i_2} \ge s_{i_1} + t_{i_1} + t_{i_2}$ ;
continuing the same step inductively along $A_{i_2} \to A_{i_3} \to \cdots \to A_{i_k}$ gives $c_{i_k} \ge s_{i_1} + t_{i_1} + t_{i_2} + \cdots + t_{i_k}$ .

The only cycles a precedence graph can carry are therefore cycles of zero-duration tasks — mathematically degenerate. So we restrict attention to graphs with no cycles at all.

A directed acyclic graph (commonly abbreviated DAG) is a directed graph that contains no cycles. From this point on, every precedence graph $G$ is assumed to be a DAG.

Precedence is transitive. If $A_i \to A_j$ and $A_j \to A_k$ , then $c_i \le s_j \le c_j \le s_k$ , so $A_i \to A_k$ holds automatically. Long chains of dependencies imply pairwise precedence between any earlier and later task in the chain, regardless of whether those pairwise edges are drawn explicitly in $E$ .

Once $G$ is a DAG, the tasks admit a clean linear ordering that is compatible with every precedence arrow.

A topological sort (also called a topological ordering) of a DAG is a reindexing of the vertices $A_1, A_2, \dots, A_n$ such that every edge $A_i \to A_j$ has $i < j$ — every task receives an index strictly larger than those of all its prerequisites. A depth-first search of the graph — a recursive traversal that follows each branch to its end before backtracking — produces one such ordering, and every DAG admits at least one.

Reindexing the vertices into a topological order makes constructing schedules straightforward: walking the tasks in order $A_1, A_2, \dots, A_n$ , every prerequisite of the current task has already been visited, so its start and completion times are already known. The next section turns this observation into an explicit “earliest-possible” schedule that always exists and turns out to be optimal as well.

Constructing schedules

With the precedence graph reduced to a DAG and the vertices renumbered in topological order, we can build a concrete admissible schedule task by task. Two natural constructions present themselves — the “as early as possible” schedule and the “as late as possible” schedule — and both turn out to achieve the same overall completion time.

Earliest schedule: lead times

The first construction follows the simplest possible policy: start every task the moment it is allowed to start — i.e. the instant all of its prerequisites have completed.

The lead time (also called the earliest start time) $s'_i$ of task $A_i$ is the moment at which all of $A_i$ ‘s predecessors have completed — the earliest time $A_i$ may begin. Equivalently, it is the maximum completion time across the predecessors:

s'_i = \max_{j \,:\, A_j \to A_i} c'_j, \qquad s'_S = c'_S = 0.

The corresponding earliest completion time is then

c'_i = s'_i + t_i.

Lead times are computed by a single forward pass through the DAG in topological order, so that every $c'_j$ on the right-hand side of the recurrence is already known by the time we reach $A_i$ :

Initialize at the source with $s'_S = c'_S = 0$ (no predecessors, zero duration).
Walk forward through $A_1, A_2, \dots, A_n$ in topological order, applying the recurrence at each step to compute first $s'_i$ and then $c'_i = s'_i + t_i$ .
Terminate at $A_E$ . The final value $c'_E$ is the earliest possible completion time of the entire process.

The $\max$ in the recurrence enforces precedence: $A_i$ can only start once every predecessor has finished, and the bottleneck predecessor is whichever finishes latest. Acyclicity of $G$ guarantees the recursion is well-defined — when we reach $A_i$ , every predecessor $A_j$ has index $j < i$ in the topological order, so $c'_j$ is already in hand.

Latest schedule: remaining times

A second construction asks the opposite question: how late can each task afford to be without slipping past the optimal overall completion time $c'_E$ ?

The remaining time (also called the latest completion time) $c''_i$ of task $A_i$ is the latest moment at which $A_i$ can finish while still allowing the whole process to complete at $c'_E$ . Equivalently, it is the minimum latest-start across the successors:

c''_i = \min_{j \,:\, A_i \to A_j} s''_j, \qquad c''_E = s''_E = c'_E.

The corresponding latest start time is then

s''_i = c''_i - t_i.

The name “remaining time” is a touch misleading. $c''_i$ is not a duration — it is a deadline, an absolute moment on the same time axis as $s_i$ and $c_i$ . The “remaining” refers to the downstream time budget: once $A_i$ has finished at $c''_i$ , the quantity $c'_E - c''_i$ is what remains on the clock for every task that comes after $A_i$ . Slip past $c''_i$ , and that budget isn’t enough.

Computing remaining times depends on $c'_E$ , the output of the forward pass — so the reverse pass must run after the lead-time pass has finished. With $c'_E$ in hand, we walk the DAG backward in reverse topological order:

Initialize at the sink with $c''_E = s''_E = c'_E$ (no successors, zero duration).
Walk backward through $A_n, A_{n-1}, \dots, A_1$ in reverse topological order, applying the recurrence at each step to compute first $c''_i$ and then $s''_i = c''_i - t_i$ .
Terminate at $A_S$ .

The $\min$ over successors mirrors the $\max$ over predecessors. $A_i$ must finish in time for every successor to start on its own latest schedule, and the tightest of those deadlines binds.

The forward pass gives each task its earliest $(s'_i, c'_i)$ ; the reverse pass gives each task its latest $(s''_i, c''_i)$ . Setting $s_i = s'_i$ for every $i$ defines one admissible schedule; setting $s_i = s''_i$ defines another. Both finish the entire process at exactly $c'_E$ , so both achieve the same overall completion time. What the gap $s''_i - s'_i$ tells us about each task — and why $c'_E$ is in fact the best any schedule can do — is the subject of the next section.

Critical paths

With both the lead times $(s'_i, c'_i)$ and the remaining times $(s''_i, c''_i)$ in hand, every vertex of the DAG falls into exactly one of two cases. By construction $s''_i \ge s'_i$ — the latest acceptable start cannot precede the earliest possible start, since the earliest possible start is the minimum — so the two cases are:

A vertex $A_i$ is critical when its lead time and latest start time coincide:

s'_i = s''_i.

(Equivalently, $c'_i = c''_i$ .) In every optimal schedule, a critical task must be assigned the start time $s_i = s'_i = s''_i$ — it has no leeway.

The slack of a non-critical vertex $A_i$ is the gap

s''_i - s'_i > 0.

In any optimal schedule, $A_i$ may begin anywhere in the interval $s'_i \le s_i \le s''_i$ without affecting the overall completion time.

Slack is individual, not joint. If $A_i \to A_j$ are both non-critical, delaying $A_i$ to its latest start $s_i = s''_i$ while letting $A_j$ begin at its earliest $s_j = s'_j$ can violate the precedence condition $c_i \le s_j$ — $A_i$ may still be running when $A_j$ wants to start, even though both choices sit inside their own slack intervals. The slack interval $[s'_i, s''_i]$ describes the freedom of $A_i$ on its own — not joint freedom across all non-critical tasks at once.

Critical vertices stitch together into paths from $A_S$ to $A_E$ .

A critical path is a path $A_S \to A_{i_1} \to A_{i_2} \to \cdots \to A_E$ in $G$ consisting entirely of critical vertices, joined by critical edges — edges $A_k \to A_l$ along the path at which $A_k$ ‘s two completion values and $A_l$ ‘s two start values all coincide:

c'_k \;=\; c''_k \;=\; s'_l \;=\; s''_l.

So $A_l$ must start the very instant $A_k$ finishes, with no gap. (The outer equalities $c'_k = c''_k$ and $s'_l = s''_l$ are automatic from the criticality of the endpoints; the operative new condition is the middle equality, written compactly in the literature as $c'_k = s''_l$ .)

At least one critical path always exists. $c'_E$ is by definition the earliest possible finish, so there must be at least one chain of tasks of total duration $c'_E$ — every vertex on that chain has no room to delay without postponing the project, hence is critical. (If every task had slack, the whole schedule could be shifted earlier and finish before $c'_E$ , contradicting $c'_E$ being the earliest possible.) Formally, the forward pass’s $\max$ at each vertex picks out the predecessor that determines that vertex’s earliest start; tracing these bottleneck predecessors back from $A_E$ recovers a critical path. If multiple chains achieve total duration $c'_E$ , multiple critical paths exist — and all of them have the same total duration $c'_E$ .

The makespan and CPM

The makespan (also called the entire completion time, or the overall time of the informal problem setup) of a schedule is the value $c_E$ — the moment at which the final vertex $A_E$ finishes, i.e. when the whole process is done. For the lead-time and remaining-time schedules, $c_E$ attains its minimum value, with $c_E = c'_E = c''_E$ .

Along any path $A_S \to A_{i_1} \to \cdots \to A_E$ in $G$ , the cumulative duration $\sum_k t_{i_k}$ — the sum of the execution times of all vertices on the path — is a lower bound on the makespan of every admissible schedule. (This cumulative duration is also called the path’s length in weighted-graph shorthand — the phrase “the length $c_E$ of a critical path” refers to exactly this total duration, not to the number of edges or vertices on the path.) Every task on the path must finish before the next can start, so the path is traversed strictly sequentially. Among all such paths, the one with the largest cumulative duration gives the tightest lower bound — and that largest cumulative duration is exactly $c'_E$ (equivalently $c''_E$ , since the reverse pass set $c''_E = c'_E$ as its boundary).

Both schedules actually achieve this lower bound: the lead-time schedule gives $c_E = c'_E$ directly, and the remaining-time schedule gives $c_E = c''_E$ . So $c_E = c'_E = c''_E$ in either schedule — $c'_E$ is simultaneously a lower bound on the makespan and an upper bound realized by explicit construction, making both schedules makespan-optimal.

The Critical Path Method, abbreviated CPM, is the procedure for finding the critical paths of a precedence DAG: compute the lead times $s'_i$ by a forward pass, the remaining times $c''_i$ by a reverse pass, classify each vertex as critical or non-critical from the gap $s''_i - s'_i$ , and walk backward from $A_E$ to $A_S$ through critical edges to extract one or more critical paths.

The forward pass alone already produces a makespan-optimal schedule — every task has a start time, and the project finishes at $c_E = c'_E$ . Identifying the critical paths is the additional output that CPM is named after, and it earns its name through two concrete real-world uses:

To shorten the makespan, you must shorten a task that lies on a critical path. Shortening any other task only widens its slack and leaves $c_E$ unchanged — so when investing effort (people, money, parallelism) to speed up a project, the critical path tells you where that effort actually pays off.
To know which tasks can absorb a delay, look at the non-critical vertices: each one can start as late as $s''_i$ — i.e. slip by up to its slack — without pushing $c_E$ later. Critical tasks have no such buffer; any slip on them pushes the entire project’s deadline.

A worked example

To make the whole CPM pipeline concrete, consider seven tasks with execution times

$i$	1	2	3	4	5	6	7
$t_i$	3	2	2	3	2	4	4

and the precedence set

\{A_1 \to A_3,\; A_2 \to A_3,\; A_3 \to A_4,\; A_3 \to A_5,\; A_6 \to A_7\}.

So we get the following precedence graph:

Three tasks ( $A_1, A_2, A_6$ ) have no prerequisites in this list, so each receives an edge from the initial vertex $A_S$ ; three tasks ( $A_4, A_5, A_7$ ) appear as nobody’s prerequisite, so each gains an edge to the final vertex $A_E$ . The natural indexing $1, 2, \dots, 7$ already forms a topological order, so we can walk the algorithm vertex by vertex in index order.

Forward pass: lead times

Taking the $\max$ over predecessors at each step:

$A_S$ : $s'_S = 0$ , $c'_S = 0 + 0 = 0$ .
$A_1$ : $s'_1 = 0$ , $c'_1 = 0 + 3 = 3$ .
$A_2$ : $s'_2 = 0$ , $c'_2 = 0 + 2 = 2$ .
$A_3$ : $s'_3 = \max(c'_1, c'_2) = \max(3, 2) = 3$ , $c'_3 = 3 + 2 = 5$ .
$A_4$ : $s'_4 = 5$ , $c'_4 = 5 + 3 = 8$ .
$A_5$ : $s'_5 = 5$ , $c'_5 = 5 + 2 = 7$ .
$A_6$ : $s'_6 = 0$ , $c'_6 = 0 + 4 = 4$ .
$A_7$ : $s'_7 = 4$ , $c'_7 = 4 + 4 = 8$ .
$A_E$ : $s'_E = \max(8, 7, 8) = 8$ , $c'_E = 8 + 0 = 8$ .

The earliest possible overall completion time is $c'_E = 8$ .

Reverse pass: remaining times

Starting from $c''_E = s''_E = 8$ and walking the DAG in reverse index order, taking the $\min$ over successors:

$A_E$ : $c''_E = 8$ , $s''_E = 8$ .
$A_7$ : $c''_7 = s''_E = 8$ , $s''_7 = 4$ .
$A_6$ : $c''_6 = s''_7 = 4$ , $s''_6 = 0$ .
$A_5$ : $c''_5 = s''_E = 8$ , $s''_5 = 6$ .
$A_4$ : $c''_4 = s''_E = 8$ , $s''_4 = 5$ .
$A_3$ : $c''_3 = \min(s''_4, s''_5) = \min(5, 6) = 5$ , $s''_3 = 3$ .
$A_2$ : $c''_2 = s''_3 = 3$ , $s''_2 = 1$ .
$A_1$ : $c''_1 = s''_3 = 3$ , $s''_1 = 0$ .
$A_S$ : $c''_S = \min(s''_1, s''_2, s''_6) = \min(0, 1, 0) = 0$ , $s''_S = 0$ .

Critical vertices and critical paths

Combining the two passes, shading the critical vertices and the critical edges, and coloring each set of four matching values $c'_k = c''_k = s'_l = s''_l$ at the endpoints of a critical edge:

Seven of the nine vertices are critical, and they line up into two distinct critical paths from $A_S$ to $A_E$ :

A_S \to A_1 \to A_3 \to A_4 \to A_E \qquad (0 + 3 + 2 + 3 + 0 = 8)

A_S \to A_6 \to A_7 \to A_E \qquad (0 + 4 + 4 + 0 = 8)

Both paths have total duration exactly $c'_E = 8$ — exactly as the existence argument promised: when several critical paths exist, they share the same total duration. The only tasks with leeway are $A_2$ and $A_5$ , each carrying one unit of slack ( $s''_i - s'_i = 1$ ). In any optimal schedule, $A_2$ may start anywhere in $[0, 1]$ and $A_5$ anywhere in $[5, 6]$ — but, by the earlier remark on joint slack, these two choices are not entirely independent of each other or of the surrounding critical structure. Every other task has its start time pinned to a unique value.

Job-shop problems

Process scheduling rested on a comfortable assumption: machine availability was never the bottleneck. There were always enough machines to start any task the instant its precedences cleared. We now drop that assumption. Machines become a scarce resource — each one runs at most one job at a time, and a job can be blocked not only by an unfinished prerequisite but also by another job sitting on the machine it needs.

The canonical formalization of this regime is the job-shop model, in which each job is itself an ordered pipeline of machine-specific steps.

A job-shop model describes a process of $n$ jobs $A_1, \ldots, A_n$ — the term “job” replaces “task” of the earlier scheduling problem setup, both naming the same kind of object — to be run on $m$ machines $M_1, \ldots, M_m$ , with the following structure:

Each job $A_i$ decomposes into a sequence of $n_i$ subjobs $A_{i,1}, A_{i,2}, \ldots, A_{i,n_i},$ which must run strictly in this order.
Subjob $A_{i,j}$ has execution time $t_{i,j} \ge 0$ and is bound to a specific machine $m_{i,j} \in \{1, \ldots, m\}$ .
Each machine processes at most one subjob at a time, so subjobs assigned to the same machine must be put into some order.
No recirculation: the machine indices $m_{i,1}, m_{i,2}, \ldots, m_{i,n_i}$ along one job’s sequence are pairwise distinct — every job uses every machine at most once.

The first two bullets pin down the internal anatomy of a job. The order of its subjobs and which machine each one runs on are given as input — the scheduler doesn’t reorder the steps within a job or move a subjob to a different machine. The model also imposes no precedence between different jobs: every job’s first subjob is allowed to start at $t = 0$ , regardless of any other job’s state.

The third bullet is where the new scheduling decision lives. Because each machine is serial, every group of subjobs targeting the same machine has to be put into a single sequence — and choosing those sequences, one per machine, is the combinatorial layer that did not appear in process scheduling. Two subjobs from different jobs, otherwise independent, can now still have to wait their turn for a shared machine.

The fourth bullet — no recirculation — limits how convoluted one job’s path can be: it never revisits a machine it has already used. Concretely, since one job’s machines are pairwise distinct, $n_i \le m$ for every job. The word recirculation names the forbidden behavior (a job circling back to an earlier machine), and the assumption is its absence. Some real-world variants drop this restriction; we keep it as a baseline simplification.

Two close relatives of the job-shop model relax or tighten the per-job step order, and are worth knowing by name even though we don’t pursue them further:

Open shop — same data as the job-shop model, but the subjobs of one job may be processed in any order. The scheduler picks both the per-job order and the per-machine sequences.
Flow shop — same data, but every job passes through the machines in the same fixed order. Every job follows one universal pipeline.

The three setups line up by how much per-job flexibility the model carries: open shop is the most permissive, flow shop the most rigid, job shop in between. The rest of this chapter develops the job-shop case.

Admissibility

Admissibility in process scheduling boiled down to a single rule: every precedence edge in the DAG must hold as a $c_i \le s_j$ inequality. In the job-shop setting the same principle applies but the constraints come in two flavors — per-job ordering and per-machine exclusion.

A schedule in a job-shop model, assigning a start time $s_{i,j} \ge 0$ to every subjob $A_{i,j}$ , is admissible when all of the following hold:

Within-job ordering — for every job $i$ , the first subjob begins at or after time zero and every later subjob waits for its predecessor: $s_{i,1} \ge 0, \qquad c_{i,j-1} \le s_{i,j} \quad \text{for } j = 2, \ldots, n_i.$
Machine exclusion — no two subjobs sharing a machine overlap in time: for every machine $k$ , every pair of distinct subjobs in its subjob set $M(k) := \{A_{i,j} : m_{i,j} = k\}$ must be processed one after the other. Formally, for any $A_{i,j}, A_{i',j'} \in M(k)$ with $(i,j) \ne (i',j')$ , $c_{i,j} \le s_{i',j'} \quad \text{or} \quad c_{i',j'} \le s_{i,j}.$

The within-job condition looks just like the process-scheduling precedence constraints — a list of $c_a \le s_b$ inequalities, each one a directed edge waiting to be drawn. The machine-exclusion condition has a different shape: it asserts a disjunction — one of two precedence inequalities must hold for each pair, but the model doesn’t say which. That disjunction is the new combinatorial choice introduced by job-shop scheduling.

Conjunctive and disjunctive edges

To keep the precedence graph machinery from process scheduling, we encode both kinds of constraints as edges — splitting them into two families that behave differently.

A conjunctive edge in the precedence graph of a job-shop model is a directed edge whose direction is fixed by the problem data. There are three families, together encoding the within-job admissibility condition:

Internal-sequence edges $A_{i,j-1} \to A_{i,j}$ for every $i = 1, \ldots, n$ and $j = 2, \ldots, n_i$ — the strict order of subjobs within a job;
Source edges $A_S \to A_{i,1}$ for every $i = 1, \ldots, n$ — every job’s first subjob hangs off the common starting vertex;
Sink edges $A_{i,n_i} \to A_E$ for every $i = 1, \ldots, n$ — every job’s last subjob feeds the common ending vertex.

The vertex set is $V = \{A_S, A_E\} \cup \{A_{i,j} : 1 \le i \le n,\, 1 \le j \le n_i\}$ , with each $A_{i,j}$ labeled by its execution time $t_{i,j}$ and $t_S = t_E = 0$ .

If the conjunctive edges were all the precedence graph had, we would be back in process scheduling: a fixed DAG with no orientation choices, ready for CPM. What’s new is that machine exclusion still has to be encoded, and it cannot be expressed as a single directed edge — the constraint is symmetric between the two subjobs sharing a machine, and only the scheduler decides which runs first.

A disjunctive edge in the precedence graph of a job-shop model is an unordered pair of subjobs $\{A_{i,j}, A_{i',j'}\} \subseteq M(k)$ that share the same machine $k$ , drawn as the pair of opposite-direction arrows

A_{i,j} \to A_{i',j'} \qquad \text{and} \qquad A_{i',j'} \to A_{i,j}.

The two arrows represent the two possible orderings of the pair on machine $k$ — one of them must hold in any admissible schedule, but which one is the scheduler’s decision. Picking a single direction for every disjunctive edge is called choosing an orientation.

For a machine $k$ holding $p := |M(k)|$ subjobs, the orientation problem on that machine can be sized two ways — by individual arrows on the diagram, or by whole disjunctive edges. The two are different granularities of the same picture: each disjunctive edge bundles two arrows, so counting arrows is finer-grained and counting disjunctive edges is coarser-grained, but both describe the same configuration.

Counting arrows. Every ordered pair of distinct subjobs contributes one directed arrow, so the diagram carries $p(p-1)$ arrows in total. They group into $\binom{p}{2}$ opposing couples — for every arrow $A_{i,j} \to A_{i',j'}$ there is a counterpart $A_{i',j'} \to A_{i,j}$ between the same two subjobs. The scheduler keeps exactly one arrow from each opposing couple and deletes the other.
Counting disjunctive edges. Each opposing couple of arrows is one disjunctive edge, per the definition above — the pair-of-arrows is the edge. So the machine carries $\binom{p}{2} = p(p-1)/2$ disjunctive edges, and the scheduler orients each one (chooses which of its two arrows to keep).

Either way of counting hands the scheduler the same workload: $\binom{p}{2}$ binary decisions per machine. Writing $p_k := |M(k)|$ for the subjob count of machine $k$ , the total disjunctive-edge count across all $m$ machines is

K \;:=\; \sum_{k=1}^{m} \binom{p_k}{2},

and the global orientation space is $\{0, 1\}^K$ — one binary choice per disjunctive edge, holding $2^K$ candidate orientations to search through.

Once an orientation has been picked, every edge in the graph carries a single direction, and the result is a directed graph that we can analyze with the existing CPM toolkit — provided it has no cycles.

The augmented precedence graph of a job-shop model under a given orientation is the directed graph

G' := (V,\, E_C \cup E_D),

where $V$ is the vertex set of the precedence graph, $E_C$ is the set of conjunctive edges, and $E_D$ contains one directed edge per disjunctive edge, chosen according to the orientation. An orientation is admissible when $G'$ is a DAG.

With an admissible orientation in hand, every job-shop admissibility condition reduces to a single statement: the schedule satisfies $c_{i,j} \le s_{i',j'}$ for every edge $(A_{i,j}, A_{i',j'}) \in E_C \cup E_D$ . That is exactly the process-scheduling notion of admissibility, applied to $G'$ . So the Critical Path Method runs on $G'$ unchanged — forward pass, reverse pass, critical paths, makespan — and produces the optimal schedule for that particular orientation.

The hard part has been pushed up one level: choosing the orientation in the first place. That choice is the topic of the next section.

Optimization

The optimization goal of the job-shop model is to find an admissible schedule with minimal makespan $c_E$ . With the augmented precedence graph in place, the search splits into two nested subproblems:

Inner question — given a fixed orientation of the disjunctive edges, find the optimal admissible schedule. The Critical Path Method solves this in time linear in the size of $G'$ .
Outer question — among all admissible orientations, find the one whose resulting CPM makespan $c_E$ is smallest.

The inner question is already solved. Everything in this section concerns the outer one.

Disjunctive edge assignments

What we have been calling an “orientation” has a formal name: a disjunctive edge assignment.

A disjunctive edge assignment for a job-shop model is a set $E_D$ of directed arrows that contains exactly one arrow from each disjunctive edge — the chosen orientation of that pair. Together with the conjunctive edges $E_C$ , the assignment forms the augmented precedence graph $G' = (V,\, E_C \cup E_D)$ .

The assignment is admissible when $G'$ is acyclic (a DAG). Admissibility is not automatic: many assignments produce cycles by routing disjunctive arrows back into the conjunctive flow.

At least one admissible assignment always exists. Take any topological ordering $\pi$ of the conjunctive-only subgraph $(V, E_C)$ — itself a DAG by construction — and orient every disjunctive edge to agree with $\pi$ . Concretely: imagine the subjobs lined up in $\pi$ ‘s order; for every disjunctive edge, let the subjob earlier in the lineup be the source and the later one be the target. Every edge of the resulting augmented graph then points forward in $\pi$ , and a graph whose arrows all run in one direction along a fixed order cannot loop back on itself — so $G'$ is acyclic. Existence is structural, not accidental: the outer search always has something to find.

Why the outer search is hard

We already noted that the orientation space has $2^K$ candidate assignments, with $K = \sum_{k=1}^m \binom{p_k}{2}$ . Naive enumeration means filtering each candidate for admissibility, then running CPM on every survivor — at most $2^K$ CPM runs in total. CPM itself is cheap (linear in the size of $G'$ ), but $2^K$ grows exponentially in $K$ , and $K$ itself scales quadratically with the per-machine subjob counts. The bottleneck is entirely in the outer search.

Two structural facts compound the cost beyond raw exponentiality:

The problem does not decompose. Re-orienting one disjunctive edge on machine $k$ can lengthen or shorten critical paths anywhere in the graph — different machines are coupled through the within-job conjunctive chains. The optimal assignment on one machine cannot be chosen independently of the others. This refusal to decompose into independent subproblems is the signature of discrete optimization and recurs across the field — the traveling salesman problem (find the shortest tour visiting every city once), bin-packing, the knapsack problem, Boolean satisfiability. Local optimality is no guarantee of global optimality.
Many candidates are inadmissible. A large fraction of the $2^K$ assignments produce cycles in $G'$ and have to be discarded before CPM ever runs. Filtering for admissibility helps a little, but the surviving set is still exponential.

Heuristics in practice

Exact enumeration is intractable for any realistic $K$ , so production scheduling uses heuristics — algorithms that produce a good admissible assignment in reasonable time, with no guarantee of global optimality. The next section walks through one representative heuristic family and exhibits a counter-intuitive failure mode that is characteristic of the entire field: shortening one task’s execution time can make the overall schedule longer. That paradox is the running illustration of why naive intuition fails in discrete optimization.

Precedence-list scheduling

The previous section motivated heuristics for hard scheduling problems. This section walks through one canonical greedy heuristic, then runs it on a small instance that exhibits the counter-intuitive failure mode greedy methods are exposed to across discrete optimization.

The heuristic

Each task carries a priority; the tasks ordered from highest to lowest priority form a priority list (also called a precedence list). Given such a list, the algorithm is essentially “always do the highest-priority thing you legally can”:

Priority-list scheduling (also called list scheduling) is a greedy heuristic for assigning tasks to machines under precedence constraints. Given a priority list:

Maintain the set of ready tasks — tasks whose precedence requirements have all completed.
Whenever a machine becomes idle and at least one ready task exists, assign that machine the highest-priority ready task. The task then runs to completion without interruption.
If no ready task exists when a machine is idle, the machine waits.

The algorithm produces a complete precedence-respecting schedule in a single forward sweep — linear time, no backtracking, no lookahead.

A worked example

The example is not in the job-shop model. It is the simpler identical-machines case — every machine interchangeable, every task with a single execution time — plus task-level precedence as in process scheduling. Six tasks $A_1, \ldots, A_6$ , two machines $M_1, M_2$ , and the precedence DAG given below:

The priority list is the natural one — lower index = higher priority:

1 > 2 > 3 > 4 > 5 > 6.

We compare two scenarios that differ in a single execution time:

Task	$A_1$	$A_2$	$A_3$	$A_4$	$A_5$	$A_6$
Scenario A: $t_i$	5	5	5	5	11	6
Scenario B: $t_i$	5	5	4	5	11	6

Common sense says scenario B (less total work) should finish at least as soon as scenario A. The heuristic disagrees: scenario A reaches $c_E = 21$ , scenario B drags out to $c_E = 26$ .

Scenario A. Both machines stay productive from $t = 5$ onward. $M_1$ runs $A_1, A_2, A_5$ ; $M_2$ runs $A_3, A_4, A_6$ . Every transition happens at exactly the moment the next task becomes ready, and both machines finish together at $t = 21$ .

Scenario B. Task $A_3$ finishes one unit early, at $t = 9$ . The only ready task then is $A_6$ — $A_4, A_5$ still need $A_2$ to finish at $t = 10$ . The heuristic, with no lookahead, commits $M_2$ to $A_6$ for six units. By the time $A_5$ finally gets picked up at $t = 15$ , its 11-unit run pushes the makespan all the way out to $t = 26$ .

Why shortening hurt

Making one input strictly better made the global result strictly worse. The mechanism is just timing: in scenario A, $M_2$ ‘s pause until $t = 10$ was useful — it let the high-priority $A_4, A_5$ become ready before $M_2$ had to commit. In scenario B the pause closed by one unit, and $M_2$ committed to the wrong task before the wave arrived.

This non-monotonicity — making an input “better” can make the result “worse” — is the signature failure mode of greedy heuristics under coupling. It recurs wherever local-rational moves chain into global outcomes: the job-shop outer search, the traveling salesman problem, knapsack, bin-packing. The example is small but the lesson generalizes.

Randomization

Every quantity in the scheduling models so far has been a fixed number known up front — execution times $t_i$ , costs, completion times, the resulting makespan. That works when the data is genuinely deterministic, but planning large projects rarely is. Unforeseeable influences — machine breakdowns, weather, supply hiccups, a key engineer falling ill — push real execution times around their planned values in ways no schedule can predict in advance.

The empirical consequence, observed across decades of project case studies, is underestimation: the realized duration and cost typically exceed the deterministic planning estimate. Famous public examples include the move of the German federal government from Bonn to Berlin and the police information-system project INPOL-neu, both of which ran well over their planned budgets and timelines. Political incentives and incomplete information contribute their own share to the gap, but those are human mechanisms — there is also a mathematical mechanism that bites even when everyone is honest and well-informed, and that mechanism is what this section is about.

The remedy is randomization — replace each deterministic quantity $x$ with a random variable $X$ whose distribution captures the spread of plausible realizations. Execution times $t_i$ become $T_i$ , costs $c_j$ become $C_j$ , and the apparatus of expected values and distributions becomes available. The scheduling pipeline of the previous sections still runs; it just operates on expectations instead of fixed numbers. And one single inequality already explains a large chunk of the underestimation phenomenon.

Let $A_1, \dots, A_n$ be subprojects with individual costs (or execution times) $C_j$ — random variables in the stochastic case, with deterministic constants $c_j$ as a special case. Then

\max_j \mathbb{E}(C_j) \;\le\; \mathbb{E}\!\left(\max_j C_j\right).

In words: the maximum of the expectations is no larger than the expectation of the maximum. Max and expectation cannot be exchanged in general.

Reading the two sides side by side:

The left, $\max_j \mathbb{E}(C_j)$ , is what a deterministic planner produces — replace each random $C_j$ by its expected value $\mathbb{E}(C_j)$ , then compute the bottleneck across subprojects.
The right, $\mathbb{E}(\max_j C_j)$ , is the expected value of the actual project bottleneck — the average, over many possible runs, of the worst subproject in each run.

The theorem says the second is always at least the first, so the deterministic estimate systematically underestimates the stochastic reality. With fixed $c_j$ , the two sides coincide trivially: $\max_j c_j = \max_j c_j$ . The moment the $C_j$ carry any spread, the right side starts pulling away — and across many parallel subprojects, that spread compounds rather than averaging out. A small worked example makes the gap concrete.

A worked example

Take the simplest possible randomization: $n$ independent subprojects $A_1, \dots, A_n$ — picture $n$ parallel pieces of work that all feed into one common deadline — each with processing time $X_j$ drawn independently from the uniform distribution on $[0, 1]$ .

Read the interval $[0, 1]$ as a normalized time scale: $t = 0$ means no time at all, $t = 1$ means the longest a subproject could possibly take (its hard upper bound), and any intermediate $t$ is a fraction of that worst case. The actual units (seconds, days, weeks) don’t matter for the punchline — the result is about how the worst-case fraction behaves as the number of parallel subprojects grows.

“Uniform on $[0, 1]$ ” then pins down the distribution: the density $f$ is flat at $1$ on the interval (zero outside), the CDF is the linear $p(X_j \le t) = t$ , and the expected value sits at the midpoint, $\mathbb{E}(X_j) = 0.5$ .

Why the density is flat at

1

, and what the CDF really is

The density $f(t)$ is the continuous analog of a histogram — picture binning a large sample of $X_j$ values and letting the bins shrink: the bar heights settle into the curve $f(t)$ , and the area of any slice under $f$ equals the probability of $X_j$ landing in that slice. For the uniform case on $[0, 1]$ , that shape is a flat horizontal segment at $y = 1$ stretching from $t = 0$ to $t = 1$ , and zero everywhere outside the interval.

The " $1$ " is not arbitrary: the total area under any density has to equal $1$ — every realization lands somewhere — and a flat segment of width $1$ must be at height $1$ to enclose unit area. Equal height across the interval is the picture-form of “every value is equally likely”.

The cumulative distribution function (CDF) accumulates that area from the left: $p(X_j \le t)$ is the area beneath $f$ between $0$ and $t$ . For the flat- $1$ density, the running area grows linearly, so $p(X_j \le t) = t$ , climbing from $0$ at $t = 0$ to $1$ at $t = 1$ .

The deterministic baseline. Treat each $X_j$ as a fixed number equal to its expected value $0.5$ . Then the deterministic planner reports

\max_j \mathbb{E}(X_j) \;=\; 0.5,

a single number that doesn’t even see $n$ as a relevant variable.

The stochastic reality. Define $Y := \max_j X_j$ , the actual worst finishing time across the $n$ subprojects. Because the $X_j$ are independent, the CDF of $Y$ factorizes:

p(Y \le t) \;=\; p\!\left(\max_j X_j \le t\right) \;=\; \prod_{j=1}^n p(X_j \le t) \;=\; t^n.

In short: $\max_j X_j$ is just the largest of the $X_j$ ‘s (the symbol $j$ is a dummy index, as in a sum) — and “the largest one is $\le t$ ” is the same event as “every $X_j \le t$ ”, a logical and. Independence turns the and into a product, each factor is just $t$ , and the product is $t^n$ .

The chain of equalities, walked link by link

Each link carries a different idea — worth pulling apart one step at a time.

$p(Y \le t) = p(\max_j X_j \le t)$ . Pure substitution. We defined $Y := \max_j X_j$ a sentence ago, so the two expressions name the same event and therefore have the same probability.
$p(\max_j X_j \le t) = p(\text{every } X_j \le t)$ . The key logical move. The maximum of a list of numbers is at most $t$ exactly when every one of them is at most $t$ — if even a single $X_j$ exceeded $t$ , the max (being at least that $X_j$ ) would too; conversely, if all of them stay below $t$ , then the max — itself one of the $X_j$ — stays below $t$ as well. “The max is $\le t$ ” and “all of them are $\le t$ ” describe the same event, and same event implies same probability.
$p(\text{every } X_j \le t) = \prod_{j=1}^n p(X_j \le t)$ . Independence. The defining property of independent random variables is exactly that the probability of a joint event factors into the product of the individual probabilities. Intuitively, knowing whether subproject $1$ finishes by $t$ tells you nothing about whether subproject $2$ does, so the events don’t interact and the probabilities just multiply — much like the chance of two fair coins both landing heads is $\tfrac{1}{2} \cdot \tfrac{1}{2} = \tfrac{1}{4}$ , not $\tfrac{1}{2}$ .
$\prod_{j=1}^n p(X_j \le t) = t^n$ . Substitute the uniform CDF from above — each $p(X_j \le t) = t$ — and the product of $n$ copies of $t$ is $t^n$ .

A quick reality check on the formula: with $n = 2$ and $t = 0.5$ , $p(Y \le 0.5) = 0.5^2 = 0.25$ — even though each subproject has a $50\%$ chance of finishing by time $0.5$ on its own, the joint event “both finished by then” has only a $25\%$ chance. With $n = 10$ , that drops to $0.5^{10} \approx 0.001$ . The only way for all $n$ subprojects to stay below a threshold $t < 1$ is an increasingly unlikely conjunction of “every single one of them got lucky” — and that’s the structural reason the worst subproject’s expected finish time creeps toward $1$ as $n$ grows.

p(Y \le t) = t^n

What does this formula actually say about $Y$ as $n$ grows? For any fixed threshold $t < 1$ , the probability $p(Y \le t) = t^n$ shrinks to zero — vanishingly small chance $Y$ lands at or below $t$ . Yet $p(Y \le 1) = 1^n = 1$ always, since $Y$ has to land somewhere in $[0, 1]$ . The only way to reconcile those two facts is for $Y$ to crowd up against $t = 1$ , the worst case: with many subprojects, the worst finishing time is almost certainly just below $1$ . On the CDF curve this reads as the flat-near-zero region growing wider with $n$ and the rise to $1$ getting crammed into an ever-thinner sliver next to the right edge. Pushing that observation through to the expectation:

\mathbb{E}(Y) \;=\; \int_0^1 t \cdot n \cdot t^{n-1} \, dt \;=\; \frac{n}{n+1} \;\xrightarrow{n \to \infty}\; 1.

The expected-value integral, step by step

For a non-negative continuous random variable $Y$ supported on $[0, 1]$ , the expected value is

\mathbb{E}(Y) \;=\; \int_0^1 t \, f_Y(t) \, dt,

where $f_Y$ is the density — the derivative of the CDF $F_Y(t) = p(Y \le t)$ . From $F_Y(t) = t^n$ ,

f_Y(t) \;=\; \frac{d}{dt}\, t^n \;=\; n \, t^{n-1} \qquad \text{for } t \in [0, 1].

Substituting and integrating:

\mathbb{E}(Y) \;=\; \int_0^1 t \cdot n \, t^{n-1} \, dt \;=\; n \int_0^1 t^n \, dt \;=\; n \cdot \left[\frac{t^{n+1}}{n+1}\right]_0^1 \;=\; \frac{n}{n+1}.

For the limit, write $\frac{n}{n+1} = 1 - \frac{1}{n+1}$ — the subtracted term shrinks to $0$ as $n \to \infty$ .

The two answers side by side.

\underbrace{\max_j \mathbb{E}(X_j) \;=\; 0.5}_{\text{deterministic estimate}} \quad \text{vs.} \quad \underbrace{\mathbb{E}(\max_j X_j) \;=\; \frac{n}{n+1} \;\to\; 1}_{\text{stochastic reality, } n \to \infty}.

The qualitative gap is striking. The deterministic estimate sits at $0.5$ regardless of $n$ ; the stochastic one climbs toward the worst case $t = 1$ as $n$ grows, because a project that could take up to time $1$ effectively does once enough parallel subprojects are stacked. The expected completion drifts toward the imaginable worst case, and there is no escape from this drift — exactly the structural underestimation Fulkerson’s theorem warned about in the abstract, now made unmistakable by a concrete distribution. The deterministic model is blind to that drift; randomization is the lens that lets the planner see it coming.

Hypergraphs

The chapter has, so far, leaned on directed graphs — vertices for tasks, edges for precedence or machine-sharing. Every edge connected exactly two vertices, and every machine handled at most one task at a time. As a closing outlook, relax both restrictions, and the natural setting becomes a hypergraph.

The new idea is small but powerful. In a regular graph an edge is always a pair of vertices. In a hypergraph the edge is generalized to a hyperedge — a subset of vertices, of any cardinality. A hyperedge can group two vertices (recovering an ordinary edge), three vertices, twenty, or any number. For instance, with $V = \{1, 2, 3, 4, 5\}$ , a hypergraph might carry hyperedges $\{1, 2\}$ , $\{2, 3, 5\}$ , $\{4\}$ , and $\{1, 3, 4, 5\}$ — four hyperedges of cardinalities $2, 3, 1, 4$ . Picture-wise: in a graph you draw lines between pairs of dots; in a hypergraph you draw closed blobs enclosing any number of dots at once.

Formally, a hypergraph is a pair

HG = (V, HE), \qquad HE \subseteq \mathcal{P}(V),

where $V$ is a vertex set and $\mathcal{P}(V)$ is the power set of $V$ (the set of all subsets of $V$ , also written $2^V$ ). So $HE \subseteq \mathcal{P}(V)$ just says “each hyperedge is some subset of vertices, and we pick a collection of them”. When every hyperedge has exactly two vertices, the hypergraph reduces to an ordinary graph — graphs are the cardinality- $2$ special case.

For scheduling, the re-interpretation is the punchline. Vertices = tasks, as before. Hyperedges = machines: each machine corresponds to the hyperedge containing all the tasks that need to run on it — a single set summarizing “these are the tasks competing for this machine”. A task that needs several machines appears as a vertex in several hyperedges. The new degree of freedom over the job-shop model is that a machine can run several tasks simultaneously, up to some capacity, instead of one at a time.

The natural scheduling problem on this object is vertex coloring, with one mapping at its core:

Color = time slot. Two vertices sharing a color means two tasks running in the same time slot — simultaneously.
Hyperedge = machine. Two vertices in the same hyperedge means two tasks competing for the same machine.

So same color and same hyperedge means two tasks trying to run on the same machine at the same time — and the cap on how many of those a hyperedge can hold is exactly the machine’s capacity. Vertices in different hyperedges (tasks on different machines) are free to share a color; they don’t compete. The optimization goal is fewest colors possible — fewer colors mean fewer time slots mean a shorter overall schedule — which puts pressure to pack as many same-colored vertices into each hyperedge as capacity permits. (Forcing every vertex to take a distinct color would just mean every task gets its own time slot in serial, which is the worst schedule, not the best.)

The hypergraph machinery is needed precisely because capacity is now allowed to exceed $1$ — the job-shop model’s “one task per machine at a time” rule is being relaxed. With capacity $1$ and every hyperedge of size $2$ , the per-hyperedge constraint “at most one same-colored vertex” forces every edge’s two endpoints to take different colors — exactly the classical graph-coloring problem (paint the vertices of a graph so no two adjacent ones share a color, using the fewest colors possible), itself NP-hard in general. Allowing larger hyperedges and capacities above $1$ piles more structure on top, and the closing message is the same one the chapter has been making throughout: a richer modeling target lands in a harder optimization regime, and heuristics carry the practical load.

That’s where the scheduling chapter stops. The arc traced — DAG-based process scheduling, job-shop with disjunctive choices, randomization for unforeseen variation, and hypergraphs for capacity-sharing machines — sketches one lesson at four scales: each modeling class trades expressiveness against tractability, and picking the right class is half the discrete-modeling problem.