Scheduling

Scheduling problems share one abstract shape: a fixed set of tasks that need to get done, a fixed set of machines that can do them, and a question of when to put each task on which machine so that the whole job finishes as well as possible. “As well as possible” depends on the situation — most often it means finishing in the shortest possible total time — but the underlying structure stays the same: an ordering decision under constraints.

This is a different flavor of discrete modeling from the decision models of the previous chapter. There the discreteness lived in the choice space — finitely many actions, finitely many states — and the decision came down to a single pick from a matrix. Here the choice space is itself an arrangement: tasks laid out over time and across machines. The questions are no longer which row of a matrix to pick; they are how to lay things out in time.

Problem setup

A scheduling problem consists of:

  • a process — an overall piece of work to be carried out — decomposed into nn tasks (also called jobs) A1,,AnA_1, \dots, A_n;
  • a pool of mm machines M1,,MmM_1, \dots, M_m that can execute those tasks;
  • an execution time ti(j)0t_i^{(j)} \ge 0 for each (task, machine) pair — the time machine MjM_j takes to complete task AiA_i;
  • three constraints on how tasks and machines may be paired up over time:
    • no task is carried out on more than one machine at a time,
    • no machine handles more than one task at a time,
    • but several machines may run in parallel.

The execution-time table ti(j)t_i^{(j)} is the most general form: the time a task takes can depend both on which task it is and on which machine runs it. Later sections specialize this — for instance when all machines are interchangeable, ti(j)t_i^{(j)} collapses to a single number tit_i per task — but for now leave both indices free.

The three constraints together carve out the legal moves of the problem. The first two are exclusion clauses: while a task runs it occupies one machine, and while a machine runs it occupies one task. The third is permission rather than prohibition — distinct machines may be busy at the same wall-clock moment, doing distinct tasks. This parallelism across machines is what makes the problem interesting; without it the model would collapse to choosing a total order on nn tasks for one resource.

Model parameters

A scheduling model is described by three kinds of parameters:

  • task parameters — what each AiA_i is, what it requires, how long it takes;
  • machine parameters — what each MjM_j is capable of and how it processes tasks (e.g. the execution times ti(j)t_i^{(j)});
  • a target function (also called objective function or optimality criterion) — a rule that turns a candidate schedule into a single number to be optimized.

The first two are descriptive: they pin down the system being modeled. The third is the normative piece — it says what optimal means for the problem at hand, and the answer changes with the situation.

Goal

The goal is to find an optimal schedule: an assignment of tasks to machines and time slots that best satisfies the target function while respecting the constraints. At its heart this is an ordering problem — every candidate schedule comes down to which task runs where, and in what order.

The most common target is to minimize the overall time — the moment when the last task finishes — i.e. the schedule that gets everything done as early as possible. Other choices are possible (minimize total cost, maximize machine utilization, balance load across machines), and much of what follows in this chapter is about how the model and the algorithms change depending on which target is in play.

Machine characterization

The structure of the execution-time table ti(j)t_i^{(j)} — how the times relate to the tasks and to the machines — distinguishes one scheduling problem from another. Four classes cover the spectrum from the most restrictive setup to the fully general one.

A single machine pool has m=1m = 1. Every task is processed strictly serially on one resource — no parallelism is possible.

The single-machine case is the simplest version of the problem. Without parallelism, the only choice is the order in which tasks are processed.

Machines are identical when they are both qualitatively identical — every machine offers the same range of functionality and can handle every task — and quantitatively identical — every machine runs at the same speed. The execution time then depends only on the task:

ti(j)=tit_i^{(j)} = t_i

The superscript (j)(j) drops because the machine identity is irrelevant.

Think of mm copies of the same industrial robot. Any task can be assigned to any robot, and each one finishes in the same time. The scheduling problem reduces to deciding how to spread the tasks across the machines.

Machines are qualitatively identical when they all support the same set of tasks but run at different speeds. Each machine MjM_j carries a per-machine proportionality factor qj>0q_j > 0, and the execution time scales accordingly:

ti(j)=qjtit_i^{(j)} = q_j \cdot t_i

A larger qjq_j marks a slower machine for every task it runs; setting all qj=1q_j = 1 recovers the identical-machines case.

This is the “same model, different generations” picture — a workshop with old and new versions of the same machine. The qualitative ability is uniform, but a job that takes 4 hours on the newest machine might take 6 hours on the previous generation.

Machines are arbitrary when both their functionality and their performance vary across the pool. The execution times ti(j)t_i^{(j)} are then arbitrary, with no algebraic relationship tying the rows or columns of the table together.

This is the fully general setup: each (task, machine) pair carries its own number, with no shared structure across the table. Most real-world scheduling problems live here — different machines with different abilities, running different jobs at different rates.

The four classes line up along a single trade-off: more general means more powerful, but also more difficult to analyze. The more general the machine model, the wider the range of real-world setups it can describe — and the less algebraic structure there is to exploit when searching for an optimum. The simpler classes often admit clean, direct algorithms; the more general ones typically force a combinatorial search over many candidate schedules.

A separate axis of generality is uncontrollable influences on the execution times — noise, machine failures, weather, the unforeseen. Modelling those requires replacing the deterministic ti(j)t_i^{(j)} with a random variable, and we return to that direction later in the chapter.

Process scheduling

We now narrow to the simplest setting: the execution time of each task is purely task-dependent, so every AiA_i carries a single duration tit_i that doesn’t depend on the machine. We also assume machine availability is not a binding constraint — there are always enough machines to start a task the instant its prerequisites finish, no matter how many other tasks are already running. The only thing that ever forces one task to wait for another is a precedence arrow, and the natural language for those arrows is a directed graph.

This “unlimited resources” assumption will be relaxed later in the chapter. With finitely many machines, two precedence-unrelated tasks can still have to take turns when they target the same machine — that case is the job-shop problem of the next few sections.

The start time si0s_i \ge 0 of task AiA_i is the moment at which AiA_i is launched. The completion time is

ci=si+ti,c_i = s_i + t_i,

the moment at which AiA_i finishes. We assume that once started, a task cannot be interrupted — it runs through to completion.

With every task carrying a start time and a completion time, “task AjA_j has to wait for task AiA_i” becomes a clean numerical condition.

A precedence condition (also called a precedence constraint) from AiA_i to AjA_j, written AiAjA_i \to A_j, requires that AjA_j start only after AiA_i has finished:

AiAj:cisj.A_i \to A_j \,:\Leftrightarrow\, c_i \le s_j.

Equivalently, AiA_i‘s completion must precede (or coincide with) AjA_j‘s start.

The precedence graph

A list of precedence conditions has a natural picture: draw one node per task, and one arrow per precedence condition. The picture is a directed graph, and it captures the entire ordering structure of the process at a glance.

The precedence graph of a process is a directed graph G:=(V,E)G := (V, E) where:

  • the vertex set is V:={A1,,An}V := \{A_1, \dots, A_n\} — one vertex per task — and each vertex AiA_i is labeled with its execution time tit_i;
  • the edge set is E:={(Ai,Aj):AiAj}E := \{ (A_i, A_j) : A_i \to A_j \} — one directed edge per precedence condition. The edges themselves carry no labels.

Two extra vertices close the graph at its endpoints:

  • an initial vertex ASA_S with index S:=0S := 0 and tS:=0t_S := 0, joined by an edge ASAiA_S \to A_i to every AiA_i that has no incoming edge in EE — every task that depends on no other task;
  • a final vertex AEA_E with index E:=n+1E := n + 1 and tE:=0t_E := 0, joined by an edge AiAEA_i \to A_E from every AiA_i that has no outgoing edge in EE — every task that nothing else depends on.

Putting the duration on the vertex and leaving the edge bare is a deliberate split. The vertex carries the quantitative data — how long the task takes — while the edge carries pure ordering information: ”AiA_i must come before AjA_j”, no number attached. In graph-theoretic terms this is a node-weighted directed graph; only the vertices carry numbers.

The two bookend vertices ASA_S and AEA_E have zero cost (tS=tE=0t_S = t_E = 0). ASA_S is a single common starting point that every otherwise-unconstrained task must follow; AEA_E is a single common endpoint that every otherwise-unconstrained task must reach. The payoff comes later — the moment at which the entire process is finished is just the completion time of AEA_E, one number instead of a maximum across many independent endpoints.

Admissible schedules and DAGs

With the precedence graph in hand, we can now write down what a schedule is — and what it means for one to actually respect the dependencies.

A schedule for a process is an assignment of a start time si0s_i \ge 0 to every task AiA_i. The completion times then follow from ci=si+tic_i = s_i + t_i.

A schedule on its own is just numbers — there is no built-in guarantee that it respects the precedence arrows. That is a separate property.

A schedule is admissible when every precedence condition AiAjA_i \to A_j in GG is satisfied — that is, cisjc_i \le s_j for every edge (Ai,Aj)E(A_i, A_j) \in E.

Not every precedence graph admits a non-trivial admissible schedule, though. A graph GG that contains a cycle — a path Ai1Ai2AikAi1A_{i_1} \to A_{i_2} \to \cdots \to A_{i_k} \to A_{i_1} that returns to its starting vertex — forces

si1ciksi1+j=1ktij,s_{i_1} \ge c_{i_k} \ge s_{i_1} + \sum_{j=1}^{k} t_{i_j},

so 0j=1ktij0 \ge \sum_{j=1}^k t_{i_j}, and since every tij0t_{i_j} \ge 0, every one of them must be zero.

Where each inequality comes from

The first inequality si1ciks_{i_1} \ge c_{i_k} is just the closing edge of the cycle: the precedence condition AikAi1A_{i_k} \to A_{i_1} reads ciksi1c_{i_k} \le s_{i_1}.

The second inequality ciksi1+j=1ktijc_{i_k} \ge s_{i_1} + \sum_{j=1}^k t_{i_j} comes from walking forward through the cycle starting at Ai1A_{i_1}:

  • ci1=si1+ti1c_{i_1} = s_{i_1} + t_{i_1};
  • si2ci1s_{i_2} \ge c_{i_1} (the edge Ai1Ai2A_{i_1} \to A_{i_2}), so ci2=si2+ti2si1+ti1+ti2c_{i_2} = s_{i_2} + t_{i_2} \ge s_{i_1} + t_{i_1} + t_{i_2};
  • continuing the same step inductively along Ai2Ai3AikA_{i_2} \to A_{i_3} \to \cdots \to A_{i_k} gives ciksi1+ti1+ti2++tikc_{i_k} \ge s_{i_1} + t_{i_1} + t_{i_2} + \cdots + t_{i_k}.

The only cycles a precedence graph can carry are therefore cycles of zero-duration tasks — mathematically degenerate. So we restrict attention to graphs with no cycles at all.

A directed acyclic graph (commonly abbreviated DAG) is a directed graph that contains no cycles. From this point on, every precedence graph GG is assumed to be a DAG.

Precedence is transitive. If AiAjA_i \to A_j and AjAkA_j \to A_k, then cisjcjskc_i \le s_j \le c_j \le s_k, so AiAkA_i \to A_k holds automatically. Long chains of dependencies imply pairwise precedence between any earlier and later task in the chain, regardless of whether those pairwise edges are drawn explicitly in EE.

Once GG is a DAG, the tasks admit a clean linear ordering that is compatible with every precedence arrow.

A topological sort (also called a topological ordering) of a DAG is a reindexing of the vertices A1,A2,,AnA_1, A_2, \dots, A_n such that every edge AiAjA_i \to A_j has i<ji < j — every task receives an index strictly larger than those of all its prerequisites. A depth-first search of the graph — a recursive traversal that follows each branch to its end before backtracking — produces one such ordering, and every DAG admits at least one.

Reindexing the vertices into a topological order makes constructing schedules straightforward: walking the tasks in order A1,A2,,AnA_1, A_2, \dots, A_n, every prerequisite of the current task has already been visited, so its start and completion times are already known. The next section turns this observation into an explicit “earliest-possible” schedule that always exists and turns out to be optimal as well.

Constructing schedules

With the precedence graph reduced to a DAG and the vertices renumbered in topological order, we can build a concrete admissible schedule task by task. Two natural constructions present themselves — the “as early as possible” schedule and the “as late as possible” schedule — and both turn out to achieve the same overall completion time.

Earliest schedule: lead times

The first construction follows the simplest possible policy: start every task the moment it is allowed to start — i.e. the instant all of its prerequisites have completed.

The lead time (also called the earliest start time) sis'_i of task AiA_i is the moment at which all of AiA_i‘s predecessors have completed — the earliest time AiA_i may begin. Equivalently, it is the maximum completion time across the predecessors:

si=maxj:AjAicj,sS=cS=0.s'_i = \max_{j \,:\, A_j \to A_i} c'_j, \qquad s'_S = c'_S = 0.

The corresponding earliest completion time is then

ci=si+ti.c'_i = s'_i + t_i.

Lead times are computed by a single forward pass through the DAG in topological order, so that every cjc'_j on the right-hand side of the recurrence is already known by the time we reach AiA_i:

  • Initialize at the source with sS=cS=0s'_S = c'_S = 0 (no predecessors, zero duration).
  • Walk forward through A1,A2,,AnA_1, A_2, \dots, A_n in topological order, applying the recurrence at each step to compute first sis'_i and then ci=si+tic'_i = s'_i + t_i.
  • Terminate at AEA_E. The final value cEc'_E is the earliest possible completion time of the entire process.

The max\max in the recurrence enforces precedence: AiA_i can only start once every predecessor has finished, and the bottleneck predecessor is whichever finishes latest. Acyclicity of GG guarantees the recursion is well-defined — when we reach AiA_i, every predecessor AjA_j has index j<ij < i in the topological order, so cjc'_j is already in hand.

Latest schedule: remaining times

A second construction asks the opposite question: how late can each task afford to be without slipping past the optimal overall completion time cEc'_E?

The remaining time (also called the latest completion time) cic''_i of task AiA_i is the latest moment at which AiA_i can finish while still allowing the whole process to complete at cEc'_E. Equivalently, it is the minimum latest-start across the successors:

ci=minj:AiAjsj,cE=sE=cE.c''_i = \min_{j \,:\, A_i \to A_j} s''_j, \qquad c''_E = s''_E = c'_E.

The corresponding latest start time is then

si=citi.s''_i = c''_i - t_i.

The name “remaining time” is a touch misleading. cic''_i is not a duration — it is a deadline, an absolute moment on the same time axis as sis_i and cic_i. The “remaining” refers to the downstream time budget: once AiA_i has finished at cic''_i, the quantity cEcic'_E - c''_i is what remains on the clock for every task that comes after AiA_i. Slip past cic''_i, and that budget isn’t enough.

Computing remaining times depends on cEc'_E, the output of the forward pass — so the reverse pass must run after the lead-time pass has finished. With cEc'_E in hand, we walk the DAG backward in reverse topological order:

  • Initialize at the sink with cE=sE=cEc''_E = s''_E = c'_E (no successors, zero duration).
  • Walk backward through An,An1,,A1A_n, A_{n-1}, \dots, A_1 in reverse topological order, applying the recurrence at each step to compute first cic''_i and then si=citis''_i = c''_i - t_i.
  • Terminate at ASA_S.

The min\min over successors mirrors the max\max over predecessors. AiA_i must finish in time for every successor to start on its own latest schedule, and the tightest of those deadlines binds.

The forward pass gives each task its earliest (si,ci)(s'_i, c'_i); the reverse pass gives each task its latest (si,ci)(s''_i, c''_i). Setting si=sis_i = s'_i for every ii defines one admissible schedule; setting si=sis_i = s''_i defines another. Both finish the entire process at exactly cEc'_E, so both achieve the same overall completion time. What the gap sisis''_i - s'_i tells us about each task — and why cEc'_E is in fact the best any schedule can do — is the subject of the next section.

Critical paths

With both the lead times (si,ci)(s'_i, c'_i) and the remaining times (si,ci)(s''_i, c''_i) in hand, every vertex of the DAG falls into exactly one of two cases. By construction sisis''_i \ge s'_i — the latest acceptable start cannot precede the earliest possible start, since the earliest possible start is the minimum — so the two cases are:

A vertex AiA_i is critical when its lead time and latest start time coincide:

si=si.s'_i = s''_i.

(Equivalently, ci=cic'_i = c''_i.) In every optimal schedule, a critical task must be assigned the start time si=si=sis_i = s'_i = s''_i — it has no leeway.

The slack of a non-critical vertex AiA_i is the gap

sisi>0.s''_i - s'_i > 0.

In any optimal schedule, AiA_i may begin anywhere in the interval sisisis'_i \le s_i \le s''_i without affecting the overall completion time.

Slack is individual, not joint. If AiAjA_i \to A_j are both non-critical, delaying AiA_i to its latest start si=sis_i = s''_i while letting AjA_j begin at its earliest sj=sjs_j = s'_j can violate the precedence condition cisjc_i \le s_jAiA_i may still be running when AjA_j wants to start, even though both choices sit inside their own slack intervals. The slack interval [si,si][s'_i, s''_i] describes the freedom of AiA_i on its own — not joint freedom across all non-critical tasks at once.

Critical vertices stitch together into paths from ASA_S to AEA_E.

A critical path is a path ASAi1Ai2AEA_S \to A_{i_1} \to A_{i_2} \to \cdots \to A_E in GG consisting entirely of critical vertices, joined by critical edges — edges AkAlA_k \to A_l along the path at which AkA_k‘s two completion values and AlA_l‘s two start values all coincide:

ck  =  ck  =  sl  =  sl.c'_k \;=\; c''_k \;=\; s'_l \;=\; s''_l.

So AlA_l must start the very instant AkA_k finishes, with no gap. (The outer equalities ck=ckc'_k = c''_k and sl=sls'_l = s''_l are automatic from the criticality of the endpoints; the operative new condition is the middle equality, written compactly in the literature as ck=slc'_k = s''_l.)

At least one critical path always exists. cEc'_E is by definition the earliest possible finish, so there must be at least one chain of tasks of total duration cEc'_E — every vertex on that chain has no room to delay without postponing the project, hence is critical. (If every task had slack, the whole schedule could be shifted earlier and finish before cEc'_E, contradicting cEc'_E being the earliest possible.) Formally, the forward pass’s max\max at each vertex picks out the predecessor that determines that vertex’s earliest start; tracing these bottleneck predecessors back from AEA_E recovers a critical path. If multiple chains achieve total duration cEc'_E, multiple critical paths exist — and all of them have the same total duration cEc'_E.

The makespan and CPM

The makespan (also called the entire completion time, or the overall time of the informal problem setup) of a schedule is the value cEc_E — the moment at which the final vertex AEA_E finishes, i.e. when the whole process is done. For the lead-time and remaining-time schedules, cEc_E attains its minimum value, with cE=cE=cEc_E = c'_E = c''_E.

Along any path ASAi1AEA_S \to A_{i_1} \to \cdots \to A_E in GG, the cumulative duration ktik\sum_k t_{i_k} — the sum of the execution times of all vertices on the path — is a lower bound on the makespan of every admissible schedule. (This cumulative duration is also called the path’s length in weighted-graph shorthand — the phrase “the length cEc_E of a critical path” refers to exactly this total duration, not to the number of edges or vertices on the path.) Every task on the path must finish before the next can start, so the path is traversed strictly sequentially. Among all such paths, the one with the largest cumulative duration gives the tightest lower bound — and that largest cumulative duration is exactly cEc'_E (equivalently cEc''_E, since the reverse pass set cE=cEc''_E = c'_E as its boundary).

Both schedules actually achieve this lower bound: the lead-time schedule gives cE=cEc_E = c'_E directly, and the remaining-time schedule gives cE=cEc_E = c''_E. So cE=cE=cEc_E = c'_E = c''_E in either schedule — cEc'_E is simultaneously a lower bound on the makespan and an upper bound realized by explicit construction, making both schedules makespan-optimal.

The Critical Path Method, abbreviated CPM, is the procedure for finding the critical paths of a precedence DAG: compute the lead times sis'_i by a forward pass, the remaining times cic''_i by a reverse pass, classify each vertex as critical or non-critical from the gap sisis''_i - s'_i, and walk backward from AEA_E to ASA_S through critical edges to extract one or more critical paths.

The forward pass alone already produces a makespan-optimal schedule — every task has a start time, and the project finishes at cE=cEc_E = c'_E. Identifying the critical paths is the additional output that CPM is named after, and it earns its name through two concrete real-world uses:

  • To shorten the makespan, you must shorten a task that lies on a critical path. Shortening any other task only widens its slack and leaves cEc_E unchanged — so when investing effort (people, money, parallelism) to speed up a project, the critical path tells you where that effort actually pays off.
  • To know which tasks can absorb a delay, look at the non-critical vertices: each one can start as late as sis''_i — i.e. slip by up to its slack — without pushing cEc_E later. Critical tasks have no such buffer; any slip on them pushes the entire project’s deadline.

A worked example

To make the whole CPM pipeline concrete, consider seven tasks with execution times

ii1234567
tit_i3223244

and the precedence set

{A1A3,  A2A3,  A3A4,  A3A5,  A6A7}.\{A_1 \to A_3,\; A_2 \to A_3,\; A_3 \to A_4,\; A_3 \to A_5,\; A_6 \to A_7\}.

So we get the following precedence graph:

Three tasks (A1,A2,A6A_1, A_2, A_6) have no prerequisites in this list, so each receives an edge from the initial vertex ASA_S; three tasks (A4,A5,A7A_4, A_5, A_7) appear as nobody’s prerequisite, so each gains an edge to the final vertex AEA_E. The natural indexing 1,2,,71, 2, \dots, 7 already forms a topological order, so we can walk the algorithm vertex by vertex in index order.

Forward pass: lead times

Taking the max\max over predecessors at each step:

  • ASA_S: sS=0s'_S = 0, cS=0+0=0c'_S = 0 + 0 = 0.
  • A1A_1: s1=0s'_1 = 0, c1=0+3=3c'_1 = 0 + 3 = 3.
  • A2A_2: s2=0s'_2 = 0, c2=0+2=2c'_2 = 0 + 2 = 2.
  • A3A_3: s3=max(c1,c2)=max(3,2)=3s'_3 = \max(c'_1, c'_2) = \max(3, 2) = 3, c3=3+2=5c'_3 = 3 + 2 = 5.
  • A4A_4: s4=5s'_4 = 5, c4=5+3=8c'_4 = 5 + 3 = 8.
  • A5A_5: s5=5s'_5 = 5, c5=5+2=7c'_5 = 5 + 2 = 7.
  • A6A_6: s6=0s'_6 = 0, c6=0+4=4c'_6 = 0 + 4 = 4.
  • A7A_7: s7=4s'_7 = 4, c7=4+4=8c'_7 = 4 + 4 = 8.
  • AEA_E: sE=max(8,7,8)=8s'_E = \max(8, 7, 8) = 8, cE=8+0=8c'_E = 8 + 0 = 8.

The earliest possible overall completion time is cE=8c'_E = 8.

Reverse pass: remaining times

Starting from cE=sE=8c''_E = s''_E = 8 and walking the DAG in reverse index order, taking the min\min over successors:

  • AEA_E: cE=8c''_E = 8, sE=8s''_E = 8.
  • A7A_7: c7=sE=8c''_7 = s''_E = 8, s7=4s''_7 = 4.
  • A6A_6: c6=s7=4c''_6 = s''_7 = 4, s6=0s''_6 = 0.
  • A5A_5: c5=sE=8c''_5 = s''_E = 8, s5=6s''_5 = 6.
  • A4A_4: c4=sE=8c''_4 = s''_E = 8, s4=5s''_4 = 5.
  • A3A_3: c3=min(s4,s5)=min(5,6)=5c''_3 = \min(s''_4, s''_5) = \min(5, 6) = 5, s3=3s''_3 = 3.
  • A2A_2: c2=s3=3c''_2 = s''_3 = 3, s2=1s''_2 = 1.
  • A1A_1: c1=s3=3c''_1 = s''_3 = 3, s1=0s''_1 = 0.
  • ASA_S: cS=min(s1,s2,s6)=min(0,1,0)=0c''_S = \min(s''_1, s''_2, s''_6) = \min(0, 1, 0) = 0, sS=0s''_S = 0.

Critical vertices and critical paths

Combining the two passes, shading the critical vertices and the critical edges, and coloring each set of four matching values ck=ck=sl=slc'_k = c''_k = s'_l = s''_l at the endpoints of a critical edge:

Seven of the nine vertices are critical, and they line up into two distinct critical paths from ASA_S to AEA_E:

ASA1A3A4AE(0+3+2+3+0=8)A_S \to A_1 \to A_3 \to A_4 \to A_E \qquad (0 + 3 + 2 + 3 + 0 = 8) ASA6A7AE(0+4+4+0=8)A_S \to A_6 \to A_7 \to A_E \qquad (0 + 4 + 4 + 0 = 8)

Both paths have total duration exactly cE=8c'_E = 8 — exactly as the existence argument promised: when several critical paths exist, they share the same total duration. The only tasks with leeway are A2A_2 and A5A_5, each carrying one unit of slack (sisi=1s''_i - s'_i = 1). In any optimal schedule, A2A_2 may start anywhere in [0,1][0, 1] and A5A_5 anywhere in [5,6][5, 6] — but, by the earlier remark on joint slack, these two choices are not entirely independent of each other or of the surrounding critical structure. Every other task has its start time pinned to a unique value.

Job-shop problems

Process scheduling rested on a comfortable assumption: machine availability was never the bottleneck. There were always enough machines to start any task the instant its precedences cleared. We now drop that assumption. Machines become a scarce resource — each one runs at most one job at a time, and a job can be blocked not only by an unfinished prerequisite but also by another job sitting on the machine it needs.

The canonical formalization of this regime is the job-shop model, in which each job is itself an ordered pipeline of machine-specific steps.

A job-shop model describes a process of nn jobs A1,,AnA_1, \ldots, A_n — the term “job” replaces “task” of the earlier scheduling problem setup, both naming the same kind of object — to be run on mm machines M1,,MmM_1, \ldots, M_m, with the following structure:

  • Each job AiA_i decomposes into a sequence of nin_i subjobs Ai,1,Ai,2,,Ai,ni,A_{i,1}, A_{i,2}, \ldots, A_{i,n_i}, which must run strictly in this order.
  • Subjob Ai,jA_{i,j} has execution time ti,j0t_{i,j} \ge 0 and is bound to a specific machine mi,j{1,,m}m_{i,j} \in \{1, \ldots, m\}.
  • Each machine processes at most one subjob at a time, so subjobs assigned to the same machine must be put into some order.
  • No recirculation: the machine indices mi,1,mi,2,,mi,nim_{i,1}, m_{i,2}, \ldots, m_{i,n_i} along one job’s sequence are pairwise distinct — every job uses every machine at most once.

The first two bullets pin down the internal anatomy of a job. The order of its subjobs and which machine each one runs on are given as input — the scheduler doesn’t reorder the steps within a job or move a subjob to a different machine. The model also imposes no precedence between different jobs: every job’s first subjob is allowed to start at t=0t = 0, regardless of any other job’s state.

The third bullet is where the new scheduling decision lives. Because each machine is serial, every group of subjobs targeting the same machine has to be put into a single sequence — and choosing those sequences, one per machine, is the combinatorial layer that did not appear in process scheduling. Two subjobs from different jobs, otherwise independent, can now still have to wait their turn for a shared machine.

The fourth bullet — no recirculation — limits how convoluted one job’s path can be: it never revisits a machine it has already used. Concretely, since one job’s machines are pairwise distinct, nimn_i \le m for every job. The word recirculation names the forbidden behavior (a job circling back to an earlier machine), and the assumption is its absence. Some real-world variants drop this restriction; we keep it as a baseline simplification.

Two close relatives of the job-shop model relax or tighten the per-job step order, and are worth knowing by name even though we don’t pursue them further:

  • Open shop — same data as the job-shop model, but the subjobs of one job may be processed in any order. The scheduler picks both the per-job order and the per-machine sequences.
  • Flow shop — same data, but every job passes through the machines in the same fixed order. Every job follows one universal pipeline.

The three setups line up by how much per-job flexibility the model carries: open shop is the most permissive, flow shop the most rigid, job shop in between. The rest of this chapter develops the job-shop case.

Admissibility

Admissibility in process scheduling boiled down to a single rule: every precedence edge in the DAG must hold as a cisjc_i \le s_j inequality. In the job-shop setting the same principle applies but the constraints come in two flavors — per-job ordering and per-machine exclusion.

A schedule in a job-shop model, assigning a start time si,j0s_{i,j} \ge 0 to every subjob Ai,jA_{i,j}, is admissible when all of the following hold:

  • Within-job ordering — for every job ii, the first subjob begins at or after time zero and every later subjob waits for its predecessor: si,10,ci,j1si,jfor j=2,,ni.s_{i,1} \ge 0, \qquad c_{i,j-1} \le s_{i,j} \quad \text{for } j = 2, \ldots, n_i.
  • Machine exclusion — no two subjobs sharing a machine overlap in time: for every machine kk, every pair of distinct subjobs in its subjob set M(k):={Ai,j:mi,j=k}M(k) := \{A_{i,j} : m_{i,j} = k\} must be processed one after the other. Formally, for any Ai,j,Ai,jM(k)A_{i,j}, A_{i',j'} \in M(k) with (i,j)(i,j)(i,j) \ne (i',j'), ci,jsi,jorci,jsi,j.c_{i,j} \le s_{i',j'} \quad \text{or} \quad c_{i',j'} \le s_{i,j}.

The within-job condition looks just like the process-scheduling precedence constraints — a list of casbc_a \le s_b inequalities, each one a directed edge waiting to be drawn. The machine-exclusion condition has a different shape: it asserts a disjunction — one of two precedence inequalities must hold for each pair, but the model doesn’t say which. That disjunction is the new combinatorial choice introduced by job-shop scheduling.

Conjunctive and disjunctive edges

To keep the precedence graph machinery from process scheduling, we encode both kinds of constraints as edges — splitting them into two families that behave differently.

A conjunctive edge in the precedence graph of a job-shop model is a directed edge whose direction is fixed by the problem data. There are three families, together encoding the within-job admissibility condition:

  • Internal-sequence edges Ai,j1Ai,jA_{i,j-1} \to A_{i,j} for every i=1,,ni = 1, \ldots, n and j=2,,nij = 2, \ldots, n_i — the strict order of subjobs within a job;
  • Source edges ASAi,1A_S \to A_{i,1} for every i=1,,ni = 1, \ldots, n — every job’s first subjob hangs off the common starting vertex;
  • Sink edges Ai,niAEA_{i,n_i} \to A_E for every i=1,,ni = 1, \ldots, n — every job’s last subjob feeds the common ending vertex.

The vertex set is V={AS,AE}{Ai,j:1in,1jni}V = \{A_S, A_E\} \cup \{A_{i,j} : 1 \le i \le n,\, 1 \le j \le n_i\}, with each Ai,jA_{i,j} labeled by its execution time ti,jt_{i,j} and tS=tE=0t_S = t_E = 0.

If the conjunctive edges were all the precedence graph had, we would be back in process scheduling: a fixed DAG with no orientation choices, ready for CPM. What’s new is that machine exclusion still has to be encoded, and it cannot be expressed as a single directed edge — the constraint is symmetric between the two subjobs sharing a machine, and only the scheduler decides which runs first.

A disjunctive edge in the precedence graph of a job-shop model is an unordered pair of subjobs {Ai,j,Ai,j}M(k)\{A_{i,j}, A_{i',j'}\} \subseteq M(k) that share the same machine kk, drawn as the pair of opposite-direction arrows

Ai,jAi,jandAi,jAi,j.A_{i,j} \to A_{i',j'} \qquad \text{and} \qquad A_{i',j'} \to A_{i,j}.

The two arrows represent the two possible orderings of the pair on machine kkone of them must hold in any admissible schedule, but which one is the scheduler’s decision. Picking a single direction for every disjunctive edge is called choosing an orientation.

For a machine kk holding p:=M(k)p := |M(k)| subjobs, the orientation problem on that machine can be sized two ways — by individual arrows on the diagram, or by whole disjunctive edges. The two are different granularities of the same picture: each disjunctive edge bundles two arrows, so counting arrows is finer-grained and counting disjunctive edges is coarser-grained, but both describe the same configuration.

  • Counting arrows. Every ordered pair of distinct subjobs contributes one directed arrow, so the diagram carries p(p1)p(p-1) arrows in total. They group into (p2)\binom{p}{2} opposing couples — for every arrow Ai,jAi,jA_{i,j} \to A_{i',j'} there is a counterpart Ai,jAi,jA_{i',j'} \to A_{i,j} between the same two subjobs. The scheduler keeps exactly one arrow from each opposing couple and deletes the other.
  • Counting disjunctive edges. Each opposing couple of arrows is one disjunctive edge, per the definition above — the pair-of-arrows is the edge. So the machine carries (p2)=p(p1)/2\binom{p}{2} = p(p-1)/2 disjunctive edges, and the scheduler orients each one (chooses which of its two arrows to keep).

Either way of counting hands the scheduler the same workload: (p2)\binom{p}{2} binary decisions per machine. Writing pk:=M(k)p_k := |M(k)| for the subjob count of machine kk, the total disjunctive-edge count across all mm machines is

K  :=  k=1m(pk2),K \;:=\; \sum_{k=1}^{m} \binom{p_k}{2},

and the global orientation space is {0,1}K\{0, 1\}^K — one binary choice per disjunctive edge, holding 2K2^K candidate orientations to search through.

Once an orientation has been picked, every edge in the graph carries a single direction, and the result is a directed graph that we can analyze with the existing CPM toolkit — provided it has no cycles.

The augmented precedence graph of a job-shop model under a given orientation is the directed graph

G:=(V,ECED),G' := (V,\, E_C \cup E_D),

where VV is the vertex set of the precedence graph, ECE_C is the set of conjunctive edges, and EDE_D contains one directed edge per disjunctive edge, chosen according to the orientation. An orientation is admissible when GG' is a DAG.

With an admissible orientation in hand, every job-shop admissibility condition reduces to a single statement: the schedule satisfies ci,jsi,jc_{i,j} \le s_{i',j'} for every edge (Ai,j,Ai,j)ECED(A_{i,j}, A_{i',j'}) \in E_C \cup E_D. That is exactly the process-scheduling notion of admissibility, applied to GG'. So the Critical Path Method runs on GG' unchanged — forward pass, reverse pass, critical paths, makespan — and produces the optimal schedule for that particular orientation.

The hard part has been pushed up one level: choosing the orientation in the first place. That choice is the topic of the next section.

Optimization

The optimization goal of the job-shop model is to find an admissible schedule with minimal makespan cEc_E. With the augmented precedence graph in place, the search splits into two nested subproblems:

  1. Inner question — given a fixed orientation of the disjunctive edges, find the optimal admissible schedule. The Critical Path Method solves this in time linear in the size of GG'.
  2. Outer question — among all admissible orientations, find the one whose resulting CPM makespan cEc_E is smallest.

The inner question is already solved. Everything in this section concerns the outer one.

Disjunctive edge assignments

What we have been calling an “orientation” has a formal name: a disjunctive edge assignment.

A disjunctive edge assignment for a job-shop model is a set EDE_D of directed arrows that contains exactly one arrow from each disjunctive edge — the chosen orientation of that pair. Together with the conjunctive edges ECE_C, the assignment forms the augmented precedence graph G=(V,ECED)G' = (V,\, E_C \cup E_D).

The assignment is admissible when GG' is acyclic (a DAG). Admissibility is not automatic: many assignments produce cycles by routing disjunctive arrows back into the conjunctive flow.

At least one admissible assignment always exists. Take any topological ordering π\pi of the conjunctive-only subgraph (V,EC)(V, E_C) — itself a DAG by construction — and orient every disjunctive edge to agree with π\pi. Concretely: imagine the subjobs lined up in π\pi‘s order; for every disjunctive edge, let the subjob earlier in the lineup be the source and the later one be the target. Every edge of the resulting augmented graph then points forward in π\pi, and a graph whose arrows all run in one direction along a fixed order cannot loop back on itself — so GG' is acyclic. Existence is structural, not accidental: the outer search always has something to find.

Why the outer search is hard

We already noted that the orientation space has 2K2^K candidate assignments, with K=k=1m(pk2)K = \sum_{k=1}^m \binom{p_k}{2}. Naive enumeration means filtering each candidate for admissibility, then running CPM on every survivor — at most 2K2^K CPM runs in total. CPM itself is cheap (linear in the size of GG'), but 2K2^K grows exponentially in KK, and KK itself scales quadratically with the per-machine subjob counts. The bottleneck is entirely in the outer search.

Two structural facts compound the cost beyond raw exponentiality:

  • The problem does not decompose. Re-orienting one disjunctive edge on machine kk can lengthen or shorten critical paths anywhere in the graph — different machines are coupled through the within-job conjunctive chains. The optimal assignment on one machine cannot be chosen independently of the others. This refusal to decompose into independent subproblems is the signature of discrete optimization and recurs across the field — the traveling salesman problem (find the shortest tour visiting every city once), bin-packing, the knapsack problem, Boolean satisfiability. Local optimality is no guarantee of global optimality.
  • Many candidates are inadmissible. A large fraction of the 2K2^K assignments produce cycles in GG' and have to be discarded before CPM ever runs. Filtering for admissibility helps a little, but the surviving set is still exponential.

Heuristics in practice

Exact enumeration is intractable for any realistic KK, so production scheduling uses heuristics — algorithms that produce a good admissible assignment in reasonable time, with no guarantee of global optimality. The next section walks through one representative heuristic family and exhibits a counter-intuitive failure mode that is characteristic of the entire field: shortening one task’s execution time can make the overall schedule longer. That paradox is the running illustration of why naive intuition fails in discrete optimization.

Precedence-list scheduling

The previous section motivated heuristics for hard scheduling problems. This section walks through one canonical greedy heuristic, then runs it on a small instance that exhibits the counter-intuitive failure mode greedy methods are exposed to across discrete optimization.

The heuristic

Each task carries a priority; the tasks ordered from highest to lowest priority form a priority list (also called a precedence list). Given such a list, the algorithm is essentially “always do the highest-priority thing you legally can”:

Priority-list scheduling (also called list scheduling) is a greedy heuristic for assigning tasks to machines under precedence constraints. Given a priority list:

  • Maintain the set of ready tasks — tasks whose precedence requirements have all completed.
  • Whenever a machine becomes idle and at least one ready task exists, assign that machine the highest-priority ready task. The task then runs to completion without interruption.
  • If no ready task exists when a machine is idle, the machine waits.

The algorithm produces a complete precedence-respecting schedule in a single forward sweep — linear time, no backtracking, no lookahead.

A worked example

The example is not in the job-shop model. It is the simpler identical-machines case — every machine interchangeable, every task with a single execution time — plus task-level precedence as in process scheduling. Six tasks A1,,A6A_1, \ldots, A_6, two machines M1,M2M_1, M_2, and the precedence DAG given below:

The priority list is the natural one — lower index = higher priority:

1>2>3>4>5>6.1 > 2 > 3 > 4 > 5 > 6.

We compare two scenarios that differ in a single execution time:

TaskA1A_1A2A_2A3A_3A4A_4A5A_5A6A_6
Scenario A: tit_i5555116
Scenario B: tit_i5545116

Common sense says scenario B (less total work) should finish at least as soon as scenario A. The heuristic disagrees: scenario A reaches cE=21c_E = 21, scenario B drags out to cE=26c_E = 26.

Scenario A. Both machines stay productive from t=5t = 5 onward. M1M_1 runs A1,A2,A5A_1, A_2, A_5; M2M_2 runs A3,A4,A6A_3, A_4, A_6. Every transition happens at exactly the moment the next task becomes ready, and both machines finish together at t=21t = 21.

Scenario B. Task A3A_3 finishes one unit early, at t=9t = 9. The only ready task then is A6A_6A4,A5A_4, A_5 still need A2A_2 to finish at t=10t = 10. The heuristic, with no lookahead, commits M2M_2 to A6A_6 for six units. By the time A5A_5 finally gets picked up at t=15t = 15, its 11-unit run pushes the makespan all the way out to t=26t = 26.

Why shortening hurt

Making one input strictly better made the global result strictly worse. The mechanism is just timing: in scenario A, M2M_2‘s pause until t=10t = 10 was useful — it let the high-priority A4,A5A_4, A_5 become ready before M2M_2 had to commit. In scenario B the pause closed by one unit, and M2M_2 committed to the wrong task before the wave arrived.

This non-monotonicity — making an input “better” can make the result “worse” — is the signature failure mode of greedy heuristics under coupling. It recurs wherever local-rational moves chain into global outcomes: the job-shop outer search, the traveling salesman problem, knapsack, bin-packing. The example is small but the lesson generalizes.

Randomization

Every quantity in the scheduling models so far has been a fixed number known up front — execution times tit_i, costs, completion times, the resulting makespan. That works when the data is genuinely deterministic, but planning large projects rarely is. Unforeseeable influences — machine breakdowns, weather, supply hiccups, a key engineer falling ill — push real execution times around their planned values in ways no schedule can predict in advance.

The empirical consequence, observed across decades of project case studies, is underestimation: the realized duration and cost typically exceed the deterministic planning estimate. Famous public examples include the move of the German federal government from Bonn to Berlin and the police information-system project INPOL-neu, both of which ran well over their planned budgets and timelines. Political incentives and incomplete information contribute their own share to the gap, but those are human mechanisms — there is also a mathematical mechanism that bites even when everyone is honest and well-informed, and that mechanism is what this section is about.

The remedy is randomization — replace each deterministic quantity xx with a random variable XX whose distribution captures the spread of plausible realizations. Execution times tit_i become TiT_i, costs cjc_j become CjC_j, and the apparatus of expected values and distributions becomes available. The scheduling pipeline of the previous sections still runs; it just operates on expectations instead of fixed numbers. And one single inequality already explains a large chunk of the underestimation phenomenon.

Let A1,,AnA_1, \dots, A_n be subprojects with individual costs (or execution times) CjC_j — random variables in the stochastic case, with deterministic constants cjc_j as a special case. Then

maxjE(Cj)    E ⁣(maxjCj).\max_j \mathbb{E}(C_j) \;\le\; \mathbb{E}\!\left(\max_j C_j\right).

In words: the maximum of the expectations is no larger than the expectation of the maximum. Max and expectation cannot be exchanged in general.

Reading the two sides side by side:

  • The left, maxjE(Cj)\max_j \mathbb{E}(C_j), is what a deterministic planner produces — replace each random CjC_j by its expected value E(Cj)\mathbb{E}(C_j), then compute the bottleneck across subprojects.
  • The right, E(maxjCj)\mathbb{E}(\max_j C_j), is the expected value of the actual project bottleneck — the average, over many possible runs, of the worst subproject in each run.

The theorem says the second is always at least the first, so the deterministic estimate systematically underestimates the stochastic reality. With fixed cjc_j, the two sides coincide trivially: maxjcj=maxjcj\max_j c_j = \max_j c_j. The moment the CjC_j carry any spread, the right side starts pulling away — and across many parallel subprojects, that spread compounds rather than averaging out. A small worked example makes the gap concrete.

A worked example

Take the simplest possible randomization: nn independent subprojects A1,,AnA_1, \dots, A_n — picture nn parallel pieces of work that all feed into one common deadline — each with processing time XjX_j drawn independently from the uniform distribution on [0,1][0, 1].

Read the interval [0,1][0, 1] as a normalized time scale: t=0t = 0 means no time at all, t=1t = 1 means the longest a subproject could possibly take (its hard upper bound), and any intermediate tt is a fraction of that worst case. The actual units (seconds, days, weeks) don’t matter for the punchline — the result is about how the worst-case fraction behaves as the number of parallel subprojects grows.

“Uniform on [0,1][0, 1]” then pins down the distribution: the density ff is flat at 11 on the interval (zero outside), the CDF is the linear p(Xjt)=tp(X_j \le t) = t, and the expected value sits at the midpoint, E(Xj)=0.5\mathbb{E}(X_j) = 0.5.

Why the density is flat at 11, and what the CDF really is

The density f(t)f(t) is the continuous analog of a histogram — picture binning a large sample of XjX_j values and letting the bins shrink: the bar heights settle into the curve f(t)f(t), and the area of any slice under ff equals the probability of XjX_j landing in that slice. For the uniform case on [0,1][0, 1], that shape is a flat horizontal segment at y=1y = 1 stretching from t=0t = 0 to t=1t = 1, and zero everywhere outside the interval.

The "11" is not arbitrary: the total area under any density has to equal 11 — every realization lands somewhere — and a flat segment of width 11 must be at height 11 to enclose unit area. Equal height across the interval is the picture-form of “every value is equally likely”.

The cumulative distribution function (CDF) accumulates that area from the left: p(Xjt)p(X_j \le t) is the area beneath ff between 00 and tt. For the flat-11 density, the running area grows linearly, so p(Xjt)=tp(X_j \le t) = t, climbing from 00 at t=0t = 0 to 11 at t=1t = 1.

The deterministic baseline. Treat each XjX_j as a fixed number equal to its expected value 0.50.5. Then the deterministic planner reports

maxjE(Xj)  =  0.5,\max_j \mathbb{E}(X_j) \;=\; 0.5,

a single number that doesn’t even see nn as a relevant variable.

The stochastic reality. Define Y:=maxjXjY := \max_j X_j, the actual worst finishing time across the nn subprojects. Because the XjX_j are independent, the CDF of YY factorizes:

p(Yt)  =  p ⁣(maxjXjt)  =  j=1np(Xjt)  =  tn.p(Y \le t) \;=\; p\!\left(\max_j X_j \le t\right) \;=\; \prod_{j=1}^n p(X_j \le t) \;=\; t^n.

In short: maxjXj\max_j X_j is just the largest of the XjX_j‘s (the symbol jj is a dummy index, as in a sum) — and “the largest one is t\le t” is the same event as “every XjtX_j \le t”, a logical and. Independence turns the and into a product, each factor is just tt, and the product is tnt^n.

The chain of equalities, walked link by link

Each link carries a different idea — worth pulling apart one step at a time.

  1. p(Yt)=p(maxjXjt)p(Y \le t) = p(\max_j X_j \le t). Pure substitution. We defined Y:=maxjXjY := \max_j X_j a sentence ago, so the two expressions name the same event and therefore have the same probability.

  2. p(maxjXjt)=p(every Xjt)p(\max_j X_j \le t) = p(\text{every } X_j \le t). The key logical move. The maximum of a list of numbers is at most tt exactly when every one of them is at most tt — if even a single XjX_j exceeded tt, the max (being at least that XjX_j) would too; conversely, if all of them stay below tt, then the max — itself one of the XjX_j — stays below tt as well. “The max is t\le t” and “all of them are t\le t” describe the same event, and same event implies same probability.

  3. p(every Xjt)=j=1np(Xjt)p(\text{every } X_j \le t) = \prod_{j=1}^n p(X_j \le t). Independence. The defining property of independent random variables is exactly that the probability of a joint event factors into the product of the individual probabilities. Intuitively, knowing whether subproject 11 finishes by tt tells you nothing about whether subproject 22 does, so the events don’t interact and the probabilities just multiply — much like the chance of two fair coins both landing heads is 1212=14\tfrac{1}{2} \cdot \tfrac{1}{2} = \tfrac{1}{4}, not 12\tfrac{1}{2}.

  4. j=1np(Xjt)=tn\prod_{j=1}^n p(X_j \le t) = t^n. Substitute the uniform CDF from above — each p(Xjt)=tp(X_j \le t) = t — and the product of nn copies of tt is tnt^n.

A quick reality check on the formula: with n=2n = 2 and t=0.5t = 0.5, p(Y0.5)=0.52=0.25p(Y \le 0.5) = 0.5^2 = 0.25 — even though each subproject has a 50%50\% chance of finishing by time 0.50.5 on its own, the joint event “both finished by then” has only a 25%25\% chance. With n=10n = 10, that drops to 0.5100.0010.5^{10} \approx 0.001. The only way for all nn subprojects to stay below a threshold t<1t < 1 is an increasingly unlikely conjunction of “every single one of them got lucky” — and that’s the structural reason the worst subproject’s expected finish time creeps toward 11 as nn grows.

p(Yt)=tnp(Y \le t) = t^n

What does this formula actually say about YY as nn grows? For any fixed threshold t<1t < 1, the probability p(Yt)=tnp(Y \le t) = t^n shrinks to zero — vanishingly small chance YY lands at or below tt. Yet p(Y1)=1n=1p(Y \le 1) = 1^n = 1 always, since YY has to land somewhere in [0,1][0, 1]. The only way to reconcile those two facts is for YY to crowd up against t=1t = 1, the worst case: with many subprojects, the worst finishing time is almost certainly just below 11. On the CDF curve this reads as the flat-near-zero region growing wider with nn and the rise to 11 getting crammed into an ever-thinner sliver next to the right edge. Pushing that observation through to the expectation:

E(Y)  =  01tntn1dt  =  nn+1  n  1.\mathbb{E}(Y) \;=\; \int_0^1 t \cdot n \cdot t^{n-1} \, dt \;=\; \frac{n}{n+1} \;\xrightarrow{n \to \infty}\; 1.
The expected-value integral, step by step

For a non-negative continuous random variable YY supported on [0,1][0, 1], the expected value is

E(Y)  =  01tfY(t)dt,\mathbb{E}(Y) \;=\; \int_0^1 t \, f_Y(t) \, dt,

where fYf_Y is the density — the derivative of the CDF FY(t)=p(Yt)F_Y(t) = p(Y \le t). From FY(t)=tnF_Y(t) = t^n,

fY(t)  =  ddttn  =  ntn1for t[0,1].f_Y(t) \;=\; \frac{d}{dt}\, t^n \;=\; n \, t^{n-1} \qquad \text{for } t \in [0, 1].

Substituting and integrating:

E(Y)  =  01tntn1dt  =  n01tndt  =  n[tn+1n+1]01  =  nn+1.\mathbb{E}(Y) \;=\; \int_0^1 t \cdot n \, t^{n-1} \, dt \;=\; n \int_0^1 t^n \, dt \;=\; n \cdot \left[\frac{t^{n+1}}{n+1}\right]_0^1 \;=\; \frac{n}{n+1}.

For the limit, write nn+1=11n+1\frac{n}{n+1} = 1 - \frac{1}{n+1} — the subtracted term shrinks to 00 as nn \to \infty.

The two answers side by side.

maxjE(Xj)  =  0.5deterministic estimatevs.E(maxjXj)  =  nn+1    1stochastic reality, n.\underbrace{\max_j \mathbb{E}(X_j) \;=\; 0.5}_{\text{deterministic estimate}} \quad \text{vs.} \quad \underbrace{\mathbb{E}(\max_j X_j) \;=\; \frac{n}{n+1} \;\to\; 1}_{\text{stochastic reality, } n \to \infty}.

The qualitative gap is striking. The deterministic estimate sits at 0.50.5 regardless of nn; the stochastic one climbs toward the worst case t=1t = 1 as nn grows, because a project that could take up to time 11 effectively does once enough parallel subprojects are stacked. The expected completion drifts toward the imaginable worst case, and there is no escape from this drift — exactly the structural underestimation Fulkerson’s theorem warned about in the abstract, now made unmistakable by a concrete distribution. The deterministic model is blind to that drift; randomization is the lens that lets the planner see it coming.

Hypergraphs

The chapter has, so far, leaned on directed graphs — vertices for tasks, edges for precedence or machine-sharing. Every edge connected exactly two vertices, and every machine handled at most one task at a time. As a closing outlook, relax both restrictions, and the natural setting becomes a hypergraph.

The new idea is small but powerful. In a regular graph an edge is always a pair of vertices. In a hypergraph the edge is generalized to a hyperedge — a subset of vertices, of any cardinality. A hyperedge can group two vertices (recovering an ordinary edge), three vertices, twenty, or any number. For instance, with V={1,2,3,4,5}V = \{1, 2, 3, 4, 5\}, a hypergraph might carry hyperedges {1,2}\{1, 2\}, {2,3,5}\{2, 3, 5\}, {4}\{4\}, and {1,3,4,5}\{1, 3, 4, 5\} — four hyperedges of cardinalities 2,3,1,42, 3, 1, 4. Picture-wise: in a graph you draw lines between pairs of dots; in a hypergraph you draw closed blobs enclosing any number of dots at once.

Formally, a hypergraph is a pair

HG=(V,HE),HEP(V),HG = (V, HE), \qquad HE \subseteq \mathcal{P}(V),

where VV is a vertex set and P(V)\mathcal{P}(V) is the power set of VV (the set of all subsets of VV, also written 2V2^V). So HEP(V)HE \subseteq \mathcal{P}(V) just says “each hyperedge is some subset of vertices, and we pick a collection of them”. When every hyperedge has exactly two vertices, the hypergraph reduces to an ordinary graph — graphs are the cardinality-22 special case.

For scheduling, the re-interpretation is the punchline. Vertices = tasks, as before. Hyperedges = machines: each machine corresponds to the hyperedge containing all the tasks that need to run on it — a single set summarizing “these are the tasks competing for this machine”. A task that needs several machines appears as a vertex in several hyperedges. The new degree of freedom over the job-shop model is that a machine can run several tasks simultaneously, up to some capacity, instead of one at a time.

The natural scheduling problem on this object is vertex coloring, with one mapping at its core:

  • Color = time slot. Two vertices sharing a color means two tasks running in the same time slot — simultaneously.
  • Hyperedge = machine. Two vertices in the same hyperedge means two tasks competing for the same machine.

So same color and same hyperedge means two tasks trying to run on the same machine at the same time — and the cap on how many of those a hyperedge can hold is exactly the machine’s capacity. Vertices in different hyperedges (tasks on different machines) are free to share a color; they don’t compete. The optimization goal is fewest colors possible — fewer colors mean fewer time slots mean a shorter overall schedule — which puts pressure to pack as many same-colored vertices into each hyperedge as capacity permits. (Forcing every vertex to take a distinct color would just mean every task gets its own time slot in serial, which is the worst schedule, not the best.)

The hypergraph machinery is needed precisely because capacity is now allowed to exceed 11 — the job-shop model’s “one task per machine at a time” rule is being relaxed. With capacity 11 and every hyperedge of size 22, the per-hyperedge constraint “at most one same-colored vertex” forces every edge’s two endpoints to take different colors — exactly the classical graph-coloring problem (paint the vertices of a graph so no two adjacent ones share a color, using the fewest colors possible), itself NP-hard in general. Allowing larger hyperedges and capacities above 11 piles more structure on top, and the closing message is the same one the chapter has been making throughout: a richer modeling target lands in a harder optimization regime, and heuristics carry the practical load.

That’s where the scheduling chapter stops. The arc traced — DAG-based process scheduling, job-shop with disjunctive choices, randomization for unforeseen variation, and hypergraphs for capacity-sharing machines — sketches one lesson at four scales: each modeling class trades expressiveness against tractability, and picking the right class is half the discrete-modeling problem.