Directed Acyclic Graph
Definition
A directed graph G is a directed acyclic graph (DAG) if there is no directed cycle in G.
- A vertex u is a source if it has no in-coming edges.
- A vertex u is a sink if it has no out-going edges.
Properties
- Every DAG G has at least one source and at least one sink
- G is a DAG if and only if $G^{rev}$ is a DAG
- G is a DAG if and only if each node is in its own strong connected component
Topological Ordering/Sorting
Definition
A topological ordering/topological sorting of G = (V, E) is an ordering $<$ on V such that if $(u → v) \in E $ then $u < v$ .
Informal Definition : One can order the vertices of the graph along a line (say the x-axis) such that all edges are from left to right.
- A DAG G may have many different topological sorts
- A directed graph G can be topologically ordered if G is a DAG
- A directed graph G is a DAG if G can be topologically ordered
Depth First Search
DFS with pre-post numbering in directed graphs
The Pre-visit number indicates when the node enters the DFS recursion stack, and the Post-visit number indicates when the node exits the DFS recursion stack. Pre and Post numbers can be used to determine whether a particular node is in the sub-tree of another node.
DFS(G)
Mark all nodes u as unvisited
T is set to ∅
time = 0
while there is an unvisited node u do
DFS(u)
Output T
DFS(u)
Mark u as visited
pre(u) = ++time
for each edge (u, v) in Out(u) do
if v is not visited
add edge (u, v) to T
DFS(v)
post(u) = ++time
Properties
- Node u is active in time interval [pre(u), post(u)]
- DFS(G) takes O(m + n) time
- Edges added form a branching: a forest of out-trees
- For any two nodes u and v, the two intervals [pre(u), post(u)] and [pre(v), post(v)] are disjoint or one is contained in the other
- To find whether u lies in the sub-tree of v or not we just compare the pre and post number of u and v. If $pre[u] > pre[v]$ and $post[u] < post[v]$ then u lies in the sub-tree of v otherwise not.
- Output of DFS(G) depends on the order in which vertices are considered
- If u is the first vertex considered by DFS(G) then DFS(u) outputs a directed out-tree T rooted at u and a vertex v is in T if and only if $v ∈ rch(u)$
Example
Edges of G can be classified with respect to the DFS tree T as:
- Tree edges that belong to T
- A forward edge is a non-tree edges (x, y) such that y is a descendant of x i.e, $pre(x) < pre(y) < post(y) < post(x)$.
- A backward edge is a non-tree edge (x, y) such that y is an ancestor of x.
- A cross edge is a non-tree edges (x, y) such that they don’t have a ancestor/descendant relationship between them.
Cycle detection in directed graph using topological sorting
Given a graph G, if it is a Directed Acyclic graph then compute a topological sort. If it failes, then output the cycle C.
The algorithm will be as follows:
- Compute DFS(G)
- If there is a back edge e = (v, u) then G is not a DAG. Output cycle C formed by path from u to v in T plus edge (v, u).
- Otherwise output nodes in decreasing post-visit order.
The above algorithm runs in $O(n + m)$ time.
Graph of strong connected components
Let $S_1, S_2, . . . S_k$ be the strong connected components (i.e.,SCCs) of G. The graph of SCCs is $G^{SCC}$. It is created by collapsing every strong connected component to a single vertex.
- Vertices are $S_1, S_2, . . . S_k$
- There is an edge $(S_i, S_j)$ if there is some $u \in S_i$ and $v \in S_j$ such that (u, v) is an edge in G
For a directed graph G, its meta-graph $G^{SCC}$ is a DAG.
The straightforward algorithm(discussed in Lecture 15) to find all SCCs of a given directed graph has a running time of $O(n(n + m))$.
The Linear time Algorithm for SCCs will be as follows:
- Let u be a vertex in a sink SCC of $G^{SCC}$
- Do DFS(u) to compute SCC(u)
- Remove SCC(u) and repeat
If v is the vertex with maximum post numbering in $DFS(G^{rev})$. Then v is in a SCC S, such that S is a sink of $G^{SCC}$. So, we can find a vertex in a sink SCC of $G^{SCC}$ for the linear time algorithm. Let us assume $G1=G^{rev}$.
do DFS(G1) and output vertices in decreasing post order.
Mark all nodes as unvisited
for each u in the computed order do
if u is not visited then
DFS(u)
Let S1 be the nodes reached by u
Output S1 as a strong connected component
Remove S1 from G
The above algorithm runs in time $O(m + n)$ and correctly outputs all the SCCs of G.
Relevent LeetCode Practice (by Tristan Yang)
- LeetCode 210 — Course Schedule II (Medium)
- Relevance: Requires producing an actual topological order of a directed acyclic graph (or detecting a cycle if one exists).
- ECE 374 Process: Build the directed graph from course prerequisites. Then either: use Kahn’s algorithm (maintain an in-degree=0 queue and remove nodes one-by-one) or use DFS with 3-colors (marking nodes white/gray/black) and output nodes in decreasing post-order (which yields a topo sort). Runs in $O(n+m)$.
- Resource: CP-Algorithms notes on Topological Sorting (both Kahn’s and DFS methods).
- Takeaway: A graph is a DAG if and only if it has a topological order. In practice, using DFS post-order or Kahn’s BFS both produce a valid topo sort for DAGs.
- LeetCode 802 — Find Eventual Safe States (Medium)
- Relevance: Identifies nodes that are not part of any cycle (i.e. eventually reach a terminal node). This is essentially finding vertices in a directed graph that belong to “acyclic” portions, using an approach akin to reasoning on the SCC condensation graph.
- ECE 374 Process: Build the reverse graph (reverse all edges). Start with all nodes that have no outgoing edges in the original graph (these are terminal “safe” nodes). Perform a Kahn-like process: put all such sinks in a queue and remove them, decrementing the out-degree of their predecessors (in the original graph). Any predecessor that loses all outgoing edges becomes safe and is added to the queue. Nodes never marked safe are part of cycles.
- Resource: Descriptions of the reverse-graph + Kahn’s algorithm trick for detecting cycle-free nodes.
- Takeaway: Many queries about “eventually safe” (cycle-free) nodes reduce to iteratively peeling off sink nodes — conceptually working on the DAG of strongly connected components.
- LeetCode 269 — Alien Dictionary (Hard)
- Relevance: A real-world application of topological sorting where nodes are characters and edges are precedence constraints derived from dictionary order. It deals with multiple valid orders and detecting inconsistencies (cycles).
- ECE 374 Process: Read the list of words. Compare each adjacent pair of words to find the first differing character, and create a directed edge from that char of the first word to that of the second word. (If a word is a prefix of a later word, that’s invalid ordering.) Then perform a topological sort on the graph of letters. If you detect a cycle or if the result doesn’t include all letters that appeared, the dictionary order is invalid.
- Resource: Detailed LeetCode editorials (for correctly building the graph and handling edge cases).
- Takeaway: Modeling is critical: turn the problem’s implicit constraints into a directed graph. Then apply topo-sort to derive an order or find a contradiction (cycle).
Supplemental Problems
-
LeetCode 1557 — Minimum Number of Vertices to Reach All Nodes
In a DAG, the answer is simply all nodes with in-degree 0 (the sources), since every other node is reachable from some source. A straightforward in-degree analysis. -
LeetCode 2192 — All Ancestors of a Node in a DAG
Compute for each node the set of nodes that can reach it (its ancestors). Can be solved via a topological order and dynamic programming (union of ancestor sets from each node’s predecessors). -
LeetCode 444 — Sequence Reconstruction
Checks if a given sequence is the unique topological order of some directed acyclic graph. Use Kahn’s algorithm: the topological order is unique if and only if at each step there is exactly one available node (in-degree 0) to choose. -
LeetCode 2050 — Parallel Courses III
Longest path in a DAG (with each node having a duration). Topologically sort the graph, then compute for each course the time to finish it as its own duration plus the maximum finish time of its prerequisites. -
LeetCode 2360 — Longest Cycle in a Graph
Directed graph where each node has out-degree ≤ 1. Find the length of the longest directed cycle. Can be solved with DFS and cycle detection (timestamps or colors) or by finding strongly connected components. The out-degree constraint simplifies cycle structure. -
LeetCode 841 — Keys and Rooms
A simple directed reachability problem: starting from room 0, use the keys to visit rooms (edges from a room to the rooms its keys unlock). Just do a DFS/BFS to see if all rooms can be reached, reinforcing basic graph traversal.
Additional Resources
- Textbooks
- Erickson, Jeff. Algorithms
- Skiena, Steven. The Algorithms Design Manual
- Chapter 7.8 - Depth-First Search
- Chapter 7.9 - Applications of Depth-First Search
- Chapter 7.10 - Depth-First Search on Directed Graphs
- Sedgewick, Robert and Wayne, Kevin. Algorithms (Forth Edition)
- Chapter 4.2 - Directed Graphs
- Cormen, Thomas, et al. Algorithms (Forth Edition)
- Chapter 20.3 - Depth-First Search
- Chapter 20.4 - Topological Sort
- Sariel’s Lecture 16
- Jeff’s - Notes on Depth-First Search