Link Search Menu Expand Document

Puppet - Graph Algorithms [Ruby]

Project home page
#graph #algorithm

Help Code Catalog grow: suggest your favorite code or weight in on open article proposals.

Table of contents
  1. Context
  2. Problem
  3. Overview
  4. Implementation details
    1. Graph definition
    2. Adding and removing vertices and edges
    3. Dependency closure
    4. Tarjan’s algorithm and finding cycles
    5. Traversing the graph
  5. Testing
  6. Related
  7. References
  8. Copyright notice


Puppet, an automated administrative engine for your Linux, Unix, and Windows systems, performs administrative tasks (such as adding users, installing packages, and updating server configurations) based on a centralized specification.

Puppet deals with resources: files, users, services, packages and so on. Resources are defined in manifest files.

By default, Puppet applies unrelated resources in the order in which they’re written in the manifest. If a resource must be applied before or after some other resource, declare a relationship between them to show that their order isn’t coincidental. You can also make changes in one resource cause a refresh of some other resource. See the Relationships and ordering page for more information.


In what order should Puppet apply resources to respect the relationships? How can it detect and report circular dependencies between resources? How does it know everything a resource depends on, directly or indirectly, or everything that depends on it?


Puppet represents relationships between resources as a graph with vertices being resources and edges being relationships. Then the problems above can be solved with standard graph algorithms.

In this article we review Puppet’s implementation of generic graph algorithms and don’t touch the application logic on top of them.

Implementation details

Graph definition

Generic graph algorithms are implemented in SimpleGraph, which is extended by more app-specific RelationshipGraph and Catalog.

The graph stores:

  • @in_to - a map from vertices to incoming edges.
  • @out_from - a map from vertices to outgoing edges.
  • @upstream_from - a map from vertices to vertices it’s reachable from.
  • @downstream_from - a map from vertices to reachable vertices.
  # All public methods of this class must maintain (assume ^ ensure) the following invariants, where "=~=" means
  # equiv. up to order:
  #   @in_to.keys =~= @out_to.keys =~= all vertices
  #   @in_to.values.collect { |x| x.values }.flatten =~= @out_from.values.collect { |x| x.values }.flatten =~= all edges
  #   @in_to[v1][v2] =~= @out_from[v2][v1] =~= all edges from v1 to v2
  #   @in_to   [v].keys =~= vertices with edges leading to   v
  #   @out_from[v].keys =~= vertices with edges leading from v
  #   no operation may shed reference loops (for gc)
  #   recursive operation must scale with the depth of the spanning trees, or better (e.g. no recursion over the set
  #       of all vertices, etc.)
  # This class is intended to be used with DAGs.  However, if the
  # graph has a cycle, it will not cause non-termination of any of the
  # algorithms.
  def initialize
    @in_to = {}
    @out_from = {}
    @upstream_from = {}
    @downstream_from = {}

Adding and removing vertices and edges

Adding a vertex:

  # Add a new vertex to the graph.
  def add_vertex(vertex)
    @in_to[vertex]    ||= {}
    @out_from[vertex] ||= {}

Removing a vertex. Besides removing the vertex from the maps, it removes all adjacent edges and also clears reachable vertices - more about this below.

  # Remove a vertex from the graph.
  def remove_vertex!(v)
    return unless vertex?(v)
    (@in_to[v].values+@out_from[v].values).flatten.each { |e| remove_edge!(e) }

Adding an edge. It adds both adjacent vertices, resets reachable vertices and updates the edge maps. The last 4 lines may look like dead code, but they are not - note the assignments to @in_to and @out_to in the right-hand sides.

  # Add a new edge.  The graph user has to create the edge instance,
  # since they have to specify what kind of edge it is.
  def add_edge(e,*a)
    return add_relationship(e,*a) unless a.empty?
    e = Puppet::Relationship.from_data_hash(e) if e.is_a?(Hash)
    # Avoid multiple lookups here. This code is performance critical
    arr = (@in_to[][e.source] ||= [])
    arr << e unless arr.include?(e)
    arr = (@out_from[e.source][] ||= [])
    arr << e unless arr.include?(e)

Removing an edge:

  # Remove an edge from our graph.
  def remove_edge!(e)
    if edge?(e.source,
      @in_to   [].delete e.source if (@in_to   [][e.source] -= [e]).empty?
      @out_from[e.source].delete if (@out_from[e.source][] -= [e]).empty?

Dependency closure

Finding all dependencies of a resource requires finding all vertices the resource is reachable from.

The comment appears to be wrong - it’s the definition of a dependent, not a dependency - but the code is correct.

  # Which resources depend upon the given resource.
  def dependencies(resource)
    vertex?(resource) ? upstream_from_vertex(resource).keys : []

Simple recursive algorithm to do it. The results are caches in @upstream_from, which is reset whenever an edge is added or removed anywhere in the graph.

  def upstream_from_vertex(v)
    return @upstream_from[v] if @upstream_from[v]
    result = @upstream_from[v] = {}
    @in_to[v].keys.each do |node|
      result[node] = 1

Dependents are found the same way.

Tarjan’s algorithm and finding cycles

Implementation of Tarjan’s strongly connected components algorithm. The implementation is very close to the description on Wikipedia. The pseudo-code presented on Wikipedia stores values, like index or lowlink, right on the node. Here it is not possible, so, as noted in the comments, the state is stored in a separate map.

  # This is a simple implementation of Tarjan's algorithm to find strongly
  # connected components in the graph; this is a fairly ugly implementation,
  # because I can't just decorate the vertices themselves.
  # This method has an unhealthy relationship with the find_cycles_in_graph
  # method below, which contains the knowledge of how the state object is
  # maintained.
  def tarjan(root, s)
    # initialize the recursion stack we use to work around the nasty lack of a
    # decent Ruby stack.
    recur = [{ :node => root }]

    while not recur.empty? do
      frame = recur.last
      vertex = frame[:node]

      case frame[:step]
      when nil then
        s[:index][vertex]   = s[:number]
        s[:lowlink][vertex] = s[:number]
        s[:number]          = s[:number] + 1

        s[:seen][vertex] = true

        frame[:children] = adjacent(vertex)
        frame[:step]     = :children

      when :children then
        if frame[:children].length > 0 then
          child = frame[:children].shift
          if ! s[:index][child] then
            # Never seen, need to recurse.
            frame[:step] = :after_recursion
            frame[:child] = child
            recur.push({ :node => child })
          elsif s[:seen][child] then
            s[:lowlink][vertex] = [s[:lowlink][vertex], s[:index][child]].min
          if s[:lowlink][vertex] == s[:index][vertex] then
            this_scc = []
            loop do
              top = s[:stack].pop
              s[:seen][top] = false
              this_scc << top
              break if top == vertex
            s[:scc] << this_scc
          recur.pop               # done with this node, finally.

      when :after_recursion then
        s[:lowlink][vertex] = [s[:lowlink][vertex], s[:lowlink][frame[:child]]].min
        frame[:step] = :children

        fail "#{frame[:step]} is an unknown step"

The strongly connected components are used to find cycles in the graph - any strongly connected component with more than one vertex contains a cycle. Cycles are also possible if a vertex declares a dependency on itself. If there are no circular dependencies, then the graph is a DAG (Directed Acyclic Graph).

  # Find all cycles in the graph by detecting all the strongly connected
  # components, then eliminating everything with a size of one as
  # uninteresting - which it is, because it can't be a cycle. :)
  # This has an unhealthy relationship with the 'tarjan' method above, which
  # it uses to implement the detection of strongly connected components.
  def find_cycles_in_graph
    state = {
      :number => 0, :index => {}, :lowlink => {}, :scc => [],
      :stack => [], :seen => {}

    # we usually have a disconnected graph, must walk all possible roots
    vertices.each do |vertex|
      if ! state[:index][vertex] then
        tarjan vertex, state

    # To provide consistent results to the user, given that a hash is never
    # assured to return the same order, and given our graph processing is
    # based on hash tables, we need to sort the cycles internally, as well as
    # the set of cycles.
    # Given we are in a failure state here, any extra cost is more or less
    # irrelevant compared to the cost of a fix - which is on a human
    # time-scale.
    state[:scc].select do |component|
      multi_vertex_component?(component) || single_vertex_referring_to_self?(component) do |component|

Traversing the graph

Traversing a graph from source. Passing direction allows walking the graph both “up” and “down”. The comment appears to be wrong - it’s a depth-first search (DFS), not breadth-first (BFS). There’s an easy way to distinguish non-recursive DFS from BFS: DFS uses stacks (last-in-first-out) and BFS uses queues (first-in-first-out).

  # Just walk the tree and pass each edge.
  def walk(source, direction)
    # Use an iterative, breadth-first traversal of the graph. One could do
    # this recursively, but Ruby's slow function calls and even slower
    # recursion make the shorter, recursive algorithm cost-prohibitive.
    stack = [source]
    seen =
    until stack.empty?
      node = stack.shift
      next if seen.member? node
      connected = adjacent(node, :direction => direction)
      connected.each do |target|
        yield node, target
      seen << node


The test suite is made up of a large number of very short and very clear tests. E.g. one of the tests for finding cycles:

    it "should report multi-vertex loops" do
      add_edges :a => :b, :b => :c, :c => :a
      expect(Puppet).to receive(:err).with(/Found 1 dependency cycle:\n\(Notify\[a\] => Notify\[b\] => Notify\[c\] => Notify\[a\]\)/)
      cycle = @graph.report_cycles_in_graph.first
      expect_cycle_to_include(cycle, :a, :b, :c)

See our article about graph algorithms in Terraform. Terraform is another open-source infrastructure management tool, albeit its use cases are quite different than Puppet’s. It implements many of the same ideas for similar purposes, but in Go.


Puppet is licensed under the Apache-2.0 License.

Copyright (c) 2011 Puppet Inc.