2020-08-13
Graph coloring
In general, k-coloring a graph is NP-complete, so there are no effective algorithms to do it. Torbens book suggests a recursive heuristic method that could work alright for our problem. However, our problem is different from the usual register allocation problem, in that we aren't limited in the number of "registers" we can use. Instead, our allocations have sizes, and so the overall goal is both to minimize the number of allocations, but also the total size. We wish to minimize the number of allocations because each allocation takes time, and we want to minimize the total size because we wish to increase the chance that the final program fits within shared memory in a GPU workgroup. By order of prioritization, I think it's more important to minimize the total size, simply because shared memory is so much faster than global memory, and so every additional program that can fit in shared memory is a win.
However, before we think too much about how to minimize the total size of
allocations, let's try to do something stupid that works. The greedy coloring
algorithm looks easy to implement, doesn't handle spilling (which we don't
need), and should be good enough that we can implement the rest of the pass and
optimise programs such as psum
.
Greedy Coloring
So, let's try to implement greedy coloring.
Here's the algorithm, taken from Wikipedia:
1: def first_available(color_list): 2: """Return smallest non-negative integer not in the given list of colors.""" 3: color_set = set(color_list) 4: count = 0 5: while True: 6: if count not in color_set: 7: return count 8: count += 1 9: 10: def greedy_color(G, order): 11: """Find the greedy coloring of G in the given order. 12: The representation of G is assumed to be like https://www.python.org/doc/essays/graphs/ 13: in allowing neighbors of a node/vertex to be iterated over by "for w in G[node]". 14: The return value is a dictionary mapping vertices to their colors.""" 15: color = dict() 16: for node in order: 17: used_neighbour_colors = [color[nbr] for nbr in G[node] 18: if nbr in color] 19: color[node] = first_available(used_neighbour_colors) 20: return color
Alright, well that was actually fairly easy to port. Now that we have that, we can move on to the next bit we need for ReuseAllocations: A pass to collect all the different sizes that are allocated.
For reference, here's what the algorithm currently looks like, in broad terms:
- For each kernel \(k\) in the program:
- Find the interference graph \(g\).
- Obtain a graph coloring \(c\) of \(g\).
- Make a map of all the memory blocks and their sizes in the kernel, \(sizeMap\).
- For each color \(c_i\) in \(c\):
- Insert at the beginning of \(k\) an allocation of a block \(b_i\) with the size \(m\), where \(m\) is the max of all the sizes of memory blocks with the color \(c_i\).
- Replace all allocations of memory blocks with the color \(c_i\) with \(b_i\).
- Run the simplifier to remove all extraneous allocs(?)
Ah, I'm adding some more tests to GreedyColoringTests before I move on.
Oh, interesting! My Graph
structure cannot represent graphs with no
intersections. That's not good, since it means that the graph coloring will be
empty, and the memory blocks won't be assigned a color. The trivial thing to do
is of course to fix that in post, by arbitrarily coloring all those VNames that
are not in the graph coloring. Yeah, that should do it. I just need to keep it
in mind in step 1.1 above.
Now for the bit about finding memory blocks and sizes. First of all, I should
probably be able to reuse the analyseKernels
function from
Interference
. It's currently hard-coded to work for the interference graph,
but really, we might want something more general?
Perhaps the analyseKernels
function should really reside in
ReuseAllocations
? After all, the kernels are distinct, and there should be no
overlap between local allocations inside each kernel, right? That also
corresponds more closely to the algorithm I described above. Currently,
Interference
and the others, work on the whole program, but really we're
interested in versions that work on a given kernel.
I got started, I'll continue tomorrow.