# 2020-07-30

## Status

I've been slacking off a bit on the technical diary. Yesterday I didn't write anything at all, and the day before that only a few paragraphs. Let's try to improve that a bit.

## Last Use

So, what's the status? I've worked on improving my last-use analysis, to better
handle aliasing, and altogether be more robust. This turned out to, not
surprisingly, be necessary for linear scan/reusing allocations. My problems over
the last couple of days have all stemmed from the fact that the return values of
a~Body~ of code, say from an `if`

-expression, cannot be unambigiously associated
with a statement. Consider the following piece of code:

1: let xs: [n]i32 = ... 2: let ys = 3: if b then 4: xs 5: else 6: iota n 7: let y = .. ys .. -- last use of ys, xs

The last syntactic use of `xs`

is in the return from the then-branch of the `if`

expression, but there are no statements inside that branch. Instead, we can
associate the last use of `xs`

with the `let ys`

statement, but then it's not
clear whether it's last used in the then- or else-branch, or both. In the
algorithm that I'd previously made, I assumed that I would always be able to
determine which branch something was last-used in, which lead me to try to
create an optimisation that would reuse the memory block of `xs`

for the ```
iota
n
```

, in the above code. However, I think I have to abandon that idea for now, and
instead focus on getting the simple cases to work.

Besides, the real last use of `xs`

is not inside the `if`

expression, but
actually at the same place as the last use of `ys`

. In order to be able to
correctly determine the last use `xs`

, I need aliasing information. Fortunately,
there already exists and `Alias`

lore that attaches aliasing information in
various places. In particular:

- Patterns elements in statements have a
`VarAliases`

attached, which indicates which existing variables the pattern element aliases. - Statements have a
`ConsumedInExp`

attached to their auxiliary information, which indicates which values were consumed in the expression. As far as I can tell, this is only ever used when using in-place updates. - Bodies have a
`BodyAliasing`

attached. I'm not sure what this is for, it seems to contain the same thing`VarAliases`

. Obviously, however, different branches in an`if`

expression can have different`BodyAliasing`

information. - Ops implement
`CanBeAliased`

and`AliasedOp`

, where`AliasedOp`

contains information about what is aliased inside, and what is consumed inside.

Upon looking into `BodyAliasing`

, I have confirmed that it is actually the case
that different branches in `if`

expressions have different `BodyAliasing`

information. It may be possible to use this knowledge to do the optimisation I
was talking about above, but let's get the other bits working first.

From a last-use perspective, we really want to know, whenever something is
last-used (the first occurrence of a variable when traversing the program
statements in reverse order), it and *all of the variables it aliases* should be
marked as last-used. However, given the alias lore described above, we don't
have the necessary aliasing information at that point. This lead me to create a
sort of inverted last-use map, which is only used during the analysis, and which
is then inverted at the end to produce the real last-use map. The inverse
last-use map records for each variable, in which statement it was last
used. In the example above, when we come upon the `let y`

statement, we record
that `ys`

is last used in that statement. Later, in the `let ys`

statement, we
see that `ys`

aliases `xs`

, so we look up the last use of `ys`

, which is `y`

,
and record that the last use of `xs`

is also `y`

.

Now, given that, let's revisit `ReuseAllocations`

, and see how we can improve
it.

## Interference

First of all, let's try to get this interference graph right. I've started by
factoring the interference analysis out into its own module, called
`Futhark.Analysis.Interference`

.

I think a good first step is to set up some tests of some sort. I currently have
a bunch of different files in my `tests`

directory, but no way of knowing what
the correct output for each file is and why. Now, I know `psum.fut`

is going to
be one of my tests. Here's what the generated code with ```
futhark-0.16.2 dev
--kernels -a -e --cse
```

looks like:

1: -- xss_247 : [impl₀_245][impl₁_246]i32@@xss_mem_760-> 2: -- {base: [impl₀_245, impl₁_246]; contiguous: True; LMADs: [{offset: 0i32; 3: -- strides: [impl₁_246, 1i32]; 4: -- rotates: [0i32, 0i32]; 5: -- shape: [impl₀_245, impl₁_246]; 6: -- permutation: [0, 1]; 7: -- monotonicity: [Inc, Inc]}]} 8: entry {*[?0][?1]i32@?2-> 9: {base: [?0, ?1]; contiguous: True; LMADs: [{offset: 0i32; 10: strides: [?1, 1i32]; 11: rotates: [0i32, 0i32]; 12: shape: [?0, ?1]; 13: permutation: [0, 1]; 14: monotonicity: [Inc, Inc]}]}} 15: main (mem xss_mem_760, i32 impl₀_245, i32 impl₁_246, 16: [impl₀_245][impl₁_246]i32 xss_247) = { 17: #[incremental_flattening(only_intra)] 18: let {i64 binop_x_775} = sext i32 impl₀_245 to i64 19: #[incremental_flattening(only_intra)] 20: let {i64 binop_y_776} = sext i32 impl₁_246 to i64 21: #[incremental_flattening(only_intra)] 22: let {i64 binop_x_777} = mul_nw64(binop_x_775, binop_y_776) 23: #[incremental_flattening(only_intra)] 24: let {i64 bytes_774} = mul_nw64(4i64, binop_x_777) 25: #[incremental_flattening(only_intra)] 26: let {mem mem_778} = 27: alloc(bytes_774) 28: let {i64 binop_x_764} = binop_y_776 29: let {i64 bytes_763} = mul_nw64(4i64, binop_y_776) 30: let {i64 binop_x_768} = binop_y_776 31: let {i64 bytes_767} = bytes_763 32: let {i64 binop_x_772} = binop_y_776 33: let {i64 bytes_771} = bytes_763 34: #[incremental_flattening(only_intra)] 35: -- res_409 : [impl₀_245][impl₁_246]i32@@mem_778-> 36: -- {base: [impl₀_245, impl₁_246]; contiguous: True; LMADs: [{offset: 0i32; 37: -- strides: [impl₁_246, 1i32]; 38: -- rotates: [0i32, 0i32]; 39: -- shape: [impl₀_245, impl₁_246]; 40: -- permutation: [0, 1]; 41: -- monotonicity: [Inc, Inc]}]} 42: let {[impl₀_245][impl₁_246]i32 res_409} = 43: segmap_group 44: (#groups=impl₀_245; groupsize=impl₁_246) 45: (gtid_292 < impl₀_245) (~phys_tid_305) : {[impl₁_246]i32} { 46: let {mem@local mem_765} = 47: alloc(bytes_763, @local) 48: -- resarr0_416 : [impl₁_246]i32@@mem_765-> 49: -- {base: [impl₁_246]; contiguous: True; LMADs: [{offset: 0i32; strides: [1i32]; 50: -- rotates: [0i32]; 51: -- shape: [impl₁_246]; 52: -- permutation: [0]; 53: -- monotonicity: [Inc]}]} 54: let {[impl₁_246]i32 resarr0_416} = 55: segscan_thread 56: (#groups=impl₀_245; groupsize=impl₁_246) 57: ({{0i32}, 58: [], 59: fn {i32} (i32 x_417, i32 x_418) => 60: let {i32 res_419} = add32(x_417, x_418) 61: in {res_419}}) 62: (gtid_295 < impl₁_246) (~phys_tid_296) : {i32} { 63: let {i32 x_420} = xss_247[gtid_292, gtid_295] 64: return {returns x_420} 65: } 66: let {mem@local mem_769} = 67: alloc(bytes_763, @local) 68: -- resarr0_426 : [impl₁_246]i32@@mem_769-> 69: -- {base: [impl₁_246]; contiguous: True; LMADs: [{offset: 0i32; strides: [1i32]; 70: -- rotates: [0i32]; 71: -- shape: [impl₁_246]; 72: -- permutation: [0]; 73: -- monotonicity: [Inc]}]} 74: let {[impl₁_246]i32 resarr0_426} = 75: segscan_thread 76: (#groups=impl₀_245; groupsize=impl₁_246) 77: ({{0i32}, 78: [], 79: fn {i32} (i32 x_427, i32 x_428) => 80: let {i32 res_429} = add32(x_427, x_428) 81: in {res_429}}) 82: (gtid_297 < impl₁_246) (~phys_tid_298) : {i32} { 83: let {i32 x_430} = resarr0_416[gtid_297] 84: return {returns x_430} 85: } 86: let {mem@local mem_773} = 87: alloc(bytes_763, @local) 88: -- resarr0_435 : [impl₁_246]i32@@mem_773-> 89: -- {base: [impl₁_246]; contiguous: True; LMADs: [{offset: 0i32; strides: [1i32]; 90: -- rotates: [0i32]; 91: -- shape: [impl₁_246]; 92: -- permutation: [0]; 93: -- monotonicity: [Inc]}]} 94: let {[impl₁_246]i32 resarr0_435} = 95: segscan_thread 96: (#groups=impl₀_245; groupsize=impl₁_246) 97: ({{0i32}, 98: [], 99: fn {i32} (i32 x_436, i32 x_437) => 100: let {i32 res_438} = add32(x_436, x_437) 101: in {res_438}}) 102: (gtid_299 < impl₁_246) (~phys_tid_300) : {i32} { 103: let {i32 x_439} = resarr0_426[gtid_299] 104: return {returns x_439} 105: } 106: return {returns resarr0_435} 107: } 108: in {impl₀_245, impl₁_246, mem_778, res_409} 109: }

Now, what I'm mostly interested in, is that `mem_765`

should interfere with
`mem_769`

, but not `mem_773`

. On the other hand, `mem_769`

and `mem_773`

should
interfere with each other. So, we want an interference graph that looks like
this:

To keep things simple, I'll only concern myself with kernels for now. Therefore,
I've written an `analyseKernels`

function, whose sole purpose is to walk through
the program and find the kernels and call the actual analysis functions on
those.

Ah, but there are other memory blocks in that code! `xss_mem_760`

is the block
that the argument to `main`

, `xss_247`

, resides in. There is also `mem_778`

,
which is where the result of the outer map is put.

In principle, if `xss_247`

was consumed by the function, we could reuse it's
memory for other global memory blocks. However, since we're focusing on just
kernels, perhaps we should actually disregard them entirely when doing the
interference graph? A heuristic could be to only add blocks to the interference
graph that actually are in `InUse`

.

So, we still want the same interference graph.

For some reason `<&> namesIntersection inuse_inside`

removes all names from
last_{use}_{mems}?

That's all sorted out, and now I actually get the correct interference graph,
for `psum.fut`

at least.