2020-07-31

Testing futhark

Currently, the only way to test much of the Futhark compiler is by writing a Futhark program and verify that the compiled code behaves as expected. I think this is generally called integration tests. Sometimes, however, you want slightly more fine-grained testing. For instance, now that I'm working on my interference graph, I'd like to write a test that says: Take psum.fut, compute the interference graph for it, and compare it to some predefined value. However, there is currently no easy way to do that.

One approach would be to write some functionality that allow us to write a Futhark program and compile it to a certain point within a unit test. That way, we could write:

However, parseAndPrepare is going to do a non-trivial amount of work.

Instead of trying to parse and process everything, we could also express the desired test IR directly in Haskell. That would look something like this:

1: let input = Prog Seq.Empty [FunDef Nothing mempty (nameFromString "psum") ....
2: let graph = computerInterferenceGraph input
3: assert(graph == [(mem_765, mem_769), (mem_769, mem_773)])

However, that seems very cumbersome and error-prone. Consider the IR generated for psum.fut:

  1: -- xss_247 : [impl₀_245][impl₁_246]i32@@xss_mem_759->
  2: -- {base: [impl₀_245, impl₁_246]; contiguous: True; LMADs: [{offset: 0i32;
  3: --                                                           strides: [impl₁_246, 1i32];
  4: --                                                           rotates: [0i32, 0i32];
  5: --                                                           shape: [impl₀_245, impl₁_246];
  6: --                                                           permutation: [0, 1];
  7: --                                                           monotonicity: [Inc, Inc]}]}
  8: entry {*[?0][?1]i32@?2->
  9:        {base: [?0, ?1]; contiguous: True; LMADs: [{offset: 0i32;
 10:                                                    strides: [?1, 1i32];
 11:                                                    rotates: [0i32, 0i32];
 12:                                                    shape: [?0, ?1];
 13:                                                    permutation: [0, 1];
 14:                                                    monotonicity: [Inc, Inc]}]}}
 15: main (mem xss_mem_759, i32 impl₀_245, i32 impl₁_246,
 16:       [impl₀_245][impl₁_246]i32 xss_247) = {
 17:   #[incremental_flattening(only_intra)]
 18:   let {i64 binop_x_774} = sext i32 impl₀_245 to i64
 19:   #[incremental_flattening(only_intra)]
 20:   let {i64 binop_y_775} = sext i32 impl₁_246 to i64
 21:   #[incremental_flattening(only_intra)]
 22:   let {i64 binop_x_776} = mul_nw64(binop_x_774, binop_y_775)
 23:   #[incremental_flattening(only_intra)]
 24:   let {i64 bytes_773} = mul_nw64(4i64, binop_x_776)
 25:   #[incremental_flattening(only_intra)]
 26:   let {mem mem_777} =
 27:     alloc(bytes_773)
 28:   let {i64 binop_x_763} = binop_y_775
 29:   let {i64 bytes_762} = mul_nw64(4i64, binop_y_775)
 30:   let {i64 binop_x_767} = binop_y_775
 31:   let {i64 bytes_766} = bytes_762
 32:   let {i64 binop_x_771} = binop_y_775
 33:   let {i64 bytes_770} = bytes_762
 34:   #[incremental_flattening(only_intra)]
 35:   -- res_408 : [impl₀_245][impl₁_246]i32@@mem_777->
 36:   -- {base: [impl₀_245, impl₁_246]; contiguous: True; LMADs: [{offset: 0i32;
 37:   --                                                           strides: [impl₁_246, 1i32];
 38:   --                                                           rotates: [0i32, 0i32];
 39:   --                                                           shape: [impl₀_245, impl₁_246];
 40:   --                                                           permutation: [0, 1];
 41:   --                                                           monotonicity: [Inc, Inc]}]}
 42:   let {[impl₀_245][impl₁_246]i32 res_408} =
 43:     segmap_group
 44:     (#groups=impl₀_245; groupsize=impl₁_246)
 45:     (gtid_292 < impl₀_245) (~phys_tid_305) : {[impl₁_246]i32} {
 46:       let {mem@local mem_764} =
 47:         alloc(bytes_762, @local)
 48:       -- resarr0_415 : [impl₁_246]i32@@mem_764->
 49:       -- {base: [impl₁_246]; contiguous: True; LMADs: [{offset: 0i32; strides: [1i32];
 50:       --                                                rotates: [0i32];
 51:       --                                                shape: [impl₁_246];
 52:       --                                                permutation: [0];
 53:       --                                                monotonicity: [Inc]}]}
 54:       let {[impl₁_246]i32 resarr0_415} =
 55:         segscan_thread
 56:         (#groups=impl₀_245; groupsize=impl₁_246)
 57:         ({{0i32},
 58:           [],
 59:           fn {i32} (i32 x_416, i32 x_417) =>
 60:             let {i32 res_418} = add32(x_416, x_417)
 61:             in {res_418}})
 62:         (gtid_295 < impl₁_246) (~phys_tid_296) : {i32} {
 63:           let {i32 x_419} = xss_247[gtid_292, gtid_295]
 64:           return {returns x_419}
 65:         }
 66:       let {mem@local mem_768} =
 67:         alloc(bytes_762, @local)
 68:       -- resarr0_425 : [impl₁_246]i32@@mem_768->
 69:       -- {base: [impl₁_246]; contiguous: True; LMADs: [{offset: 0i32; strides: [1i32];
 70:       --                                                rotates: [0i32];
 71:       --                                                shape: [impl₁_246];
 72:       --                                                permutation: [0];
 73:       --                                                monotonicity: [Inc]}]}
 74:       let {[impl₁_246]i32 resarr0_425} =
 75:         segscan_thread
 76:         (#groups=impl₀_245; groupsize=impl₁_246)
 77:         ({{0i32},
 78:           [],
 79:           fn {i32} (i32 x_426, i32 x_427) =>
 80:             let {i32 res_428} = add32(x_426, x_427)
 81:             in {res_428}})
 82:         (gtid_297 < impl₁_246) (~phys_tid_298) : {i32} {
 83:           let {i32 x_429} = resarr0_415[gtid_297]
 84:           return {returns x_429}
 85:         }
 86:       let {mem@local mem_772} =
 87:         alloc(bytes_762, @local)
 88:       -- resarr0_434 : [impl₁_246]i32@@mem_772->
 89:       -- {base: [impl₁_246]; contiguous: True; LMADs: [{offset: 0i32; strides: [1i32];
 90:       --                                                rotates: [0i32];
 91:       --                                                shape: [impl₁_246];
 92:       --                                                permutation: [0];
 93:       --                                                monotonicity: [Inc]}]}
 94:       let {[impl₁_246]i32 resarr0_434} =
 95:         segscan_thread
 96:         (#groups=impl₀_245; groupsize=impl₁_246)
 97:         ({{0i32},
 98:           [],
 99:           fn {i32} (i32 x_435, i32 x_436) =>
100:             let {i32 res_437} = add32(x_435, x_436)
101:             in {res_437}})
102:         (gtid_299 < impl₁_246) (~phys_tid_300) : {i32} {
103:           let {i32 x_438} = resarr0_425[gtid_299]
104:           return {returns x_438}
105:         }
106:       return {returns resarr0_434}
107:     }
108:   in {impl₀_245, impl₁_246, mem_777, res_408}
109: }

That's a lot of IR to write by hand…

Troels suggested having an alternative way to input IR (Futhark Core Language), for instance by defining an isomorphism with sexps. Then, we would input the IR as a sexp and could directly run our test functions on it. I found the sexp-grammar package, which should supposedly be able to help us do that, but unfortunately its dependencies clash with those of Futhark, and I don't know how to fix that. If it worked, however, we would be able to input an IR like this:

1: let input = parseSexp "(prog () (fun () () \"psum\" ....)"
2: let graph = computerInterferenceGraph input
3: assert(graph == [(mem_765, mem_769), (mem_769, mem_773)])

Still cumbersome and error-prone like the direct AST from above, but perhaps marginally better? Of course, everything breaks down if the AST changes, how are we supposed to handle that?

For the record, this is the error I'm getting after adding sexp-grammar >= 2.1.0 to the build-depends in futhark.cabal and running cabal build:

Resolving dependencies...
cabal: Could not resolve dependencies:
[__0] trying: futhark-0.17.0 (user goal)
[__1] trying: template-haskell-2.16.0.0/installed-2.16.0.0 (dependency of
futhark)
[__2] trying: sexp-grammar-2.1.0 (dependency of futhark)
[__3] next goal: prettyprinter (dependency of sexp-grammar)
[__3] rejecting: prettyprinter-1.6.1 (conflict: sexp-grammar =>
prettyprinter>=1 && <1.3)
[__3] skipping: prettyprinter-1.6.0, prettyprinter-1.5.1, prettyprinter-1.5.0,
prettyprinter-1.4.0, prettyprinter-1.3.0 (has the same characteristics that
caused the previous version to fail: excluded by constraint '>=1 && <1.3' from
'sexp-grammar')
[__3] rejecting: prettyprinter-1.2.1.1, prettyprinter-1.2.1,
prettyprinter-1.2.0.1, prettyprinter-1.2, prettyprinter-1.1.1,
prettyprinter-1.1.0.1, prettyprinter-1.1, prettyprinter-1.0.1 (conflict:
template-haskell => base==4.14.0.0/installed-4.14.0.0, prettyprinter =>
base>=4.7 && <4.13)
[__3] rejecting: prettyprinter-1 (conflict:
template-haskell==2.16.0.0/installed-2.16.0.0, prettyprinter =>
template-haskell>=2.9 && <2.12)
[__3] rejecting: prettyprinter-0.1 (conflict: sexp-grammar => prettyprinter>=1
&& <1.3)
[__3] fail (backjumping, conflict set: prettyprinter, sexp-grammar,
template-haskell)
After searching the rest of the dependency tree exhaustively, these were the
goals I've had most trouble fulfilling: base, sexp-grammar, futhark,
prettyprinter, template-haskell
Try running with --minimize-conflict-set to improve the error message.

So, while I can't get sexp-grammar to work, let's try and give it a go with the first approach, parseAndPrepare.

`parseAndPrepare`

Well, this turned out to actually be somewhat reasonable… I'm reading and writing to a temporary file, but that should be okay, right?

 1: pipeline :: Pipeline SOACS KernelsMem
 2: pipeline =
 3:   kernelsPipeline
 4:     >>> onePass Kernels.explicitAllocations
 5:     >>> passes
 6:       [ simplifyKernelsMem,
 7:         performCSE False
 8:       ]
 9: 
10: psumTest :: TestTree
11: psumTest =
12:   testCase "psum.fut" $ do
13:     fp <- writeSystemTempFile "psum.fut" "let psum = scan (+) (0: i32) let main (xss: [][]i32) = #[incremental_flattening(only_intra)] map (psum >-> psum >-> psum) xss"
14:     prog <- runFutharkM (runPipelineOnProgram newFutharkConfig pipeline fp) NotVerbose
15:     case prog of
16:       Right prog' ->
17:         case Set.toList $ Interference.analyse prog' of
18:           [(mem1, mem2), (mem2', mem3)] -> do
19:             assertEqual "Some mems" mem2 mem2'
20:             assertBool "Only two elements" (mem1 /= mem3)
21:             assertBool "Only two elements" (mem1 /= mem2)
22:             assertBool "Only two elements" (mem2 /= mem3)
23:           _ ->
24:             assertFailure "Interference graph invalid"
25:       Left _ ->
26:         assertFailure "Could not compile"

It's not robust at all, any changes to the optimisation passes might make the test fail for non-obvious reasons. But it should be good enough for now.

So, now it's just a matter of writing more of those, and expanding on the intersection analysis.