2020-09-08
Status
So, not a lot has happened since last week. Well, that's not really true, because I've been busy with teaching AP (Advanced Programming) and taking PMPH (Programming Massively Parallel Hardware). And there was a funeral yesterday…
Anyway, enough excuses. Today and tomorrow I should have ample time to get in to OptionPricing.fut and figure out what I need to do to make it faster/require less memory.
Perhaps I've been diving at it too hard by only looking at the output kernels? Perhaps today I should start by looking at the Futhark code and see if I can make sense of that before I move on to the kernels. Perhaps I can even identify some patterns directly in the Futhark code that is not being optimised.
OptionPricing.fut
Oh my, this is… a lot.
So, the main function takes a lot of arguments. One is the contract_number
. It
seems like there are three different types of contracts implemented, each using
payoff1
, payoff2
and payoff3
as payoff functions. I don't know what a
payoff function is, but perhaps that'll make sense later on. Supposedly it has
to with with how each type of contract settles. Most of the other arguments seem
to have to do the model or the brownian bridge.
The main
function itself first computes a sobol matrix which it then
immediately uses to compute a gaussian matrix which in turn is turned into a
bownian bridge matrix. Now, these are obvious candidates for memory reuse
(gauss_mat
is only used in the computation of bb_mat
, so they can share the
same memory space), but they do not take place inside a kernel, so they would
not be optimised at all by my current implementation. Instead, I need to look
for that kind of pattern inside some nested parallelism. For instance, payoffs
maps over bb_mat
, and inside bd_row
is computed as the result of a map4
and then immediately used. Perhaps that is an area that's worth investigating?
But those two maps should be fused, right? Let's see what it looks like in the
kernel. For reference, this is the Futhark code:
1: let payoffs = map (\bb_row: [num_models]f32 -> 2: let bd_row = map4 (blackScholes bb_row) md_cs md_vols md_drifts md_sts 3: in map3 (genericPayoff contract_number) md_discts md_detvals bd_row) 4: bb_mat
futhark dev --kernels -a -e --cse -e --double-buffer -e --reuse-allocations -e OptionPricing.fut > OptionPricing.kernel
I've used these highlight settings
-- Hi-lock: (("\\_<color_23568\\_>" (0 'hi-blue-b prepend))) -- Hi-lock: (("\\_<color_23567\\_>" (0 'hi-black-b prepend))) -- Hi-lock: (("\\_<color_23566\\_>" (0 'hi-aquamarine prepend))) -- Hi-lock: (("\\_<color_23565\\_>" (0 'hi-salmon prepend))) -- Hi-lock: (("\\_<color_23564\\_>" (0 'hi-blue prepend))) -- Hi-lock: (("\\_<color_23563\\_>" (0 'hi-green prepend))) -- Hi-lock: (("\\_<color_23562\\_>" (0 'hi-pink prepend))) -- Hi-lock: (("\\_<color_23561\\_>" (0 'hi-yellow prepend)))
After highlighting all the associated values, I get this:
-- Hi-lock: (("lw_dest_23251" (0 'hi-green prepend))) -- Hi-lock: (("res_23057" (0 'hi-green prepend))) -- Hi-lock: (("mapout_23249" (0 'hi-green prepend))) -- Hi-lock: (("result_23248" (0 'hi-green prepend))) -- Hi-lock: (("stream_mapout_23211" (0 'hi-green prepend))) -- Hi-lock: (("res_22859" (0 'hi-green prepend))) -- Hi-lock: (("stream_mapout_scratch_23212" (0 'hi-green prepend))) -- Hi-lock: (("res_linear_nonext_copy_23527" (0 'hi-green prepend))) -- Hi-lock: (("res_linear_nonext_copy_23526" (0 'hi-green prepend))) -- Hi-lock: (("res_22577" (0 'hi-green prepend))) -- Hi-lock: (("lw_dest_23243" (0 'hi-blue prepend))) -- Hi-lock: (("mapout_23241" (0 'hi-blue prepend))) -- Hi-lock: (("res_22833" (0 'hi-blue prepend))) -- Hi-lock: (("result_23240" (0 'hi-blue prepend))) -- Hi-lock: (("bbrow_22755" (0 'hi-blue prepend))) -- Hi-lock: (("res_22752" (0 'hi-blue prepend))) -- Hi-lock: (("lw_dest_23239" (0 'hi-salmon prepend))) -- Hi-lock: (("mapout_23237" (0 'hi-salmon prepend))) -- Hi-lock: (("res_22792" (0 'hi-salmon prepend))) -- Hi-lock: (("result_23236" (0 'hi-salmon prepend))) -- Hi-lock: (("lw_dest_23235" (0 'hi-salmon prepend))) -- Hi-lock: (("mapout_23233" (0 'hi-salmon prepend))) -- Hi-lock: (("x_22631" (0 'hi-salmon prepend))) -- Hi-lock: (("result_23232" (0 'hi-salmon prepend))) -- Hi-lock: (("lowered_array_23265" (0 'hi-aquamarine prepend))) -- Hi-lock: (("lowered_array_updated_23278" (0 'hi-aquamarine prepend))) -- Hi-lock: (("lw_dest_23231" (0 'hi-aquamarine prepend))) -- Hi-lock: (("lowered_array_updated_23271" (0 'hi-aquamarine prepend))) -- Hi-lock: (("lowered_array_23272" (0 'hi-aquamarine prepend))) -- Hi-lock: (("modified_source_23266" (0 'hi-aquamarine prepend))) -- Hi-lock: (("modified_source_23273" (0 'hi-aquamarine prepend))) -- Hi-lock: (("mapout_23229" (0 'hi-aquamarine prepend))) -- Hi-lock: (("res_22622" (0 'hi-aquamarine prepend))) -- Hi-lock: (("result_23228" (0 'hi-aquamarine prepend))) -- Hi-lock: (("res_22598" (0 'hi-green-b prepend))) -- Hi-lock: (("lw_dest_23227" (0 'hi-green-b prepend))) -- Hi-lock: (("mapout_23225" (0 'hi-green-b prepend))) -- Hi-lock: (("result_23224" (0 'hi-green-b prepend))) -- Hi-lock: (("lw_dest_23221" (0 'hi-red-b prepend))) -- Hi-lock: (("result_23218" (0 'hi-red-b prepend))) -- Hi-lock: (("res_22549" (0 'hi-red-b prepend))) -- Hi-lock: (("double_buffer_array_23546" (0 'hi-pink prepend))) -- Hi-lock: (("inpacc_22565" (0 'hi-pink prepend))) -- Hi-lock: (("inpacc_22562" (0 'hi-pink prepend))) -- Hi-lock: (("res_double_buffer_copy_23552" (0 'hi-pink prepend))) -- Hi-lock: (("double_buffer_array_23547" (0 'hi-yellow prepend))) -- Hi-lock: (("res_double_buffer_copy_23553" (0 'hi-yellow prepend))) -- Hi-lock: (("acc0_22563" (0 'hi-yellow prepend))) -- Hi-lock: (("inpacc_22566" (0 'hi-yellow prepend))) -- Hi-lock: (("\\_<color_23568\\_>" (0 'hi-red-b prepend))) -- Hi-lock: (("\\_<color_23567\\_>" (0 'hi-green-b prepend))) -- Hi-lock: (("\\_<color_23566\\_>" (0 'hi-aquamarine prepend))) -- Hi-lock: (("\\_<color_23565\\_>" (0 'hi-salmon prepend))) -- Hi-lock: (("\\_<color_23564\\_>" (0 'hi-blue prepend))) -- Hi-lock: (("\\_<color_23563\\_>" (0 'hi-green prepend))) -- Hi-lock: (("\\_<color_23562\\_>" (0 'hi-pink prepend))) -- Hi-lock: (("\\_<color_23561\\_>" (0 'hi-yellow prepend)))
Perhaps I can get Futhark to output this information automatically?
bbrow22755 and mapout23229
Cosmin suggested that I take a look at the OpenCL implementation in order to figure out where memory is being reused.