but isn't the higher effort with "optimizing" useful for getting
a better optimized code from blitz than otherwise? when the
compiler would see the larger picture of the whole program,
wouldn't it then be able to produce faster code by better inlining?