No, it's primarily simple on-axis. Off-axis is done and shown separately for beamwidth and DI. We each settle upon those approaches which confer the highest personal level of comfort in the quality of the result.
To me, spatial averaging is dubious, difficult to interpret, and certainly much more difficult to accomplish and reproduce. If someone's reporting spatially averaged curves, I expect them to say so, as my assumption is not. I don't want to adjust frequency response for power response, either; the result is artificial. If something's anomalous, find it, figure it out, fix it.
With respect to curve smoothing, low resolution better reveals the big trends requiring adjustment. That's what's most audible, but I rarely show those here. At the other extreme, highest resolutions charactize more the quality of the test environment and methodology than the information of interest. It's a compromise.
Same with windowing. I think we pay attention to the scales, methods, and resolutions of what others post here, because we're familiar with the tradeoffs from our own experience.
As for "getting it right," I find the multiple driver/horn studies essential. It's important to work with a "typical" when doing filter fine tuning, then verify with multiples for confidence in the outcome. That also gives me the data to match pairs for systems, and a baseline for comparison to unknowns.
[And finds my defectives.... ]