This benchmark is pretty useless.
There's no benchmark code shown to see if they're doing something wrong or cherrypicking.
Also, they only tested on arm64? Why weren't they testing on x86-64?
And what the hell is this test?
Common data transformation pattern, map then aggregate with reduce.
People who care about performance are using loops, not map(). Why are you even testing the slow path?