Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mark_start()/mark_end() sometimes break autovectorization #30

Open
Seelengrab opened this issue Dec 2, 2022 · 1 comment
Open

mark_start()/mark_end() sometimes break autovectorization #30

Seelengrab opened this issue Dec 2, 2022 · 1 comment
Labels

Comments

@Seelengrab
Copy link
Collaborator

Seelengrab commented Dec 2, 2022

Adding mark_start() to the tight inner loop here:

@inline function scorep1(opp, me)
    isdraw = opp == me
    iswin  = (opp+0x1 == me) | (me+0x2 == opp)
    me + (0x3*isdraw) + (0x6*iswin)
end

@inline function scorep2(opp, target)
    mychoice = mod1(opp + mod1(target+0x1, 0x3), 0x3)
    mychoice + 0x3*(target-0x1)
end

solve(file::String) = solve(read(file))
function solve(data, f::F=scorep1) where F
    l = length(data)
    acc = UInt16(0)
    @inbounds @simd for idx in 1:4:l
        opp = data[idx + 0] - UInt8('A') + 0x1
        me  = data[idx + 2] - UInt8('X') + 0x1
        acc += f(opp, me)
    end
    acc
end

Breaks vectorization pretty badly. It goes from happily using lots of xmm to only using eax & friends. I just wanted to know how much performance was still left on the table, which is kind of hard to do when the tool breaks the vectorization. I don't yet know how, so this issue is just here for tracking this in general, but it ought to be possible to have our cake & eat it too here.

@Robert-j7
Copy link

I tried a few random things, adding a nop @asmcall, calling @llvm.donothing(),and replacing @simd with LLVMLoopInfo none worked. Then found this in the llvm documentation:

However, this interferes with optimizations like loop vectorization and may have an impact on the code generated. 
This is because the __asm statements are seen as real code having important side effects, which limits how the code around them can be transformed. 
If users want to make use of inline assembly to emit markers, then the recommendation is to always verify that the output assembly is equivalent to the assembly generated in the absence of markers.

How should we go about implementing #31 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants