Keresés

Új hozzászólás Aktív témák

  • S_x96x_S

    addikt

    válasz Petykemano #3555 üzenetére

    > Mármint az ARMos SVE - túl azon, hogy ott dinamikus/változtatható
    > a bithossz - mennyiben jobb?
    > az SVE-t az armos oldalon istenítik.

    ARM-es oldalon nem kell újrafordítani a szoftvert ..
    ha 2048 bitesre megírod .. mögötte a hardver lehet 128 vagy akár 2048 bites ..

    vagyis a következő öt évre a szoftvereket a hardver automatikusan skálázza ... nem lesz olyan mint az X86 -oldalon, hogy mindig újra kell fordítani az AVX-512 ... majd az AVX-1024 .. vagy az AVX-2048 -ra ...

    és ez az Apple oldalon nagy előny ...
    mert támogatni fogja az SVE2 -öt.

    lesz egy gyenge mag ... ami 128 bites hardverre fordítja a 2028 bites utasításokat ..
    és lesz egy erős mag ... ami 1024 bites hardver ...
    és ARM-es oldalon a kettő között egyszerű az átjárás .. szoftveres kompatibilitás megvan ... könnyű hibrid CPU -t összerakni.

    ------------------------------------------
    mig most az Inteles oldalon a Hibrid cpu-knál erős és a gyenge mag két külön implementáció hardveresen és szoftveresen .. a gyenge nem tudja az AVX-512 -öt .. emiatt az erősön is le kell tiltani
    mivel fontos a homogenitás az utasításkészletben ...
    ... szívás és hajtépés ....
    Ahogy az AT -irta
    https://www.anandtech.com/show/15877/intel-hybrid-cpu-lakefield-all-you-need-to-know/5

    "
    The hair-pulling out moment occurs when a processor has two different types of CPU core involved, and there is the potential for each of them to support different instructions or commands. Typically the scheduler makes no guarantee that software will run on any given core, so for example if you had some code written for AVX-512, it would happily run on an AVX-512 enabled core, but cause a critical fault on a core that doesn’t have AVX-512. The core won’t even know it’s an AVX-512 instruction until it comes time to decode it, and just throw an error when that happens. Not only this, but the scheduler has the right to move a thread when it needs to – if it moves a thread in the middle of an instruction stream, that can cause errors too. The processor could also move a thread to prevent thermal hotspots occurring, which will then cause a fault.

    There could be a situation where the programmer can flag that their code has specific instructions. In a program with unique instructions, there’s very often a check that tries to detect support, in order to say to itself something like ‘AVX512 will work here!’. However, all modern software assumes a homogeneous processor – that all cores will support all of the same instructions.

    It becomes a very chicken and egg problem, to a certain degree.

    The only way out of this is that both processors in a hybrid CPU have to support the same instructions completely. This means that we end up with the worst of both worlds – only instructions supported by both can be enabled. This is the lowest common denominator of the two, and means that in Lakefield we lose support for AVX-512 on Sunny Cove, but also things like GFNI, ENCLV, and CLDEMOTE in Tremont (Tremont is actually rather progressive in its instruction support)."

  • S_x96x_S

    addikt

    válasz Petykemano #3555 üzenetére

    > Egy pár mondatban összefoglalható, hogy mi az ellenszenv oka?

    az én értelmezésem szerint a fragmentáció a legnagyobb problémája
    .... a rengeteg AVX-512 variáció
    https://en.wikichip.org/wiki/x86/avx-512#Implementation
    ... aminek nehéz a támogatása ...
    meg összehasonlítva az ARM SVE2 -vel .. az AVX-512 .. gányolás...

    Az ARM-es SV2 bár késői szülés ... de alaposabban átgondolt mint az Inteles rögtönzés - és jobban skálázódik .. mobiltelefontól --- az ARM-es HPC -ig .. egy utasításrendszer ... amit bárhol lehet használni ...

    későbbi e-mail -ben jobban kifejtette ...

    -------------------------------
    "Now, that said, do I hate MMX/SSE/AVX/AVX2 with the same burning passion as AVX512? No. Because there's a big difference between them.

    MMX/SSE was a first-attempt (plus fixes). The i387 was a particularly nasty thing to be compatible with anyway, it's entirely understandable why it was done the way it was done. In hindsight, maybe it could have been done better, but a "in hindsight" argument is always complete BS. So that's not a valid argument. MMX/SSE was fine.

    AVX/AVX2 were reasonable cleanups and honestly, I don't think 256 bits is a huge pain even as a baseline. And Intel has been good about keeping AVX always there. Afaik, new CPU's really have gotten AVX reliably. So it hasn't been a fragmentation issue, and while I think it has the same state dirtying issue ("helper function using MMX instructions and saves/restores the instructions it modifies will be clearing upper bits in AVX registers and trashing state"), I think it was a fairly reasonable extension.

    So again, AVX/AVX2 was fine. Was it "lovely"? No. But I think it's a reasonable baseline.

    So what's different with AVX512?

    One fundamental difference is that fragmentation issue. It came up before AVX512 was even out, with the failed multi-core Knights atoms having a completely different versions. But it's really been obvious lately, with even today, in CPU's being sold, it being a "marketing feature".

    But the other - and to me really annoying - fundamental issue is "by now, you should have damn well have learnt from your mistakes".

    Here, look at the real competition for Intel and x86 long-term: ARM. They had an equally disgusting and horrendously bad FPU situation originally. Yes, their FPU situation was differently bad from the i387, but the whole soft-FP vs VFP vs random other implementations was arguably worse than Intel ever had, even if at the time, you would find the usual ARM fanbois that made excuses for just how horrendous the situation was.

    But then ARM got their act together, and NEON happened. I'd say that was roughly the equivalent to SSE, because I'll call the original mess of nasty shit comparable to the nofp/i387/IBM-mis-wiring-the-exception-pin/MMX era. The timing may not line up, but with NEON, ARM at least had gotten rid of their messy lack of standards, and I think it's fair to compare it to Intel and SSE conceptually.

    So ARM did SVE, and I'll call that their AVX/AVX2. But now you see signs of differences. Part of it is just the name. "S" for "Scalable". ARM is starting to do something interesting and fundamentally different from what AVX was for Intel.

    And then ARM designed SVE2, and again, let's see how it actually plays out in real life, but I think it has the potential to be their "AVX512 done right". And they designed it to have a reasonable downgrade/upgrade path, to be extensible, to do that masking and memory accesses etc that is so important for compilers to auto-parallelize.

    Honestly, if I were into HPC and vectorization, I'd be all in on the ARM bandwagon.

    As it happens, I'm not into HPC and vectorization, and it's possible that exactly because I'm not into it, I'm missing why SVE2 has some horrible problems. And I realize that AVX512 does some things that a very very very small minority of people care deeply about (I don't know why, but some people really love the shuffle instructions and will put up with absolutely anything if they get them).

    So just as a bystander, I'm looking at AVX512, and I'm looking at SVE2, and I'm going "AVX512 really is nasty, isn't it"?

    And by now it's the third big generation, and the "it wasn't clear what the right answer was" is no longer an excuse for doing things wrong. People knew that scaling up and down the CPU stack was an issue. This wasn't something where Intel couldn't have seen it coming - when Intel was designing AVX512, Intel was still trying to also enter the smartphone and IoT area.

    Have I sufficiently explained why I absolutely despise AVX512?

    And yes, maybe in five years, AVX512 is there everywhere and my fragmentation argument goes away.

    Buy maybe in five years, SVE2 is everywhere too, and is happily working in cellphones and in supercomputers, and I think I won't be the only person in the room that says "AVX512 is a butt-ugly disgrace".

    We'll see, even if it might take years. I'm happy to be proven wrong.

    And I'm here for the heated technical discussion anyway. Tell me why I'm a pinhead and a nincompoop, and why SVE2 is so bad, and why AVX512 is clearly better.

    Because this forum is about architecture design and implementation, isn't it? So I think it's very fair to put down that gauntlet: AVX512 vs SVE2. "Gong plays" - FIGHT!

    Linus"
    https://www.realworldtech.com/forum/?threadid=193189&curpostid=193248

Új hozzászólás Aktív témák