Hirdetés

Új hozzászólás Aktív témák

  • Oliverda

    félisten

    válasz Oliverda #6594 üzenetére

    Ha már szóba került az SMT pontosabban a HT akkor itt egy teszt amibe na Netburst és a Nehalem HT hatékonyságát próbálják összemérni:

    Pentium XE vs Core i3 Hyper-Threading review

    Itt meg egy kis találgatás:

    Llano core: potentially superior to Nehalem - clearly!... yes i believe so!.. 32B fetches with (if the announcement about improved fetch materializes) similar branch/loop handling as Nehalem, much better execution with cluster design and 6 ports for execution units with an augmented (84 to 72 entries meaning an augment of ~15% in the instruction window(ROB) along with many tweaks allover, how difficult would be to push that to +20% performance over Deneb ?) instruction window on par with Nehalem, improved(much?) memory fill with forwarding schemes and disambiguation close or on par with nehalem.. yet clearly bottlenecked at decode with only 3 decode slots( it doesn't but only if it could borrow the o-o-o decode slots of BD ! )... but its "balanced" and clustered nature should make it much more clock friendly than Nehalem... I wouldn't be surprised if the gain is close to 20% comparing core to core at the same clock with K10... also considering "turbo" and power management... It should mark the end of the narrower(3 wide) and faster single thread core processors... yet i hope we will see an improved Llano2...

    Bulldozer: truly advanced. Yet IMHO, bottlenecked at decode with only 4 decode slots( yet advanced o-o-o design) for the potential of 2 threads. "IF" with only 2 ALU per core/cluster if should also be bottlenecked at execution, considering that the 2 core clusters are independent and never work together.

    This processor with 2 L1-$I cache blocks interleaving accesses consecutively, it seems that is the rumor now, should be a fetch monster, and if it provides "value prediction" with superb re-execution mechanisms upon a much better memory fill and all the other techs to speed up the back end, it should really be a "throughput" monster... yet that decode bottleneck and narrow execution upon each thread should help nothing. BD screams for Macro-ops and uOPs fusion as it is( has identical problems of core2 and Nehalem) .

    Clock to clock it should be in league with Llano core at single thread execution ( perhaps a "startegic" decision... for not trashing Llano sales ?)... perhaps a tiny little better, yet clearly inferior to SB at single thread execution with 3 ALUs and 6 ports for 2 threads ( "IF" there is only 2 ALUs per core/clusters in BD). But OTOH it "could" be a speed monster, compensating that way its inferior fixed point(INT) narrowness. If only 1 more decode slots at the front end, and 1 more ALU at each core/cluster, and this processor would completely burier SB, if not at single thread then at multithreading when the 2 cores are actives; that is, firing 2 threads in the same module would not hurt fixed point(INT) execution in any of the core/clusters, as it seems it might happen now...

    Ugyanez a Sandy Bridge-ről:

    An evolution of Nehalem, a more elaborated tick not a tock. Yet it addresses one of the more fundamental bottlenecks of Nehalem at execution with 6 ports for the execution units. The rest should be more tweaked accounting for Hyperthreading. "IF" the front-end has separated L1-$i cache blocks for the 2 simultaneous threads it should also improve this fetch 16B bottleneck upon single thread code, it remains to be seen if it can achieve 20% over Nehalem which i doubt... nevertheless it should provide a good boost over Nehalem. ZeroingIdioms should give the entrance into FMAC execution with compiler oriented "code transformations" and in here there is a serious advantage, if for nothing else, at benchmarketing with Intel code.

    "Minden negyedik-ötödik magyar funkcionális analfabéta – derült ki a nemzetközi felmérésekből."

Új hozzászólás Aktív témák