Keresés

Hirdetés

Új hozzászólás Aktív témák

  • P.H.

    senior tag

    Correctly Synchronized C++ AMP Programs alatt:

    All the threads that are launched by a parallel_for_each are potentially concurrent. Unless barriers are used, an implementation is free to schedule these threads in any order. In addition, the memory model for normal memory accesses is weak; that is, operations can be arbitrarily reordered as long as each thread executes in its original program 2808 order. Therefore, any two memory operations from any two threads in a parallel_for_each are by default concurrent, unless the application has explicitly enforced an order between these two operations by using atomic operations, fences, or barriers.

    Conversely, an implementation may also schedule only one logical thread at a time, in a non-cooperative manner; that is, without letting any other threads make any progress except for hitting a tile barrier or terminating. When a thread encounters a tile barrier, an implementation must wrest control from that thread and provide progress to some other thread in the tile until they all have reached the barrier. Similarly, when a thread finishes execution, the system is obligated to execute steps from some other thread. Therefore, an implementation is obligated to switch context between threads only when a thread has hit a barrier (barriers pertain just to the tiled parallel_for_each), or is finished. An implementation does not have to admit any concurrency at a finer level than that which is dictated by barriers and thread termination. All implementations, however, are obligated to ensure that progress is continually made, until all threads that are launched by a parallel_for_each are completed.

    An immediate corollary is that C++ AMP does not provide a mechanism that a thread could use, without using tile barriers, to poll for a change that has to be effected by another thread. In particular, C++ AMP does not support locks that are implemented by using atomic operations and fences, because a thread could end up polling forever, while waiting for a lock to become available. The usage of tile barriers enables the creation of a limited form of locking that is scoped to a thread tile.

    Nem nagyon szeretem a negatív hozzállást egy-egy megvalósuló témához, de ez "szép": több, egymást követő parallel_for_each közt nem tud kükönbséget tenni, ezt programban kell külön lekezelni... (Tile barrier vagy egy thread-finished számláló alkalmazása) már rég rossz, ha a programozási környezet nem szinkronizál automatikusan egy-egy "ciklus" lefutása után.

    Informative: More often than not, such non-deterministic locking within a tile is not really necessary, because a static schedule of the threads that is based on integer thread IDs is possible, and results in more efficient and more maintainable code. But we bring this example here for completeness and to illustrate a valid form of polling.

    Informative: This requirement, however, is typically not sufficient to allow for efficient implementations. For example, it allows for the call stack of threads to differ, when they hit a barrier. To be able to generate good quality code for vector targets, much stronger constraints should be placed on the usage of barriers, as explained later.

    Later:
    C++ AMP requires that, when a barrier is encountered by one thread:
    1. That the same barrier will be encountered by all other threads in the tile.
    2. That the sequence of active control flow statements and/or expressions be identical for all threads when they reach the barrier.
    3. That each of the correspondng control expressions be tile-uniform (which is defined below).
    4. That any active control flow statement or expression has not been departed (necessarily in a non-uniform fashion) by a break, continue, or return statement. That is, any breaking statement that instructs the program to leave an active scope must in itself behave as if it was a barrier; that is, it must adhere to the four preceding rules.

    Leírták pl. az SSE (Streaming SIMD Extensions értelmi) lényegét: nincs elágazás, nincs feltételes ugrás (= nincs kódátlépés-és-folytatás), csak ugyanazon kód lefutása minden elemre.
    Ez így eléggé konzervatív és CPU-központú megközelítés.

    [ Szerkesztve ]

    Arguing on the Internet is like running in the Special Olympics. Even if you win, you are still ... ˙˙˙ Real Eyes Realize Real Lies ˙˙˙

  • P.H.

    senior tag

    válasz LordX #27 üzenetére

    Ehhez nagyban hozzájárul a feltételes utasításvégrehajtás is. De pl. azon GPU-architektúrák, amelyek nem támogatják ezt, ott bonyolódik a helyzet. Két példa:

    NV PTX ISA: "Instructions are formed from an instruction opcode followed by a comma-separated list of zero or more operands, and terminated with a semicolon. Operands may be register variables, constant expressions, address expressions, or label names. Instructions have an optional guard predicate which controls conditional execution. The guard predicate follows the optional label and precedes the opcode, and is written as @p, where p is a predicate register. The guard predicate may be optionally negated, written as @!p."

    Az ARM is ismeri az alap utasítások jó részében a feltételes végrehajtást, azaz csak akkor hajt végre valamilyen (predicate-bit által) megjelölt utasítást, ha az előző valamelyik utasítás eredménye pl. 0 vagy negatív, stb.

    AMD GCN:
    Scalar ALU operations:
    Compare (integer, bit, min/max), move and conditional move (select operations based on SCC)
    A GCN ismeri a feltételes utasításvégrehajtást? Vagy csak azt a megközelítést, ami x86/x64-en is van?

    Mindegyik gyártó a saját alapjai felé dolgozik, a C++ AMP viszont gyártófüggetlen szoftveres megközelítés, nyilvánvaló érdeke, hogy elterjedjen mindenhol és mindenen. Ezzel lenne (nem csak az OpenCL, hanem) a CUDA és a HSA ellenfele is.

    [ Szerkesztve ]

    Arguing on the Internet is like running in the Special Olympics. Even if you win, you are still ... ˙˙˙ Real Eyes Realize Real Lies ˙˙˙

Új hozzászólás Aktív témák