網頁

2019年1月1日 星期二

Accelerated Computing


2. Atomic Operations (warp vote)

CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics (20141001)
https://devblogs.nvidia.com/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/

GPU Pro Tip: Fast Histograms Using Shared Atomics on Maxwell (20150317)

Accelerating Dissipative Particle Dynamics Simulation on Tesla GPUs (20150416)
https://devblogs.nvidia.com/accelerating-dissipative-particle-dynamics-simulation-tesla-gpus/

Voting and Shuffling to Optimize Atomic Operations (20150806)




instruction

閱讀Warp Vote Functions


CUDA之Warp Shuffle詳解

cuda的Shuffle技術以及自定義雙精度版本

__shfl_down and __shfl_down_sync give different results

3. Unified Memory

Beyond GPU Memory Limits with Unified Memory on Pascal (20161214)
https://devblogs.nvidia.com/beyond-gpu-memory-limits-unified-memory-pascal/
Unified Memory for CUDA Beginners (20170619)

Maximizing Unified Memory Performance in CUDA (20171119)
https://devblogs.nvidia.com/maximizing-unified-memory-performance-cuda/

4. C++11

The Power of C++11 in CUDA 7 (20150318)https://devblogs.nvidia.com/power-cpp11-cuda-7/

C++11 in CUDA: Variadic Templates (20150326)

CUDA 7 Release Candidate Feature Overview: C++11, New Libraries, and More (20150113)
https://devblogs.nvidia.com/cuda-7-release-candidate-feature-overview/

5. COOPERATIVE GROUPS

Cooperative Groups: Flexible CUDA Thread Programming (20171004)

7. Other

CUDA Spotlight: Michela Taufer on GPU-Accelerated Scientific Computing (20140821)https://devblogs.nvidia.com/cuda-spotlight-michela-taufer-gpu-accelerated-scientific-computing/



Register Cache: Caching for Warp-Centric CUDA Programs (20171012)



未來會增加
Streams, zero-copy memory, texture objects, PTX (parallel thread execution) assembly, warp-level vote/shuffle

沒有留言:

張貼留言