1. Introduction
An Even Easier Introduction to CUDA (20170125)
https://devblogs.nvidia.com/even-easier-introduction-cuda/
CUDA從入門到精通
https://blog.csdn.net/Augusdi/article/details/12833235
GPUS Ladyhttps://cloud.tencent.com/developer/user/1539448
An Even Easier Introduction to CUDA (20170125)
https://devblogs.nvidia.com/even-easier-introduction-cuda/
CUDA從入門到精通
https://blog.csdn.net/Augusdi/article/details/12833235
GPUS Ladyhttps://cloud.tencent.com/developer/user/1539448
CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics (20141001)
https://devblogs.nvidia.com/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/
GPU Pro Tip: Fast Histograms Using Shared Atomics on Maxwell (20150317)
Accelerating Dissipative Particle Dynamics Simulation on Tesla GPUs (20150416)
https://devblogs.nvidia.com/accelerating-dissipative-particle-dynamics-simulation-tesla-gpus/
Voting and Shuffling to Optimize Atomic Operations (20150806)
__shfl_down and __shfl_down_sync give different results
Using CUDA Warp-Level Primitives (20180115)
instruction
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-vote-functions
DPD slides
http://on-demand.gputechconf.com/gtc/2014/presentations/S4518-dissipative-particle-dymanics-sims-kepler.pdf
warp shuffles, and reduction / scan operations
DPD slides
http://on-demand.gputechconf.com/gtc/2014/presentations/S4518-dissipative-particle-dymanics-sims-kepler.pdf
warp shuffles, and reduction / scan operations
閱讀Warp Vote Functions
閱讀Warp Shuffle Functions
CUDA之Warp Shuffle詳解
cuda的Shuffle技術以及自定義雙精度版本
3. Unified Memory
Beyond GPU Memory Limits with Unified Memory on Pascal (20161214)
Beyond GPU Memory Limits with Unified Memory on Pascal (20161214)
https://devblogs.nvidia.com/beyond-gpu-memory-limits-unified-memory-pascal/
Unified Memory for CUDA Beginners (20170619)
Maximizing Unified Memory Performance in CUDA (20171119)
https://devblogs.nvidia.com/maximizing-unified-memory-performance-cuda/
4. C++11
The Power of C++11 in CUDA 7 (20150318)https://devblogs.nvidia.com/power-cpp11-cuda-7/
C++11 in CUDA: Variadic Templates (20150326)
Unified Memory for CUDA Beginners (20170619)
Maximizing Unified Memory Performance in CUDA (20171119)
https://devblogs.nvidia.com/maximizing-unified-memory-performance-cuda/
4. C++11
The Power of C++11 in CUDA 7 (20150318)https://devblogs.nvidia.com/power-cpp11-cuda-7/
C++11 in CUDA: Variadic Templates (20150326)
CUDA 7 Release Candidate Feature Overview: C++11, New Libraries, and More (20150113)
https://devblogs.nvidia.com/cuda-7-release-candidate-feature-overview/
5. COOPERATIVE GROUPS
Cooperative Groups: Flexible CUDA Thread Programming (20171004)
未來會增加
Streams, zero-copy memory, texture objects, PTX (parallel thread execution) assembly, warp-level vote/shuffle
5. COOPERATIVE GROUPS
Cooperative Groups: Flexible CUDA Thread Programming (20171004)
https://devblogs.nvidia.com/cooperative-groups/
CUDA 9 AND BEYOND
https://drive.google.com/file/d/1YipovGErr3mfCBG3dqlaAgfUcFt__BCd/view?usp=sharing
Cooperative Groups
https://drive.google.com/file/d/13eN5flsds307eIIAtwBNAq8BScscoyOe/view?usp=sharing
6. Stream
https://devblogs.nvidia.com/how-overlap-data-transfers-cuda-cc/
CUDA 9 AND BEYOND
https://drive.google.com/file/d/1YipovGErr3mfCBG3dqlaAgfUcFt__BCd/view?usp=sharing
Cooperative Groups
https://drive.google.com/file/d/13eN5flsds307eIIAtwBNAq8BScscoyOe/view?usp=sharing
6. Stream
https://devblogs.nvidia.com/how-overlap-data-transfers-cuda-cc/
7. Other
CUDA Spotlight: Michela Taufer on GPU-Accelerated Scientific Computing (20140821)https://devblogs.nvidia.com/cuda-spotlight-michela-taufer-gpu-accelerated-scientific-computing/
CUDA Spotlight: Michela Taufer on GPU-Accelerated Scientific Computing (20140821)https://devblogs.nvidia.com/cuda-spotlight-michela-taufer-gpu-accelerated-scientific-computing/
Register Cache: Caching for Warp-Centric CUDA Programs (20171012)
未來會增加
Streams, zero-copy memory, texture objects, PTX (parallel thread execution) assembly, warp-level vote/shuffle
沒有留言:
張貼留言