概述
triton-ops is a collection of triton-x kernels designed for LLM training. It can effectively increase multi-NPU/GPU/MLU training throughput and reduces memory usage.
triton-ops is a collection of triton-x kernels designed for LLM training. It can effectively increase multi-NPU/GPU/MLU training throughput and reduces memory usage.