-
Questions: Clarifying the use of FP8 for Training #99
-
Question(weight quantization): Why not write out fp8 after performing weight update?
-
AdamW FP8 optimizer CUDA code
Aug 03, 20251 min read
Questions: Clarifying the use of FP8 for Training #99
Question(weight quantization): Why not write out fp8 after performing weight update?
AdamW FP8 optimizer CUDA code