Published inBetter MLOvercoming inter-node bandwidth bottleneck through Hybrid-FSDPOptimize distributed training with network bottlenecksJan 20Jan 20
Published inBetter MLTeach an old dog new tricks: LLM continual pre-training (CPT)Continue pre-training on new tasks => Data and compute efficient models.Jan 3Jan 3
Published inBetter MLEvaluating Pre-training DatasetsBetter dataset evaluation => Robust datasets => Powerful modelsJan 1Jan 1
Published inBetter MLTackling OOM: Strategies for Reliable ML Training on KubernetesTackle OOMs => reliable training => win !Dec 28, 2024Dec 28, 2024
Published inBetter MLML training & Remote Direct Memory Access (RDMA)zero-copy data transfer => faster communication => larger models.Dec 28, 2024Dec 28, 2024
Published inBetter MLQuantization Aware Training (QAT) vs. Post-Training Quantization (PTQ)Smaller models => Faster inference => Better outcomesDec 25, 20242Dec 25, 20242
Published inBetter MLStorage architecture for distributed trainingIn distributed training, multiple GPUs in a single node or in multiple nodes work together to train a single model. This requires massive…Dec 23, 2024Dec 23, 2024
Published inBetter MLMetrics for evaluating LLMsELO Rating, BLEU, Perplexity and Cross EntropyDec 23, 2024Dec 23, 2024
Published inBetter MLThe art of setting learning rateThe learning rate is a training hyperparameter that has a small positive value between 0.0 and 1.0 (e.g. 1e-5). During training, the…Nov 29, 2024Nov 29, 2024