Storage architecture for distributed trainingIn distributed training, multiple GPUs in a single node or in multiple nodes work together to train a single model. This requires massive…7h ago7h ago
The art of setting learning rateThe learning rate is a training hyperparameter that has a small positive value between 0.0 and 1.0 (e.g. 1e-5). During training, the…Nov 29Nov 29
Published inBetter MLTraining Small Language Models on a BudgetList of training optimizations!Apr 20Apr 20
Published inBetter MLWill it scale to N GPUs ?Multi-node training & cluster network bandwidthApr 1Apr 1
Published inBetter MLPerf model cardsModel cards are metadata for trained ML models that provide benchmarked evaluation and performance characteristics. It is an effective…Feb 121Feb 121
Published inBetter MLLLM serving challengesDiscussing the unique serving challenges of LLMsJan 271Jan 271