LLM

Model

LLM

AffiliationModelGithubSizeTrainInferMore
METALLaMa7B/13B/33B/65Bhfgithub
DatabricsDollyDolly12Bv
LAION.ai
Stability.ai
Eleuther.AI
BigScienceBLOOM176B

Variation

AffiliationModelSizeBase
Baize7B

Tech

Megatron-DeepSpeed

Megatron-DeepSpeed

  • https://github.com/NVIDIA/Megatron-LM

  • https://github.com/microsoft/DeepSpeed

  • https://github.com/microsoft/Megatron-DeepSpeed

  • https://github.com/bigscience-workshop/Megatron-DeepSpeed)

  • HF LLaMA modeling_llama

  • Stanford Alpaca stanford_alpaca

  • Transformer Reinforcement Learning X trlx

  • 3D parallelism

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

  • ZeRO sharding
  • pipeline parallelism

DeepSpeed-Chat

DeepSpedd support

  • Optimizer state partitioning (ZeRO stage 1)
  • Gradient partitioning (ZeRO stage 2)
  • Parameter partitioning (ZeRO stage 3)
  • Custom mixed precision training handling
  • A range of fast CUDA-extension-based optimizers
  • ZeRO-Offload to CPU and Disk/NVMe

huggingface deepspeed

Megatron-LM

Megatron-LM is a large, powerful transformer model framework developed by the Applied Deep Learning Research team at NVIDIA.

  • Tensor Parallelism
  • main_grad

Workflow

https://github.com/microsoft/DeepSpeedExamples/tree/master/applications/DeepSpeed-Chat

https://github.com/EleutherAI/gpt-neox

https://github.com/CarperAI/trlx/blob/main/examples/summarize_rlhf/reward_model/reward_model.py

Question

  • How loss was caculated in SFT ? What's the difference between Pretraining ?