LLM
Model
LLM
Affiliation | Model | Github | Size | Train | Infer | More |
---|---|---|---|---|---|---|
META | LLaMa | 7B/13B/33B/65B | hf | github | ||
Databrics | Dolly | Dolly | 12B | v | ||
LAION.ai | ||||||
Stability.ai | ||||||
Eleuther.AI | ||||||
BigScience | BLOOM | 176B |
Variation
Affiliation | Model | Size | Base |
---|---|---|---|
Baize | 7B |
-
LLaMA arXiv
-
OPT https://github.com/facebookresearch/metaseq/tree/main/projects/OPT
-
Bloom arxiv
-
Bloom huggingface
Tech
Megatron-DeepSpeed
-
https://github.com/NVIDIA/Megatron-LM
-
https://github.com/microsoft/DeepSpeed
-
https://github.com/microsoft/Megatron-DeepSpeed
-
https://github.com/bigscience-workshop/Megatron-DeepSpeed)
-
HF LLaMA modeling_llama
-
Stanford Alpaca stanford_alpaca
-
Transformer Reinforcement Learning X trlx
-
3D parallelism
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
- ZeRO sharding
- pipeline parallelism
DeepSpedd support
- Optimizer state partitioning (ZeRO stage 1)
- Gradient partitioning (ZeRO stage 2)
- Parameter partitioning (ZeRO stage 3)
- Custom mixed precision training handling
- A range of fast CUDA-extension-based optimizers
- ZeRO-Offload to CPU and Disk/NVMe
Megatron-LM
Megatron-LM is a large, powerful transformer model framework developed by the Applied Deep Learning Research team at NVIDIA.
- Tensor Parallelism
- main_grad
Workflow
https://github.com/microsoft/DeepSpeedExamples/tree/master/applications/DeepSpeed-Chat
https://github.com/EleutherAI/gpt-neox
https://github.com/CarperAI/trlx/blob/main/examples/summarize_rlhf/reward_model/reward_model.py
Question
- How loss was caculated in SFT ? What's the difference between Pretraining ?