falcon

Architecture

  • Positionnal embeddings: rotary (Su et al., 2021);
  • Attention: multiquery (Shazeer et al., 2019) and FlashAttention (Dao et al., 2022);
  • Decoder-block: parallel attention/MLP with a single layer norm.

Spec

  • AWS SageMaker, on 64 A100 40GB GPUs in P4d instances.
  • 40B parameters
  • 1,000B tokens of RefinedWeb
  • Falcon-40B-Instruct = Falcon-40B + 150M tokens (Baize + 5% RefinedWeb).