1. welcome
  2. Hello World
  3. Open Source
  4. 1. vllm
  5. 2. sglang
  6. 3. pytorch
    ❱
    1. 3.1. torchrun
    2. 3.2. tensor
    3. 3.3. autograd
    4. 3.4. operator
    5. 3.5. profiler
    6. 3.6. hook
    7. 3.7. elastic
    8. 3.8. patch
    9. 3.9. misc
  7. 4. paddlepaddle
    ❱
    1. 4.1. ps
    2. 4.2. framework
    3. 4.3. cinn
    4. 4.4. dataloader
  8. 5. horovod
    ❱
    1. 5.1. run
    2. 5.2. workflow
    3. 5.3. object
    4. 5.4. develop
    5. 5.5. pytorch
    6. 5.6. tensorflow
    7. 5.7. elastic
  9. 6. ray
    ❱
    1. 6.1. overview
    2. 6.2. gcs
    3. 6.3. raylet
    4. 6.4. api
    5. 6.5. survey
  10. 7. llama
  11. 8. nccl
  12. 9. megatron
  13. 10. deepspeed
  14. 11. nanochat
  15. Survey
  16. 12. survey
    ❱
    1. 12.1. pollux
    2. 12.2. adasum
    3. 12.3. adaptation_learning
    4. 12.4. gradient_descent
    5. 12.5. auto_parallel
    6. 12.6. scheduling
    7. 12.7. gradient_compression
      ❱
      1. 12.7.1. dgc
      2. 12.7.2. csc
    8. 12.8. flash attention
    9. 12.9. LoRA
  17. 13. models
    ❱
    1. 13.1. llm
    2. 13.2. falcon
    3. 13.3. llama
    4. 13.4. peft
    5. 13.5. transformer
    6. 13.6. models
  18. Programming
  19. 14. python
    ❱
    1. 14.1. concurrent execution
    2. 14.2. multiprocessing
    3. 14.3. decorator
  20. 15. golang
    ❱
    1. 15.1. golang error
  21. 16. cplusplus
    ❱
    1. 16.1. enable_shared_from_this
  22. Mathematics
  23. 17. mathematics
    ❱
    1. 17.1. basic
    2. 17.2. entropy
    3. 17.3. newton
    4. 17.4. regression
    5. 17.5. conjugate descent
    6. 17.6. gradient descent
    7. 17.7. pca
    8. 17.8. support vector
    9. 17.9. differentiation
    10. 17.10. fourier
    11. 17.11. kmeans
  24. 18. wavelets
    ❱
    1. 18.1. plan
    2. 18.2. preliminary
    3. 18.3. haar wavelet
    4. 18.4. fourier analysis
    5. 18.5. uncertainty principle
    6. 18.6. multiresolution
  25. Learning Deep
  26. 19. kubernetes
    ❱
    1. 19.1. concepts
    2. 19.2. scheduler
    3. 19.3. operator
    4. 19.4. device plugin
    5. 19.5. docker
    6. 19.6. install
    7. 19.7. api-service
    8. 19.8. controller
  27. 20. cuda
    ❱
    1. 20.1. 101
  28. 21. todo
    ❱
    1. 21.1. gloo
    2. 21.2. mpi
    3. 21.3. jax
    4. 21.4. tvm
    5. 21.5. llm
  29. 22. notes
    ❱
    1. 22.1. influence and persuasion
    2. 22.2. freynman technique
    3. 22.3. wavelet signal processing
  30. 23. tips
    ❱
    1. 23.1. ip_local_port_range
  31. 24. infrastructure
    ❱
    1. 24.1. pki
    2. 24.2. linux cache
  32. 25. projects
    ❱
    1. 25.1. copilot
    2. 25.2. library
    3. 25.3. RAG
  33. 26. chronicles
    ❱
    1. 26.1. feb 2024

Aller au boulot

Scheduling

OSDI 2021

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

Aurick Qiao et al. Petuum, Inc, CMU, UCB. Cite 14

Award: Jay Lepreau Best Paper

Oort: Efficient Federated Learning via Guided Participant Selection

Fan Lai et al. University of Michigan. Cite 17

PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections

Haojie Wang et al. Tsinghua, CMU, FB. Cite 4

Privacy Budget Scheduling

Tao Luo et al. Columbia University, Microsoft Research. Cite 2

OSDI 2020

Providing SLOs for Resource-Harvesting VMs in Cloud Platforms

Pradeep Ambati et al. Microsoft Azure, Microsoft Research.

The CacheLib Caching Engine: Design and Experiences at Scale

Benjamin Berg et al. CMU, FB, MS

Twine: A Unified Cluster Management System for Shared Infrastructure

Chunqiang Tang et al. FB

FIRM: An Intelligent Fine-Grained Resource Management Framework for SLO-Oriented Microservices

Haoran Qiu et al. UIUC

Building Scalable and Flexible Cluster Managers Using Declarative Programming

Lalith Suresh et al. VMware, IST, UIUC ...

Protean: VM Allocation Service at Scale

Ori Hadary et al. MS