1. welcome
  2. Hello World
  3. Open Source
  4. 1. vllm
  5. 2. pytorch
    ❱
    1. 2.1. torchrun
    2. 2.2. tensor
    3. 2.3. autograd
    4. 2.4. operator
    5. 2.5. profiler
    6. 2.6. hook
    7. 2.7. elastic
    8. 2.8. patch
    9. 2.9. misc
  6. 3. paddlepaddle
    ❱
    1. 3.1. ps
    2. 3.2. framework
    3. 3.3. cinn
    4. 3.4. dataloader
  7. 4. horovod
    ❱
    1. 4.1. run
    2. 4.2. workflow
    3. 4.3. object
    4. 4.4. develop
    5. 4.5. pytorch
    6. 4.6. tensorflow
    7. 4.7. elastic
  8. 5. ray
    ❱
    1. 5.1. overview
    2. 5.2. gcs
    3. 5.3. raylet
    4. 5.4. api
    5. 5.5. survey
  9. 6. llama
  10. 7. nccl
  11. 8. megatron
  12. 9. deepspeed
  13. Survey
  14. 10. survey
    ❱
    1. 10.1. pollux
    2. 10.2. adasum
    3. 10.3. adaptation_learning
    4. 10.4. gradient_descent
    5. 10.5. auto_parallel
    6. 10.6. scheduling
    7. 10.7. gradient_compression
      ❱
      1. 10.7.1. dgc
      2. 10.7.2. csc
    8. 10.8. flash attention
    9. 10.9. LoRA
  15. 11. models
    ❱
    1. 11.1. llm
    2. 11.2. falcon
    3. 11.3. llama
    4. 11.4. peft
    5. 11.5. transformer
    6. 11.6. models
  16. Programming
  17. 12. python
    ❱
    1. 12.1. concurrent execution
    2. 12.2. multiprocessing
    3. 12.3. decorator
  18. 13. golang
    ❱
    1. 13.1. golang error
  19. 14. cplusplus
    ❱
    1. 14.1. enable_shared_from_this
  20. Mathematics
  21. 15. mathematics
    ❱
    1. 15.1. basic
    2. 15.2. entropy
    3. 15.3. newton
    4. 15.4. regression
    5. 15.5. conjugate descent
    6. 15.6. gradient descent
    7. 15.7. pca
    8. 15.8. support vector
    9. 15.9. differentiation
    10. 15.10. fourier
    11. 15.11. kmeans
  22. 16. wavelets
    ❱
    1. 16.1. plan
    2. 16.2. preliminary
    3. 16.3. haar wavelet
    4. 16.4. fourier analysis
    5. 16.5. uncertainty principle
    6. 16.6. multiresolution
  23. Learning Deep
  24. 17. kubernetes
    ❱
    1. 17.1. concepts
    2. 17.2. scheduler
    3. 17.3. operator
    4. 17.4. device plugin
    5. 17.5. docker
    6. 17.6. install
    7. 17.7. api-service
    8. 17.8. controller
  25. 18. cuda
  26. 19. todo
    ❱
    1. 19.1. gloo
    2. 19.2. mpi
    3. 19.3. jax
    4. 19.4. tvm
    5. 19.5. llm
  27. 20. notes
    ❱
    1. 20.1. influence and persuasion
    2. 20.2. freynman technique
    3. 20.3. wavelet signal processing
  28. 21. tips
    ❱
    1. 21.1. ip_local_port_range
  29. 22. infrastructure
    ❱
    1. 22.1. pki
    2. 22.2. linux cache
  30. 23. projects
    ❱
    1. 23.1. copilot
    2. 23.2. library
    3. 23.3. RAG
  31. 24. chronicles
    ❱
    1. 24.1. feb 2024

Aller au boulot

Scheduling

OSDI 2021

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

Aurick Qiao et al. Petuum, Inc, CMU, UCB. Cite 14

Award: Jay Lepreau Best Paper

Oort: Efficient Federated Learning via Guided Participant Selection

Fan Lai et al. University of Michigan. Cite 17

PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections

Haojie Wang et al. Tsinghua, CMU, FB. Cite 4

Privacy Budget Scheduling

Tao Luo et al. Columbia University, Microsoft Research. Cite 2

OSDI 2020

Providing SLOs for Resource-Harvesting VMs in Cloud Platforms

Pradeep Ambati et al. Microsoft Azure, Microsoft Research.

The CacheLib Caching Engine: Design and Experiences at Scale

Benjamin Berg et al. CMU, FB, MS

Twine: A Unified Cluster Management System for Shared Infrastructure

Chunqiang Tang et al. FB

FIRM: An Intelligent Fine-Grained Resource Management Framework for SLO-Oriented Microservices

Haoran Qiu et al. UIUC

Building Scalable and Flexible Cluster Managers Using Declarative Programming

Lalith Suresh et al. VMware, IST, UIUC ...

Protean: VM Allocation Service at Scale

Ori Hadary et al. MS