SheepNav
新上线3个月前0 投票

Efficiently serve dozens of fine-tuned models with vLLM on Amazon SageMaker AI and Amazon Bedrock

In this post, we explain how we implemented multi-LoRA inference for Mixture of Experts (MoE) models in vLLM, describe the kernel-level optimizations we performed, and show you how you can benefit from this work. We use GPT-OSS 20B as our primary example throughout this post.

延伸阅读

  1. 撒丁岛人为何抵制可再生能源转型?2700年的入侵与剥削史给出答案
  2. 挪威大西洋航空推出超低价机票,但有个大问题
  3. OpenAI模型破解困扰人类80年的著名数学难题
查看原文