BentoML Blog

BentoML Blog https://www.bentoml.com/blog Dive into the transformative world of AI application development with us! From expert insights to innovative use cases, we bring you the latest in efficiently deploying AI at scale. http://www.rssboard.org/rss-specification python-feedgen En Tue, 01 Jul 2025 09:02:45 +0000 Tue, 01 Jul 2025 17:02:45 +0800 What is InferenceOps? https://www.bentoml.com/blog/what-is-inference-ops Learn what InferenceOps is, why it matters, and how leading AI teams scale, optimize, and manage LLM inference for production-grade performance and reliability. Tue, 01 Jul 2025 08:41:29 +0800 The Shift to Distributed LLM Inference: 3 Key Technologies Breaking Single-Node Bottlenecks https://www.bentoml.com/blog/the-shift-to-distributed-llm-inference Explore 3 key strategies — prefill/decode disaggregation, KV cache utilization-aware load balancing, and prefix cache-aware routing — to optimize distributed LLM inference at scale. Wed, 11 Jun 2025 03:01:56 +0800 Deploying Phi-4-reasoning with BentoML: A Step-by-Step Guide https://www.bentoml.com/blog/deploying-phi-4-reasoning-with-bentoml A step-by-step guide to deploy and scale Phi-4-reasoning in the cloud with BentoML. Thu, 08 May 2025 06:57:55 +0800 25x Faster Cold Starts for LLMs on Kubernetes https://www.bentoml.com/blog/25x-faster-cold-starts-for-llms-on-kubernetes Discover how we optimized LLM container cold starts on Kubernetes with object storage, FUSE mounts, and stream-based model loading. Wed, 07 May 2025 02:51:21 +0800 How to Beat the GPU CAP Theorem in AI Inference https://www.bentoml.com/blog/how-to-beat-the-gpu-cap-theorem-in-ai-inference Learn how to solve the GPU CAP Theorem for AI inference by leveraging BentoML’s unified compute fabric for better control, on-demand availability, and cost efficiency across on-prem and cloud environments. Tue, 29 Apr 2025 07:14:12 +0800 Accelerating AI Innovation at Yext with BentoML https://www.bentoml.com/blog/accelerating-ai-innovation-at-yext-with-bentoml Learn how Yext achieved 2x faster time-to-market and reduced compute costs by 80% with BentoML’s unified inference platform. Mon, 21 Apr 2025 06:08:46 +0800 6 Infrastructure Pitfalls Slowing Down Your AI Progress https://www.bentoml.com/blog/6-infrastructure-pitfalls-slowing-down-your-ai-progress Discover 6 common AI infrastructure pitfalls that slow down AI innovation. Learn how to avoid them and accelerate your AI journey from development to production with BentoML’s scalable inference platform. Tue, 18 Mar 2025 08:03:06 +0800 The Complete Guide to DeepSeek Models: From V3 to R1 and Beyond https://www.bentoml.com/blog/the-complete-guide-to-deepseek-models-from-v3-to-r1-and-beyond Understand the differences between DeepSeek-V3, R1, and distilled models. Learn how to choose the right model and deploy them securely with BentoML. Thu, 06 Mar 2025 08:41:57 +0800 Building ML Pipelines with MLflow and BentoML https://www.bentoml.com/blog/building-ml-pipelines-with-mlflow-and-bentoml Learn to bridge ML experimentation and production using MLflow for tracking and BentoML for deployment in this end-to-end ML pipeline tutorial. Thu, 27 Feb 2025 07:03:43 +0800 2024 AI Inference Infrastructure Survey Highlights https://www.bentoml.com/blog/2024-ai-infra-survey-highlights Discover key insights from the Survey on AI Inference Infrastructure, highlighting model adoption, deployment patterns, and infrastructure challenges across 250+ organizations implementing AI solutions. Mon, 24 Feb 2025 03:03:47 +0800 Secure and Private DeepSeek Deployment with BentoML https://www.bentoml.com/blog/secure-and-private-deepseek-deployment-with-bentoml Discover why organizations are choosing private DeepSeek deployment and how BentoML makes it simple, secure, and scalable. Fri, 14 Feb 2025 08:19:31 +0800 Announcing BentoML 1.4 https://www.bentoml.com/blog/announcing-bentoml-1-4 BentoML 1.4 introduces new features and improvements to accelerate the iteration cycle and enhance the overall developer experience. Mon, 10 Feb 2025 03:10:47 +0800