<?xml version='1.0' encoding='UTF-8'?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">
  <channel>
    <title>BentoML Blog</title>
    <link>https://www.bentoml.com/blog</link>
    <description>Dive into the transformative world of AI application development with us!From expert insights to innovative use cases, we bring you the latest in efficiently deploying AI at scale.</description>
    <docs>http://www.rssboard.org/rss-specification</docs>
    <generator>python-feedgen</generator>
    <language>En</language>
    <lastBuildDate>Tue, 10 Feb 2026 09:01:00 +0000</lastBuildDate>
    <pubDate>Tue, 10 Feb 2026 17:01:00 +0800</pubDate>
    <item>
      <title>BentoML Is Joining Modular</title>
      <link>https://www.bentoml.com/blog/bentoml-is-joining-modular</link>
      <description>BentoML is joining Modular to build the next generation of AI inference infrastructure. Learn what this means for production inference, open source, and customers.</description>
      <enclosure url="https://admin.bentoml.com/uploads/Modular_Bento_650fc60de9.png" length="0" type="image/jpeg"/>
      <pubDate>Tue, 10 Feb 2026 14:36:28 +0800</pubDate>
    </item>
    <item>
      <title>6 Production-Tested Optimization Strategies for High-Performance LLM Inference</title>
      <link>https://www.bentoml.com/blog/6-production-tested-optimization-strategies-for-high-performance-llm-inference</link>
      <description>This guide helps you match specific LLM inference bottlenecks to the highest-impact optimization strategies, and understand when to implement each one as your workloads evolve.</description>
      <enclosure url="https://admin.bentoml.com/uploads/inference_optimizations_cover_image_blog_005e00c030.png" length="0" type="image/jpeg"/>
      <pubDate>Wed, 14 Jan 2026 15:30:46 +0800</pubDate>
    </item>
    <item>
      <title>Beyond Tokens-per-Second: How to Balance Speed, Cost, and Quality in LLM Inference</title>
      <link>https://www.bentoml.com/blog/beyond-tokens-per-second-how-to-balance-speed-cost-and-quality-in-llm-inference</link>
      <description>This guide shows enterprise teams how to identify hidden trade-offs in LLM deployment and evaluate performance through the lens of your actual workloads, not simplified metrics.</description>
      <enclosure url="https://admin.bentoml.com/uploads/balance_speed_cost_and_quality_in_llm_inference_0037472c4a.png" length="0" type="image/jpeg"/>
      <pubDate>Mon, 12 Jan 2026 09:55:41 +0800</pubDate>
    </item>
    <item>
      <title>The Best Open-Source Small Language Models (SLMs) in 2026</title>
      <link>https://www.bentoml.com/blog/the-best-open-source-small-language-models</link>
      <description>Small language models (SLMs) are compact LLMs designed to run efficiently in resource-constrained environments. They are now good enough for many production workloads.</description>
      <enclosure url="https://admin.bentoml.com/uploads/slms_blog_cover_image_9bbe44a1c9.png" length="0" type="image/jpeg"/>
      <pubDate>Tue, 16 Dec 2025 21:25:59 +0800</pubDate>
    </item>
    <item>
      <title>7 Days to Prototype: How Jabali AI Accelerated Time-to-Value with Bento Inference Platform</title>
      <link>https://www.bentoml.com/blog/7-days-to-prototype-how-jabali-ai-accelerated-time-to-value-with-bento-inference-platform</link>
      <description>Learn how Jabali AI partnered with Bento to stand up a complex generative visual asset pipeline and deploy unsupported models, all without hiring an infrastructure team.</description>
      <enclosure url="https://admin.bentoml.com/uploads/Bento_x_Jabali_Header_71fb6d8e27.png" length="0" type="image/jpeg"/>
      <pubDate>Thu, 11 Dec 2025 03:47:08 +0800</pubDate>
    </item>
    <item>
      <title>Why Bento Is Built For Full-Scale AI Production Workloads</title>
      <link>https://www.bentoml.com/blog/why-bento-is-built-for-full-scale-ai-production-workloads</link>
      <description>The Bento Inference Platform is designed from the ground up to close that gap, delivering the speed, reliability, and control needed to run AI confidently in production.</description>
      <enclosure url="https://admin.bentoml.com/uploads/Bento_x_Production_Workloads_Objection_Handler_Header_8ac3bf1c7a.png" length="0" type="image/jpeg"/>
      <pubDate>Thu, 11 Dec 2025 03:04:22 +0800</pubDate>
    </item>
    <item>
      <title>Running Local LLMs with Ollama: 3 Levels from Laptop to Cluster-Scale Distributed Inference</title>
      <link>https://www.bentoml.com/blog/running-local-llms-with-ollama-3-levels-from-local-to-distributed-inference</link>
      <description>Learn the three levels of running LLMs: from local models with Ollama to high-performance runtimes and full distributed inference across regions and clouds.</description>
      <enclosure url="https://admin.bentoml.com/uploads/3_levels_of_running_llms_0ecc08ce4b.png" length="0" type="image/jpeg"/>
      <pubDate>Mon, 01 Dec 2025 16:50:57 +0800</pubDate>
    </item>
    <item>
      <title>Scaling Inference For AI Startups: Choosing The Right Approach For Your Stage</title>
      <link>https://www.bentoml.com/blog/scaling-inference-for-ai-startups-choosing-the-right-approach-for-your-stage</link>
      <description>This article breaks down the five essential tools that make up a modern inference stack, explains where each fits in the startup journey, and highlights the top providers in each category.</description>
      <enclosure url="https://admin.bentoml.com/uploads/Bento_x_Scaling_Inference_For_AI_Startups_Header_d04e302410.png" length="0" type="image/jpeg"/>
      <pubDate>Thu, 27 Nov 2025 01:39:17 +0800</pubDate>
    </item>
    <item>
      <title>What is GPU Memory and Why it Matters for LLM Inference</title>
      <link>https://www.bentoml.com/blog/what-is-gpu-memory-and-why-it-matters-for-llm-inference</link>
      <description>A complete guide to GPU memory for LLMs: VRAM, KV cache, context windows, quantization, parallelism, and inference optimizations for faster, more efficient inference.</description>
      <enclosure url="https://admin.bentoml.com/uploads/gpu_memory_vram_d141d4b427.png" length="0" type="image/jpeg"/>
      <pubDate>Fri, 21 Nov 2025 15:07:11 +0800</pubDate>
    </item>
    <item>
      <title>Deploying gpt-oss with vLLM and BentoML</title>
      <link>https://www.bentoml.com/blog/deploying-a-large-language-model-with-bentoml-and-vllm</link>
      <description>Self-host gpt-oss with vLLM and BentoML. Learn to build a fast, private reasoning API and deploy it on BentoCloud with autoscaling.</description>
      <enclosure url="https://admin.bentoml.com/uploads/run_gpt_oss_vllm_48bf40593e.png" length="0" type="image/jpeg"/>
      <pubDate>Wed, 05 Nov 2025 16:37:00 +0800</pubDate>
    </item>
    <item>
      <title>Where to Buy or Rent GPUs for LLM Inference: The 2026 GPU Procurement Guide</title>
      <link>https://www.bentoml.com/blog/where-to-buy-or-rent-gpus-for-llm-inference</link>
      <description>Find the best GPUs for LLM inference. Compare hyperscaler, GPU cloud, and on-prem options, understand pricing and availability, and learn how Bento simplifies cross-region and multi-cloud GPU management.</description>
      <enclosure url="https://admin.bentoml.com/uploads/gpu_procurement_guide_5d3926ee4b.jpg" length="0" type="image/jpeg"/>
      <pubDate>Fri, 31 Oct 2025 16:53:57 +0800</pubDate>
    </item>
    <item>
      <title>InferenceOps: The Strategic Foundation For Scaling Enterprise AI</title>
      <link>https://www.bentoml.com/blog/the-strategic-foundation-for-scaling-enterprise-ai</link>
      <description>To deliver production-grade performance, inference has to move from a secondary concern to a first-class operational discipline.</description>
      <enclosure url="https://admin.bentoml.com/uploads/Bento_x_Scaling_Enterprise_AI_Header_625d8f2bb3.png" length="0" type="image/jpeg"/>
      <pubDate>Fri, 31 Oct 2025 04:07:19 +0800</pubDate>
    </item>
    <item>
      <title>Deploy AI Anywhere with One Unified Inference Platform</title>
      <link>https://www.bentoml.com/blog/deploy-ai-anywhere-with-one-unified-inference-platform</link>
      <description>The question every AI team now faces is how to deploy models wherever they’re needed, without rebuilding infrastructure for each environment or compromising on security, compliance, or performance.</description>
      <enclosure url="https://admin.bentoml.com/uploads/Bento_x_Deploy_AI_Anywhere_Header_13891cfffc.png" length="0" type="image/jpeg"/>
      <pubDate>Fri, 31 Oct 2025 03:51:56 +0800</pubDate>
    </item>
    <item>
      <title>DeepSeek-OCR Explained: How Contexts Optical Compression Redefines AI Efficiency</title>
      <link>https://www.bentoml.com/blog/deepseek-ocr-contexts-optical-compression-explained</link>
      <description>Learn how DeepSeek-OCR redefines AI efficiency with Contexts Optical Compression, turning text into vision for faster, cheaper, long-context LLMs.</description>
      <enclosure url="https://admin.bentoml.com/uploads/deepseek_ocr_blog_image_da25d0c17f.png" length="0" type="image/jpeg"/>
      <pubDate>Fri, 24 Oct 2025 17:25:10 +0800</pubDate>
    </item>
    <item>
      <title>ChatGPT Usage Limits: What They Are and How to Get Rid of Them</title>
      <link>https://www.bentoml.com/blog/chatgpt-usage-limits-explained-and-how-to-remove-them</link>
      <description>Learn ChatGPT usage limits for Free, Plus, Business, and Pro plans (2025 update). Understand why they exist and how to remove them with self-hosted LLMs.</description>
      <enclosure url="https://admin.bentoml.com/uploads/chatgpt_usage_limits_ca1dce063a.png" length="0" type="image/jpeg"/>
      <pubDate>Thu, 23 Oct 2025 15:35:25 +0800</pubDate>
    </item>
    <item>
      <title>Bento vs. SageMaker: Which Inference Platform Is Right for Enterprise AI?</title>
      <link>https://www.bentoml.com/blog/which-inference-platform-is-right-for-enterprise-ai</link>
      <description>For Heads of AI, choosing the right inference platform isn’t just a technical decision, but a strategic one.</description>
      <enclosure url="https://admin.bentoml.com/uploads/Bento_vs_Sage_Maker_Header_3a42a73763.png" length="0" type="image/jpeg"/>
      <pubDate>Tue, 21 Oct 2025 23:12:48 +0800</pubDate>
    </item>
    <item>
      <title>Top-Rated LLMs for Chat in 2025</title>
      <link>https://www.bentoml.com/blog/navigating-the-world-of-open-source-large-language-models</link>
      <description>Explore the best open-source LLMs and find answers to common FAQs about performance, inference optimization, and self-hosted deployment.</description>
      <enclosure url="https://admin.bentoml.com/uploads/best_open_source_llms_75c89214e6.png" length="0" type="image/jpeg"/>
      <pubDate>Fri, 10 Oct 2025 09:59:10 +0800</pubDate>
    </item>
    <item>
      <title>NVIDIA Data Center GPUs Explained: From A100 to B200 and Beyond</title>
      <link>https://www.bentoml.com/blog/nvidia-data-center-gpus-explained-a100-h200-b200-and-beyond</link>
      <description>Understand NVIDIA data center GPUs for AI inference. Compare T4, L4, A100, H100, H200, and B200 on use cases, memory, and pricing to choose the right GPU.</description>
      <enclosure url="https://admin.bentoml.com/uploads/nvidia_data_center_gpus_00670051bb.jpg" length="0" type="image/jpeg"/>
      <pubDate>Thu, 28 Aug 2025 10:53:29 +0800</pubDate>
    </item>
    <item>
      <title>The Complete Guide to DeepSeek Models: V3, R1, V3.1, V3.2 and Beyond</title>
      <link>https://www.bentoml.com/blog/the-complete-guide-to-deepseek-models-from-v3-to-r1-and-beyond</link>
      <description>Understand the differences between DeepSeek-V3, R1, V3.1, V3.2, and distilled models. Learn how to choose the right model and deploy them securely with BentoML.</description>
      <enclosure url="https://admin.bentoml.com/uploads/deepseek_models_4a64fc090c.png" length="0" type="image/jpeg"/>
      <pubDate>Fri, 22 Aug 2025 10:32:44 +0800</pubDate>
    </item>
  </channel>
</rss>
