vllm tutorial

Red Hat Community

Steve Watt, PyTorch ambassador - Getting Started with Inference Using vLLM.

20:18

Getting Started with Inference Using vLLM

653 views

3 months ago

Red Hat Community

vLLM Semantic Router: Intelligent Auto Reasoning for Efficient LLM Inference on Mixture-of-Models

Huamin Chen, vLLM Semantic Router project creator - vLLM Semantic Router: Intelligent Auto Reasoning Router for Efficient LLM ...

32:57

vLLM Semantic Router: Intelligent Auto Reasoning for Efficient LLM Inference on Mixture-of-Models

142 views

3 months ago

Julien Simon

Open-source LLMs are great for conversational applications, but they can be difficult to scale in production and deliver latency ...

36:12

Deep Dive: Optimizing LLM inference

44,554 views

1 year ago

Vuk Rosić

Nano-vLLM - DeepSeek Engineer's Side Project - Code Explained

repo - https://github.com/GeeeekExplorer/nano-vllm/tree/main * Nano-vLLM is a simple, fast LLM server in \~1200 lines of Python ...

19:18

Nano-vLLM - DeepSeek Engineer's Side Project - Code Explained

1,559 views

7 months ago

CNCF [Cloud Native Computing Foundation]

Efficient LLM Deployment: A Unified Approach with Ray, VLLM, and Kubernetes - Lily (Xiaoxuan) Liu

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon Europe in London from April 1 - 4, 2025.

27:08

Efficient LLM Deployment: A Unified Approach with Ray, VLLM, and Kubernetes - Lily (Xiaoxuan) Liu

3,905 views

1 year ago

Red Hat Community

Greg Pereira, llm-d maintainer - Combining Kubernetes and vLLM to Deliver Scalable, Distributed Inference with llm-d.

28:26

Combining Kubernetes and vLLM to Deliver Scalable, Distributed Inference with llm-d

538 views

3 months ago

The Machine Learning Engineer

LLMOps : vLLM Integracion withLangchain #machinelearning #datascience

In this video we are going to make an introduction to vLLM technology and its integration with the Langchain library to create RAG ...

52:38

LLMOps : vLLM Integracion withLangchain #machinelearning #datascience

14,560 views

10 months ago

The Linux Foundation

Scalable and Efficient LLM Serving With the VLLM Production Stack - Junchen Jiang & Yue Zhu

Don't miss out! Join us at the next Open Source Summit in Hyderabad, India (August 5); Amsterdam, Netherland (August 25-29); ...

39:36

Scalable and Efficient LLM Serving With the VLLM Production Stack - Junchen Jiang & Yue Zhu

331 views

7 months ago

The Linux Foundation

Streamlining AI Pipelines With Elyra: From Development To Inference With KServe & VLLM - Ritesh Shah

Don't miss out! Join us at the next Open Source Summit in Seoul, South Korea (November 4-5). Join us at the premier ...

26:00

Streamlining AI Pipelines With Elyra: From Development To Inference With KServe & VLLM - Ritesh Shah

66 views

4 months ago

Red Hat Community

Description: Burkhard Ringlein, Chih-Chieh Yang, Sara Kokkila Schumacher, IBM and Rishi Astra, University of Texas - Triton for ...

37:46

Triton for vLLM

425 views

8 months ago

The Machine Learning Engineer

LLMOPS : vLLM Inference LLM Server Engine #machinelearning #datascience

In this video I will introduce you the technology vLLM , a LLM Inference and Serving library. Notebooks: ...

45:45

LLMOPS : vLLM Inference LLM Server Engine #machinelearning #datascience

230 views

1 year ago

CNCF [Cloud Native Computing Foundation]

LLMs on Kubernetes: Squeeze 5x GPU Efficiency With Cache, Route, Repea... Yuhan Liu & Suraj Deshmukh

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands ...

32:31

LLMs on Kubernetes: Squeeze 5x GPU Efficiency With Cache, Route, Repea... Yuhan Liu & Suraj Deshmukh

498 views

2 months ago

CNCF [Cloud Native Computing Foundation]

Yes You Can Run LLMs on Kubernetes - Abdel Sghiouar & Mofi Rahman, Google Cloud

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Hong Kong, China (June 10-11); ...

27:25

Yes You Can Run LLMs on Kubernetes - Abdel Sghiouar & Mofi Rahman, Google Cloud

1,029 views

9 months ago

CNCF [Cloud Native Computing Foundation]

Tutorial: Cloud Native Sustainable LLM Inference in Action

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from ...

1:24:20

Tutorial: Cloud Native Sustainable LLM Inference in Action

489 views

1 year ago

DevConf

Speaker(s): Rehan Samaratunga My auto-tuning project aims to find the best settings for running large language models using ...

9:51

Auto-tuning vllm - DevConf.US 2025

86 views

4 months ago

CNCF [Cloud Native Computing Foundation]

Lightning Talk: Best Practices for LLM Serving with DRA - Chen Wang & Abhishek Malvankar, IBM

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from ...

9:37

Lightning Talk: Best Practices for LLM Serving with DRA - Chen Wang & Abhishek Malvankar, IBM

477 views

1 year ago

The Machine Learning Engineer

LLMOps : vLLM Integracion con Langchain (Español) #machinelearning #datascience

En este Vídeo vamos a hacer una Introducción a la Tecnología vLLM y a su integracion con la libreria Lanchain para crear ...

56:03

LLMOps : vLLM Integracion con Langchain (Español) #machinelearning #datascience

5,584 views

11 months ago

CNCF [Cloud Native Computing Foundation]

Sailing Multi-host Inference for LLM on Kubernetes - Kay Yan, DaoCloud

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon India in Hyderabad (August 6-7), and ...

28:03

Sailing Multi-host Inference for LLM on Kubernetes - Kay Yan, DaoCloud

312 views

7 months ago

Dutch Algotrading

Run Your Locally Hosted Deepseek, Qwen or Codellama AI Assistant in VSCode Under 5 Minutes!

Run your Locally hosted AI Coding Assistant in VSCode with Continue extension, Ollama, Deepseek, Qwen or CodeLlama in less ...

5:26

Run Your Locally Hosted Deepseek, Qwen or Codellama AI Assistant in VSCode Under 5 Minutes!

73,438 views

1 year ago

DevConf

Speaker(s): Ashish Kamra, David Gray, Samuel Monson Modern LLM applications demand reliable, reproducible performance ...

32:45

Learn How to Run an LLM Inference Performance Benchmark on NVIDIA GPUs - DevConf.US 2025

179 views

4 months ago

Python Code Camp

shorts #short #shortvideo #python #pythonprogramming #pythonshorts #pythontips #pythontricks #chatgpt #langchain ...

0:25

Chat with PDF langchain project

53,733 views

1 year ago

The Linux Foundation

Open Source LLMs in the Cloud: Scalable Solutions - Miley Fu, WasmEdge & Hung-Ying Tai, Second State/WasmEdge The ...

40:28

Open Source LLMs in the Cloud: Scalable Solutions - Miley Fu, WasmEdge & Hung-Ying Tai, Second State

140 views

1 year ago

Paolo Cadoni

Beyond GPUs: How Groq and Cerebras Lead the Next Wave in AI Infrastructure

Everyone talks about NVIDIA when it comes to AI-but what if GPUs aren't the future? In this video, I break down why AI inference is ...

12:34

Beyond GPUs: How Groq and Cerebras Lead the Next Wave in AI Infrastructure

7,777 views

9 months ago

RavenJS

Open source model from OpenAI: gpt-oss #openai #llm #chatgpt

OpenAI just released gpt-oss-120b and gpt-oss-20b—two state-of-the-art open-weight language models that deliver strong ...

0:24

Open source model from OpenAI: gpt-oss #openai #llm #chatgpt

10,228 views

6 months ago

CNCF [Cloud Native Computing Foundation]

Tutorial: A Cross-Industry Benchmarking Tutorial for Distributed LLM Inference... Multiple Speakers

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands ...

1:18:11

Tutorial: A Cross-Industry Benchmarking Tutorial for Distributed LLM Inference... Multiple Speakers

97 views

2 months ago

ViewTube