vllm tutorial

Red Hat Community

Steve Watt, PyTorch ambassador - Getting Started with Inference Using vLLM.

20:18

Getting Started with Inference Using vLLM

632 views

3 months ago

Red Hat Community

vLLM Semantic Router: Intelligent Auto Reasoning for Efficient LLM Inference on Mixture-of-Models

Huamin Chen, vLLM Semantic Router project creator - vLLM Semantic Router: Intelligent Auto Reasoning Router for Efficient LLM ...

32:57

vLLM Semantic Router: Intelligent Auto Reasoning for Efficient LLM Inference on Mixture-of-Models

132 views

3 months ago

Julien Simon

Open-source LLMs are great for conversational applications, but they can be difficult to scale in production and deliver latency ...

36:12

Deep Dive: Optimizing LLM inference

44,069 views

1 year ago

Vuk Rosić

Nano-vLLM - DeepSeek Engineer's Side Project - Code Explained

repo - https://github.com/GeeeekExplorer/nano-vllm/tree/main * Nano-vLLM is a simple, fast LLM server in \~1200 lines of Python ...

19:18

Nano-vLLM - DeepSeek Engineer's Side Project - Code Explained

1,529 views

7 months ago

CNCF [Cloud Native Computing Foundation]

Efficient LLM Deployment: A Unified Approach with Ray, VLLM, and Kubernetes - Lily (Xiaoxuan) Liu

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon Europe in London from April 1 - 4, 2025.

27:08

Efficient LLM Deployment: A Unified Approach with Ray, VLLM, and Kubernetes - Lily (Xiaoxuan) Liu

3,859 views

1 year ago

Red Hat Community

Greg Pereira, llm-d maintainer - Combining Kubernetes and vLLM to Deliver Scalable, Distributed Inference with llm-d.

28:26

Combining Kubernetes and vLLM to Deliver Scalable, Distributed Inference with llm-d

486 views

3 months ago

The Machine Learning Engineer

LLMOps : vLLM Integracion withLangchain #machinelearning #datascience

In this video we are going to make an introduction to vLLM technology and its integration with the Langchain library to create RAG ...

52:38

LLMOps : vLLM Integracion withLangchain #machinelearning #datascience

14,545 views

10 months ago

The Linux Foundation

Scalable and Efficient LLM Serving With the VLLM Production Stack - Junchen Jiang & Yue Zhu

Don't miss out! Join us at the next Open Source Summit in Hyderabad, India (August 5); Amsterdam, Netherland (August 25-29); ...

39:36

Scalable and Efficient LLM Serving With the VLLM Production Stack - Junchen Jiang & Yue Zhu

317 views

6 months ago

The Linux Foundation

Streamlining AI Pipelines With Elyra: From Development To Inference With KServe & VLLM - Ritesh Shah

Don't miss out! Join us at the next Open Source Summit in Seoul, South Korea (November 4-5). Join us at the premier ...

26:00

Streamlining AI Pipelines With Elyra: From Development To Inference With KServe & VLLM - Ritesh Shah

60 views

4 months ago

CNCF [Cloud Native Computing Foundation]

Yes You Can Run LLMs on Kubernetes - Abdel Sghiouar & Mofi Rahman, Google Cloud

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Hong Kong, China (June 10-11); ...

27:25

Yes You Can Run LLMs on Kubernetes - Abdel Sghiouar & Mofi Rahman, Google Cloud

825 views

9 months ago

The Machine Learning Engineer

LLMOPS : vLLM Inference LLM Server Engine #machinelearning #datascience

In this video I will introduce you the technology vLLM , a LLM Inference and Serving library. Notebooks: ...

45:45

LLMOPS : vLLM Inference LLM Server Engine #machinelearning #datascience

228 views

1 year ago

CNCF [Cloud Native Computing Foundation]

LLMs on Kubernetes: Squeeze 5x GPU Efficiency With Cache, Route, Repea... Yuhan Liu & Suraj Deshmukh

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands ...

32:31

LLMs on Kubernetes: Squeeze 5x GPU Efficiency With Cache, Route, Repea... Yuhan Liu & Suraj Deshmukh

474 views

2 months ago

Red Hat Community

Description: Burkhard Ringlein, Chih-Chieh Yang, Sara Kokkila Schumacher, IBM and Rishi Astra, University of Texas - Triton for ...

37:46

Triton for vLLM

425 views

8 months ago

DevConf

Speaker(s): Rehan Samaratunga My auto-tuning project aims to find the best settings for running large language models using ...

9:51

Auto-tuning vllm - DevConf.US 2025

80 views

3 months ago

Dutch Algotrading

Run Your Locally Hosted Deepseek, Qwen or Codellama AI Assistant in VSCode Under 5 Minutes!

Run your Locally hosted AI Coding Assistant in VSCode with Continue extension, Ollama, Deepseek, Qwen or CodeLlama in less ...

5:26

Run Your Locally Hosted Deepseek, Qwen or Codellama AI Assistant in VSCode Under 5 Minutes!

70,826 views

11 months ago

CNCF [Cloud Native Computing Foundation]

Tutorial: Cloud Native Sustainable LLM Inference in Action

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from ...

1:24:20

Tutorial: Cloud Native Sustainable LLM Inference in Action

487 views

1 year ago

CNCF [Cloud Native Computing Foundation]

Lightning Talk: Best Practices for LLM Serving with DRA - Chen Wang & Abhishek Malvankar, IBM

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from ...

9:37

Lightning Talk: Best Practices for LLM Serving with DRA - Chen Wang & Abhishek Malvankar, IBM

475 views

1 year ago

CNCF [Cloud Native Computing Foundation]

Sailing Multi-host Inference for LLM on Kubernetes - Kay Yan, DaoCloud

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon India in Hyderabad (August 6-7), and ...

28:03

Sailing Multi-host Inference for LLM on Kubernetes - Kay Yan, DaoCloud

306 views

6 months ago

Python Code Camp

shorts #short #shortvideo #python #pythonprogramming #pythonshorts #pythontips #pythontricks #chatgpt #langchain ...

0:25

Chat with PDF langchain project

52,612 views

1 year ago

The Machine Learning Engineer

LLMOps : vLLM Integracion con Langchain (Español) #machinelearning #datascience

En este Vídeo vamos a hacer una Introducción a la Tecnología vLLM y a su integracion con la libreria Lanchain para crear ...

56:03

LLMOps : vLLM Integracion con Langchain (Español) #machinelearning #datascience

5,579 views

10 months ago

ViewTube