vllm tutorial

Coding Agent with a Self-Hosted LLM using OpenCode and vLLM

In this video, we build a fully self-hosted coding agent powered by the 7B parameter Qwen 2.5 Coder model, running on a GPU ...

13:21

Coding Agent with a Self-Hosted LLM using OpenCode and vLLM

1,360 views

2w ago

Orbilearn

vLLM Explained in 2 Min [2026] | 2 Min Series of Tech |

The High-Throughput and Memory-Efficient inference and serving engine for LLMs Easy, fast, and cost-efficient LLM serving for ...

2:38

vLLM Explained in 2 Min [2026] | 2 Min Series of Tech |

37 views

3w ago

Andrej Baranovskij

How to Cache vLLM Model in FastAPI for Faster Inference

I show you how to keep your vLLM model loaded in FastAPI cache for much faster inference — without reloading it on every ...

7:47

How to Cache vLLM Model in FastAPI for Faster Inference

178 views

3d ago

Rob Mulla

Timeline: 00:00 Intro 00:38 Provisioning a TPU VM 04:03 Confirming TPU 04:40 Installing with Docker 06:13 Testing the Endpoint ...

8:35

Getting Started with vLLM on TPUs

1,286 views

2w ago

Zen van Riel

The Unbeatable Local AI Coding Workflow (Full 2026 Setup)

Get my FREE local AI projects: https://zenvanriel.com/open-source ⚡ Become a high-earning AI engineer: ...

16:34

The Unbeatable Local AI Coding Workflow (Full 2026 Setup)

121,114 views

3w ago

ManuAGI - AutoGPT Tutorials

Trending Open-Source Github Projects: MoneyPrinterV2, vllm-omni, Unsloth, OpenGauss & RCLI #242

AI Agents Studio : https://www.youtube.com/channel/UCAawqobkJZ28OLcYcMgqYaw?sub_confirmation=1 "This video covers the ...

14:16

3,827 views

5d ago

Yajentio Training Academy Official

vLLM Deployment on Kubernetes | Scalable LLM Inference with GPUs | AI Infrastructure Tutorial

In this video, we explore how to deploy vLLM on Kubernetes to run large language models efficiently in production AI platforms.

5:04

vLLM Deployment on Kubernetes | Scalable LLM Inference with GPUs | AI Infrastructure Tutorial

45 views

13d ago

AER Labs

vLLM Compile Deep Dive | Ayush Satyam | PyTorch / vLLM Contributor | AER LABS In this presentation, Ayush Satyam provides a ...

30:03

vLLM Compile Deep Dive | Ayush satyam | PyTorch / vLLM Contributor | AER LABS

626 views

12d ago

The Code Architect

vLLM Powering Modern AI | Why It’s the Gold Standard for LLM Inference

Is your LLM inference slow or hitting OOM (Out of Memory) errors? In this video, we dive deep into vLLM, the high-throughput, ...

8:31

vLLM Powering Modern AI | Why It’s the Gold Standard for LLM Inference

86 views

8d ago

AI Research

Run Any LLM Locally with vLLM | Full Setup + API + App

Run vLLM Locally | Install and Serve LLM on Your Computer In this video, I demonstrate how to install and run vLLM on your local ...

23:47

Run Any LLM Locally with vLLM | Full Setup + API + App

85 views

3w ago

RedTren

I Tested All 4 LLM Deployment Methods So You Don't Have To | Ollama, LLama.cpp, LM studio, vLLM

The Best Ways to Deploy LLM. Which Method Actually Works? (Ollama vs LM Studio vs LLama.cpp vs vLLM) What is the absolute ...

12:52

I Tested All 4 LLM Deployment Methods So You Don't Have To | Ollama, LLama.cpp, LM studio, vLLM

384 views

8d ago

Better Stack

This Open-Source Tool Replaces Ollama + LangChain + Your UI

If you're building with local LLMs and you're tired of juggling Ollama, LangChain, a vector database, and a hacked-together UI just ...

5:16

This Open-Source Tool Replaces Ollama + LangChain + Your UI

89,391 views

3w ago

Michel Laclé

vLLM: The Production LLM Inference Engine — Deep Dive

vLLM is UC Berkeley's high-throughput inference engine that changed LLM serving. PagedAttention drops memory waste from ...

1:57

vLLM: The Production LLM Inference Engine — Deep Dive

8 views

2w ago

Red Hat

Vienna vLLM Meetup Live Stream - March 11, 2026

Tune in to the Vienna vLLM meetup live on YouTube. Agenda: 00:00 - Welcome to the Vienna vLLM Meetup 07:00 - Intro to vLLM ...

2:26:53

Vienna vLLM Meetup Live Stream - March 11, 2026

1,226 views

Streamed 2w ago

Houssem Dellai

Deploying an LLM model into Kubernetes/AKS can be complex, especially if you prefer not to manage the following tasks yourself: ...

21:04

Deploying LLM to AKS using KAITO

267 views

6d ago

roseindiatutorials

Install vLLM on RTX 5060 Ti (16GB) & RTX 5070 / 5080 / 5090 GPUs | Complete Guide

Install vllm on RTX 5060 Ti 16GB and /5070/5080/5090 GPU Install vLLM on RTX 5060 Ti (16GB) & RTX 5070 / 5080 / 5090 ...

6:48

Install vLLM on RTX 5060 Ti (16GB) & RTX 5070 / 5080 / 5090 GPUs | Complete Guide

25 views

1d ago

InferX

0:27

Can InferX Run Any Framework? (vLLM, SGLang, TRT)

2 views

1d ago

NetworkChuck

Claude Code on your Phone is OFFICIAL (it changes everything)

Get your CCNA at NetworkChuck Academy: https://academy.networkchuck.com Remember when I hacked together a way to run ...

7:31

Claude Code on your Phone is OFFICIAL (it changes everything)

249,495 views

1mo ago

Devs Kingdom

Free Premium LLMs with Public API Access on Kaggle

this video demonstrated how to use Premium LLMs (Gemini Pro, Anthropic, DeepSeek, etc) for free on Kaggle notebooks and ...

7:01

Free Premium LLMs with Public API Access on Kaggle

682 views

3w ago

SCG Team; Seibert Consulting Group

Learn how Open Source LLM Models for Local Coding can give your programming new avenues of exploration, privacy and ...

36:59

Open Source LLM Models for Local Coding

158 views

Streamed 3w ago

ViewTube