Upload date
All time
Last hour
Today
This week
This month
This year
Type
All
Video
Channel
Playlist
Movie
Duration
Short (< 4 minutes)
Medium (4-20 minutes)
Long (> 20 minutes)
Sort by
Relevance
Rating
View count
Features
HD
Subtitles/CC
Creative Commons
3D
Live
4K
360°
VR180
HDR
236 results
In this video, we build a fully self-hosted coding agent powered by the 7B parameter Qwen 2.5 Coder model, running on a GPU ...
1,360 views
2w ago
The High-Throughput and Memory-Efficient inference and serving engine for LLMs Easy, fast, and cost-efficient LLM serving for ...
37 views
3w ago
I show you how to keep your vLLM model loaded in FastAPI cache for much faster inference — without reloading it on every ...
178 views
3d ago
Timeline: 00:00 Intro 00:38 Provisioning a TPU VM 04:03 Confirming TPU 04:40 Installing with Docker 06:13 Testing the Endpoint ...
1,286 views
Get my FREE local AI projects: https://zenvanriel.com/open-source ⚡ Become a high-earning AI engineer: ...
121,114 views
AI Agents Studio : https://www.youtube.com/channel/UCAawqobkJZ28OLcYcMgqYaw?sub_confirmation=1 "This video covers the ...
3,827 views
5d ago
In this video, we explore how to deploy vLLM on Kubernetes to run large language models efficiently in production AI platforms.
45 views
13d ago
vLLM Compile Deep Dive | Ayush Satyam | PyTorch / vLLM Contributor | AER LABS In this presentation, Ayush Satyam provides a ...
626 views
12d ago
Is your LLM inference slow or hitting OOM (Out of Memory) errors? In this video, we dive deep into vLLM, the high-throughput, ...
86 views
8d ago
Run vLLM Locally | Install and Serve LLM on Your Computer In this video, I demonstrate how to install and run vLLM on your local ...
85 views
The Best Ways to Deploy LLM. Which Method Actually Works? (Ollama vs LM Studio vs LLama.cpp vs vLLM) What is the absolute ...
384 views
If you're building with local LLMs and you're tired of juggling Ollama, LangChain, a vector database, and a hacked-together UI just ...
89,391 views
vLLM is UC Berkeley's high-throughput inference engine that changed LLM serving. PagedAttention drops memory waste from ...
8 views
Tune in to the Vienna vLLM meetup live on YouTube. Agenda: 00:00 - Welcome to the Vienna vLLM Meetup 07:00 - Intro to vLLM ...
1,226 views
Streamed 2w ago
Deploying an LLM model into Kubernetes/AKS can be complex, especially if you prefer not to manage the following tasks yourself: ...
267 views
6d ago
Install vllm on RTX 5060 Ti 16GB and /5070/5080/5090 GPU Install vLLM on RTX 5060 Ti (16GB) & RTX 5070 / 5080 / 5090 ...
25 views
1d ago
2 views
Get your CCNA at NetworkChuck Academy: https://academy.networkchuck.com Remember when I hacked together a way to run ...
249,495 views
1mo ago
this video demonstrated how to use Premium LLMs (Gemini Pro, Anthropic, DeepSeek, etc) for free on Kaggle notebooks and ...
682 views
Learn how Open Source LLM Models for Local Coding can give your programming new avenues of exploration, privacy and ...
158 views
Streamed 3w ago