vllm tutorial

Nano-vLLM - DeepSeek Engineer's Side Project - Code Explained

repo - https://github.com/GeeeekExplorer/nano-vllm/tree/main * Nano-vLLM is a simple, fast LLM server in \~1200 lines of Python ...

19:18

Nano-vLLM - DeepSeek Engineer's Side Project - Code Explained

1,607 views

7 months ago

DevConf

Speaker(s): Rehan Samaratunga My auto-tuning project aims to find the best settings for running large language models using ...

9:51

Auto-tuning vllm - DevConf.US 2025

100 views

4 months ago

Dutch Algotrading

Run Your Locally Hosted Deepseek, Qwen or Codellama AI Assistant in VSCode Under 5 Minutes!

Run your Locally hosted AI Coding Assistant in VSCode with Continue extension, Ollama, Deepseek, Qwen or CodeLlama in less ...

5:26

Run Your Locally Hosted Deepseek, Qwen or Codellama AI Assistant in VSCode Under 5 Minutes!

76,271 views

1 year ago

The Nitty-Gritty

Cloud vs. Homelab: Which is *Actually* Better for LLMs?

I battled my homelab machine cerebro against cloud machines with identical or better gpus to see if my local setup is worth it or ...

15:30

Cloud vs. Homelab: Which is *Actually* Better for LLMs?

3,461 views

11 months ago

Paolo Cadoni

Beyond GPUs: How Groq and Cerebras Lead the Next Wave in AI Infrastructure

Everyone talks about NVIDIA when it comes to AI-but what if GPUs aren't the future? In this video, I break down why AI inference is ...

12:34

Beyond GPUs: How Groq and Cerebras Lead the Next Wave in AI Infrastructure

8,009 views

10 months ago

Tommy Eberle

How to Avoid Dependency Hell Forever (in Python)

If you've ever worked on a python project you know how painful it can be to get all of the dependencies set up properly.

13:31

How to Avoid Dependency Hell Forever (in Python)

1,442 views

1 year ago

DevConf

Speaker(s): KEERTHI UDAYAKUMAR RAG apps save up to 60% of the cost compared to standard LLMs. But in this talk, I will tell ...

14:12

Smarter RAG, Smaller Bill: Optimize for Performance and Price - DevConf.US 2025

20 views

4 months ago

CNCF [Cloud Native Computing Foundation]

Keynote: Building a Large Model Inference Platform for Heterogeneous Chinese Chips Base... Kante Yin

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon India in Hyderabad (August 6-7), and ...

11:29

Keynote: Building a Large Model Inference Platform for Heterogeneous Chinese Chips Base... Kante Yin

121 views

8 months ago

The ASF

OpenLLM: Effortless High-Performance Cloud Deployment for Open Source LLMs

Lightning-Talk Track Speaker: Fog Dong Title: BentoML Senior Engineer，CNCF Ambassador，LFAPAC Evangelist, KubeVela ...

4:27

OpenLLM: Effortless High-Performance Cloud Deployment for Open Source LLMs

35 views

1 year ago

Julien Simon

Accelerate Transformer inference on CPU with Optimum and Intel OpenVINO

In this video, I show you how to accelerate Transformer inference with Optimum, an open source library by Hugging Face, and ...

12:54

Accelerate Transformer inference on CPU with Optimum and Intel OpenVINO

3,058 views

3 years ago

AI Tools Quest

The Full Stack AI Skill Set Build, Scale & Monetize Intelligent Systems Like a Pro!

Unlock the complete Full Stack AI Skill Set you need to build, scale, and monetize intelligent systems — even if you're just starting ...

4:19

The Full Stack AI Skill Set Build, Scale & Monetize Intelligent Systems Like a Pro!

8 views

3 months ago

Vu Hung Nguyen (Hưng)

This episode details a practical exercise focused on fine-tuning a language model to improve its reasoning capabilities using ...

6:01

12-6 AI: Training to Show Its Work

2 views

4 months ago

Vuk Rosić

DeepSeek INFINITE Context Window - Encode Text As Images - DeepSeek OCR

Paper - https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf Become AI Researcher & Train ...

12:24

DeepSeek INFINITE Context Window - Encode Text As Images - DeepSeek OCR

5,033 views

3 months ago

Vuk Rosić

NEW DeepSeek Sparse Attention Explained - DeepSeek V3.2-Exp

Blog - https://opensuperintelligencelab.com/blog/deepseek-sparse-attention/ DeepSeek V3 From Scratch (understand attention ...

15:00

NEW DeepSeek Sparse Attention Explained - DeepSeek V3.2-Exp

2,261 views

4 months ago

Fardjad

LLMatic - Use self-hosted LLMs with an OpenAI compatible API

LLMatic can be used as a drop-in replacement for OpenAI's API. In this video, I briefly introduce the project and demo some of its ...

5:37

LLMatic - Use self-hosted LLMs with an OpenAI compatible API

998 views

2 years ago

ViewTube