vllm tutorial

Nano-vLLM - DeepSeek Engineer's Side Project - Code Explained

repo - https://github.com/GeeeekExplorer/nano-vllm/tree/main * Nano-vLLM is a simple, fast LLM server in \~1200 lines of Python ...

19:18

Nano-vLLM - DeepSeek Engineer's Side Project - Code Explained

1,541 views

7 months ago

DevConf

Speaker(s): Rehan Samaratunga My auto-tuning project aims to find the best settings for running large language models using ...

9:51

Auto-tuning vllm - DevConf.US 2025

82 views

3 months ago

Dutch Algotrading

Run Your Locally Hosted Deepseek, Qwen or Codellama AI Assistant in VSCode Under 5 Minutes!

Run your Locally hosted AI Coding Assistant in VSCode with Continue extension, Ollama, Deepseek, Qwen or CodeLlama in less ...

5:26

Run Your Locally Hosted Deepseek, Qwen or Codellama AI Assistant in VSCode Under 5 Minutes!

71,614 views

11 months ago

CNCF [Cloud Native Computing Foundation]

Lightning Talk: Best Practices for LLM Serving with DRA - Chen Wang & Abhishek Malvankar, IBM

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from ...

9:37

Lightning Talk: Best Practices for LLM Serving with DRA - Chen Wang & Abhishek Malvankar, IBM

475 views

1 year ago

Paolo Cadoni

Beyond GPUs: How Groq and Cerebras Lead the Next Wave in AI Infrastructure

Everyone talks about NVIDIA when it comes to AI-but what if GPUs aren't the future? In this video, I break down why AI inference is ...

12:34

Beyond GPUs: How Groq and Cerebras Lead the Next Wave in AI Infrastructure

7,652 views

9 months ago

Tommy Eberle

How to Avoid Dependency Hell Forever (in Python)

If you've ever worked on a python project you know how painful it can be to get all of the dependencies set up properly.

13:31

How to Avoid Dependency Hell Forever (in Python)

1,325 views

11 months ago

The Nitty-Gritty

Cloud vs. Homelab: Which is *Actually* Better for LLMs?

I battled my homelab machine cerebro against cloud machines with identical or better gpus to see if my local setup is worth it or ...

15:30

Cloud vs. Homelab: Which is *Actually* Better for LLMs?

3,435 views

10 months ago

DevConf

Speaker(s): KEERTHI UDAYAKUMAR RAG apps save up to 60% of the cost compared to standard LLMs. But in this talk, I will tell ...

14:12

Smarter RAG, Smaller Bill: Optimize for Performance and Price - DevConf.US 2025

19 views

3 months ago

The ASF

OpenLLM: Effortless High-Performance Cloud Deployment for Open Source LLMs

Lightning-Talk Track Speaker: Fog Dong Title: BentoML Senior Engineer，CNCF Ambassador，LFAPAC Evangelist, KubeVela ...

4:27

OpenLLM: Effortless High-Performance Cloud Deployment for Open Source LLMs

35 views

1 year ago

AI Tools Quest

The Full Stack AI Skill Set Build, Scale & Monetize Intelligent Systems Like a Pro!

Unlock the complete Full Stack AI Skill Set you need to build, scale, and monetize intelligent systems — even if you're just starting ...

4:19

The Full Stack AI Skill Set Build, Scale & Monetize Intelligent Systems Like a Pro!

8 views

3 months ago

Vuk Rosić

NEW DeepSeek Sparse Attention Explained - DeepSeek V3.2-Exp

Blog - https://opensuperintelligencelab.com/blog/deepseek-sparse-attention/ DeepSeek V3 From Scratch (understand attention ...

15:00

NEW DeepSeek Sparse Attention Explained - DeepSeek V3.2-Exp

2,117 views

3 months ago

CNCF [Cloud Native Computing Foundation]

Keynote: Building a Large Model Inference Platform for Heterogeneous Chinese Chips Base... Kante Yin

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon India in Hyderabad (August 6-7), and ...

11:29

Keynote: Building a Large Model Inference Platform for Heterogeneous Chinese Chips Base... Kante Yin

121 views

7 months ago

Julien Simon

Accelerate Transformer inference on CPU with Optimum and Intel OpenVINO

In this video, I show you how to accelerate Transformer inference with Optimum, an open source library by Hugging Face, and ...

12:54

Accelerate Transformer inference on CPU with Optimum and Intel OpenVINO

3,043 views

3 years ago

Vuk Rosić

DeepSeek INFINITE Context Window - Encode Text As Images - DeepSeek OCR

Paper - https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf Become AI Researcher & Train ...

12:24

DeepSeek INFINITE Context Window - Encode Text As Images - DeepSeek OCR

4,988 views

3 months ago

Vu Hung Nguyen (Hưng)

This episode details a practical exercise focused on fine-tuning a language model to improve its reasoning capabilities using ...

6:01

12-6 AI: Training to Show Its Work

2 views

3 months ago

Fardjad

LLMatic - Use self-hosted LLMs with an OpenAI compatible API

LLMatic can be used as a drop-in replacement for OpenAI's API. In this video, I briefly introduce the project and demo some of its ...

5:37

LLMatic - Use self-hosted LLMs with an OpenAI compatible API

995 views

2 years ago

Julien Simon

Accelerating Transformers with Optimum Neuron, AWS Trainium and AWS Inferentia2

In this video, I show you how to accelerate Transformer training and inference with the Hugging Face Optimum Neuron library, ...

18:56

Accelerating Transformers with Optimum Neuron, AWS Trainium and AWS Inferentia2

2,215 views

2 years ago

ViewTube