ViewTube

ViewTube
Sign inSign upSubscriptions
Filters

Upload date

Type

Duration

Sort by

Features

Reset

17 results

Vuk Rosić
Nano-vLLM - DeepSeek Engineer's Side Project - Code Explained

repo - https://github.com/GeeeekExplorer/nano-vllm/tree/main * Nano-vLLM is a simple, fast LLM server in \~1200 lines of Python ...

19:18
Nano-vLLM - DeepSeek Engineer's Side Project - Code Explained

1,541 views

7 months ago

DevConf
Auto-tuning vllm - DevConf.US 2025

Speaker(s): Rehan Samaratunga My auto-tuning project aims to find the best settings for running large language models using ...

9:51
Auto-tuning vllm - DevConf.US 2025

82 views

3 months ago

Dutch Algotrading
Run Your Locally Hosted Deepseek, Qwen or Codellama AI Assistant in VSCode Under 5 Minutes!

Run your Locally hosted AI Coding Assistant in VSCode with Continue extension, Ollama, Deepseek, Qwen or CodeLlama in less ...

5:26
Run Your Locally Hosted Deepseek, Qwen or Codellama AI Assistant in VSCode Under 5 Minutes!

71,614 views

11 months ago

CNCF [Cloud Native Computing Foundation]
Lightning Talk: Best Practices for LLM Serving with DRA - Chen Wang & Abhishek Malvankar, IBM

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from ...

9:37
Lightning Talk: Best Practices for LLM Serving with DRA - Chen Wang & Abhishek Malvankar, IBM

475 views

1 year ago

Paolo Cadoni
Beyond GPUs: How Groq and Cerebras Lead the Next Wave in AI Infrastructure

Everyone talks about NVIDIA when it comes to AI-but what if GPUs aren't the future? In this video, I break down why AI inference is ...

12:34
Beyond GPUs: How Groq and Cerebras Lead the Next Wave in AI Infrastructure

7,652 views

9 months ago

Tommy Eberle
How to Avoid Dependency Hell Forever (in Python)

If you've ever worked on a python project you know how painful it can be to get all of the dependencies set up properly.

13:31
How to Avoid Dependency Hell Forever (in Python)

1,325 views

11 months ago

The Nitty-Gritty
Cloud vs. Homelab: Which is *Actually* Better for LLMs?

I battled my homelab machine cerebro against cloud machines with identical or better gpus to see if my local setup is worth it or ...

15:30
Cloud vs. Homelab: Which is *Actually* Better for LLMs?

3,435 views

10 months ago

DevConf
Smarter RAG, Smaller Bill: Optimize for Performance and Price - DevConf.US 2025

Speaker(s): KEERTHI UDAYAKUMAR RAG apps save up to 60% of the cost compared to standard LLMs. But in this talk, I will tell ...

14:12
Smarter RAG, Smaller Bill: Optimize for Performance and Price - DevConf.US 2025

19 views

3 months ago

The ASF
OpenLLM: Effortless High-Performance Cloud Deployment for Open Source LLMs

Lightning-Talk Track Speaker: Fog Dong Title: BentoML Senior Engineer,CNCF Ambassador,LFAPAC Evangelist, KubeVela ...

4:27
OpenLLM: Effortless High-Performance Cloud Deployment for Open Source LLMs

35 views

1 year ago

AI Tools Quest
The Full Stack AI Skill Set Build, Scale & Monetize Intelligent Systems Like a Pro!

Unlock the complete Full Stack AI Skill Set you need to build, scale, and monetize intelligent systems — even if you're just starting ...

4:19
The Full Stack AI Skill Set Build, Scale & Monetize Intelligent Systems Like a Pro!

8 views

3 months ago

Vuk Rosić
NEW DeepSeek Sparse Attention Explained - DeepSeek V3.2-Exp

Blog - https://opensuperintelligencelab.com/blog/deepseek-sparse-attention/ DeepSeek V3 From Scratch (understand attention ...

15:00
NEW DeepSeek Sparse Attention Explained - DeepSeek V3.2-Exp

2,117 views

3 months ago

CNCF [Cloud Native Computing Foundation]
Keynote: Building a Large Model Inference Platform for Heterogeneous Chinese Chips Base... Kante Yin

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon India in Hyderabad (August 6-7), and ...

11:29
Keynote: Building a Large Model Inference Platform for Heterogeneous Chinese Chips Base... Kante Yin

121 views

7 months ago

Julien Simon
Accelerate Transformer inference on CPU with Optimum and Intel OpenVINO

In this video, I show you how to accelerate Transformer inference with Optimum, an open source library by Hugging Face, and ...

12:54
Accelerate Transformer inference on CPU with Optimum and Intel OpenVINO

3,043 views

3 years ago

Vuk Rosić
DeepSeek INFINITE Context Window - Encode Text As Images - DeepSeek OCR

Paper - https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf Become AI Researcher & Train ...

12:24
DeepSeek INFINITE Context Window - Encode Text As Images - DeepSeek OCR

4,988 views

3 months ago

Vu Hung Nguyen (Hưng)
12-6 AI: Training to Show Its Work

This episode details a practical exercise focused on fine-tuning a language model to improve its reasoning capabilities using ...

6:01
12-6 AI: Training to Show Its Work

2 views

3 months ago

Fardjad
LLMatic - Use self-hosted LLMs with an OpenAI compatible API

LLMatic can be used as a drop-in replacement for OpenAI's API. In this video, I briefly introduce the project and demo some of its ...

5:37
LLMatic - Use self-hosted LLMs with an OpenAI compatible API

995 views

2 years ago

Julien Simon
Accelerating Transformers with Optimum Neuron, AWS Trainium and AWS Inferentia2

In this video, I show you how to accelerate Transformer training and inference with the Hugging Face Optimum Neuron library, ...

18:56
Accelerating Transformers with Optimum Neuron, AWS Trainium and AWS Inferentia2

2,215 views

2 years ago