quantized llm

Optimize Your AI - Quantization Explained

Run massive AI models on your laptop! Learn the secrets of LLM quantization and how q2, q4, and q8 settings in Ollama can save ...

12:10

Optimize Your AI - Quantization Explained

392,564 views

1 year ago

Julia Turc

How LLMs survive in low precision | Quantization Fundamentals

In this video, we discuss the fundamentals of model quantization, the technique that allows us to run inference on massive LLMs ...

20:34

How LLMs survive in low precision | Quantization Fundamentals

43,916 views

9 months ago

Airtrain AI

In this video we define the basics of quantization and look at how its benefits and how it affects large language models.

5:13

What is LLM quantization?

28,075 views

2 years ago

Adam Lucek

Quantizing LLMs - How & Why (8-Bit, 4-Bit, GGUF & More)

Quantizing models for maximum efficiency gains! Resources: Model Quantized: ...

26:26

Quantizing LLMs - How & Why (8-Bit, 4-Bit, GGUF & More)

22,285 views

1 year ago

Matt Williams

5. Comparing Quantizations of the Same Model - Ollama Course

Welcome back to the Ollama course! In this lesson, we dive into the fascinating world of AI model quantization. Using variations of ...

10:29

5. Comparing Quantizations of the Same Model - Ollama Course

29,063 views

1 year ago

BlueSpork

DeepSeek R1: Distilled & Quantized Models Explained

This video explores DeepSeek R1, how distilled versions and quantization make it more accessible, and the trade-offs between ...

3:47

DeepSeek R1: Distilled & Quantized Models Explained

22,906 views

1 year ago

Julia Turc

The myth of 1-bit LLMs | Quantization-Aware Training

Are 1-bit LLMs the future of efficient AI? Or just a catchy Microsoft metaphor? In this video, we break down BitNet, the so-called ...

24:37

The myth of 1-bit LLMs | Quantization-Aware Training

87,119 views

8 months ago

Codeically

I Made ChatGPT-2 Run on a Potato (63MB AI Model!) - Extreme Quantization Experiment What happens when you compress a ...

5:52

I Made The Smallest (And Dumbest) LLM

469,247 views

5 months ago

Gary Explains

Does LLM Size Matter? How Many Billions of Parameters do you REALLY Need?

Large Language Models (LLMs) are measured by the number of parameters they contain – the number of weights and biases ...

25:03

Does LLM Size Matter? How Many Billions of Parameters do you REALLY Need?

43,743 views

1 year ago

Julia Turc

Training models with only 4 bits | Fully-Quantized Training

Can you really train a large language model in just 4 bits? In this video, we explore the cutting edge of model compression: fully ...

24:08

Training models with only 4 bits | Fully-Quantized Training

49,586 views

8 months ago

Umar Jamil

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

In this video I will introduce and explain quantization: we will first start with a little introduction on numerical representation of ...

50:55

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

50,839 views

2 years ago

Julia Turc

Reverse-engineering GGUF | Post-Training Quantization

The first comprehensive explainer for the GGUF quantization ecosystem. GGUF quantization is currently the most popular tool for ...

25:07

Reverse-engineering GGUF | Post-Training Quantization

48,585 views

7 months ago

AI Bites

QLoRA paper explained (Efficient Finetuning of Quantized LLMs)

QLoRA is the first approach that allows the TRAINING of Large Language Models (LLMs) on a single GPU. It does this by using ...

11:44

QLoRA paper explained (Efficient Finetuning of Quantized LLMs)

22,926 views

2 years ago

New Machina

VIDEO TITLE What is LLM Quantization? ✍️VIDEO DESCRIPTION ✍️ Large Language Models (LLMs) are built using ...

9:57

What is LLM Quantization ?

2,983 views

11 months ago

GosuCoder

Run AI Models on Your PC: Best Quantization Levels (Q2, Q3, Q4) Explained!

Run AI Models Locally: Quantization Explained (Q2, Q3, Q4, Q5) Want to run large language models (LLMs) like Phi-4 on your PC ...

12:37

Run AI Models on Your PC: Best Quantization Levels (Q2, Q3, Q4) Explained!

4,610 views

1 year ago

Discover AI

LLM Quantization (Ollama, LM Studio): Any Performance Drop? TEST

A NEW benchmark and guide which quantization models to use locally on your PC or laptop. Either in Ollama or in LM Studio, ...

19:01

LLM Quantization (Ollama, LM Studio): Any Performance Drop? TEST

3,916 views

6 months ago

Efficient NLP

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io Four techniques to optimize the speed ...

19:46

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

60,293 views

2 years ago

bycloud

1-Bit LLM: The Most Efficient LLM Possible?

Download Tanka today https://www.tanka.ai and enjoy 3 months of free Premium! You can also get $20 / team for each referrals ...

14:35

1-Bit LLM: The Most Efficient LLM Possible?

364,229 views

8 months ago

Julien Simon

Deep Dive: Quantizing Large Language Models, part 1

Quantization is an excellent technique to compress Large Language Models (LLM) and accelerate their inference. In this video ...

40:28

Deep Dive: Quantizing Large Language Models, part 1

22,845 views

1 year ago

Zachary Huang

Give me 30 min, I will make Quantization click forever

Text:* https://github.com/The-Pocket/PocketFlow-Tutorial-Video-Generator/blob/main/docs/llm/quantization.md 0:00:00 ...

32:42

Give me 30 min, I will make Quantization click forever

2,244 views

2 months ago

ViewTube