Upload date
All time
Last hour
Today
This week
This month
This year
Type
All
Video
Channel
Playlist
Movie
Duration
Short (< 4 minutes)
Medium (4-20 minutes)
Long (> 20 minutes)
Sort by
Relevance
Rating
View count
Features
HD
Subtitles/CC
Creative Commons
3D
Live
4K
360°
VR180
HDR
98 results
Azure Databricks Learning: Performance Optimization: Spark/Databricks Interview Question Series - II ...
13,581 views
2 years ago
Apache Spark™️ has become the de-facto open-source standard for big data processing due to its ease of use and performance ...
10,227 views
Streamed 5 years ago
The SQL tab in the Spark UI provides a lot of information for analysing your spark queries, ranging from the query plan, to all ...
18,076 views
5 years ago
In data warehouse area, it is common to use one or more columns in complex type, such as map, and put many subfields into it.
1,603 views
Over the years, there has been extensive and continuous effort on improving Spark SQL's query optimizer and planner, in order to ...
9,484 views
You've seen the technical deep dives on Spark's Catalyst query optimizer. You understand how to fix joins, how to find common ...
1,417 views
Uneven distribution of input (or intermediate) data can often cause skew in joins. In Spark, this leads to very slow join stages ...
2,394 views
User Defined Functions is an important feature of Spark SQL which helps extend the language by adding custom constructs.
8,787 views
Nowadays, Spark is widely adopted in the big enterprise by handling the large volume of data. In PayPal, more and more complex ...
532 views
This talk will break down merge in Delta Lake—what is actually happening under the hood—and then explain about how you can ...
16,065 views
The Delta Architecture pattern has made the lives of data engineers much simpler, but what about improving query performance ...
8,881 views
Bucketing is commonly used in Hive and Spark SQL to improve performance by eliminating Shuffle in Join or group-by-aggregate ...
7,089 views
Parquet is a very popular column based format. Spark can automatically filter useless data using parquet file statistical data by ...
3,325 views
One of the most significant benefits provided by Databricks Delta is the ability to use z-ordering and dynamic file pruning to ...
1,016 views
Learn PySpark, an interface for Apache Spark in Python. PySpark is often used for large-scale data processing and machine ...
1,638,466 views
4 years ago
Defining customized scalable aggregation logic is one of Apache Spark's most powerful features. User Defined Aggregate ...
2,724 views
John submits a query and expects it to run smoothly. Based on his prior experience, he anticipates the query to finish in 20 mins.
751 views
Join us for a four part learning series: Introduction to Data Analysis for Aspiring Data Scientists. This is the fourth of four online ...
20,520 views
This talk will introduce TeraCache, a new scalable cache for Spark that avoids both garbage collection (GC) and serialization ...
365 views
It's very easy to be distracted by the latest and greatest approaches with technology, but sometimes there's a reason old ...
6,848 views