Introduction:
In Today’s analytics, raw data alone isn’t enough, what matters is how flexibly and deeply you can explore it. When analyzing data, it’s often important to calculate values like running totals, rankings, or comparisons between rows, without losing the details of each row. That’s where SQL window functions come in.
They let you perform powerful calculations across related rows while keeping every row in your results. This makes it easier to spot trends, compare values, and build more insightful reports. Whether you're just starting with SQL or looking to improve your analysis skills, understanding window functions is a valuable step forward.
SQL Window Functions Core Concepts:
Each window function uses the OVER clause to define its scope. You can refine this window with:
PARTITION BY to divide data into groups for independent calculations,
ORDER BY to set row order within each group for operations like rankings or running totals.
Window functions fall into several key categories:
Aggregate functions (SUM(), AVG(), COUNT()) for running totals or moving averages.
Ranking functions (ROW_NUMBER(), RANK(), DENSE_RANK()) to assign positions or ranks within partitions.
Value functions (LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()) to access values from other rows in the window.
Example:
This example calculates a cumulative total of duration_seconds ordered by start_time, without grouping or removing rows.
This example calculates a cumulative total of duration_seconds ordered by start_time, without grouping or removing rows.
Primary SQL Ranking Functions:
Ranking functions in SQL are specialized window functions that assign ranks to rows based on their ordering within a dataset. The primary ranking functions are ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE().
ROW_NUMBER() -
ROW_NUMBER()
is a window function in SQL that assigns a unique sequential integer to rows within a result set, based on the order you specify.
Output:
The query would produce:
RANK(): Assigns ranks to rows, but if rows have identical values per the ORDER BY clause, they receive the same rank. This may create gaps in the rankings after ties (i.e., if two rows are ranked 1, the next row is ranked 3).
Output:
The query would produce:
DENSE_RANK(): Functions similarly to RANK(), but does not leave gaps in the ranking sequence after ties. If two rows are tied for rank 1, the next row is assigned rank 2.
Output:
The query would produce:
NTILE(n): Divides the data into n equal or nearly equal groups and assigns each row a group number.
Output :
(rows split into 4 nearly equal buckets, highest values in quartile 1)
📦 Key Navigation Functions
Navigation functions are a subset of SQL window functions that allow you to access values from rows adjacent to the current row within a defined window. These adjacent rows can be from before (LAG) or after (LEAD) the current row. Other navigation functions include FIRST_VALUE() and LAST_VALUE() which retrieve boundary values within each window.
The navigation summary for the LEAD() , LAG(), FIRST_VALUE() and LAST_VALUE() functions from the SQL time-series window functions is as follows:
LEAD and LAG
Purpose: The LEAD and LAG functions enable access to data from rows after (LEAD) or before (LAG) the current row within a result set ordered by a specific column, making them ideal for time-series and sequential data analysis.
Functionality: LEAD looks "into the future" (next rows), while LAG looks "into the past" (previous rows). They are frequently used to calculate differences between consecutive data points, such as monthly stock prices or moving averages.
FIRST_VALUE() and LAST_VALUE()
Purpose:
FIRST_VALUE()
andLAST_VALUE()
return the first and last value within the ordered window of each partition. They're ideal when you need to reference the starting or ending state of a sequence, like the first purchase amount, the initial score, or the last known status.Functionality:
FIRST_VALUE()
captures the earliest value in the ordered set.LAST_VALUE()
captures the latest value, but note: its behavior depends on the window frame definition. Without correct frame specification, it may return the current row’s value instead of the true “last” row.
Example:
Imagine a Scenario : Analyzing Employee Promotion Patterns in an organisation
Business Use Case:
An HR analytics team wants to understand employee promotion trends over time. Specifically, they want to know:
When each employee got promoted.
What was their previous job title (
LAG()
).What is their next job title (
LEAD()
).What was their first-ever title in the company (
FIRST_VALUE()
).What is their current/latest title (
LAST_VALUE()
).
SQL query:
✅ Output of query on that data
What is an Aggregate Function?
An aggregate function in SQL is a special type of function that takes multiple rows of data and condenses them into a single value based on a calculation you specify.
They’re used when you want summarized information, such as totals, averages, or counts, rather than listing individual rows.
Common SQL aggregate functions include:
• COUNT(): Returns the number of rows in a set, or the number of non-NULL values in a specified column.
• SUM(): Calculates the total sum of values in a numeric column.
• MIN(): Retrieves the minimum value from a specified column.
• MAX(): Retrieves the maximum value from a specified column.
📊 Business Use Cases for Aggregation
🛍 Retail & E-Commerce:
Sales Reporting: Calculate
SUM(sales_amount)
to find total revenue per day, month, or store.Top-Selling Products: Use
COUNT(product_id)
grouped by product to identify best performers.
🏥 Healthcare
Patient Volume: Use
COUNT(patient_id)
to track inflow by department.Treatment Success Rate: Aggregate outcomes with
SUM(successful_cases)
/SUM(total_cases)
.
🎓 Education
Performance Tracking:
AVG(marks)
per subject or class.Top Scores: Identify
MAX(score)
for award consideration.
Advanced patterns of SQL Window Functions:
1. Row-to-Row Comparisons
Functions Used:
LAG()
andLEAD()
Purpose: Compare a row with its preceding or following row within the same partition.
Typical Applications:
Finding trends or changes over time
Calculating deltas between consecutive events
Detecting anomalies in sequential data
(Conceptual Example):
2. Rolling and Moving Calculations
Functions Used:
AVG()
,SUM()
, etc., combined with window framesPurpose: Compute metrics like rolling sums, running averages, or cumulative totals.
Applications:
Smoothing time-series
Spotting moving trends
Monitoring performance metrics
4. Conditional Insights
Approach: Use
CASE
inside a window function to apply conditional aggregations.Applications:
Aggregating only certain categories of data while still operating in a partition
Segmenting by logic without breaking partitions
Performance Principles:
Indexes & Partitions
Index columns that are used in
PARTITION BY
andORDER BY
.Consider table partitioning to reduce the data each query scans.
Window Frame Efficiency
Use specific frames (e.g.,
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
) instead of broad ones likeUNBOUNDED PRECEDING
when possible. Narrow windows run faster.
Reduce Complexity
Keep only necessary columns in the
OVER()
clause.Simplify expressions inside window functions; push heavy calculations outside the window if possible.
Benchmark Queries
Run queries on real or representative datasets.
Use EXPLAIN / execution plans or profiling tools to spot slow steps.
Quick best practices
Use only when needed. If a
GROUP BY
works, use that.Set
PARTITION BY
andORDER BY
carefully, add a tiebreaker (e.g.,ORDER BY ts, id
).Filter early, select fewer columns before the window step.
Use tight frames (
ROWS BETWEEN N PRECEDING AND CURRENT ROW
) instead of unbounded history.Reuse results, don’t compute the same window twice (or use
QUALIFY
where available).Handle NULLs in
LAG/LEAD
with a default value.Watch performance: check the plan/profile; index/partition on the columns you partition/order by.
Conclusion
Window functions add trends, ranks, and rolling metrics without losing row detail. Keep them predictable and fast with precise ordering, small frames, and smaller inputs, and always verify with the execution plan.