sqlcmd.net validated sql reference
advanced aggregation PostgreSQL

Fit a Linear Regression With REGR_SLOPE and REGR_INTERCEPT

Compute the least-squares slope, intercept, and R² of a linear regression directly in SQL using `REGR_SLOPE(y, x)`, `REGR_INTERCEPT(y, x)`, and `REGR_R2(y, x)` in PostgreSQL.

Docker-validated Not currently validation-green

Find the slope and intercept of the study-hours vs test-score relationship

The three data points (1,3), (2,5), (3,7) lie perfectly on the line score = 2 × hours + 1, so the slope is 2.0, the intercept is 1.0, and R² = 1.0 (perfect fit — no residual variance). In practice, real data is noisy and R² will be less than 1. The slope 2.0 means each additional hour of study predicts an increase of 2 points. To predict a score for a new hours value, multiply slope by hours and add intercept: 4 hours → 2 × 4 + 1 = 9. To compute slope manually in engines without REGR_SLOPE: (n·Σxy − Σx·Σy) / (n·Σx² − (Σx)²) where n = COUNT(*), which for this data gives (3×34 − 6×15) / (3×14 − 36) = 12/6 = 2.0.

Rows loaded before the example query runs.
Setup
CREATE TABLE study_data (hours_studied INT, test_score INT);

INSERT INTO
  study_data
VALUES
  (1, 3),
  (2, 5),
  (3, 7);
Shared across supported engines.
SQL
SELECT
  REGR_SLOPE(test_score, hours_studied) AS slope,
  REGR_INTERCEPT(test_score, hours_studied) AS intercept,
  REGR_R2(test_score, hours_studied) AS r_squared
FROM
  study_data;
Returned rows for the shared example.
slopeinterceptr_squared
211

Only PostgreSQL supports REGR_SLOPE, REGR_INTERCEPT, and REGR_R2 as built-in aggregates.

Where this command helps.

  • finding the linear trend line through time-series data to quantify the rate of change per period
  • computing the price elasticity of demand by regressing log(quantity) on log(price)

What the command is doing.

PostgreSQL exposes the full set of SQL:2003 linear regression aggregates. REGR_SLOPE(y, x) returns the slope b and REGR_INTERCEPT(y, x) returns the intercept a of the ordinary least-squares line ŷ = a + b·x that best fits the (x, y) pairs. REGR_R2(y, x) returns the coefficient of determination R², the square of the Pearson correlation, which measures what proportion of the variance in y is explained by x — a value of 1.0 indicates a perfect linear fit. Additional functions cover the count of non-null pairs (REGR_COUNT), averages of both variables (REGR_AVGX, REGR_AVGY), and variance of x (REGR_SXX, REGR_SYY, REGR_SXY). MySQL, MariaDB, SQL Server, and SQLite lack these built-ins; the slope can be computed manually as (n·Σxy − Σx·Σy) / (n·Σx² − (Σx)²) and the intercept as (AVG(y) − slope·AVG(x)). These aggregates also work as window functions for computing local regressions per partition.