sqlcmd.net validated sql reference
advanced aggregation PostgreSQL

Compute Pearson Correlation and Covariance

Measure the strength of a linear relationship between two columns with `CORR(y, x)` and covariance with `COVAR_SAMP` / `COVAR_POP` in PostgreSQL.

Docker-validated Not currently validation-green

Measure the correlation between ad spend and conversions across five campaigns

The five (ad_spend, conversions) pairs are (1,2), (2,3), (3,5), (4,4), (5,6). Mean ad spend is 3; mean conversions is 4. The sum of cross-deviations Σ(xi−3)(yi−4) is (−2)(−2)+(−1)(−1)+(0)(1)+(1)(0)+(2)(2) = 4+1+0+0+4 = 9. Both variance sums Σ(xi−3)² and Σ(yi−4)² equal 10. CORR = 9 / √(10×10) = 9/10 = 0.9 — a strong positive linear relationship. COVAR_SAMP = 9 / (5−1) = 2.25. COVAR_POP = 9 / 5 = 1.8. A correlation of 0.9 suggests ad spend and conversions move closely together, though correlation alone does not establish causation. To compute CORR in engines without a built-in function, use (AVG(ad_spend conversions) − AVG(ad_spend) AVG(conversions)) / (STDDEV_POP(ad_spend) * STDDEV_POP(conversions)).

Rows loaded before the example query runs.
Setup
CREATE TABLE campaign_data (ad_spend INT, conversions INT);

INSERT INTO
  campaign_data
VALUES
  (1, 2),
  (2, 3),
  (3, 5),
  (4, 4),
  (5, 6);
Shared across supported engines.
SQL
SELECT
  CORR(conversions, ad_spend) AS correlation,
  COVAR_SAMP(conversions, ad_spend) AS covar_samp,
  COVAR_POP(conversions, ad_spend) AS covar_pop
FROM
  campaign_data;
Returned rows for the shared example.
correlationcovar_sampcovar_pop
0.92.251.8

Only PostgreSQL supports CORR, COVAR_SAMP, and COVAR_POP as built-in aggregates.

Where this command helps.

  • checking whether advertising spend is linearly correlated with conversions before fitting a model
  • quantifying how strongly two product metrics move together to detect multicollinearity before regression

What the command is doing.

CORR(y, x) returns the Pearson correlation coefficient — a value in [−1, 1] that measures how closely two columns move together. A value of 1 means a perfect positive linear relationship, −1 a perfect inverse relationship, and 0 no linear relationship at all. COVAR_SAMP(y, x) returns the sample covariance Σ(xi − x̄)(yi − ȳ) / (n − 1); COVAR_POP(y, x) divides by n instead. The correlation coefficient equals COVAR_POP / (STDDEV_POP(x) × STDDEV_POP(y)). PostgreSQL exposes all three as standard ordered-set aggregates. MySQL, MariaDB, SQL Server, and SQLite have no built-in CORR function; the equivalent can be computed with aggregate arithmetic: (AVG(x y) − AVG(x) AVG(y)) / (STDDEV_POP(x) * STDDEV_POP(y)). Like other statistical aggregates, all three functions can also be used as window functions with OVER() to compute per-partition correlations.