Aggregate functions

This page describes the available functions to assist with performing aggregate calculations. Functions are organized by category below.

Function categories

Basic aggregates

Function	Description
avg	Arithmetic mean
count	Count rows or non-NULL values
geomean	Geometric mean
max	Maximum value
min	Minimum value
sum	Sum of values

Positional aggregates

Function	Description
arg_max	Value at the row where another column is maximum
arg_min	Value at the row where another column is minimum
first	First value (by designated timestamp or insertion order)
first_not_null	First non-NULL value
last	Last value (by designated timestamp or insertion order)
last_not_null	Last non-NULL value

Statistical aggregates

Function	Description
corr	Pearson correlation coefficient
covar_pop	Population covariance
covar_samp	Sample covariance
mode	Most frequent value
stddev / stddev_samp	Sample standard deviation
stddev_pop	Population standard deviation
var_pop	Population variance
variance / var_samp	Sample variance

Approximate aggregates

Function	Description
approx_count_distinct	Estimated distinct count using HyperLogLog
approx_median	Approximate median (50th percentile)
approx_percentile	Approximate percentile using HdrHistogram

String aggregates

Function	Description
string_agg	Concatenate values with delimiter
string_distinct_agg	Concatenate distinct values with delimiter

Boolean aggregates

Function	Description
bool_and	True if all values are true
bool_or	True if any value is true

Bitwise aggregates

Function	Description
bit_and	Bitwise AND of all non-NULL values
bit_or	Bitwise OR of all non-NULL values
bit_xor	Bitwise XOR of all non-NULL values

Specialized aggregates

Function	Description
count_distinct	Exact count of distinct values
haversine_dist_deg	Total traveled distance from lat/lon points
ksum	Kahan compensated sum (for floating-point precision)
nsum	Neumaier sum (for floating-point precision)
twap	Time-weighted average price
weighted_avg	Weighted arithmetic mean
weighted_stddev	Weighted standard deviation (reliability weights)
weighted_stddev_freq	Weighted standard deviation (frequency weights)
weighted_stddev_rel	Weighted standard deviation (reliability weights)

Array aggregates

Several aggregates on this page (first, first_not_null, last, last_not_null) accept array columns directly. For element-wise aggregation across arrays — summing, averaging, or finding min/max position-by-position — see the array functions page.

Function	Description
array_elem_avg	Element-wise average across arrays
array_elem_max	Element-wise maximum across arrays
array_elem_min	Element-wise minimum across arrays
array_elem_sum	Element-wise sum across arrays

Visualization aggregates

Function	Description
sparkline	Vertical block chart of values within a group

See the visualization functions page for full reference, including the scalar bar function.

QuestDB supports implicit GROUP BY. When aggregate functions are used with non-aggregated columns, QuestDB automatically groups by those columns. Examples in this documentation often omit GROUP BY for brevity.

approx_count_distinct

approx_count_distinct(column_name, precision) - estimates the number of distinct non-NULL values in IPv4, int, or long columns using the HyperLogLog data structure, which provides an approximation rather than an exact count.

The precision of HyperLogLog can be controlled via the optional precision parameter, typically between 4 and 16. A higher precision leads to more accurate results with increased memory usage. The default is 1.

This function is useful within high cardinality datasets where an exact count is not required. Thus consider it the higher cardinality alternative to count_distinct.

Parameters

column_name: The name of the column for which to estimate the count of distinct values.
precision (optional): A number specifying the precision of the HyperLogLog algorithm, which influences the trade-off between accuracy and memory usage. A higher precision gives a more accurate estimate, but consumes more memory. Defaults to 1 (lower accuracy, high efficiency).

Return value

Return value type is long.

Examples

Please note that exact example values will vary as they are approximations derived from the HyperLogLog algorithm.

Estimate count of distinct symbols with precision 5Demo this query

SELECT approx_count_distinct(symbol, 5) FROM fx_trades;

Estimate count of distinct user_id (int) values by date

SELECT date, approx_count_distinct(user_id) FROM sessions GROUP BY date;

date	approx_count_distinct
2023-01-01	2358
2023-01-02	2491
...	...

Estimate count of distinct product_id values by region

SELECT region, approx_count_distinct(product_id) FROM sales GROUP BY region;

region	approx_count_distinct
North	1589
South	1432
East	1675
West	1543

Estimate count of distinct order_ids with precision 8

SELECT approx_count_distinct(order_id, 8) FROM orders;

approx_count_distinct
3456789

Estimate count of distinct transaction_ids by store_id

SELECT store_id, approx_count_distinct(transaction_id) FROM transactions GROUP BY store_id;

store_id	approx_count_distinct
1	56789
2	67890
...	...

approx_median

approx_median(value, precision) calculates the approximate median (50th percentile) of a set of non-negative numeric values using the HdrHistogram algorithm. This is equivalent to calling approx_percentile(value, 0.5, precision).

The function will throw an error if any negative values are encountered in the input. All input values must be non-negative.

Parameters

value is any non-negative numeric value.
precision (optional) is an int value between 0 and 5, inclusive. This is the number of significant decimal digits to which the histogram will maintain value resolution and separation. Higher precision leads to more accurate results with increased memory usage. Defaults to 1 (lower accuracy, high efficiency).

Return value

Return value type is double.

Examples

Approximate median price by symbolDemo this query
SELECT symbol, approx_median(price) FROM trades
WHERE timestamp IN '$today'
GROUP BY symbol;

symbol	approx_median
ETH-BTC	0.0283
BTC-USDT	77824.0
SOL-BTC	0.0010
...	...

Approximate median with higher precisionDemo this query
SELECT symbol, approx_median(price, 3) FROM trades
WHERE timestamp IN '$today'
GROUP BY symbol;

symbol	approx_median
ETH-BTC	0.02861
BTC-USDT	79680.0
SOL-BTC	0.00110
...	...

approx_percentile

approx_percentile(value, percentile, precision) calculates the approximate value for the given non-negative column and percentile using the HdrHistogram algorithm.

Parameters

value is any numeric non-negative value.
percentile is a double value between 0.0 and 1.0, inclusive.
precision is an optional int value between 0 and 5, inclusive. This is the number of significant decimal digits to which the histogram will maintain value resolution and separation. For example, when the input column contains integer values between 0 and 3,600,000,000 and the precision is set to 3, value quantization within the range will be no larger than 1/1,000th (or 0.1%) of any value. In this example, the function tracks and analyzes the counts of observed response times ranging between 1 microsecond and 1 hour in magnitude, while maintaining a value resolution of 1 microsecond up to 1 millisecond, a resolution of 1 millisecond (or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At its maximum tracked value (1 hour), it would still maintain a resolution of 3.6 seconds (or better).

Return value

Return value type is double.

Examples

Approximate 99th percentile of priceDemo this query

SELECT approx_percentile(price, 0.99, 3) FROM fx_trades
WHERE timestamp IN '$today';

approx_percentile
211.24

arg_max

arg_max(value, key) returns the value of the first argument at the row where the second argument reaches its maximum value. This function is useful for finding values at extreme points in time-series and grouped data.

Parameters

value is the column or expression whose value to return.
key is the column or expression used to determine which row to select (the row with the maximum key value).

Return value

Return value type matches the type of the value argument.

Null handling

Rows where key is NULL are ignored during aggregation.
If the value at the maximum key row is NULL, the result is NULL.
If all keys in a group are NULL, the result is NULL.

Supported type combinations

The function supports the following type combinations for value and key:

Value Type	Key Types
double	double, long, timestamp
long	double, timestamp
timestamp	double, long, uuid
uuid	timestamp

Examples

Find when the highest price occurred todayDemo this query

SELECT arg_max(timestamp, price) AS peak_time FROM trades
WHERE timestamp IN '$today';

peak_time
2026-05-08T11:30:13.194999Z

Find when each symbol hit its high todayDemo this query
SELECT symbol, arg_max(timestamp, price) AS peak_time
FROM trades
WHERE timestamp IN '$today'
LIMIT 5;

symbol	peak_time
ETH-BTC	2026-05-08T02:59:54.918999Z
BTC-USDT	2026-05-08T11:30:13.194999Z
SOL-BTC	2026-05-08T03:49:39.657999Z
ADA-USDT	2026-05-08T10:43:44.374000Z
AVAX-USDT	2026-05-08T09:29:03.172000Z

Find the price at each symbol's peak volume todayDemo this query
SELECT symbol, arg_max(price, amount) AS price_at_peak_volume
FROM trades
WHERE timestamp IN '$today'
LIMIT 5;

symbol	price_at_peak_volume
ETH-BTC	0.02865
BTC-USDT	79461.7
SOL-BTC	0.0011062
ADA-USDT	0.2604
AVAX-USDT	9.571

arg_min

arg_min(value, key) returns the value of the first argument at the row where the second argument reaches its minimum value. This function is useful for finding values at extreme points in time-series and grouped data.

Parameters

value is the column or expression whose value to return.
key is the column or expression used to determine which row to select (the row with the minimum key value).

Return value

Return value type matches the type of the value argument.

Null handling

Rows where key is NULL are ignored during aggregation.
If the value at the minimum key row is NULL, the result is NULL.
If all keys in a group are NULL, the result is NULL.

Supported type combinations

The function supports the following type combinations for value and key:

Value Type	Key Types
double	double, long, timestamp
long	double, timestamp
timestamp	double, long, uuid
uuid	timestamp

Examples

Find when the lowest price occurred todayDemo this query

SELECT arg_min(timestamp, price) AS bottom_time FROM trades
WHERE timestamp IN '$today';

bottom_time
2026-05-08T11:10:07.520999Z

Find when each symbol hit its low todayDemo this query
SELECT symbol, arg_min(timestamp, price) AS trough_time
FROM trades
WHERE timestamp IN '$today'
LIMIT 5;

symbol	trough_time
ETH-BTC	2026-05-08T10:55:22.411000Z
BTC-USDT	2026-05-08T03:05:18.441999Z
SOL-BTC	2026-05-08T11:10:07.520999Z
ADA-USDT	2026-05-08T03:05:18.430000Z
AVAX-USDT	2026-05-08T03:05:18.536000Z

Find the price at each symbol's lowest volume todayDemo this query
SELECT symbol, arg_min(price, amount) AS price_at_min_volume
FROM trades
WHERE timestamp IN '$today'
LIMIT 5;

symbol	price_at_min_volume
ETH-BTC	0.02863
BTC-USDT	80023.2
SOL-BTC	0.0011083
ADA-USDT	0.263
AVAX-USDT	9.474

avg

avg(value) calculates simple average of values ignoring missing data (e.g NULL values).

Parameters

value is any numeric value.

Return value

Return value type is double.

Examples

Average trade priceDemo this query

SELECT avg(price) FROM fx_trades
WHERE timestamp IN '$today';

Average trade price by symbolDemo this query
SELECT symbol, avg(price) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

bit_and

bit_and(value) returns the bitwise AND of all non-NULL values in an integer column.

Parameters

value is a byte, short, int, or long column.

Return value

Return value type matches the type of the argument.

Examples

Bitwise AND of all status flags

SELECT bit_and(flags) FROM events;

bit_and
4

Bitwise AND of status flags by category

SELECT category, bit_and(status_flags) FROM items;

category	bit_and
A	1
B	0
C	5

bit_or

bit_or(value) returns the bitwise OR of all non-NULL values in an integer column.

Parameters

value is a byte, short, int, or long column.

Return value

Return value type matches the type of the argument.

Examples

Bitwise OR of all permissions

SELECT bit_or(permissions) FROM users;

bit_or
15

Bitwise OR of permissions by role

SELECT role, bit_or(permissions) FROM users;

role	bit_or
admin	255
editor	31
viewer	1

bit_xor

bit_xor(value) returns the bitwise XOR of all non-NULL values in an integer column.

Parameters

value is a byte, short, int, or long column.

Return value

Return value type matches the type of the argument.

Examples

Bitwise XOR of all checksums

SELECT bit_xor(checksum) FROM data;

bit_xor
42

Bitwise XOR by partition

SELECT partition_id, bit_xor(value) FROM records;

partition_id	bit_xor
1	170
2	85

bool_and

bool_and(value) returns true if all non-NULL values in the group are true, otherwise returns false. This function is useful for checking if a condition holds across all rows in a group.

Parameters

value is a boolean column or expression.

Return value

Return value type is boolean.

Examples

Check if all orders are fulfilled

SELECT bool_and(is_fulfilled) FROM orders;

bool_and
false

Check if all items passed QA by batch

SELECT batch_id, bool_and(passed_qa) FROM items;

batch_id	bool_and
1	true
2	false
3	true

Check if all prices are above threshold

SELECT symbol, bool_and(price > 100) FROM trades;

symbol	bool_and
BTC-USD	true
ETH-USD	false

bool_or

bool_or(value) returns true if any non-NULL value in the group is true, otherwise returns false. This function is useful for checking if a condition holds for at least one row in a group.

Parameters

value is a boolean column or expression.

Return value

Return value type is boolean.

Examples

Check if any order has errors

SELECT bool_or(has_error) FROM orders;

bool_or
true

Check if any item failed QA by batch

SELECT batch_id, bool_or(failed_qa) FROM items;

batch_id	bool_or
1	false
2	true
3	false

Check if any trade exceeded volume threshold

SELECT symbol, bool_or(volume > 1000000) FROM trades;

symbol	bool_or
BTC-USD	true
ETH-USD	true

corr

corr(arg0, arg1) is a function that measures how closely two sets of numbers move in the same direction. It does this by comparing how much each number in each set differs from the average of its set. This calculation is based on Welford's Algorithm.

If the numbers in both sets tend to be above or below their average values at the same time, the function will return a value close to 1.
If one set of numbers tends to be above its average value when the other set is below its average, the function will return a value close to -1.
If there's no clear pattern, the function will return a value close to 0.

Parameters

arg0 is any numeric value representing the first variable
arg1 is any numeric value representing the second variable

Return value

Return value type is double.

Examples

Correlation between price and quantityDemo this query

SELECT corr(price, quantity) FROM fx_trades
WHERE timestamp IN '$today';

corr
0.0070

Correlation between price and quantity by symbolDemo this query
SELECT symbol, corr(price, quantity) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

symbol	corr
USDCAD	0.0019
NZDUSD	0.0065
EURUSD	-0.0001
EURCHF	0.0170
USDZAR	0.0085

count

count() or count(*) - counts the number of rows irrespective of underlying data.
count(column_name) - counts the number of non-NULL values in a given column.
count(distinct column_name) - counts the number of distinct non-NULL values in a given column. This is identical to count_distinct(column_name).

Parameters

count() does not require arguments.
count(column_name) - supports the following data types:
- double
- float
- integer
- character
- short
- byte
- timestamp
- date
- long
- long256
- geohash
- varchar
- string
- symbol

Return value

Return value type is long.

Examples

Count of rows in the fx_trades table:

Count all rowsDemo this query

SELECT count() FROM fx_trades
WHERE timestamp IN '$today';

Count of rows aggregated by symbol:

Count by symbolDemo this query
SELECT symbol, count() FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

Count non-NULL values in a specific column:

Count non-NULL pricesDemo this query
SELECT symbol, count(price) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

Count distinct values using standard SQL syntax (identical to count_distinct):

Count distinct ECNs per symbolDemo this query
SELECT symbol, count(distinct ecn) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

note

NULL values are aggregated with count(), but not with count(column_name) or count(distinct column_name).

count_distinct

count_distinct(column_name) - counts distinct non-NULL values in varchar, symbol, long256, UUID, IPv4, long, int or string columns.

tip

count_distinct is available for backwards compatibility. We recommend using the standard SQL syntax count(distinct column_name) instead.

Return value

Return value type is long.

Examples

Count of distinct sides (buy/sell):

Count distinct sidesDemo this query

SELECT count_distinct(side) FROM fx_trades
WHERE timestamp IN '$today';

count_distinct
2

Count of distinct ECNs per symbol:

Count distinct ECNs by symbolDemo this query
SELECT symbol, count_distinct(ecn) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

covar_pop

covar_pop(arg0, arg1) is a function that measures how much two sets of numbers change together. It does this by looking at how much each number in each set differs from the average of its set. It multiplies these differences together, adds them all up, and then divides by the total number of pairs. This gives a measure of the overall trend.

If the numbers in both sets tend to be above or below their average values at the same time, the function will return a positive number.
If one set of numbers tends to be above its average value when the other set is below its average, the function will return a negative number.
The closer the result is to zero, the less relationship there is between the two sets of numbers.

Parameters

arg0 is any numeric value representing the first variable
arg1 is any numeric value representing the second variable.

Return value

Return value type is double.

Examples

Population covariance between price and quantityDemo this query

SELECT covar_pop(price, quantity) FROM fx_trades
WHERE timestamp IN '$today';

Population covariance by symbolDemo this query
SELECT symbol, covar_pop(price, quantity) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

covar_samp

covar_samp(arg0, arg1) is a function that finds the relationship between two sets of numbers. It does this by looking at how much the numbers vary from the average in each set.

If the numbers in both sets tend to be above or below their average values at the same time, the function will return a positive number.
If one set of numbers tends to be above its average value when the other set is below its average, the function will return a negative number.
The closer the result is to zero, the less relationship there is between the two sets of numbers.

Parameters

arg0 is any numeric value representing the first variable.
arg1 is any numeric value representing the second variable.

Return value

Return value type is double.

Examples

Sample covariance between price and quantityDemo this query

SELECT covar_samp(price, quantity) FROM fx_trades
WHERE timestamp IN '$today';

Sample covariance by symbolDemo this query
SELECT symbol, covar_samp(price, quantity) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

first

first(column_name) - returns the first value of a column.

Supported column datatype: double, float, integer, IPv4, character, short, byte, timestamp, date, long, geohash, symbol, varchar, uuid and array.

If a table has a designated timestamp, then the first row is always the row with the lowest timestamp (oldest). For a table without a designated timestamp column, first returns the first row regardless of any timestamp column.

Return value

Return value type is the same as the type of the argument.

Examples

Given a table trades, which has a designated timestamp column:

symbol	price	ts
AAPL	142	2021-06-02T14:33:19.970258Z
GOOGL	2750	2021-06-02T14:33:21.703934Z
MSFT	285	2021-06-02T14:33:23.707013Z

The following query returns oldest value for the symbol column:

SELECT first(symbol) FROM trades;

first
AAPL

Without selecting a designated timestamp column, the table may be unordered and the query may return different result. Given an unordered table trades_unordered:

symbol	price	ts
AAPL	142	2021-06-02T14:33:19.970258Z
MSFT	285	2021-06-02T14:33:23.707013Z
GOOGL	2750	2021-06-02T14:33:21.703934Z

The following query returns the first record for the symbol column:

SELECT first(symbol) FROM trades_unordered;

first
AAPL

first_not_null

first_not_null(column_name) - returns the first non-NULL value of a column.

Supported column datatype: double, float, integer, IPv4, char, short, byte, timestamp, date, long, geohash, symbol, varchar, uuid and array.

If a table has a designated timestamp, then the first non-NULL row is always the row with the lowest timestamp (oldest). For a table without a designated timestamp column, first_not_null returns the first non-NULL row, regardless of any timestamp column.

Return value

Return value type is the same as the type of the argument.

Examples

Given a table trades, which has a designated timestamp column:

symbol	price	ts
NULL	142	2021-06-02T14:33:19.970258Z
GOOGL	2750	2021-06-02T14:33:21.703934Z
MSFT	285	2021-06-02T14:33:23.707013Z

The following query returns oldest non-NULL value for the symbol column:

SELECT first_not_null(symbol) FROM trades;

first_not_null
GOOGL

Without selecting a designated timestamp column, the table may be unordered and the query may return different result. Given an unordered table trades_unordered:

symbol	price	ts
NULL	142	2021-06-02T14:33:19.970258Z
MSFT	285	2021-06-02T14:33:23.707013Z
GOOGL	2750	2021-06-02T14:33:21.703934Z

The following query returns the first non-NULL record for the symbol column:

SELECT first_not_null(symbol) FROM trades_unordered;

first_not_null
MSFT

geomean

geomean(value) calculates the geometric mean of a set of positive values. The geometric mean is computed using the formula exp(avg(ln(x))), which prevents overflow issues with large products by using logarithms.

The geometric mean is useful for calculating average growth rates, ratios, and other multiplicative quantities.

Parameters

value is a double column or expression. Other numeric types are implicitly converted to double.

Return value

Return value type is double.

Null and edge case handling

Input	Result	Reason
Negative values	`NULL`	Geometric mean undefined
Zero values	`NULL`	`ln(0)` is undefined
NULL values	Skipped	Standard aggregate behavior
Empty group	`NULL`	Standard aggregate behavior

Examples

Geometric mean of growth rates

SELECT geomean(growth_rate) FROM quarterly_data;

geomean
1.12

Geometric mean of returns by asset

SELECT asset, geomean(return_factor) FROM portfolio;

asset	geomean
stocks	1.08
bonds	1.03
crypto	1.25

Comparing arithmetic and geometric means
SELECT avg(return_factor) AS arithmetic_mean,
       geomean(return_factor) AS geometric_mean
FROM investments;

arithmetic_mean	geometric_mean
1.15	1.12

haversine_dist_deg

haversine_dist_deg(lat, lon, ts) - calculates the traveled distance for a series of latitude and longitude points.

Parameters

lat is the latitude expressed as degrees in decimal format (double)
lon is the longitude expressed as degrees in decimal format (double)
ts is the timestamp for the data point

Return value

Return value type is double.

Examples

Calculate the aggregate traveled distance for each car_id

SELECT car_id, haversine_dist_deg(lat, lon, k)
FROM rides;

ksum

ksum(value) - adds values ignoring missing data (e.g NULL values). Values are added using the

Kahan compensated sum algorithm. This is only beneficial for floating-point values such as float or double.

Parameters

value is any numeric value.

Return value

Return value type is the same as the type of the argument.

Examples

Kahan compensated sum of random doublesDemo this query

SELECT ksum(a)
FROM (SELECT rnd_double() a FROM long_sequence(100));

last

last(column_name) - returns the last value of a column.

Supported column datatype: double, float, integer, IPv4, character, short, byte, timestamp, date, long, geohash, symbol, varchar, uuid and array.

If a table has a designated timestamp, the last row is always the one with the highest (latest) timestamp.

For a table without a designated timestamp column, last returns the last inserted row, regardless of any timestamp column.

Return value

Return value type is the same as the type of the argument.

Examples

Given a table trades, which has a designated timestamp column:

symbol	price	ts
AAPL	142	2021-06-02T14:33:19.970258Z
GOOGL	2750	2021-06-02T14:33:21.703934Z
MSFT	285	2021-06-02T14:33:23.707013Z

The following query returns the latest value for the symbol column:

SELECT last(symbol) FROM trades;

last
MSFT

Without selecting a designated timestamp column, the table may be unordered and the query may return different result. Given an unordered table trades_unordered:

symbol	price	ts
AAPL	142	2021-06-02T14:33:19.970258Z
MSFT	285	2021-06-02T14:33:23.707013Z
GOOGL	2750	2021-06-02T14:33:21.703934Z

The following query returns the last record for the symbol column:

SELECT last(symbol) FROM trades_unordered;

last
GOOGL

last_not_null

last_not_null(column_name) - returns the last non-NULL value of a column.

Supported column datatype: double, float, integer, IPv4, char, short, byte, timestamp, date, long, geohash, symbol, varchar, uuid and array.

If a table has a designated timestamp, then the last non-NULL row is always the row with the highest timestamp (most recent). For a table without a designated timestamp column, last_not_null returns the last non-NULL row, regardless of any timestamp column.

Return value

Return value type is the same as the type of the argument.

Examples

Given a table trades, which has a designated timestamp column:

symbol	price	ts
NULL	142	2021-06-02T14:33:19.970258Z
GOOGL	2750	2021-06-02T14:33:21.703934Z
MSFT	285	2021-06-02T14:33:23.707013Z

The following query returns most recent non-NULL value for the symbol column:

SELECT last_not_null(symbol) FROM trades;

last_not_null
MSFT

Without selecting a designated timestamp column, the table may be unordered and the query may return different result. Given an unordered table trades_unordered:

symbol	price	ts
NULL	142	2021-06-02T14:33:19.970258Z
MSFT	285	2021-06-02T14:33:23.707013Z
GOOGL	2750	2021-06-02T14:33:21.703934Z

The following query returns the last non-NULL record for the symbol column:

SELECT last_not_null(symbol) FROM trades_unordered;

last_not_null
GOOGL

max

max(value) - returns the highest value ignoring missing data (e.g NULL values).

Parameters

value is any numeric or string value

Return value

Return value type is the same as the type of the argument.

Examples

Highest trade priceDemo this query

SELECT max(price) FROM fx_trades
WHERE timestamp IN '$today';

Highest trade price by symbolDemo this query
SELECT symbol, max(price) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

min

min(value) - returns the lowest value ignoring missing data (e.g NULL values).

Parameters

value is any numeric or string value

Return value

Return value type is the same as the type of the argument.

Examples

Lowest trade priceDemo this query

SELECT min(price) FROM fx_trades
WHERE timestamp IN '$today';

Lowest trade price by symbolDemo this query
SELECT symbol, min(price) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

mode

mode(value) - calculates the mode (most frequent) value out of a particular dataset.

For mode(B), if there are an equal number of true and false values, true will be returned as a tie-breaker.

For other modes, if there are equal mode values, the returned value will be whichever the code identifies first.

To make the result deterministic, you must enforce an underlying sort order.

Parameters

value - one of (LONG, DOUBLE, BOOLEAN, STRING, VARCHAR, SYMBOL)

Return value

Return value type is the same as the type of the input value.

Examples

With this dataset:

symbol	value
A	alpha
A	alpha
A	alpha
A	omega
B	beta
B	beta
B	gamma

SELECT symbol, mode(value) as mode FROM dataset;

symbol	mode
A	alpha
B	beta

On demo:

Most frequent side per symbolDemo this query
SELECT symbol, mode(side)
FROM trades
WHERE timestamp IN '$today'
ORDER BY symbol ASC;

symbol	mode
ADA-USDT	sell
AVAX-USDT	buy
BTC-USDT	buy
DOT-USDT	sell
ETH-BTC	buy
ETH-USDT	buy
...	...

nsum

nsum(value) - adds values ignoring missing data (e.g NULL values). Values are added using the Neumaier sum algorithm. This is only beneficial for floating-point values such as float or double.

Parameters

value is any numeric value.

Return value

Return value type is double.

Examples

Neumaier sum of random doublesDemo this query

SELECT nsum(a)
FROM (SELECT rnd_double() a FROM long_sequence(100));

stddev / stddev_samp

stddev_samp(value) - Calculates the sample standard deviation of a set of values, ignoring missing data (e.g., NULL values). The sample standard deviation is a measure of the amount of variation or dispersion in a sample of a population. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

stddev is an alias for stddev_samp.

Parameters

value is any numeric value.

Return value

Return value type is double.

Examples

Sample standard deviationDemo this query

SELECT stddev_samp(x)
FROM (SELECT x FROM long_sequence(100));

stddev_samp
29.011491975882016

stddev_pop

stddev_pop(value) - Calculates the population standard deviation of a set of values. The population standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

Parameters

value is any numeric value.

Return value

Return value type is double.

Examples

Population standard deviationDemo this query

SELECT stddev_pop(x)
FROM (SELECT x FROM long_sequence(100));

stddev_pop
28.86607004772212

string_agg

string_agg(value, delimiter) - Concatenates the given string values into a single string with the delimiter used as a value separator.

Parameters

value is a varchar value.
delimiter is a char value.

Return value

Return value type is varchar.

Examples

Concatenate values with delimiterDemo this query

SELECT string_agg(x::varchar, ',')
FROM (SELECT x FROM long_sequence(5));

string_agg
1,2,3,4,5

string_distinct_agg

string_distinct_agg(value, delimiter) - concatenates distinct non-NULL string values into a single string, using the specified delimiter to separate the values.

string_distinct_agg ignores NULL values and only concatenates non-NULL distinct values.
Order is guaranteed.
Does not support ORDER BY.

Parameters

value: A varchar or string column containing the values to be aggregated.
delimiter: A char value used to separate the distinct values in the concatenated string.

Return value

Return value type is string.

Examples

Suppose we want to find all the distinct symbols observed in the trades table in our public demo:

All distinct symbols as a comma-separated listDemo this query
SELECT string_distinct_agg(symbol, ',') AS distinct_symbols
FROM trades
WHERE timestamp IN '$today';

distinct_symbols
ETH-USDT,SOL-USDT,ADA-USDT,BTC-USDT,UNI-USDT,AVAX-USDT,LTC-USDT,XLM-USDT,DOT-USDT,...

This query returns a single string containing all the distinct symbol values separated by commas. Even though the symbol column may have many rows with repeated values, string_distinct_agg aggregates only the unique non-NULL values.

You can also group the aggregation by another column.

To find out which symbols are observed for each side:

Distinct symbols by sideDemo this query
SELECT side, string_distinct_agg(symbol, ',') AS distinct_symbols
FROM trades
WHERE timestamp IN '$today';

side	distinct_symbols
buy	ETH-USDT,SOL-USDT,ADA-USDT,BTC-USDT,UNI-USDT,AVAX-USDT,LTC-USDT,XLM-USDT,DOT-...
sell	ADA-USDT,ETH-USDT,BTC-USDT,LTC-USDT,SOL-USDT,UNI-USDT,SOL-BTC,AVAX-USDT,XLM-...

Note we don't need to add GROUP BY side as it is implicit. But you can add it, if you prefer that syntax.

sum

sum(value) - adds values ignoring missing data (e.g NULL values).

Parameters

value is any numeric value.

Return value

Return value type is the same as the type of the argument.

Examples

Sum all quantitiesDemo this query

SELECT sum(quantity) FROM fx_trades
WHERE timestamp IN '$today';

Sum quantities by symbolDemo this query
SELECT symbol, sum(quantity) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

Overflow

sum does not perform overflow check. To avoid overflow, you can cast the argument to wider type.

Cast as long to avoid overflow

SELECT sum(cast(a AS LONG)) FROM my_table;

twap

twap(price, timestamp) - Calculates the time-weighted average price (TWAP) using step-function (forward-fill) integration: each observed price is assumed to persist until the next observation, and the TWAP is the area under this step function divided by the total time span. Unlike VWAP, which weights by volume, TWAP gives equal weight to every time interval — useful for execution benchmarking, algorithmic trading, and fair value reference in illiquid markets. See the TWAP cookbook recipe for practical examples.

\text{TWAP} = \frac{\sum_{i=1}^{n-1} p_i \cdot (t_{i+1} - t_i)}{t_n - t_1}

Where:

$p_i$ is the price at observation $i$
$t_i$ is the timestamp at observation $i$

If all observations share the same timestamp, the function falls back to a simple arithmetic mean.

If the price is NULL, that observation is skipped.

If the timestamp is NULL, that observation is skipped.

If there are no valid observations, the result is NULL.

Supports SAMPLE BY with FILL modes.

Parameters

price is any numeric value.
timestamp is a timestamp value. This is typically the table's designated timestamp but can be any timestamp column.

Return value

Return value type is double.

Examples

TWAP of trade prices for a single symbolDemo this query
SELECT twap(price, timestamp)
FROM trades
WHERE symbol = 'BTC-USDT'
  AND timestamp IN '$yesterday';

twap
80636.70

TWAP per symbolDemo this query
SELECT symbol, twap(price, timestamp)
FROM trades
WHERE timestamp IN '$yesterday';

symbol	twap
ETH-BTC	0.0287
BTC-USDT	80636.70
SOL-BTC	0.0010
ADA-USDT	0.2650
AVAX-USDT	9.5341
...	...

Hourly TWAPDemo this query
SELECT timestamp, symbol, twap(price, timestamp)
FROM trades
WHERE symbol IN ('BTC-USDT', 'ETH-USDT')
  AND timestamp IN '$yesterday'
SAMPLE BY 1h;

timestamp	symbol	twap
2026-05-07T00:00:00.000000Z	BTC-USDT	81242.47
2026-05-07T00:00:00.000000Z	ETH-USDT	2344.47
2026-05-07T01:00:00.000000Z	BTC-USDT	81153.52
2026-05-07T01:00:00.000000Z	ETH-USDT	2328.97
...	...	...

var_pop

var_pop(value) - Calculates the population variance of a set of values. The population variance is a measure of the amount of variation or dispersion of a set of values. A low variance indicates that the values tend to be very close to the mean, while a high variance indicates that the values are spread out over a wider range.

Parameters

value is any numeric value.

Return value

Return value type is double.

Examples

Population varianceDemo this query

SELECT var_pop(x)
FROM (SELECT x FROM long_sequence(100));

var_pop
833.25

variance / var_samp

var_samp(value) - Calculates the sample variance of a set of values. The sample variance is a measure of the amount of variation or dispersion of a set of values in a sample from a population. A low variance indicates that the values tend to be very close to the mean, while a high variance indicates that the values are spread out over a wider range.

variance() is an alias for var_samp.

Parameters

value is any numeric value.

Return value

Return value type is double.

Examples

Sample varianceDemo this query

SELECT var_samp(x)
FROM (SELECT x FROM long_sequence(100));

var_samp
841.6666666666666

weighted_avg

weighted_avg(value, weight) - Calculates the weighted mean (average) of a set of observations (database rows). It calculates the equivalent of:

\bar{x}_w = \frac{\sum w_i x_i}{\sum w_i}

Where:

$x_i$ are the values
$w_i$ are the weights

If the value is NULL, that observation is ignored.

If the weight is NULL or zero, that observation is ignored.

If there are no observations, the result is NULL.

If the weights sum to zero, the result is NULL.

Weights should be non-negative to make sense, but this isn't enforced.

Parameters

value is any numeric value.
weight is any numeric value.

Return value

Return value type is double.

Examples

Weighted average price by trade quantityDemo this query

SELECT weighted_avg(price, quantity) FROM fx_trades
WHERE timestamp IN '$today';

weighted_stddev

weighted_stddev(value, weight) - Calculates the unbiased weighted standard deviation of a set of observations using reliability weights.

This is an alias for weighted_stddev_rel.

weighted_stddev_freq

weighted_stddev_freq(value, weight) - Calculates the unbiased weighted standard deviation of a set of observations using frequency weights.

A frequency weight represents the number of occurrences of each observation in the dataset. This variant uses the frequency-weighted estimator for the population variance. It calculates the equivalent of:

\sqrt{ \frac{ \sum w_i x_i^2 - \frac{(\sum w_i x_i)^2}{\sum w_i} }{ \sum w_i - 1 } }

Where:

$x_i$ are the values
$w_i$ are the frequency weights

If the value is NULL, that observation is ignored.

If the weight is NULL or zero, that observation is ignored.

If there are fewer than two observations, the result is NULL.

Weights should be positive integers to make sense, but this isn't enforced.

Weights must not be normalized. If they sum to one, the result is NULL.

If the sum of weights is negative, the result is NULL.

Parameters

value is any numeric value.
weight is any numeric value representing the frequency weight (typically an integer).

Return value

Return value type is double.

Examples

Weighted standard deviation (frequency) of price by quantityDemo this query

SELECT weighted_stddev_freq(price, quantity) FROM fx_trades
WHERE timestamp IN '$today';

Weighted standard deviation (frequency) by symbolDemo this query
SELECT symbol, weighted_stddev_freq(price, quantity) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

weighted_stddev_rel

weighted_stddev_rel(value, weight) - Calculates the unbiased weighted standard deviation of a set of observations using reliability weights. You can also use the shorthand name weighted_stddev.

A reliability weight represents the "importance" or "trustworthiness" of each observation. This variant uses the reliability-weighted estimator for the population variance. It calculates the equivalent of:

\sqrt{ \frac{ \sum w_i x_i^2 - \frac{(\sum w_i x_i)^2}{\sum w_i} }{ \sum w_i - \frac{\sum w_i^2}{\sum w_i} } }

Where:

$x_i$ are the values
$w_i$ are the reliability weights

If the value is NULL, that observation is ignored.

If the weight is NULL or zero, that observation is ignored.

If there are fewer than two observations, the result is NULL.

Weights should be positive to make sense, but this isn't enforced.

If the sum of weights is not positive, the result is NULL.

Parameters

value is any numeric value.
weight is any numeric value representing the reliability weight.

Return value

Return value type is double.

Examples

Weighted standard deviation of price by quantityDemo this query

SELECT weighted_stddev(price, quantity) FROM fx_trades
WHERE timestamp IN '$today';

Weighted standard deviation by symbolDemo this query
SELECT symbol, weighted_stddev(price, quantity) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

Function categories​

Basic aggregates​

Positional aggregates​

Statistical aggregates​

Approximate aggregates​

String aggregates​

Boolean aggregates​

Bitwise aggregates​

Specialized aggregates​

Array aggregates​

Visualization aggregates​

approx_count_distinct​

Parameters​

Return value​

Examples​

approx_median​

Parameters​

Return value​

Examples​

See also​

approx_percentile​

Parameters​

Return value​

Examples​

See also​

arg_max​

Parameters​

Return value​

Null handling​

Supported type combinations​

Examples​

See also​

arg_min​

Parameters​

Return value​

Null handling​

Supported type combinations​

Examples​

See also​

avg​

Parameters​

Return value​

Examples​

See also​

bit_and​

Parameters​

Return value​

Examples​

See also​

bit_or​

Parameters​

Return value​

Examples​

See also​

bit_xor​

Parameters​

Return value​

Examples​

See also​

bool_and​

Parameters​

Return value​

Examples​

See also​

bool_or​

Parameters​

Return value​

Examples​

See also​

corr​

Parameters​

Return value​

Examples​

count​

Parameters​

Return value​

Examples​

See also​

count_distinct​

Return value​

Function categories

Basic aggregates

Positional aggregates

Statistical aggregates

Approximate aggregates

String aggregates

Boolean aggregates

Bitwise aggregates

Specialized aggregates

Array aggregates

Visualization aggregates

approx_count_distinct

Parameters

Return value

Examples

approx_median

Parameters

Return value

Examples

See also

approx_percentile

Parameters

Return value

Examples

See also

arg_max

Parameters

Return value

Null handling

Supported type combinations

Examples

See also

arg_min

Parameters

Return value

Null handling

Supported type combinations

Examples

See also

avg

Parameters

Return value

Examples

See also

bit_and

Parameters

Return value

Examples

See also

bit_or

Parameters

Return value

Examples

See also

bit_xor

Parameters

Return value

Examples

See also

bool_and

Parameters

Return value

Examples

See also

bool_or

Parameters

Return value

Examples

See also

corr

Parameters

Return value

Examples

count

Parameters

Return value

Examples

See also

count_distinct

Return value