Aggregate functions

This page describes the available functions to assist with performing aggregate calculations. Functions are organized by category below.

Function categories

Basic aggregates

FunctionDescription
avgArithmetic mean
countCount rows or non-NULL values
geomeanGeometric mean
maxMaximum value
minMinimum value
sumSum of values

Positional aggregates

FunctionDescription
arg_maxValue at the row where another column is maximum
arg_minValue at the row where another column is minimum
firstFirst value (by designated timestamp or insertion order)
first_not_nullFirst non-NULL value
lastLast value (by designated timestamp or insertion order)
last_not_nullLast non-NULL value

Statistical aggregates

FunctionDescription
corrPearson correlation coefficient
covar_popPopulation covariance
covar_sampSample covariance
modeMost frequent value
stddev / stddev_sampSample standard deviation
stddev_popPopulation standard deviation
var_popPopulation variance
variance / var_sampSample variance

Approximate aggregates

FunctionDescription
approx_count_distinctEstimated distinct count using HyperLogLog
approx_medianApproximate median (50th percentile)
approx_percentileApproximate percentile using HdrHistogram

String aggregates

FunctionDescription
string_aggConcatenate values with delimiter
string_distinct_aggConcatenate distinct values with delimiter

Boolean aggregates

FunctionDescription
bool_andTrue if all values are true
bool_orTrue if any value is true

Bitwise aggregates

FunctionDescription
bit_andBitwise AND of all non-NULL values
bit_orBitwise OR of all non-NULL values
bit_xorBitwise XOR of all non-NULL values

Specialized aggregates

FunctionDescription
count_distinctExact count of distinct values
haversine_dist_degTotal traveled distance from lat/lon points
ksumKahan compensated sum (for floating-point precision)
nsumNeumaier sum (for floating-point precision)
twapTime-weighted average price
weighted_avgWeighted arithmetic mean
weighted_stddevWeighted standard deviation (reliability weights)
weighted_stddev_freqWeighted standard deviation (frequency weights)
weighted_stddev_relWeighted standard deviation (reliability weights)

Array aggregates

Several aggregates on this page (first, first_not_null, last, last_not_null) accept array columns directly. For element-wise aggregation across arrays — summing, averaging, or finding min/max position-by-position — see the array functions page.

FunctionDescription
array_elem_avgElement-wise average across arrays
array_elem_maxElement-wise maximum across arrays
array_elem_minElement-wise minimum across arrays
array_elem_sumElement-wise sum across arrays

Visualization aggregates

FunctionDescription
sparklineVertical block chart of values within a group

See the visualization functions page for full reference, including the scalar bar function.


QuestDB supports implicit GROUP BY. When aggregate functions are used with non-aggregated columns, QuestDB automatically groups by those columns. Examples in this documentation often omit GROUP BY for brevity.


approx_count_distinct

approx_count_distinct(column_name, precision) - estimates the number of distinct non-NULL values in IPv4, int, or long columns using the HyperLogLog data structure, which provides an approximation rather than an exact count.

The precision of HyperLogLog can be controlled via the optional precision parameter, typically between 4 and 16. A higher precision leads to more accurate results with increased memory usage. The default is 1.

This function is useful within high cardinality datasets where an exact count is not required. Thus consider it the higher cardinality alternative to count_distinct.

Parameters

  • column_name: The name of the column for which to estimate the count of distinct values.
  • precision (optional): A number specifying the precision of the HyperLogLog algorithm, which influences the trade-off between accuracy and memory usage. A higher precision gives a more accurate estimate, but consumes more memory. Defaults to 1 (lower accuracy, high efficiency).

Return value

Return value type is long.

Examples

Please note that exact example values will vary as they are approximations derived from the HyperLogLog algorithm.

Estimate count of distinct symbols with precision 5Demo this query
SELECT approx_count_distinct(symbol, 5) FROM fx_trades;
Estimate count of distinct user_id (int) values by date
SELECT date, approx_count_distinct(user_id) FROM sessions GROUP BY date;
dateapprox_count_distinct
2023-01-012358
2023-01-022491
......
Estimate count of distinct product_id values by region
SELECT region, approx_count_distinct(product_id) FROM sales GROUP BY region;
regionapprox_count_distinct
North1589
South1432
East1675
West1543
Estimate count of distinct order_ids with precision 8
SELECT approx_count_distinct(order_id, 8) FROM orders;
approx_count_distinct
3456789
Estimate count of distinct transaction_ids by store_id
SELECT store_id, approx_count_distinct(transaction_id) FROM transactions GROUP BY store_id;
store_idapprox_count_distinct
156789
267890
......

approx_median

approx_median(value, precision) calculates the approximate median (50th percentile) of a set of non-negative numeric values using the HdrHistogram algorithm. This is equivalent to calling approx_percentile(value, 0.5, precision).

The function will throw an error if any negative values are encountered in the input. All input values must be non-negative.

Parameters

  • value is any non-negative numeric value.
  • precision (optional) is an int value between 0 and 5, inclusive. This is the number of significant decimal digits to which the histogram will maintain value resolution and separation. Higher precision leads to more accurate results with increased memory usage. Defaults to 1 (lower accuracy, high efficiency).

Return value

Return value type is double.

Examples

Approximate median price by symbolDemo this query
SELECT symbol, approx_median(price) FROM trades
WHERE timestamp IN '$today'
GROUP BY symbol;
symbolapprox_median
ETH-BTC0.0283
BTC-USDT77824.0
SOL-BTC0.0010
......
Approximate median with higher precisionDemo this query
SELECT symbol, approx_median(price, 3) FROM trades
WHERE timestamp IN '$today'
GROUP BY symbol;
symbolapprox_median
ETH-BTC0.02861
BTC-USDT79680.0
SOL-BTC0.00110
......

See also

approx_percentile

approx_percentile(value, percentile, precision) calculates the approximate value for the given non-negative column and percentile using the HdrHistogram algorithm.

Parameters

  • value is any numeric non-negative value.
  • percentile is a double value between 0.0 and 1.0, inclusive.
  • precision is an optional int value between 0 and 5, inclusive. This is the number of significant decimal digits to which the histogram will maintain value resolution and separation. For example, when the input column contains integer values between 0 and 3,600,000,000 and the precision is set to 3, value quantization within the range will be no larger than 1/1,000th (or 0.1%) of any value. In this example, the function tracks and analyzes the counts of observed response times ranging between 1 microsecond and 1 hour in magnitude, while maintaining a value resolution of 1 microsecond up to 1 millisecond, a resolution of 1 millisecond (or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At its maximum tracked value (1 hour), it would still maintain a resolution of 3.6 seconds (or better).

Return value

Return value type is double.

Examples

Approximate 99th percentile of priceDemo this query
SELECT approx_percentile(price, 0.99, 3) FROM fx_trades
WHERE timestamp IN '$today';
approx_percentile
211.24

See also

arg_max

arg_max(value, key) returns the value of the first argument at the row where the second argument reaches its maximum value. This function is useful for finding values at extreme points in time-series and grouped data.

Parameters

  • value is the column or expression whose value to return.
  • key is the column or expression used to determine which row to select (the row with the maximum key value).

Return value

Return value type matches the type of the value argument.

Null handling

  • Rows where key is NULL are ignored during aggregation.
  • If the value at the maximum key row is NULL, the result is NULL.
  • If all keys in a group are NULL, the result is NULL.

Supported type combinations

The function supports the following type combinations for value and key:

Value TypeKey Types
doubledouble, long, timestamp
longdouble, timestamp
timestampdouble, long, uuid
uuidtimestamp

Examples

Find when the highest price occurred todayDemo this query
SELECT arg_max(timestamp, price) AS peak_time FROM trades
WHERE timestamp IN '$today';
peak_time
2026-05-08T11:30:13.194999Z
Find when each symbol hit its high todayDemo this query
SELECT symbol, arg_max(timestamp, price) AS peak_time
FROM trades
WHERE timestamp IN '$today'
LIMIT 5;
symbolpeak_time
ETH-BTC2026-05-08T02:59:54.918999Z
BTC-USDT2026-05-08T11:30:13.194999Z
SOL-BTC2026-05-08T03:49:39.657999Z
ADA-USDT2026-05-08T10:43:44.374000Z
AVAX-USDT2026-05-08T09:29:03.172000Z
Find the price at each symbol's peak volume todayDemo this query
SELECT symbol, arg_max(price, amount) AS price_at_peak_volume
FROM trades
WHERE timestamp IN '$today'
LIMIT 5;
symbolprice_at_peak_volume
ETH-BTC0.02865
BTC-USDT79461.7
SOL-BTC0.0011062
ADA-USDT0.2604
AVAX-USDT9.571

See also

  • arg_min - Value at the row where another column is minimum
  • max - Returns the maximum value itself
  • last - Returns the last value by timestamp order

arg_min

arg_min(value, key) returns the value of the first argument at the row where the second argument reaches its minimum value. This function is useful for finding values at extreme points in time-series and grouped data.

Parameters

  • value is the column or expression whose value to return.
  • key is the column or expression used to determine which row to select (the row with the minimum key value).

Return value

Return value type matches the type of the value argument.

Null handling

  • Rows where key is NULL are ignored during aggregation.
  • If the value at the minimum key row is NULL, the result is NULL.
  • If all keys in a group are NULL, the result is NULL.

Supported type combinations

The function supports the following type combinations for value and key:

Value TypeKey Types
doubledouble, long, timestamp
longdouble, timestamp
timestampdouble, long, uuid
uuidtimestamp

Examples

Find when the lowest price occurred todayDemo this query
SELECT arg_min(timestamp, price) AS bottom_time FROM trades
WHERE timestamp IN '$today';
bottom_time
2026-05-08T11:10:07.520999Z
Find when each symbol hit its low todayDemo this query
SELECT symbol, arg_min(timestamp, price) AS trough_time
FROM trades
WHERE timestamp IN '$today'
LIMIT 5;
symboltrough_time
ETH-BTC2026-05-08T10:55:22.411000Z
BTC-USDT2026-05-08T03:05:18.441999Z
SOL-BTC2026-05-08T11:10:07.520999Z
ADA-USDT2026-05-08T03:05:18.430000Z
AVAX-USDT2026-05-08T03:05:18.536000Z
Find the price at each symbol's lowest volume todayDemo this query
SELECT symbol, arg_min(price, amount) AS price_at_min_volume
FROM trades
WHERE timestamp IN '$today'
LIMIT 5;
symbolprice_at_min_volume
ETH-BTC0.02863
BTC-USDT80023.2
SOL-BTC0.0011083
ADA-USDT0.263
AVAX-USDT9.474

See also

  • arg_max - Value at the row where another column is maximum
  • min - Returns the minimum value itself
  • first - Returns the first value by timestamp order

avg

avg(value) calculates simple average of values ignoring missing data (e.g NULL values).

Parameters

  • value is any numeric value.

Return value

Return value type is double.

Examples

Average trade priceDemo this query
SELECT avg(price) FROM fx_trades
WHERE timestamp IN '$today';
Average trade price by symbolDemo this query
SELECT symbol, avg(price) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

See also

bit_and

bit_and(value) returns the bitwise AND of all non-NULL values in an integer column.

Parameters

  • value is a byte, short, int, or long column.

Return value

Return value type matches the type of the argument.

Examples

Bitwise AND of all status flags
SELECT bit_and(flags) FROM events;
bit_and
4
Bitwise AND of status flags by category
SELECT category, bit_and(status_flags) FROM items;
categorybit_and
A1
B0
C5

See also

  • bit_or - Bitwise OR of all non-NULL values
  • bit_xor - Bitwise XOR of all non-NULL values

bit_or

bit_or(value) returns the bitwise OR of all non-NULL values in an integer column.

Parameters

  • value is a byte, short, int, or long column.

Return value

Return value type matches the type of the argument.

Examples

Bitwise OR of all permissions
SELECT bit_or(permissions) FROM users;
bit_or
15
Bitwise OR of permissions by role
SELECT role, bit_or(permissions) FROM users;
rolebit_or
admin255
editor31
viewer1

See also

  • bit_and - Bitwise AND of all non-NULL values
  • bit_xor - Bitwise XOR of all non-NULL values

bit_xor

bit_xor(value) returns the bitwise XOR of all non-NULL values in an integer column.

Parameters

  • value is a byte, short, int, or long column.

Return value

Return value type matches the type of the argument.

Examples

Bitwise XOR of all checksums
SELECT bit_xor(checksum) FROM data;
bit_xor
42
Bitwise XOR by partition
SELECT partition_id, bit_xor(value) FROM records;
partition_idbit_xor
1170
285

See also

  • bit_and - Bitwise AND of all non-NULL values
  • bit_or - Bitwise OR of all non-NULL values

bool_and

bool_and(value) returns true if all non-NULL values in the group are true, otherwise returns false. This function is useful for checking if a condition holds across all rows in a group.

Parameters

  • value is a boolean column or expression.

Return value

Return value type is boolean.

Examples

Check if all orders are fulfilled
SELECT bool_and(is_fulfilled) FROM orders;
bool_and
false
Check if all items passed QA by batch
SELECT batch_id, bool_and(passed_qa) FROM items;
batch_idbool_and
1true
2false
3true
Check if all prices are above threshold
SELECT symbol, bool_and(price > 100) FROM trades;
symbolbool_and
BTC-USDtrue
ETH-USDfalse

See also

  • bool_or - True if any value is true

bool_or

bool_or(value) returns true if any non-NULL value in the group is true, otherwise returns false. This function is useful for checking if a condition holds for at least one row in a group.

Parameters

  • value is a boolean column or expression.

Return value

Return value type is boolean.

Examples

Check if any order has errors
SELECT bool_or(has_error) FROM orders;
bool_or
true
Check if any item failed QA by batch
SELECT batch_id, bool_or(failed_qa) FROM items;
batch_idbool_or
1false
2true
3false
Check if any trade exceeded volume threshold
SELECT symbol, bool_or(volume > 1000000) FROM trades;
symbolbool_or
BTC-USDtrue
ETH-USDtrue

See also

  • bool_and - True if all values are true

corr

corr(arg0, arg1) is a function that measures how closely two sets of numbers move in the same direction. It does this by comparing how much each number in each set differs from the average of its set. This calculation is based on Welford's Algorithm.

  • If the numbers in both sets tend to be above or below their average values at the same time, the function will return a value close to 1.

  • If one set of numbers tends to be above its average value when the other set is below its average, the function will return a value close to -1.

  • If there's no clear pattern, the function will return a value close to 0.

Parameters

  • arg0 is any numeric value representing the first variable
  • arg1 is any numeric value representing the second variable

Return value

Return value type is double.

Examples

Correlation between price and quantityDemo this query
SELECT corr(price, quantity) FROM fx_trades
WHERE timestamp IN '$today';
corr
0.0070
Correlation between price and quantity by symbolDemo this query
SELECT symbol, corr(price, quantity) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;
symbolcorr
USDCAD0.0019
NZDUSD0.0065
EURUSD-0.0001
EURCHF0.0170
USDZAR0.0085

count

  • count() or count(*) - counts the number of rows irrespective of underlying data.
  • count(column_name) - counts the number of non-NULL values in a given column.
  • count(distinct column_name) - counts the number of distinct non-NULL values in a given column. This is identical to count_distinct(column_name).

Parameters

  • count() does not require arguments.
  • count(column_name) - supports the following data types:
    • double
    • float
    • integer
    • character
    • short
    • byte
    • timestamp
    • date
    • long
    • long256
    • geohash
    • varchar
    • string
    • symbol

Return value

Return value type is long.

Examples

Count of rows in the fx_trades table:

Count all rowsDemo this query
SELECT count() FROM fx_trades
WHERE timestamp IN '$today';

Count of rows aggregated by symbol:

Count by symbolDemo this query
SELECT symbol, count() FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

Count non-NULL values in a specific column:

Count non-NULL pricesDemo this query
SELECT symbol, count(price) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

Count distinct values using standard SQL syntax (identical to count_distinct):

Count distinct ECNs per symbolDemo this query
SELECT symbol, count(distinct ecn) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;
note

NULL values are aggregated with count(), but not with count(column_name) or count(distinct column_name).

See also

count_distinct

count_distinct(column_name) - counts distinct non-NULL values in varchar, symbol, long256, UUID, IPv4, long, int or string columns.

tip

count_distinct is available for backwards compatibility. We recommend using the standard SQL syntax count(distinct column_name) instead.

Return value

Return value type is long.

Examples

Count of distinct sides (buy/sell):

Count distinct sidesDemo this query
SELECT count_distinct(side) FROM fx_trades
WHERE timestamp IN '$today';
count_distinct
2

Count of distinct ECNs per symbol:

Count distinct ECNs by symbolDemo this query
SELECT symbol, count_distinct(ecn) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

See also

covar_pop

covar_pop(arg0, arg1) is a function that measures how much two sets of numbers change together. It does this by looking at how much each number in each set differs from the average of its set. It multiplies these differences together, adds them all up, and then divides by the total number of pairs. This gives a measure of the overall trend.

  • If the numbers in both sets tend to be above or below their average values at the same time, the function will return a positive number.

  • If one set of numbers tends to be above its average value when the other set is below its average, the function will return a negative number.

  • The closer the result is to zero, the less relationship there is between the two sets of numbers.

Parameters

  • arg0 is any numeric value representing the first variable
  • arg1 is any numeric value representing the second variable.

Return value

Return value type is double.

Examples

Population covariance between price and quantityDemo this query
SELECT covar_pop(price, quantity) FROM fx_trades
WHERE timestamp IN '$today';
Population covariance by symbolDemo this query
SELECT symbol, covar_pop(price, quantity) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

covar_samp

covar_samp(arg0, arg1) is a function that finds the relationship between two sets of numbers. It does this by looking at how much the numbers vary from the average in each set.

  • If the numbers in both sets tend to be above or below their average values at the same time, the function will return a positive number.

  • If one set of numbers tends to be above its average value when the other set is below its average, the function will return a negative number.

  • The closer the result is to zero, the less relationship there is between the two sets of numbers.

Parameters

  • arg0 is any numeric value representing the first variable.
  • arg1 is any numeric value representing the second variable.

Return value

Return value type is double.

Examples

Sample covariance between price and quantityDemo this query
SELECT covar_samp(price, quantity) FROM fx_trades
WHERE timestamp IN '$today';
Sample covariance by symbolDemo this query
SELECT symbol, covar_samp(price, quantity) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

first

  • first(column_name) - returns the first value of a column.

Supported column datatype: double, float, integer, IPv4, character, short, byte, timestamp, date, long, geohash, symbol, varchar, uuid and array.

If a table has a designated timestamp, then the first row is always the row with the lowest timestamp (oldest). For a table without a designated timestamp column, first returns the first row regardless of any timestamp column.

Return value

Return value type is the same as the type of the argument.

Examples

Given a table trades, which has a designated timestamp column:

symbolpricets
AAPL1422021-06-02T14:33:19.970258Z
GOOGL27502021-06-02T14:33:21.703934Z
MSFT2852021-06-02T14:33:23.707013Z

The following query returns oldest value for the symbol column:

SELECT first(symbol) FROM trades;
first
AAPL

Without selecting a designated timestamp column, the table may be unordered and the query may return different result. Given an unordered table trades_unordered:

symbolpricets
AAPL1422021-06-02T14:33:19.970258Z
MSFT2852021-06-02T14:33:23.707013Z
GOOGL27502021-06-02T14:33:21.703934Z

The following query returns the first record for the symbol column:

SELECT first(symbol) FROM trades_unordered;
first
AAPL

See also

  • first_not_null - First non-NULL value
  • last - Last value by timestamp order
  • arg_min - Value at the row where another column is minimum

first_not_null

  • first_not_null(column_name) - returns the first non-NULL value of a column.

Supported column datatype: double, float, integer, IPv4, char, short, byte, timestamp, date, long, geohash, symbol, varchar, uuid and array.

If a table has a designated timestamp, then the first non-NULL row is always the row with the lowest timestamp (oldest). For a table without a designated timestamp column, first_not_null returns the first non-NULL row, regardless of any timestamp column.

Return value

Return value type is the same as the type of the argument.

Examples

Given a table trades, which has a designated timestamp column:

symbolpricets
NULL1422021-06-02T14:33:19.970258Z
GOOGL27502021-06-02T14:33:21.703934Z
MSFT2852021-06-02T14:33:23.707013Z

The following query returns oldest non-NULL value for the symbol column:

SELECT first_not_null(symbol) FROM trades;
first_not_null
GOOGL

Without selecting a designated timestamp column, the table may be unordered and the query may return different result. Given an unordered table trades_unordered:

symbolpricets
NULL1422021-06-02T14:33:19.970258Z
MSFT2852021-06-02T14:33:23.707013Z
GOOGL27502021-06-02T14:33:21.703934Z

The following query returns the first non-NULL record for the symbol column:

SELECT first_not_null(symbol) FROM trades_unordered;
first_not_null
MSFT

See also

geomean

geomean(value) calculates the geometric mean of a set of positive values. The geometric mean is computed using the formula exp(avg(ln(x))), which prevents overflow issues with large products by using logarithms.

The geometric mean is useful for calculating average growth rates, ratios, and other multiplicative quantities.

Parameters

  • value is a double column or expression. Other numeric types are implicitly converted to double.

Return value

Return value type is double.

Null and edge case handling

InputResultReason
Negative valuesNULLGeometric mean undefined
Zero valuesNULLln(0) is undefined
NULL valuesSkippedStandard aggregate behavior
Empty groupNULLStandard aggregate behavior

Examples

Geometric mean of growth rates
SELECT geomean(growth_rate) FROM quarterly_data;
geomean
1.12
Geometric mean of returns by asset
SELECT asset, geomean(return_factor) FROM portfolio;
assetgeomean
stocks1.08
bonds1.03
crypto1.25
Comparing arithmetic and geometric means
SELECT avg(return_factor) AS arithmetic_mean,
geomean(return_factor) AS geometric_mean
FROM investments;
arithmetic_meangeometric_mean
1.151.12

See also

  • avg - Arithmetic mean

haversine_dist_deg

haversine_dist_deg(lat, lon, ts) - calculates the traveled distance for a series of latitude and longitude points.

Parameters

  • lat is the latitude expressed as degrees in decimal format (double)
  • lon is the longitude expressed as degrees in decimal format (double)
  • ts is the timestamp for the data point

Return value

Return value type is double.

Examples

Calculate the aggregate traveled distance for each car_id
SELECT car_id, haversine_dist_deg(lat, lon, k)
FROM rides;

ksum

ksum(value) - adds values ignoring missing data (e.g NULL values). Values are added using the

Kahan compensated sum algorithm. This is only beneficial for floating-point values such as float or double.

Parameters

  • value is any numeric value.

Return value

Return value type is the same as the type of the argument.

Examples

Kahan compensated sum of random doublesDemo this query
SELECT ksum(a)
FROM (SELECT rnd_double() a FROM long_sequence(100));

last

  • last(column_name) - returns the last value of a column.

Supported column datatype: double, float, integer, IPv4, character, short, byte, timestamp, date, long, geohash, symbol, varchar, uuid and array.

If a table has a designated timestamp, the last row is always the one with the highest (latest) timestamp.

For a table without a designated timestamp column, last returns the last inserted row, regardless of any timestamp column.

Return value

Return value type is the same as the type of the argument.

Examples

Given a table trades, which has a designated timestamp column:

symbolpricets
AAPL1422021-06-02T14:33:19.970258Z
GOOGL27502021-06-02T14:33:21.703934Z
MSFT2852021-06-02T14:33:23.707013Z

The following query returns the latest value for the symbol column:

SELECT last(symbol) FROM trades;
last
MSFT

Without selecting a designated timestamp column, the table may be unordered and the query may return different result. Given an unordered table trades_unordered:

symbolpricets
AAPL1422021-06-02T14:33:19.970258Z
MSFT2852021-06-02T14:33:23.707013Z
GOOGL27502021-06-02T14:33:21.703934Z

The following query returns the last record for the symbol column:

SELECT last(symbol) FROM trades_unordered;
last
GOOGL

See also

  • last_not_null - Last non-NULL value
  • first - First value by timestamp order
  • arg_max - Value at the row where another column is maximum

last_not_null

  • last_not_null(column_name) - returns the last non-NULL value of a column.

Supported column datatype: double, float, integer, IPv4, char, short, byte, timestamp, date, long, geohash, symbol, varchar, uuid and array.

If a table has a designated timestamp, then the last non-NULL row is always the row with the highest timestamp (most recent). For a table without a designated timestamp column, last_not_null returns the last non-NULL row, regardless of any timestamp column.

Return value

Return value type is the same as the type of the argument.

Examples

Given a table trades, which has a designated timestamp column:

symbolpricets
NULL1422021-06-02T14:33:19.970258Z
GOOGL27502021-06-02T14:33:21.703934Z
MSFT2852021-06-02T14:33:23.707013Z

The following query returns most recent non-NULL value for the symbol column:

SELECT last_not_null(symbol) FROM trades;
last_not_null
MSFT

Without selecting a designated timestamp column, the table may be unordered and the query may return different result. Given an unordered table trades_unordered:

symbolpricets
NULL1422021-06-02T14:33:19.970258Z
MSFT2852021-06-02T14:33:23.707013Z
GOOGL27502021-06-02T14:33:21.703934Z

The following query returns the last non-NULL record for the symbol column:

SELECT last_not_null(symbol) FROM trades_unordered;
last_not_null
GOOGL

See also

max

max(value) - returns the highest value ignoring missing data (e.g NULL values).

Parameters

  • value is any numeric or string value

Return value

Return value type is the same as the type of the argument.

Examples

Highest trade priceDemo this query
SELECT max(price) FROM fx_trades
WHERE timestamp IN '$today';
Highest trade price by symbolDemo this query
SELECT symbol, max(price) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

See also

  • min - Returns the minimum value
  • arg_max - Returns another column's value at the row where this column is maximum

min

min(value) - returns the lowest value ignoring missing data (e.g NULL values).

Parameters

  • value is any numeric or string value

Return value

Return value type is the same as the type of the argument.

Examples

Lowest trade priceDemo this query
SELECT min(price) FROM fx_trades
WHERE timestamp IN '$today';
Lowest trade price by symbolDemo this query
SELECT symbol, min(price) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

See also

  • max - Returns the maximum value
  • arg_min - Returns another column's value at the row where this column is minimum

mode

mode(value) - calculates the mode (most frequent) value out of a particular dataset.

For mode(B), if there are an equal number of true and false values, true will be returned as a tie-breaker.

For other modes, if there are equal mode values, the returned value will be whichever the code identifies first.

To make the result deterministic, you must enforce an underlying sort order.

Parameters

  • value - one of (LONG, DOUBLE, BOOLEAN, STRING, VARCHAR, SYMBOL)

Return value

Return value type is the same as the type of the input value.

Examples

With this dataset:

symbolvalue
Aalpha
Aalpha
Aalpha
Aomega
Bbeta
Bbeta
Bgamma
SELECT symbol, mode(value) as mode FROM dataset;
symbolmode
Aalpha
Bbeta

On demo:

Most frequent side per symbolDemo this query
SELECT symbol, mode(side)
FROM trades
WHERE timestamp IN '$today'
ORDER BY symbol ASC;
symbolmode
ADA-USDTsell
AVAX-USDTbuy
BTC-USDTbuy
DOT-USDTsell
ETH-BTCbuy
ETH-USDTbuy
......

nsum

nsum(value) - adds values ignoring missing data (e.g NULL values). Values are added using the Neumaier sum algorithm. This is only beneficial for floating-point values such as float or double.

Parameters

  • value is any numeric value.

Return value

Return value type is double.

Examples

Neumaier sum of random doublesDemo this query
SELECT nsum(a)
FROM (SELECT rnd_double() a FROM long_sequence(100));

stddev / stddev_samp

stddev_samp(value) - Calculates the sample standard deviation of a set of values, ignoring missing data (e.g., NULL values). The sample standard deviation is a measure of the amount of variation or dispersion in a sample of a population. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

stddev is an alias for stddev_samp.

Parameters

  • value is any numeric value.

Return value

Return value type is double.

Examples

Sample standard deviationDemo this query
SELECT stddev_samp(x)
FROM (SELECT x FROM long_sequence(100));
stddev_samp
29.011491975882016

stddev_pop

stddev_pop(value) - Calculates the population standard deviation of a set of values. The population standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

Parameters

  • value is any numeric value.

Return value

Return value type is double.

Examples

Population standard deviationDemo this query
SELECT stddev_pop(x)
FROM (SELECT x FROM long_sequence(100));
stddev_pop
28.86607004772212

string_agg

string_agg(value, delimiter) - Concatenates the given string values into a single string with the delimiter used as a value separator.

Parameters

  • value is a varchar value.
  • delimiter is a char value.

Return value

Return value type is varchar.

Examples

Concatenate values with delimiterDemo this query
SELECT string_agg(x::varchar, ',')
FROM (SELECT x FROM long_sequence(5));
string_agg
1,2,3,4,5

string_distinct_agg

string_distinct_agg(value, delimiter) - concatenates distinct non-NULL string values into a single string, using the specified delimiter to separate the values.

  • string_distinct_agg ignores NULL values and only concatenates non-NULL distinct values.

  • Order is guaranteed.

  • Does not support ORDER BY.

Parameters

  • value: A varchar or string column containing the values to be aggregated.
  • delimiter: A char value used to separate the distinct values in the concatenated string.

Return value

Return value type is string.

Examples

Suppose we want to find all the distinct symbols observed in the trades table in our public demo:

All distinct symbols as a comma-separated listDemo this query
SELECT string_distinct_agg(symbol, ',') AS distinct_symbols
FROM trades
WHERE timestamp IN '$today';
distinct_symbols
ETH-USDT,SOL-USDT,ADA-USDT,BTC-USDT,UNI-USDT,AVAX-USDT,LTC-USDT,XLM-USDT,DOT-USDT,...

This query returns a single string containing all the distinct symbol values separated by commas. Even though the symbol column may have many rows with repeated values, string_distinct_agg aggregates only the unique non-NULL values.

You can also group the aggregation by another column.

To find out which symbols are observed for each side:

Distinct symbols by sideDemo this query
SELECT side, string_distinct_agg(symbol, ',') AS distinct_symbols
FROM trades
WHERE timestamp IN '$today';
sidedistinct_symbols
buyETH-USDT,SOL-USDT,ADA-USDT,BTC-USDT,UNI-USDT,AVAX-USDT,LTC-USDT,XLM-USDT,DOT-...
sellADA-USDT,ETH-USDT,BTC-USDT,LTC-USDT,SOL-USDT,UNI-USDT,SOL-BTC,AVAX-USDT,XLM-...

Note we don't need to add GROUP BY side as it is implicit. But you can add it, if you prefer that syntax.

sum

sum(value) - adds values ignoring missing data (e.g NULL values).

Parameters

  • value is any numeric value.

Return value

Return value type is the same as the type of the argument.

Examples

Sum all quantitiesDemo this query
SELECT sum(quantity) FROM fx_trades
WHERE timestamp IN '$today';
Sum quantities by symbolDemo this query
SELECT symbol, sum(quantity) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

Overflow

sum does not perform overflow check. To avoid overflow, you can cast the argument to wider type.

Cast as long to avoid overflow
SELECT sum(cast(a AS LONG)) FROM my_table;

See also

  • ksum - Kahan compensated sum for floating-point precision
  • nsum - Neumaier sum for floating-point precision
  • avg - Arithmetic mean

twap

twap(price, timestamp) - Calculates the time-weighted average price (TWAP) using step-function (forward-fill) integration: each observed price is assumed to persist until the next observation, and the TWAP is the area under this step function divided by the total time span. Unlike VWAP, which weights by volume, TWAP gives equal weight to every time interval — useful for execution benchmarking, algorithmic trading, and fair value reference in illiquid markets. See the TWAP cookbook recipe for practical examples.

TWAP=i=1n1pi(ti+1ti)tnt1\text{TWAP} = \frac{\sum_{i=1}^{n-1} p_i \cdot (t_{i+1} - t_i)}{t_n - t_1}

Where:

  • pip_i is the price at observation ii
  • tit_i is the timestamp at observation ii

If all observations share the same timestamp, the function falls back to a simple arithmetic mean.

If the price is NULL, that observation is skipped.

If the timestamp is NULL, that observation is skipped.

If there are no valid observations, the result is NULL.

Supports SAMPLE BY with FILL modes.

Parameters

  • price is any numeric value.
  • timestamp is a timestamp value. This is typically the table's designated timestamp but can be any timestamp column.

Return value

Return value type is double.

Examples

TWAP of trade prices for a single symbolDemo this query
SELECT twap(price, timestamp)
FROM trades
WHERE symbol = 'BTC-USDT'
AND timestamp IN '$yesterday';
twap
80636.70
TWAP per symbolDemo this query
SELECT symbol, twap(price, timestamp)
FROM trades
WHERE timestamp IN '$yesterday';
symboltwap
ETH-BTC0.0287
BTC-USDT80636.70
SOL-BTC0.0010
ADA-USDT0.2650
AVAX-USDT9.5341
......
Hourly TWAPDemo this query
SELECT timestamp, symbol, twap(price, timestamp)
FROM trades
WHERE symbol IN ('BTC-USDT', 'ETH-USDT')
AND timestamp IN '$yesterday'
SAMPLE BY 1h;
timestampsymboltwap
2026-05-07T00:00:00.000000ZBTC-USDT81242.47
2026-05-07T00:00:00.000000ZETH-USDT2344.47
2026-05-07T01:00:00.000000ZBTC-USDT81153.52
2026-05-07T01:00:00.000000ZETH-USDT2328.97
.........

See also

var_pop

var_pop(value) - Calculates the population variance of a set of values. The population variance is a measure of the amount of variation or dispersion of a set of values. A low variance indicates that the values tend to be very close to the mean, while a high variance indicates that the values are spread out over a wider range.

Parameters

  • value is any numeric value.

Return value

Return value type is double.

Examples

Population varianceDemo this query
SELECT var_pop(x)
FROM (SELECT x FROM long_sequence(100));
var_pop
833.25

variance / var_samp

var_samp(value) - Calculates the sample variance of a set of values. The sample variance is a measure of the amount of variation or dispersion of a set of values in a sample from a population. A low variance indicates that the values tend to be very close to the mean, while a high variance indicates that the values are spread out over a wider range.

variance() is an alias for var_samp.

Parameters

  • value is any numeric value.

Return value

Return value type is double.

Examples

Sample varianceDemo this query
SELECT var_samp(x)
FROM (SELECT x FROM long_sequence(100));
var_samp
841.6666666666666

weighted_avg

weighted_avg(value, weight) - Calculates the weighted mean (average) of a set of observations (database rows). It calculates the equivalent of:

xˉw=wixiwi\bar{x}_w = \frac{\sum w_i x_i}{\sum w_i}

Where:

  • xix_i are the values
  • wiw_i are the weights

If the value is NULL, that observation is ignored.

If the weight is NULL or zero, that observation is ignored.

If there are no observations, the result is NULL.

If the weights sum to zero, the result is NULL.

Weights should be non-negative to make sense, but this isn't enforced.

Parameters

  • value is any numeric value.
  • weight is any numeric value.

Return value

Return value type is double.

Examples

Weighted average price by trade quantityDemo this query
SELECT weighted_avg(price, quantity) FROM fx_trades
WHERE timestamp IN '$today';

weighted_stddev

weighted_stddev(value, weight) - Calculates the unbiased weighted standard deviation of a set of observations using reliability weights.

This is an alias for weighted_stddev_rel.

weighted_stddev_freq

weighted_stddev_freq(value, weight) - Calculates the unbiased weighted standard deviation of a set of observations using frequency weights.

A frequency weight represents the number of occurrences of each observation in the dataset. This variant uses the frequency-weighted estimator for the population variance. It calculates the equivalent of:

wixi2(wixi)2wiwi1\sqrt{ \frac{ \sum w_i x_i^2 - \frac{(\sum w_i x_i)^2}{\sum w_i} }{ \sum w_i - 1 } }

Where:

  • xix_i are the values
  • wiw_i are the frequency weights

If the value is NULL, that observation is ignored.

If the weight is NULL or zero, that observation is ignored.

If there are fewer than two observations, the result is NULL.

Weights should be positive integers to make sense, but this isn't enforced.

Weights must not be normalized. If they sum to one, the result is NULL.

If the sum of weights is negative, the result is NULL.

Parameters

  • value is any numeric value.
  • weight is any numeric value representing the frequency weight (typically an integer).

Return value

Return value type is double.

Examples

Weighted standard deviation (frequency) of price by quantityDemo this query
SELECT weighted_stddev_freq(price, quantity) FROM fx_trades
WHERE timestamp IN '$today';
Weighted standard deviation (frequency) by symbolDemo this query
SELECT symbol, weighted_stddev_freq(price, quantity) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

weighted_stddev_rel

weighted_stddev_rel(value, weight) - Calculates the unbiased weighted standard deviation of a set of observations using reliability weights. You can also use the shorthand name weighted_stddev.

A reliability weight represents the "importance" or "trustworthiness" of each observation. This variant uses the reliability-weighted estimator for the population variance. It calculates the equivalent of:

wixi2(wixi)2wiwiwi2wi\sqrt{ \frac{ \sum w_i x_i^2 - \frac{(\sum w_i x_i)^2}{\sum w_i} }{ \sum w_i - \frac{\sum w_i^2}{\sum w_i} } }

Where:

  • xix_i are the values
  • wiw_i are the reliability weights

If the value is NULL, that observation is ignored.

If the weight is NULL or zero, that observation is ignored.

If there are fewer than two observations, the result is NULL.

Weights should be positive to make sense, but this isn't enforced.

If the sum of weights is not positive, the result is NULL.

Parameters

  • value is any numeric value.
  • weight is any numeric value representing the reliability weight.

Return value

Return value type is double.

Examples

Weighted standard deviation of price by quantityDemo this query
SELECT weighted_stddev(price, quantity) FROM fx_trades
WHERE timestamp IN '$today';
Weighted standard deviation by symbolDemo this query
SELECT symbol, weighted_stddev(price, quantity) FROM fx_trades
WHERE timestamp IN '$today'
LIMIT 5;

See also

  • GROUP BY - Group rows for aggregation
  • SAMPLE BY - Time-series aggregation
  • PIVOT - Transform aggregation results from rows to columns