PQL DISTINCT()
The Distinct() query returns the distinct values in a field for a set of records.
Distinct() can always be used as a row call. This, in conjunction with the index argument, gives users the ability to query across indexes. Use caution to ensure you get the expected behavior. The examples below outline some considerations.
Call Definitioin
Distinct(ROW_CALL, field=FIELD, index=INDEX)
Mandatory Arguments
field: a string, FIELD, which is the name of some field in INDEX or a string which is the name of some field in the primary index (i.e. the index in square brackets before the query) – the latter is the default when theindexargument isn’t supplied. The distinct values returned will be the distinct values from this field.
Optional Arguments
ROW_CALL: the row call use to filter records. Only the values of records returned by this row call will be used in the computation ofDistinct(). Note this row call is run against the primary index (i.e. the index in square brackets) when there is not anindexargument supplied. It is run againstINDEXwhen theindexargument is supplied.index: INDEX is the name of the index that FIELD exists in. When this argument is supplied, it is the index ROW_CALL is executed against. When it isn’t supplied, FIELD should exist in the primary index andROW_CALLwill be executed on the primary index (i.e. the index in square brackets)
Returns
- if FIELD is an unkeyed Set, Mutex, or Time field:
- a list of field values (ID or unsigned integers) are return under the “columns” key
- if FIELD is a keyed Set, Mutex, of Time field:
- a list of field values (strings) are returned under the “keys” key
- a list of ID / unsigned integers are returned uner the “columns” key
- the value of the “columns” key is essentially erroneous data; but note, these are the values that will be used if you’re attempting you use
Distinct()as the row call. In this case, you might get some unexpected behavior.
- if FIELD is an Int, Decimal, or Timestamp field:
- similar to the result of a row call,
Distinct()including a list of “columns”. Dissimilarly, these results are separated into two groups – one for negative values and one for positive values. This structure is necessary to accommodate signed values that are represented with bitsets.
- similar to the result of a row call,
Examples
Data:
Index: users (non-keyed index)
_id | age (Int) | zipcode (Set)
-----+-----------+---------
0 | 18 | 44444
1 | 55 | 55555
2 | 30 | 11111
3 | 35 | 22222
4 | 22 | 33333
Index: transactions (non-keyed index)
_id | amount (Decimal) | category (Keyed Set) | userid (Non-keyed Set)
-----+------------------+----------------------+------------------------
123 | 99.99 | Tools | 0
124 | 34.99 | Hardware | 1
125 | 129.99 | Tools | 1
126 | 29.99 | Plants | 0
127 | 12.99 | Food | 0
128 | 399.99 | Drugs | 4
129 | 29.99 | Plants | 2
130 | 34.99 | Hardware | 3
Note: In this case, the userid field in transactions was ingested as an ID field (non-keyed mutex in this case).
Example 1
What are the distinct ages in the users index?
Query
[users]Distinct(field=age)
Tabular Response
----
18
22
30
35
55
HTTP Response
{
"results": [
{
"neg": {
"columns": []
},
"pos": {
"columns": [
18,
22,
30,
35,
55
]
}
}
]
}
Explanation
Distinct() returns the unique values in the age field of the users index. Note these distinct values are separated into negative and positive becuase age is an Int field.
Example 2
What are the distinct prices of transactions?
Query
[transactions]Distinct(field=amount)
Tabular Response
-------
1299
2999
3499
9999
12999
39999
HTTP Response
{
"results": [
{
"neg": {
"columns": []
},
"pos": {
"columns": [
1299,
2999,
3499,
9999,
12999,
39999
]
}
}
]
}
Explanation
Distinct() returns the unique values in the amount field of the transactions index. Note these distinct values are separated into negative and positive because age is an Decimal field.
Example 3
What are the distinct field values in the category field in the transactions index?
Query
[transactions]Distinct(field=category)
Tabular Response
_id
----------
Tools
Hardware
Plants
Food
Drugs
HTTP Response
{
"results": [
{
"columns": [
1,
2,
3,
4,
5
],
"keys": [
"Tools",
"Hardware",
"Plants",
"Food",
"Drugs"
]
}
]
}
Explanation
Distinct() returns the unique values in the category field of the transactions index. Note the “columns” key returned by Distinct() in the HTTP response. In some sense, this is erroneous data. If this query is used as a row call the “columns” key values will be interpreted as record IDs. However, in this case 1, 2, 3, 4, 5 are not referencing record IDs.
Example 4
How many distinct field value are there in the category field in the transactions index?
Query
[transaction]Count(
Distinct(field=category)
)
Tabular Response
count
-------
5
{
"results": [
5
]
}
Explanation
Count(Distinct()) can be used to count the number of unique field values in a field. In this case, there are 5.
Example 5
Which users have bought tools and is older than 25?
Query
[users]Intersect(
Row(age > 25),
Distinct(Row(category=Tools), field=userid, index=transactions)
)
Tabular Response
_id
-----
1
HTTP Response
{
"results": [
{
"columns": [
1
]
}
]
}
Explanation
Row(age > 25) return the users that are older than 25 - [1, 2, 3]. Distinct() is returning unique userids from the transactions index where the transaction also had category=Tools – [0, 1]. The two sets are intersected which gives [1]. This is an example of a query that spans multiple indexes. It works because the userid field in the transactions index have a soft reference to the records IDs in the users index. “Soft reference” here means there is not an actual contraint ensuring all the userid values in the transactions are actually record IDs in the users index. That would be enforced outside of FeatureBase in this example.