An overview of FeatureBase bitmaps
FeatureBase converts data to base-2 (binary) in two types of bitmap:
- Equality-encoded bitmaps for non-integer values, or
- Bit-sliced bitmaps which slice integer values into a single bitmap for each power of two
Table of contents
Before you begin
Why use bitmaps for data storage?
Bitmaps make updates and queries faster because the data is encoded as 1
or 0
.
Faster updates
Bitmap updates in FeatureBase are faster for two reasons.
Bitmap type | Update description |
---|---|
Equality-encoded | FeatureBase can directly update a value encoded as a standard bitmap without needing to traverse other values in the structure |
Bit-slice | Updates to bit-slice bitmaps mean flipping or adding bits rather than altering the entire value |
Faster queries
There are two limitations to every data query:
- Latency, where the structure and encoding cause delays returning results
- Concurrency, where queries slow down because multiple users are accessing the same data at the same time
FeatureBase addresses these limitations as follows:
Query | Limitation | Solution |
---|---|---|
Multiple | Concurrency | Lower latency queries mean data is accessed for shorter times, which reduces the number of connections and concurrency issues |
Boolean queries | Latency | Equality-encoded bitmaps mean Boolean queries such as WHERE and OR are substantially faster because data relationships are represented as 1 (they exist) or 0 (they don’t exist) |
SELECT specific values | Latency | Queries on equality-encoded and bit-slice data can directly and sequentially access specified values without needing to traverse all other values in a database table |
Range queries | Latency | Integer values are bit-sliced into individual bitmaps for each power of two. This means range queries can combine the specific bitmaps instead of working with integers in a traditional row/column format |
What are the drawbacks of Bitmaps?
Bitmaps have two main issues:
- low-cardinality data duplication
- data storage overheads
Low cardinality data duplication
FeatureBase overcomes low-cardinality issues with four unique data types suitable for integer
or string
values.
Data storage overheads
Encoding data as base-2 equality-encoded or bit-slice bitmaps makes queries faster but incurs storage overheads because the number of bitmaps scale:
- with the number of values, and
- the cardinality of those values
For example, the average storage overheads for a 10,000 value dataset will be as follows:
Database | Dataset saved as | Average storage overhead (KB) |
---|---|---|
RDBMS | Row and column based structure | 20480 - 30720 |
FeatureBase | * equality-encoded bitmaps * Bit-slice bitmaps | 1280000 |
FeatureBase overcomes this issue by compressing all bitmap data using Roaring Bitmap Format, based on Roaring Bitmaps.
What bitmaps are created for my data?
Data is converted to bitmaps based on the destination data type:
User data | FeatureBase data type | Bitmap type |
---|---|---|
Boolean | Bool | Equality-encoded bitmaps |
Floating point | Decimal | Bit-sliced |
Unsigned integer | ID | Equality-encoded bitmaps |
Signed Integer | Integer | Bit-sliced |
Alphanumeric | String | Equality-encoded bitmaps |
Date and time | Timestamp | Bit-sliced |
Low cardinality | Set | Equality-encoded bitmaps |
Low cardinality keyed to date/time values | SetQ | Equality-encoded bitmaps |
How does FeatureBase store bitmaps?
At a high level FeatureBase bitmaps are stored as Shards, made up of:
- a Roaring Bitmap Format (RBF) data file
- a Write Ahead Log (WAL) file
FeatureBase stores shards on disk in the following directories:
FeatureBase Product | Directory | Additional information |
---|---|---|
Cloud | etc | |
Community | pilosa | FeatureBase Community data directory |
Are column names converted to bitmaps?
Column names are saved to disk in the Roaring Bitmap Format data file.
Further information
- Learn about equality-encoded bitmaps
- Learn about bit-sliced bitmaps
- Learn about importing data to FeatureBase