Skip to main content Link Menu Expand (external link) Document Search Copy Copied

An overview of FeatureBase bitmaps

FeatureBase converts data to base-2 (binary) in two types of bitmap:

  • Equality-encoded bitmaps for non-integer values, or
  • Bit-sliced bitmaps which slice integer values into a single bitmap for each power of two
Table of contents

Before you begin

Why use bitmaps for data storage?

Bitmaps make updates and queries faster because the data is encoded as 1 or 0.

Faster updates

Bitmap updates in FeatureBase are faster for two reasons.

Bitmap type Update description
Equality-encoded FeatureBase can directly update a value encoded as a standard bitmap without needing to traverse other values in the structure
Bit-slice Updates to bit-slice bitmaps mean flipping or adding bits rather than altering the entire value

Faster queries

There are two limitations to every data query:

  • Latency, where the structure and encoding cause delays returning results
  • Concurrency, where queries slow down because multiple users are accessing the same data at the same time

FeatureBase addresses these limitations as follows:

Query Limitation Solution
Multiple Concurrency Lower latency queries mean data is accessed for shorter times, which reduces the number of connections and concurrency issues
Boolean queries Latency Equality-encoded bitmaps mean Boolean queries such as WHERE and OR are substantially faster because data relationships are represented as 1 (they exist) or 0 (they don’t exist)
SELECT specific values Latency Queries on equality-encoded and bit-slice data can directly and sequentially access specified values without needing to traverse all other values in a database table
Range queries Latency Integer values are bit-sliced into individual bitmaps for each power of two. This means range queries can combine the specific bitmaps instead of working with integers in a traditional row/column format

What are the drawbacks of Bitmaps?

Bitmaps have two main issues:

  • low-cardinality data duplication
  • data storage overheads

Low cardinality data duplication

FeatureBase overcomes low-cardinality issues with four unique data types suitable for integer or string values.

Data storage overheads

Encoding data as base-2 equality-encoded or bit-slice bitmaps makes queries faster but incurs storage overheads because the number of bitmaps scale:

  • with the number of values, and
  • the cardinality of those values

For example, the average storage overheads for a 10,000 value dataset will be as follows:

Database Dataset saved as Average storage overhead (KB)
RDBMS Row and column based structure 20480 - 30720
FeatureBase * equality-encoded bitmaps
* Bit-slice bitmaps
1280000

FeatureBase overcomes this issue by compressing all bitmap data using Roaring Bitmap Format, based on Roaring Bitmaps.

What bitmaps are created for my data?

Data is converted to bitmaps based on the destination data type:

User data FeatureBase data type Bitmap type
Boolean Bool Equality-encoded bitmaps
Floating point Decimal Bit-sliced
Unsigned integer ID Equality-encoded bitmaps
Signed Integer Integer Bit-sliced
Alphanumeric String Equality-encoded bitmaps
Date and time Timestamp Bit-sliced
Low cardinality Set Equality-encoded bitmaps
Low cardinality keyed to date/time values SetQ Equality-encoded bitmaps

How does FeatureBase store bitmaps?

At a high level FeatureBase bitmaps are stored as Shards, made up of:

  • a Roaring Bitmap Format (RBF) data file
  • a Write Ahead Log (WAL) file

FeatureBase stores shards on disk in the following directories:

FeatureBase Product Directory Additional information
Cloud etc  
Community pilosa FeatureBase Community data directory

Are column names converted to bitmaps?

Column names are saved to disk in the Roaring Bitmap Format data file.

Further information