Skip to main content Link Menu Expand (external link) Document Search Copy Copied

How do I add data to FeatureBase tables using Kafka?

These instructions apply to Kafka schemas managed by Confluent Schema Management.

FeatureBase recommends using Confluent Schema Management because it makes it easier to setup Kafka dependencies in a local environment:

  • Schema registry
  • Apache Kafka
  • Apache Zookeeper

The Kafka Confluent ingest process:

  • streams and reads encoded records from a Kafka topic
  • uses Confluent Schema Management to determine the message schema, destination FeatureBase field data types
  • ingests the data into FeatureBase tables.
Table of contents

Before you begin

Avro schema for Kafka messages

{
    "namespace": "<namespace>",
    "type": "record",
    "name": "<name-of-schema>",
    "fields": [
        <Kafka-Avro-data-types>,
    ]
}

Additional information

Storing String, Integer and Decimal fields in Featurebase Tuple Store

For Featurebase string, int and decimal type fields the default Bitmap store backend can be overridden and the new Tuple store backend can be specified as the field’s target storage. In avro field definitions a new storage-specifier attribute can be used to specify the preferred backend for these three types. When no storage specification is provided these types will continute to store in the default Bitmap backend. Refer to the following table for avaialble storage-specifier options and valid use cases.

Storage Specification

Field’s FeatureBase Data Type Avro Field Properties Backend Selection
STRING storage-specifier=”tuplestore”, length=256 This string field will be stored in Tuple store with 256 defined as maximum field length.
STRING mutex=true, quantum=(YMD), storage-specifier=”bitmapstore” This string mutex field will be stored in Bitmap store with time quantum support.
STRING mutex=true, quantum=(YMD), storage-specifier=”tuplestore” This field will be stored in Bitmap store. In this case the storage-specifier=”tuplestore” is invalid and ignored. When either mutex=true or quantum=(YMD) is specified the default bitmap backend will be automatically selected to support the mutex and time quantum features.
INT storage-specifier=”tuplestore” This integer field will be stored in Tuple store.
DECIMAL storage-specifier=”tuplestore”, scale=2 This decimal field will be stored in Tuple store with precision set to 2 decimal points.

Mapping Avro and FeatureBase data types

The ./molecula-consumer-kafka and ./molecula-consumer-kafka-delete CLI commands:

  • Read the Avro schema from the Confluent Schema Registry
  • Infers the FeatureBase data type for each field as specified in the following tables

avro.Array

Avro data type Properties FeatureBase Data type Backend
avro.Array : avro.String   STRINGSET bitmap store
avro.Array : avro.Bytes   STRINGSET bitmap store
avro.Array : avro.Fixed   STRINGSET bitmap store
avro.Array : avro.Enum   STRINGSET bitmap store
avro.Array : avro.String quantum=(YMD) STRINGSETQ bitmap store
avro.Array : avro.Bytes quantum=(YMD) STRINGSETQ bitmap store
avro.Array : avro.Fixed quantum=(YMD) STRINGSETQ bitmap store
avro.Array : avro.Enum quantum=(YMD) STRINGSETQ bitmap store
avro.Array : avro.Long   IDSET bitmap store
avro.Array : avro.Long quantum=(YMD) IDSETQ bitmap store

avro.Bytes

Avro data type Properties FeatureBase Data type Backend
avro.Bytes logicalType=decimal, scale DECIMAL bitmap store
avro.Bytes fieldType=decimal, scale DECIMAL bitmap store
avro.Bytes fieldType=decimal, scale, storage-specifier=”tuplestore” DECIMAL tuple store
avro.Bytes fieldType=recordTime STRINGSETQ and IDSETQ bitmap store
avro.Bytes   STRINGSET bitmap store
avro.Bytes mutex=true STRING bitmap store
avro.Bytes storage-specifier=”tuplestore” STRING tuple store
avro.Bytes quantum=(YMD) STRINGSETQ bitmap store
avro.Bytes quantum=(YMD) STRINGSETQ bitmap store

avro.boolean

Avro data type Properties FeatureBase Data type Backend
avro.Boolean   BOOL bitmap store

avro.Double, avro.Float

Avro data type Properties FeatureBase Data type Backend
avro.Double, avro.Float scale DECIMAL bitmap store
avro.Double, avro.Float scale, storage-specifier=”tuplestore” DECIMAL tuple store

avro.Enum

Avro data type Properties FeatureBase Data type Backend
avro.Enum   STRING bitmap store

avro.Int, avro.Long

Avro data type Properties FeatureBase Data type Backend
avro.Int, avro.Long fieldType=id IDSET bitmap store
avro.Int, avro.Long fieldType=id, mutex=false ID bitmap store
avro.Int, avro.Long fieldType=id, quantum=(YMD) IDSETQ bitmap store
avro.Int, avro.Long fieldType=int, min, max INT bitmap store
avro.Int, avro.Long fieldType=int, min, max, storage-specifier=”tuplestore” INT tuple store

avro.String

Avro data type Properties FeatureBase Data type Backend
avro.String   STRINGSET bitmap store
avro.String mutex=true STRING bitmap store
avro.String quantum=(YMD) STRINGSETQ bitmap store
avro.String storage-specifier=”tuplestore”, length STRING tuple store

avro.Union

Avro data type Properties FeatureBase Data type
avro.Union   Supports one or two members (if two, one must be avro.NULL)

Not supported in FeatureBase

Avro data type Properties FeatureBase Data type
avro.Map   NOT SUPPORTED
avro.Null   NOT SUPPORTED
avro.Record   NOT SUPPORTED
avro.Recursive   NOT SUPPORTED

Kafka Avro data type syntax

Map Avro field data types and property key-value pairs to determine the FeatureBase record data type

BOOL

Kafka Avro fields Description
{"name": "bool_bool", "type": "boolean"} FeatureBase Bool from Avro Boolean

DECIMAL

Kafka Avro fields Description
{"name": "decimal_float", "type": "float", "fieldType": "decimal", "scale": 2} FeatureBase Decimal from Avro Float

ID

Kafka Avro fields Description
{"name": "id_long", "type": "long", "mutex": true, "fieldType": "id"} FeatureBase ID from Avro Long
{"name": "id_int", "type": "int", "mutex": true, "fieldType": "id"} FeatureBase ID from Avro int

IDSET

Kafka Avro fields Description
{"name": "idset_int", "type": "int", "fieldType": "id"} FeatureBase IDSET from Avro Int
{"name": "idset_intarray", "type": {"type": "array", "items": "int"}} FeatureBase IDSET from Avro Int Array

IDSETQ

STRINGSETQ Avro strings require a matching RecordTime field in the Avro Schema. Examples are provided below.

Kafka Avro string Description Required in Avro schema
{"name": "idsetq_int", "type": "int", "fieldType": "id", "quantum": "YMD"} FeatureBase IDSETQ from Avro Int {"name": "recordtime_bytes", "type": "bytes", "fieldType": "recordTime", "layout": "2006-01-02 15:04:05", "unit": "s"}
{"name": "idsetq_intarray", "type": "array", "items": {"type": "int", "quantum": "YMD"}} FeatureBase IDSETQ from Avro Int Array {"name": "recordtime_bytes", "type": "bytes", "fieldType": "recordTime", "layout": "2006-01-02 15:04:05", "unit": "s"}

INT

Kafka Avro fields Description
{"name": "int_int", "type": "int", "fieldType": "int"} FeatureBase Int from Avro Int

Strings

Kafka Avro fields Description Additional
{"name": "string_string", "type": "string", "mutex": true } FeatureBase String from Avro String  
{"name": "string_bytes", "type": "bytes" , "mutex": true } FeatureBase String from Avro Bytes  
{"name": "string_enum", "type": "enum"} FeatureBase String from Avro Enum  
{"name": "string_string", "type": ["string", "null"], "mutex": true } Optional String Ignore missing fields
{"name": "stringset_stringarray", "type": [{"type": "array", "items": "string"}, "null"]} Optional Array of Strings Ignore missing fields

STRINGSET

Kafka Avro string Description
{"name": "stringset_string", "type": "string"} FeatureBase StringSet from Avro String
{"name": "stringset_bytes", "type": "bytes"} FeatureBase StringSet from Avro Bytes
{"name": "stringset_stringarray", "type": {"type": "array", "items": "string"}} FeatureBase StringSet from Avro String Array

STRINGSETQ

STRINGSETQ Avro strings require a matching RecordTime field in the Avro Schema. Examples are provided below.

Kafka Avro string Description Required in Avro schema
{“name”: “stringsetq_string”, “type”: “string”, “quantum”: “YMD”} FeatureBase StringSetQ with Day Granularity from Avro String {"name": "recordtime_bytes", "type": "bytes", "fieldType": "recordTime", "layout": "2006-01-02 15:04:05", "unit": "s"}
{“name”: “stringsetq_stringarray”, “type”: “array”, “items”: {“type”: “string”, “quantum”: “YMD”}} FeatureBase StringSetQ with Day Granularity from Avro String Array {"name": "recordtime_bytes", "type": "bytes", "fieldType": "recordTime", "layout": "2006-01-02 15:04:05", "unit": "s"}

TIMESTAMP

Kafka Avro string Description Additional
{"name": "timestamp_bytes_ts", "type": "bytes", "fieldType": "timestamp", "layout": "2006-01-02 15:04:05", "epoch": "1970-01-01 00:00:00"} FeatureBase Timestamp from Avro Bytes Expects byte representation of string timestamp
{"name": "timestamp_bytes_int", "type": ["bytes", "null"], "fieldType": "timestamp", "unit": "s", "layout": "2006-01-02 15:04:05", "epoch": "1970-01-01 00:00:00"} FeatureBase Timestamp from Avro Int  

Examples

Next step

Further information