How do I add data to FeatureBase tables using Kafka? These instructions apply to Kafka schemas managed by Confluent Schema Management.
FeatureBase recommends using Confluent Schema Management because it makes it easier to setup Kafka dependencies in a local environment:
Schema registry Apache Kafka Apache Zookeeper The Kafka Confluent ingest process:
streams and reads encoded records from a Kafka topic uses Confluent Schema Management to determine the message schema, destination FeatureBase field data types ingests the data into FeatureBase tables. Table of contents Before you begin Avro schema for Kafka messages {
"namespace": "<namespace>",
"type": "record",
"name": "<name-of-schema>",
"fields": [
<Kafka-Avro-data-types>,
]
}
Storing String, Integer and Decimal fields in Featurebase Tuple Store For Featurebase string
, int
and decimal
type fields the default Bitmap store backend can be overridden and the new Tuple store backend can be specified as the field’s target storage. In avro field definitions a new storage-specifier
attribute can be used to specify the preferred backend for these three types. When no storage specification is provided these types will continute to store in the default Bitmap backend. Refer to the following table for avaialble storage-specifier
options and valid use cases.
Storage Specification Field’s FeatureBase Data Type Avro Field Properties Backend Selection STRING
storage-specifier=”tuplestore”, length=256 This string field will be stored in Tuple store with 256 defined as maximum field length. STRING
mutex=true, quantum=(YMD), storage-specifier=”bitmapstore” This string mutex field will be stored in Bitmap store with time quantum support. STRING
mutex=true, quantum=(YMD), storage-specifier=”tuplestore” This field will be stored in Bitmap store. In this case the storage-specifier=”tuplestore” is invalid and ignored. When either mutex=true
or quantum=(YMD)
is specified the default bitmap backend will be automatically selected to support the mutex and time quantum features. INT
storage-specifier=”tuplestore” This integer field will be stored in Tuple store. DECIMAL
storage-specifier=”tuplestore”, scale=2 This decimal field will be stored in Tuple store with precision set to 2 decimal points.
Mapping Avro and FeatureBase data types The ./molecula-consumer-kafka
and ./molecula-consumer-kafka-delete
CLI commands:
Read the Avro schema from the Confluent Schema Registry Infers the FeatureBase data type for each field as specified in the following tables avro.Array Avro data type Properties FeatureBase Data type Backend avro.Array : avro.String STRINGSET bitmap store avro.Array : avro.Bytes STRINGSET bitmap store avro.Array : avro.Fixed STRINGSET bitmap store avro.Array : avro.Enum STRINGSET bitmap store avro.Array : avro.String quantum=(YMD) STRINGSETQ bitmap store avro.Array : avro.Bytes quantum=(YMD) STRINGSETQ bitmap store avro.Array : avro.Fixed quantum=(YMD) STRINGSETQ bitmap store avro.Array : avro.Enum quantum=(YMD) STRINGSETQ bitmap store avro.Array : avro.Long IDSET bitmap store avro.Array : avro.Long quantum=(YMD) IDSETQ bitmap store
avro.Bytes Avro data type Properties FeatureBase Data type Backend avro.Bytes logicalType=decimal, scale DECIMAL
bitmap store avro.Bytes fieldType=decimal, scale DECIMAL
bitmap store avro.Bytes fieldType=decimal, scale, storage-specifier=”tuplestore” DECIMAL
tuple store avro.Bytes fieldType=recordTime STRINGSETQ
and IDSETQ
bitmap store avro.Bytes STRINGSET
bitmap store avro.Bytes mutex=true STRING
bitmap store avro.Bytes storage-specifier=”tuplestore” STRING
tuple store avro.Bytes quantum=(YMD) STRINGSETQ
bitmap store avro.Bytes quantum=(YMD) STRINGSETQ
bitmap store
avro.boolean Avro data type Properties FeatureBase Data type Backend avro.Boolean BOOL
bitmap store
avro.Double, avro.Float Avro data type Properties FeatureBase Data type Backend avro.Double, avro.Float scale DECIMAL
bitmap store avro.Double, avro.Float scale, storage-specifier=”tuplestore” DECIMAL
tuple store
avro.Enum Avro data type Properties FeatureBase Data type Backend avro.Enum STRING
bitmap store
avro.Int, avro.Long Avro data type Properties FeatureBase Data type Backend avro.Int, avro.Long fieldType=id IDSET bitmap store avro.Int, avro.Long fieldType=id, mutex=false ID bitmap store avro.Int, avro.Long fieldType=id, quantum=(YMD) IDSETQ bitmap store avro.Int, avro.Long fieldType=int, min, max INT bitmap store avro.Int, avro.Long fieldType=int, min, max, storage-specifier=”tuplestore” INT tuple store
avro.String Avro data type Properties FeatureBase Data type Backend avro.String STRINGSET bitmap store avro.String mutex=true STRING bitmap store avro.String quantum=(YMD) STRINGSETQ bitmap store avro.String storage-specifier=”tuplestore”, length STRING tuple store
avro.Union Avro data type Properties FeatureBase Data type avro.Union Supports one or two members (if two, one must be avro.NULL)
Not supported in FeatureBase Avro data type Properties FeatureBase Data type avro.Map NOT SUPPORTED avro.Null NOT SUPPORTED avro.Record NOT SUPPORTED avro.Recursive NOT SUPPORTED
Kafka Avro data type syntax Map Avro field data types and property key-value pairs to determine the FeatureBase record data type
BOOL Kafka Avro fields Description {"name": "bool_bool", "type": "boolean"}
FeatureBase Bool from Avro Boolean
DECIMAL Kafka Avro fields Description {"name": "decimal_float", "type": "float", "fieldType": "decimal", "scale": 2}
FeatureBase Decimal from Avro Float
ID Kafka Avro fields Description {"name": "id_long", "type": "long", "mutex": true, "fieldType": "id"}
FeatureBase ID from Avro Long {"name": "id_int", "type": "int", "mutex": true, "fieldType": "id"}
FeatureBase ID from Avro int
IDSET Kafka Avro fields Description {"name": "idset_int", "type": "int", "fieldType": "id"}
FeatureBase IDSET from Avro Int {"name": "idset_intarray", "type": {"type": "array", "items": "int"}}
FeatureBase IDSET from Avro Int Array
IDSETQ STRINGSETQ
Avro strings require a matching RecordTime
field in the Avro Schema. Examples are provided below.
Kafka Avro string Description Required in Avro schema {"name": "idsetq_int", "type": "int", "fieldType": "id", "quantum": "YMD"}
FeatureBase IDSETQ from Avro Int {"name": "recordtime_bytes", "type": "bytes", "fieldType": "recordTime", "layout": "2006-01-02 15:04:05", "unit": "s"}
{"name": "idsetq_intarray", "type": "array", "items": {"type": "int", "quantum": "YMD"}}
FeatureBase IDSETQ from Avro Int Array {"name": "recordtime_bytes", "type": "bytes", "fieldType": "recordTime", "layout": "2006-01-02 15:04:05", "unit": "s"}
INT Kafka Avro fields Description {"name": "int_int", "type": "int", "fieldType": "int"}
FeatureBase Int from Avro Int
Strings Kafka Avro fields Description Additional {"name": "string_string", "type": "string", "mutex": true }
FeatureBase String from Avro String {"name": "string_bytes", "type": "bytes" , "mutex": true }
FeatureBase String from Avro Bytes {"name": "string_enum", "type": "enum"}
FeatureBase String from Avro Enum {"name": "string_string", "type": ["string", "null"], "mutex": true }
Optional String Ignore missing fields {"name": "stringset_stringarray", "type": [{"type": "array", "items": "string"}, "null"]}
Optional Array of Strings Ignore missing fields
STRINGSET Kafka Avro string Description {"name": "stringset_string", "type": "string"}
FeatureBase StringSet from Avro String {"name": "stringset_bytes", "type": "bytes"}
FeatureBase StringSet from Avro Bytes {"name": "stringset_stringarray", "type": {"type": "array", "items": "string"}}
FeatureBase StringSet from Avro String Array
STRINGSETQ STRINGSETQ
Avro strings require a matching RecordTime
field in the Avro Schema. Examples are provided below.
Kafka Avro string Description Required in Avro schema {“name”: “stringsetq_string”, “type”: “string”, “quantum”: “YMD”} FeatureBase StringSetQ with Day Granularity from Avro String {"name": "recordtime_bytes", "type": "bytes", "fieldType": "recordTime", "layout": "2006-01-02 15:04:05", "unit": "s"}
{“name”: “stringsetq_stringarray”, “type”: “array”, “items”: {“type”: “string”, “quantum”: “YMD”}} FeatureBase StringSetQ with Day Granularity from Avro String Array {"name": "recordtime_bytes", "type": "bytes", "fieldType": "recordTime", "layout": "2006-01-02 15:04:05", "unit": "s"}
TIMESTAMP Kafka Avro string Description Additional {"name": "timestamp_bytes_ts", "type": "bytes", "fieldType": "timestamp", "layout": "2006-01-02 15:04:05", "epoch": "1970-01-01 00:00:00"}
FeatureBase Timestamp from Avro Bytes Expects byte representation of string timestamp {"name": "timestamp_bytes_int", "type": ["bytes", "null"], "fieldType": "timestamp", "unit": "s", "layout": "2006-01-02 15:04:05", "epoch": "1970-01-01 00:00:00"}
FeatureBase Timestamp from Avro Int
Examples Next step