# Conflicts: # src/infi/clickhouse_orm/engines.py # src/infi/clickhouse_orm/models.py # tests/test_database.py # tests/test_engines.py
4.3 KiB
Table Engines
Each model must have an engine instance, used when creating the table in ClickHouse.
The following engines are supported by the ORM:
- TinyLog
- Log
- Memory
- MergeTree / ReplicatedMergeTree
- CollapsingMergeTree / ReplicatedCollapsingMergeTree
- SummingMergeTree / ReplicatedSummingMergeTree
- ReplacingMergeTree / ReplicatedReplacingMergeTree
- Buffer
- Merge
- Distributed
Simple Engines
TinyLog
, Log
and Memory
engines do not require any parameters:
engine = engines.TinyLog()
engine = engines.Log()
engine = engines.Memory()
Engines in the MergeTree Family
To define a MergeTree
engine, supply the date column name and the names (or expressions) for the key columns:
engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'))
You may also provide a sampling expression:
engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'), sampling_expr='intHash32(UserID)')
A CollapsingMergeTree
engine is defined in a similar manner, but requires also a sign column:
engine = engines.CollapsingMergeTree('EventDate', ('CounterID', 'EventDate'), 'Sign')
For a SummingMergeTree
you can optionally specify the summing columns:
engine = engines.SummingMergeTree('EventDate', ('OrderID', 'EventDate', 'BannerID'),
summing_cols=('Shows', 'Clicks', 'Cost'))
For a ReplacingMergeTree
you can optionally specify the version column:
engine = engines.ReplacingMergeTree('EventDate', ('OrderID', 'EventDate', 'BannerID'), ver_col='Version')
Custom partitioning
ClickHouse supports custom partitioning expressions since version 1.1.54310 You can use custom partitioning with any MergeTree family engine. To set custom partitioning:
- skip date_col (first) constructor parameter or fill it with None value
- add name to order_by (second) constructor parameter
- add partition_key parameter. It should be a tuple of expressions, by which partition are built.
Standard partitioning by date column can be added using toYYYYMM(date) function.
Example:
engine = engines.ReplacingMergeTree(order_by=('OrderID', 'EventDate', 'BannerID'), ver_col='Version',
partition_key=('toYYYYMM(EventDate)', 'BannerID'))
Data Replication
Any of the above engines can be converted to a replicated engine (e.g. ReplicatedMergeTree
) by adding two parameters, replica_table_path
and replica_name
:
engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'),
replica_table_path='/clickhouse/tables/{layer}-{shard}/hits',
replica_name='{replica}')
Buffer Engine
A Buffer
engine is only used in conjunction with a BufferModel
.
The model should be a subclass of both models.BufferModel
and the main model.
The main model is also passed to the engine:
class PersonBuffer(models.BufferModel, Person):
engine = engines.Buffer(Person)
Additional buffer parameters can optionally be specified:
engine = engines.Buffer(Person, num_layers=16, min_time=10,
max_time=100, min_rows=10000, max_rows=1000000,
min_bytes=10000000, max_bytes=100000000)
Then you can insert objects into Buffer model and they will be handled by ClickHouse properly:
db.create_table(PersonBuffer)
suzy = PersonBuffer(first_name='Suzy', last_name='Jones')
dan = PersonBuffer(first_name='Dan', last_name='Schwartz')
db.insert([dan, suzy])
Merge Engine
ClickHouse docs
A Merge
engine is only used in conjunction with a MergeModel
.
This table does not store data itself, but allows reading from any number of other tables simultaneously. So you can't insert in it.
Engine parameter specifies re2 (similar to PCRE) regular expression, from which data is selected.
class MergeTable(models.MergeModel):
engine = engines.Merge('^table_prefix')