infi.clickhouse_orm/docs/table_engines.md

138 lines
4.9 KiB
Markdown
Raw Permalink Normal View History

2017-04-26 15:47:02 +03:00
Table Engines
=============
2020-05-28 19:18:10 +03:00
See: [ClickHouse Documentation](https://clickhouse.tech/docs/en/engines/table-engines/)
2017-04-26 15:47:02 +03:00
Each model must have an engine instance, used when creating the table in ClickHouse.
The following engines are supported by the ORM:
- TinyLog
- Log
- Memory
- MergeTree / ReplicatedMergeTree
- CollapsingMergeTree / ReplicatedCollapsingMergeTree
- SummingMergeTree / ReplicatedSummingMergeTree
- ReplacingMergeTree / ReplicatedReplacingMergeTree
- Buffer
- Merge
2017-11-21 14:30:25 +03:00
- Distributed
Simple Engines
--------------
`TinyLog`, `Log` and `Memory` engines do not require any parameters:
2020-05-28 19:18:10 +03:00
engine = TinyLog()
2020-05-28 19:18:10 +03:00
engine = Log()
2018-04-21 13:48:00 +03:00
2020-05-28 19:18:10 +03:00
engine = Memory()
Engines in the MergeTree Family
-------------------------------
2017-04-26 15:47:02 +03:00
To define a `MergeTree` engine, supply the date column name and the names (or expressions) for the key columns:
2020-05-28 19:18:10 +03:00
engine = MergeTree('EventDate', ('CounterID', 'EventDate'))
2017-04-26 15:47:02 +03:00
You may also provide a sampling expression:
2020-05-28 19:18:10 +03:00
engine = MergeTree('EventDate', ('CounterID', 'EventDate'), sampling_expr=F.intHash32(UserID))
2017-04-26 15:47:02 +03:00
A `CollapsingMergeTree` engine is defined in a similar manner, but requires also a sign column:
2020-05-28 19:18:10 +03:00
engine = CollapsingMergeTree('EventDate', ('CounterID', 'EventDate'), 'Sign')
2017-04-26 15:47:02 +03:00
For a `SummingMergeTree` you can optionally specify the summing columns:
2020-05-28 19:18:10 +03:00
engine = SummingMergeTree('EventDate', ('OrderID', 'EventDate', 'BannerID'),
summing_cols=('Shows', 'Clicks', 'Cost'))
2017-04-26 15:47:02 +03:00
For a `ReplacingMergeTree` you can optionally specify the version column:
2020-05-28 19:18:10 +03:00
engine = ReplacingMergeTree('EventDate', ('OrderID', 'EventDate', 'BannerID'), ver_col='Version')
2017-04-26 15:47:02 +03:00
### Custom partitioning
2020-05-28 19:18:10 +03:00
ClickHouse supports [custom partitioning](https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/custom-partitioning-key/) expressions since version 1.1.54310
2018-04-21 13:48:00 +03:00
You can use custom partitioning with any `MergeTree` family engine.
To set custom partitioning:
2018-04-21 13:48:00 +03:00
* Instead of specifying the `date_col` (first) constructor parameter, pass a tuple of field names or expressions in the `order_by` (second) constructor parameter.
* Add `partition_key` parameter. It should be a tuple of expressions, by which partitions are built.
Standard monthly partitioning by date column can be specified using the `toYYYYMM(date)` function.
Example:
2018-04-21 13:48:00 +03:00
2020-05-28 19:18:10 +03:00
engine = ReplacingMergeTree(order_by=('OrderID', 'EventDate', 'BannerID'), ver_col='Version',
partition_key=(F.toYYYYMM(EventDate), 'BannerID'))
2019-07-15 11:01:45 +03:00
### Primary key
2020-05-28 19:18:10 +03:00
ClickHouse supports [custom primary key](https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/mergetree/#primary-keys-and-indexes-in-queries) expressions since version 1.1.54310
2019-07-15 11:01:45 +03:00
You can use custom primary key with any `MergeTree` family engine.
To set custom partitioning add `primary_key` parameter. It should be a tuple of expressions, by which partitions are built.
By default primary key is equal to order_by expression
Example:
2020-05-28 19:18:10 +03:00
engine = ReplacingMergeTree(order_by=('OrderID', 'EventDate', 'BannerID'), ver_col='Version',
partition_key=(F.toYYYYMM(EventDate), 'BannerID'), primary_key=('OrderID',))
2019-07-15 11:01:45 +03:00
### Data Replication
2017-04-26 15:47:02 +03:00
Any of the above engines can be converted to a replicated engine (e.g. `ReplicatedMergeTree`) by adding two parameters, `replica_table_path` and `replica_name`:
2017-04-26 15:47:02 +03:00
2020-05-28 19:18:10 +03:00
engine = MergeTree('EventDate', ('CounterID', 'EventDate'),
replica_table_path='/clickhouse/tables/{layer}-{shard}/hits',
replica_name='{replica}')
Buffer Engine
2017-04-26 15:47:02 +03:00
-------------
A `Buffer` engine is only used in conjunction with a `BufferModel`.
2020-05-28 19:18:10 +03:00
The model should be a subclass of both `BufferModel` and the main model.
The main model is also passed to the engine:
2017-04-26 15:47:02 +03:00
2020-05-28 19:18:10 +03:00
class PersonBuffer(BufferModel, Person):
2017-04-26 15:47:02 +03:00
2020-05-28 19:18:10 +03:00
engine = Buffer(Person)
2017-04-26 15:47:02 +03:00
Additional buffer parameters can optionally be specified:
2020-05-28 19:18:10 +03:00
engine = Buffer(Person, num_layers=16, min_time=10,
max_time=100, min_rows=10000, max_rows=1000000,
min_bytes=10000000, max_bytes=100000000)
2017-04-26 15:47:02 +03:00
Then you can insert objects into Buffer model and they will be handled by ClickHouse properly:
db.create_table(PersonBuffer)
suzy = PersonBuffer(first_name='Suzy', last_name='Jones')
dan = PersonBuffer(first_name='Dan', last_name='Schwartz')
db.insert([dan, suzy])
2018-04-21 13:48:00 +03:00
Merge Engine
-------------
2020-05-28 19:18:10 +03:00
[ClickHouse docs](https://clickhouse.tech/docs/en/operations/table_engines/merge/)
2018-04-21 15:23:00 +03:00
2018-04-21 13:48:00 +03:00
A `Merge` engine is only used in conjunction with a `MergeModel`.
This table does not store data itself, but allows reading from any number of other tables simultaneously. So you can't insert in it.
Engine parameter specifies re2 (similar to PCRE) regular expression, from which data is selected.
2020-05-28 19:18:10 +03:00
class MergeTable(MergeModel):
engine = Merge('^table_prefix')
2017-04-26 15:47:02 +03:00
2017-04-28 13:44:45 +03:00
---
[<< Field Types](field_types.md) | [Table of Contents](toc.md) | [Schema Migrations >>](schema_migrations.md)