infi.clickhouse_orm/docs/table_engines.md
M1ha de9f64cd3a Added Merge engine
1) Divided readonly and system flags of Field model. Readonly flag only restricts insert operations, while system flag restricts also create and drop table operations

2) Added Merge engine and tests for it
3) Added docs for Merge engine
4) Added opportunity to make Field readonly. This is useful for "virtual" columns (https://clickhouse.yandex/docs/en/single/index.html#virtual-columns)
2017-09-07 17:44:27 +05:00

3.5 KiB

Table Engines

See: ClickHouse Documentation

Each model must have an engine instance, used when creating the table in ClickHouse.

The following engines are supported by the ORM:

  • TinyLog
  • Log
  • Memory
  • MergeTree / ReplicatedMergeTree
  • CollapsingMergeTree / ReplicatedCollapsingMergeTree
  • SummingMergeTree / ReplicatedSummingMergeTree
  • ReplacingMergeTree / ReplicatedReplacingMergeTree
  • Buffer
  • Merge

Simple Engines

TinyLog, Log and Memory engines do not require any parameters:

engine = engines.TinyLog()

engine = engines.Log()

engine = engines.Memory()

Engines in the MergeTree Family

To define a MergeTree engine, supply the date column name and the names (or expressions) for the key columns:

engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'))

You may also provide a sampling expression:

engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'), sampling_expr='intHash32(UserID)')

A CollapsingMergeTree engine is defined in a similar manner, but requires also a sign column:

engine = engines.CollapsingMergeTree('EventDate', ('CounterID', 'EventDate'), 'Sign')

For a SummingMergeTree you can optionally specify the summing columns:

engine = engines.SummingMergeTree('EventDate', ('OrderID', 'EventDate', 'BannerID'),
                                  summing_cols=('Shows', 'Clicks', 'Cost'))

For a ReplacingMergeTree you can optionally specify the version column:

engine = engines.ReplacingMergeTree('EventDate', ('OrderID', 'EventDate', 'BannerID'), ver_col='Version')

Data Replication

Any of the above engines can be converted to a replicated engine (e.g. ReplicatedMergeTree) by adding two parameters, replica_table_path and replica_name:

engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'),
                           replica_table_path='/clickhouse/tables/{layer}-{shard}/hits',
                           replica_name='{replica}')

Buffer Engine

A Buffer engine is only used in conjunction with a BufferModel. The model should be a subclass of both models.BufferModel and the main model. The main model is also passed to the engine:

class PersonBuffer(models.BufferModel, Person):

    engine = engines.Buffer(Person)

Additional buffer parameters can optionally be specified:

    engine = engines.Buffer(Person, num_layers=16, min_time=10, 
                            max_time=100, min_rows=10000, max_rows=1000000, 
                            min_bytes=10000000, max_bytes=100000000)

Then you can insert objects into Buffer model and they will be handled by ClickHouse properly:

db.create_table(PersonBuffer)
suzy = PersonBuffer(first_name='Suzy', last_name='Jones')
dan = PersonBuffer(first_name='Dan', last_name='Schwartz')
db.insert([dan, suzy])

Merge Engine

ClickHouse docs
A Merge engine is only used in conjunction with a MergeModel.
This table does not store data itself, but allows reading from any number of other tables simultaneously. So you can't insert in it. Engine parameter specifies re2 (similar to PCRE) regular expression, from which data is selected.

class MergeTable(models.MergeModel):
    engine = engines.Merge('^table_prefix')

<< Field Types | Table of Contents | Schema Migrations >>