2.3 KiB
Design motivation
Separate from django database setting, QuerySet and migration system
ClickHouse SQL and DML language is near to standard, but does not follow it exactly (docs).
As a result, it can not be easily integrated into django query subsystem as it expects databases to support:
- Transactions.
- INNER/OUTER JOINS by condition.
- Full featured updates and deletes.
- Per database replication (ClickHouse has per table replication)
- Other features, not supported in ClickHouse.
In order to have more functionality, infi.clickhouse-orm is used as base library for databases, querysets and migrations. The most part of it is compatible and can be used without any changes.
Sync over intermediate storage
This library has several goals which lead to intermediate storage:
- Fail resistant import, does not matter what the fail reason is: ClickHouse fail, network fail, killing import process by system (OOM, for instance).
- ClickHouse does not like single row inserts: docs. So it's worth batching data somewhere before inserting it. ClickHouse provide BufferEngine for this, but it can loose data if ClickHouse fails - and no one will now about it.
- Better scalability. Different intermediate storages may be implemented in the future, based on databases, queue systems or even BufferEngine.
Replication and routing
In primitive cases people just have single database or cluster with same tables on each replica. But as ClickHouse has per table replication a more complicated structure can be built:
- Model A is stored on servers 1 and 2
- Model B is stored on servers 2, 3 and 5
- Model C is stored on servers 1, 3 and 4
Moreover, migration operations in ClickHouse can also be auto-replicated (ALTER TABLE
, for instance) or not (CREATE TABLE
).
In order to make replication scheme scalable:
- Each model has it's own read / write / migrate routing configuration.
- You can use router like django does to set basic routing rules for all models or model groups.