Sync performance

Every real life system may have its own performance problems. They depend on:

You ClickHouse servers configuration
Number of ClickHouse instances in your cluster
Your data formats
Import speed
Network
etc

I recommend to use monitoring in order to understand where is the bottle neck and act accordingly.

This chapter gives a list of known problems which can slow down your import.

ClickHouse tuning

Read this doc and tune it both for read and write.

ClickHouse cluster

As ClickHouse is a multimaster database, you can import and read from any node when you have a cluster. In order to read and import to multiple nodes you can use CHProxy or add multiple databases to routing configuration.

CollapsingMergeTree engine and previous versions

In order to reduce number of stored data in intermediate storage, this library doesn't store old versions of data on update or delete. Another point is that getting previous data versions from relational storages is a hard operation. Engines like CollapsingMergeTree get old versions from ClickHouse:

Using version_col if it is set in engine's parameters. This is a special field which stores incremental row versions and is filled by the library. It should be of any unsigned integer type (depending on how many row versions you may have).
Using FINAL query modification. This way is much more slow, but doesn't require additional column.

Know your data

In common case library user uses python types to form ClickHouse data. Library is responsible for converting this data into format ClickHouse expects to receive. This leads to great number of convert operations when you import data in big batches. In order to reduce this time, you can:

Set MyClickHouseModel.sync_formatted_tuples to True
Override MyClickHouseModel.get_insert_batch(, import_objects: Iterable[DjangoModel]) method:
It should get cls.get_tuple_class() and yield (it is a generator) so it generates tuples of string values, already prepared to insert into ClickHouse.
Important note: ClickHouseModel.get_insert_batch(...) can perform additional functionality depending on model engine. Be careful.

2.5 KiB Raw Permalink Blame History

Sync performance

ClickHouse tuning

ClickHouse cluster

CollapsingMergeTree engine and previous versions

Know your data

2.5 KiB

Raw Permalink Blame History