5.0 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Synchronization
Design motivation
Read here.
Algorithm
- Celery beat schedules django_clickhouse.tasks.clickhouse_auto_synctask every second or near.
- Celery workers execute clickhouse_auto_sync. It searches forClickHouseModelsubclasses which need sync (ifModel.need_sync()method returnsTrue).
- django_clickhouse.tasks.sync_clickhouse_modeltask is scheduled for each- ClickHouseModelwhich needs sync.
- sync_clickhouse_modelsaves sync start time in storage and calls- ClickHouseModel.sync_batch_from_storage()method.
- ClickHouseModel.sync_batch_from_storage():- Gets storage model works with using ClickHouseModel.get_storage()method
- Calls Storage.pre_sync(import_key)for model storage. This may be used to prevent parallel execution with locks or some other operations.
- Gets a list of operations to sync from storage.
- Fetches objects from relational database calling ClickHouseModel.get_sync_objects(operations)method.
- Forms a batch of tuples to insert into ClickHouse using ClickHouseModel.get_insert_batch(import_objects)method.
- Inserts batch of tuples into ClickHouse using ClickHouseModel.insert_batch(batch)method.
- Calls Storage.post_sync(import_key)method to clean up storage after syncing batch. This method also removes synced operations from storage.
- If some exception occurred during execution, Storage.post_sybc_failed(import_key)method is called. Note, that process can be killed without exception, for instance by OOM killer. And this method will not be called.
 
- Gets storage model works with using 
Configuration
Sync configuration can be set globally using django settings.py parameters or redeclared for each ClickHouseModel class.
ClickHouseModel configuration is prior to settings configuration.
Settings configuration
- 
CLICKHOUSE_CELERY_QUEUE 
 Defaults to:'celery'
 A name of a queue, used by celery to plan library sync tasks.
- 
CLICKHOUSE_SYNC_STORAGE 
 Defaults to:'django_clickhouse.storages.RedisStorage'
 An intermediate storage class to use. Can be a string or class.
- 
CLICKHOUSE_SYNC_BATCH_SIZE 
 Defaults to:10000
 Maximum number of operations, fetched by sync process from intermediate storage per sync round.
- 
CLICKHOUSE_SYNC_DELAY 
 Defaults to:5A delay in seconds between two sync rounds start.
ClickHouseModel configuration
Each ClickHouseModel subclass can define sync arguments and methods:
- 
django_model: django.db.models.Model
 Required. Django model this ClickHouseModel class is synchronized with.
- 
django_model_serializer: django.db.models.Model
 Defaults to:django_clickhouse.serializers.Django2ClickHouseModelSerializer
 Serializer class to convert DjangoModel to ClickHouseModel.
- 
sync_enabled: bool
 Defaults to:False. Is sync for this model enabled?
- 
sync_batch_size: int
 Defaults to: CLICKHOUSE_SYNC_BATCH_SIZE
 Maximum number of operations, fetched by sync process from storage per sync round.
- 
sync_delay: float
 Defaults to: CLICKHOUSE_SYNC_DELAY
 A delay in seconds between two sync rounds start.
- 
sync_storage: Union[str, Storage]
 Defaults to: CLICKHOUSE_SYNC_STORAGE
 An intermediate storage class to use. Can be a string or class.
Example:
from django_clickhouse.clickhouse_models import ClickHouseModel
from django_clickhouse.engines import ReplacingMergeTree
from infi.clickhouse_orm import fields 
from my_app.models import User
class ClickHouseUser(ClickHouseModel):
    django_model = User
    sync_enabled = True
    sync_delay = 5
    sync_batch_size = 1000
    id = fields.UInt32Field()
    first_name = fields.StringField()
    birthday = fields.DateField()
    visits = fields.UInt32Field(default=0)
    engine = ReplacingMergeTree('birthday', ('birthday',))
Fail resistance
Fail resistance is based on several points:
- Storage should not loose data in any case. It's not this library goal to keep it stable.
- Data is removed from storage only if import succeeds. Otherwise import attempt is repeated.
- It's recommended to use ReplacingMergeTree or CollapsingMergeTree engines instead of simple MergeTree, so it removes duplicates if batch is imported twice.
- Each ClickHouseModelis synced in separate process. If one model fails, it should not affect other models.