django-clickhouse/docs/models.md
2020-02-07 13:05:19 +05:00

7.0 KiB

Models

Model is a pythonic class representing database table in your code. It also defines an interface (methods) to perform operations on this table and describes its configuration inside framework.

This library operates 2 kinds of models:

  • DjangoModel, describing tables in source relational database (PostgreSQL, MySQL, etc.)
  • ClickHouseModel, describing models in ClickHouse database

In order to distinguish them, I will refer them as ClickHouseModel and DjangoModel in further documentation.

DjangoModel

Django provides a model system to interact with relational databases. In order to perform synchronization we need to "catch" all DML operations on source django model and save information about them in storage. To achieve this, library introduces abstract django_clickhouse.models.ClickHouseSyncModel class. Each model, inherited from ClickHouseSyncModel will automatically save information, needed to sync to storage.
Read synchronization section for more info.

ClickHouseSyncModel saves information about:

  • Model.objects.create(), Model.objects.bulk_create()
  • Model.save(), Model.delete()
  • QuerySet.update(), QuerySet.delete()
  • All queries of django-pg-returning library
  • All queries of django-pg-bulk-update library

You can also combine your custom django manager and queryset using mixins from django_clickhouse.models package:

Important note: Operations are saved in transaction.on_commit(). The goal is avoiding syncing operations, not committed to relational database. But this may also provide bad effect: situation, when transaction is committed, but it hasn't been registered, if something went wrong during registration.

Example:

from django_clickhouse.models import ClickHouseSyncModel
from django.db import models
from datetime import date

class User(ClickHouseSyncModel):
    first_name = models.CharField(max_length=50)
    age = models.IntegerField()
    birthday = models.DateField()

# All operations will be registered to sync with ClickHouse models:
User.objects.create(first_name='Alice', age=16, birthday=date(2003, 6, 1))
User(first_name='Bob', age=17, birthday=date(2002, 1, 1)).save()
User.objects.update(first_name='Candy')

# Custom manager

ClickHouseModel

This kind of model is based on infi.clickhouse_orm Model and represents table in ClickHouse database.

You should define ClickHouseModel subclass for each table you want to access and sync in ClickHouse. Each model should be inherited from django_clickhouse.clickhouse_models.ClickHouseModel. By default, models are searched in clickhouse_models module of each django app. You can change modules name, using setting CLICKHOUSE_MODELS_MODULE

You can read more about creating models and fields here: all capabilities are supported. At the same time, django-clickhouse libraries adds:

Example:

from django_clickhouse.clickhouse_models import ClickHouseModel
from django_clickhouse.engines import MergeTree
from infi.clickhouse_orm import fields
from my_app.models import User


class HeightData(ClickHouseModel):
    django_model = User

    first_name = fields.StringField()
    birthday = fields.DateField()
    height = fields.Float32Field()

    engine = MergeTree('birthday', ('first_name', 'last_name', 'birthday'))


class AgeData(ClickHouseModel):
    django_model = User

    first_name = fields.StringField()
    birthday = fields.DateField()
    age = fields.UInt32Field()

    engine = MergeTree('birthday', ('first_name', 'last_name', 'birthday'))

ClickHouseMultiModel

In some cases you may need to sync single DjangoModel to multiple ClickHouse models. This model gives ability to reduce number of relational database operations. You can read more in sync section.

Example:

from django_clickhouse.clickhouse_models import ClickHouseMultiModel
from my_app.models import User

class MyMultiModel(ClickHouseMultiModel):
    django_model = User
    sub_models = [AgeData, HeightData]

ClickHouseModel namedtuple form

infi.clickhouse_orm stores data rows in special Model objects. It works well on hundreds of records. But when you sync 100k records in a batch, initializing 100k model instances will be slow.
Too optimize this process ClickHouseModel class have get_tuple_class() method. It generates a namedtuple class, with same data fields a model has. Initializing such tuples takes much less time, then initializing Model objects.

Engines

Engine is a way of storing, indexing, replicating and sorting data ClickHouse (docs).
Engine system is based on infi.clickhouse_orm engine system.
This library extends original engine classes as each engine can have it's own synchronization mechanics. Engines are defined in django_clickhouse.engines module.

Currently supported engines (with all infi functionality, more info):

  • MergeTree
  • ReplacingMergeTree
  • SummingMergeTree
  • CollapsingMergeTree

Serializers

Serializer is a class which translates django model instances to namedtuples, inserted into ClickHouse. django_clickhouse.serializers.Django2ClickHouseModelSerializer is used by default in all models. All serializers must inherit this class.

Serializer must implement next interface:

from django_clickhouse.serializers import Django2ClickHouseModelSerializer
from django.db.models import Model as DjangoModel
from typing import *

class CustomSerializer(Django2ClickHouseModelSerializer):
    def __init__(self, model_cls: Type['ClickHouseModel'], fields: Optional[Iterable[str]] = None,
                 exclude_fields: Optional[Iterable[str]] = None, writable: bool = False,
                 defaults: Optional[dict] = None) -> None:
        super().__init__(model_cls, fields=fields, exclude_fields=exclude_fields, writable=writable, defaults=defaults)

    def serialize(self, obj: DjangoModel) -> NamedTuple:
        pass