mirror of
https://github.com/carrotquest/django-clickhouse.git
synced 2024-11-30 21:13:44 +03:00
154 lines
7.0 KiB
Markdown
154 lines
7.0 KiB
Markdown
# Models
|
|
Model is a pythonic class representing database table in your code.
|
|
It also defines an interface (methods) to perform operations on this table
|
|
and describes its configuration inside framework.
|
|
|
|
This library operates 2 kinds of models:
|
|
* DjangoModel, describing tables in source relational database (PostgreSQL, MySQL, etc.)
|
|
* ClickHouseModel, describing models in [ClickHouse](https://clickhouse.yandex/docs/en) database
|
|
|
|
In order to distinguish them, I will refer them as ClickHouseModel and DjangoModel in further documentation.
|
|
|
|
## DjangoModel
|
|
Django provides a [model system](https://docs.djangoproject.com/en/3.0/topics/db/models/)
|
|
to interact with relational databases.
|
|
In order to perform [synchronization](synchronization.md) we need to "catch" all [DML operations](https://en.wikipedia.org/wiki/Data_manipulation_language)
|
|
on source django model and save information about them in [storage](storages.md).
|
|
To achieve this, library introduces abstract `django_clickhouse.models.ClickHouseSyncModel` class.
|
|
Each model, inherited from `ClickHouseSyncModel` will automatically save information, needed to sync to storage.
|
|
Read [synchronization](synchronization.md) section for more info.
|
|
|
|
`ClickHouseSyncModel` saves information about:
|
|
* `Model.objects.create()`, `Model.objects.bulk_create()`
|
|
* `Model.save()`, `Model.delete()`
|
|
* `QuerySet.update()`, `QuerySet.delete()`
|
|
* All queries of [django-pg-returning](https://pypi.org/project/django-pg-returning/) library
|
|
* All queries of [django-pg-bulk-update](https://pypi.org/project/django-pg-bulk-update/) library
|
|
|
|
You can also combine your custom django manager and queryset using mixins from `django_clickhouse.models` package:
|
|
|
|
**Important note**: Operations are saved in [transaction.on_commit()](https://docs.djangoproject.com/en/2.2/topics/db/transactions/#django.db.transaction.on_commit).
|
|
The goal is avoiding syncing operations, not committed to relational database.
|
|
But this may also provide bad effect: situation, when transaction is committed,
|
|
but it hasn't been registered, if something went wrong during registration.
|
|
|
|
Example:
|
|
```python
|
|
from django_clickhouse.models import ClickHouseSyncModel
|
|
from django.db import models
|
|
from datetime import date
|
|
|
|
class User(ClickHouseSyncModel):
|
|
first_name = models.CharField(max_length=50)
|
|
age = models.IntegerField()
|
|
birthday = models.DateField()
|
|
|
|
# All operations will be registered to sync with ClickHouse models:
|
|
User.objects.create(first_name='Alice', age=16, birthday=date(2003, 6, 1))
|
|
User(first_name='Bob', age=17, birthday=date(2002, 1, 1)).save()
|
|
User.objects.update(first_name='Candy')
|
|
|
|
# Custom manager
|
|
|
|
```
|
|
|
|
## ClickHouseModel
|
|
This kind of model is based on [infi.clickhouse_orm Model](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/models_and_databases.md#defining-models)
|
|
and represents table in [ClickHouse database](https://clickhouse.yandex/docs/en).
|
|
|
|
You should define `ClickHouseModel` subclass for each table you want to access and sync in ClickHouse.
|
|
Each model should be inherited from `django_clickhouse.clickhouse_models.ClickHouseModel`.
|
|
By default, models are searched in `clickhouse_models` module of each django app.
|
|
You can change modules name, using setting [CLICKHOUSE_MODELS_MODULE](configuration.md#clickhouse_models_module)
|
|
|
|
You can read more about creating models and fields [here](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/models_and_databases.md#defining-models):
|
|
all capabilities are supported. At the same time, django-clickhouse libraries adds:
|
|
* [routing attributes and methods](routing.md)
|
|
* [sync attributes and methods](synchronization.md)
|
|
|
|
Example:
|
|
```python
|
|
from django_clickhouse.clickhouse_models import ClickHouseModel
|
|
from django_clickhouse.engines import MergeTree
|
|
from infi.clickhouse_orm import fields
|
|
from my_app.models import User
|
|
|
|
|
|
class HeightData(ClickHouseModel):
|
|
django_model = User
|
|
|
|
first_name = fields.StringField()
|
|
birthday = fields.DateField()
|
|
height = fields.Float32Field()
|
|
|
|
engine = MergeTree('birthday', ('first_name', 'last_name', 'birthday'))
|
|
|
|
|
|
class AgeData(ClickHouseModel):
|
|
django_model = User
|
|
|
|
first_name = fields.StringField()
|
|
birthday = fields.DateField()
|
|
age = fields.UInt32Field()
|
|
|
|
engine = MergeTree('birthday', ('first_name', 'last_name', 'birthday'))
|
|
```
|
|
|
|
### ClickHouseMultiModel
|
|
In some cases you may need to sync single DjangoModel to multiple ClickHouse models.
|
|
This model gives ability to reduce number of relational database operations.
|
|
You can read more in [sync](synchronization.md) section.
|
|
|
|
Example:
|
|
```python
|
|
from django_clickhouse.clickhouse_models import ClickHouseMultiModel
|
|
from my_app.models import User
|
|
|
|
class MyMultiModel(ClickHouseMultiModel):
|
|
django_model = User
|
|
sub_models = [AgeData, HeightData]
|
|
```
|
|
|
|
## ClickHouseModel namedtuple form
|
|
[infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm) stores data rows in special Model objects.
|
|
It works well on hundreds of records.
|
|
But when you sync 100k records in a batch, initializing 100k model instances will be slow.
|
|
Too optimize this process `ClickHouseModel` class have `get_tuple_class()` method.
|
|
It generates a [namedtuple](https://docs.python.org/3/library/collections.html#collections.namedtuple) class,
|
|
with same data fields a model has.
|
|
Initializing such tuples takes much less time, then initializing Model objects.
|
|
|
|
## Engines
|
|
Engine is a way of storing, indexing, replicating and sorting data ClickHouse ([docs](https://clickhouse.yandex/docs/en/operations/table_engines/)).
|
|
Engine system is based on [infi.clickhouse_orm engine system](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/table_engines.md#table-engines).
|
|
This library extends original engine classes as each engine can have it's own synchronization mechanics.
|
|
Engines are defined in `django_clickhouse.engines` module.
|
|
|
|
Currently supported engines (with all infi functionality, [more info](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/table_engines.md#data-replication)):
|
|
* `MergeTree`
|
|
* `ReplacingMergeTree`
|
|
* `SummingMergeTree`
|
|
* `CollapsingMergeTree`
|
|
|
|
|
|
## Serializers
|
|
Serializer is a class which translates django model instances to [namedtuples, inserted into ClickHouse](#clickhousemodel-namedtuple-form).
|
|
`django_clickhouse.serializers.Django2ClickHouseModelSerializer` is used by default in all models.
|
|
All serializers must inherit this class.
|
|
|
|
Serializer must implement next interface:
|
|
```python
|
|
from django_clickhouse.serializers import Django2ClickHouseModelSerializer
|
|
from django.db.models import Model as DjangoModel
|
|
from typing import *
|
|
|
|
class CustomSerializer(Django2ClickHouseModelSerializer):
|
|
def __init__(self, model_cls: Type['ClickHouseModel'], fields: Optional[Iterable[str]] = None,
|
|
exclude_fields: Optional[Iterable[str]] = None, writable: bool = False,
|
|
defaults: Optional[dict] = None) -> None:
|
|
super().__init__(model_cls, fields=fields, exclude_fields=exclude_fields, writable=writable, defaults=defaults)
|
|
|
|
def serialize(self, obj: DjangoModel) -> NamedTuple:
|
|
pass
|
|
```
|