mirror of
				https://github.com/carrotquest/django-clickhouse.git
				synced 2025-11-04 01:47:46 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			154 lines
		
	
	
		
			7.0 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			154 lines
		
	
	
		
			7.0 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# Models
 | 
						|
Model is a pythonic class representing database table in your code.
 | 
						|
 It also defines an interface (methods) to perform operations on this table
 | 
						|
 and describes its configuration inside framework.
 | 
						|
 
 | 
						|
This library operates 2 kinds of models:  
 | 
						|
* DjangoModel, describing tables in source relational database (PostgreSQL, MySQL, etc.)  
 | 
						|
* ClickHouseModel, describing models in [ClickHouse](https://clickhouse.yandex/docs/en) database
 | 
						|
  
 | 
						|
In order to distinguish them, I will refer them as ClickHouseModel and DjangoModel in further documentation.
 | 
						|
 | 
						|
## DjangoModel
 | 
						|
Django provides a [model system](https://docs.djangoproject.com/en/3.0/topics/db/models/) 
 | 
						|
 to interact with relational databases. 
 | 
						|
 In order to perform [synchronization](synchronization.md) we need to "catch" all [DML operations](https://en.wikipedia.org/wiki/Data_manipulation_language)
 | 
						|
 on source django model and save information about them in [storage](storages.md).
 | 
						|
 To achieve this, library introduces abstract `django_clickhouse.models.ClickHouseSyncModel` class.
 | 
						|
 Each model, inherited from `ClickHouseSyncModel` will automatically save information, needed to sync to storage.  
 | 
						|
Read [synchronization](synchronization.md) section for more info.
 | 
						|
 | 
						|
`ClickHouseSyncModel` saves information about:
 | 
						|
* `Model.objects.create()`, `Model.objects.bulk_create()`
 | 
						|
* `Model.save()`, `Model.delete()`
 | 
						|
* `QuerySet.update()`, `QuerySet.delete()`
 | 
						|
* All queries of [django-pg-returning](https://pypi.org/project/django-pg-returning/) library
 | 
						|
* All queries of [django-pg-bulk-update](https://pypi.org/project/django-pg-bulk-update/) library
 | 
						|
 | 
						|
You can also combine your custom django manager and queryset using mixins from `django_clickhouse.models` package:
 | 
						|
  
 | 
						|
**Important note**: Operations are saved in [transaction.on_commit()](https://docs.djangoproject.com/en/2.2/topics/db/transactions/#django.db.transaction.on_commit). 
 | 
						|
 The goal is avoiding syncing operations, not committed to relational database.
 | 
						|
 But this may also provide bad effect: situation, when transaction is committed,
 | 
						|
 but it hasn't been registered, if something went wrong during registration. 
 | 
						|
 | 
						|
Example:
 | 
						|
```python
 | 
						|
from django_clickhouse.models import ClickHouseSyncModel
 | 
						|
from django.db import models
 | 
						|
from datetime import date
 | 
						|
 | 
						|
class User(ClickHouseSyncModel):
 | 
						|
    first_name = models.CharField(max_length=50)
 | 
						|
    age = models.IntegerField()
 | 
						|
    birthday = models.DateField()
 | 
						|
 | 
						|
# All operations will be registered to sync with ClickHouse models:
 | 
						|
User.objects.create(first_name='Alice', age=16, birthday=date(2003, 6, 1))
 | 
						|
User(first_name='Bob', age=17, birthday=date(2002, 1, 1)).save()
 | 
						|
User.objects.update(first_name='Candy')
 | 
						|
 | 
						|
# Custom manager
 | 
						|
 | 
						|
```
 | 
						|
 | 
						|
## ClickHouseModel
 | 
						|
This kind of model is based on [infi.clickhouse_orm Model](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/models_and_databases.md#defining-models)
 | 
						|
 and represents table in [ClickHouse database](https://clickhouse.yandex/docs/en).
 | 
						|
 | 
						|
You should define `ClickHouseModel` subclass for each table you want to access and sync in ClickHouse.
 | 
						|
Each model should be inherited from `django_clickhouse.clickhouse_models.ClickHouseModel`.
 | 
						|
By default, models are searched in `clickhouse_models` module of each django app.
 | 
						|
You can change modules name, using setting [CLICKHOUSE_MODELS_MODULE](configuration.md#clickhouse_models_module)
 | 
						|
 
 | 
						|
You can read more about creating models and fields [here](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/models_and_databases.md#defining-models):
 | 
						|
all capabilities are supported. At the same time, django-clickhouse libraries adds:
 | 
						|
* [routing attributes and methods](routing.md)
 | 
						|
* [sync attributes and methods](synchronization.md)
 | 
						|
 | 
						|
Example:
 | 
						|
```python
 | 
						|
from django_clickhouse.clickhouse_models import ClickHouseModel
 | 
						|
from django_clickhouse.engines import MergeTree
 | 
						|
from infi.clickhouse_orm import fields
 | 
						|
from my_app.models import User
 | 
						|
 | 
						|
 | 
						|
class HeightData(ClickHouseModel):
 | 
						|
    django_model = User
 | 
						|
 | 
						|
    first_name = fields.StringField()
 | 
						|
    birthday = fields.DateField()
 | 
						|
    height = fields.Float32Field()
 | 
						|
 | 
						|
    engine = MergeTree('birthday', ('first_name', 'last_name', 'birthday'))
 | 
						|
 | 
						|
 | 
						|
class AgeData(ClickHouseModel):
 | 
						|
    django_model = User
 | 
						|
 | 
						|
    first_name = fields.StringField()
 | 
						|
    birthday = fields.DateField()
 | 
						|
    age = fields.UInt32Field()
 | 
						|
 | 
						|
    engine = MergeTree('birthday', ('first_name', 'last_name', 'birthday'))
 | 
						|
```
 | 
						|
 | 
						|
### ClickHouseMultiModel
 | 
						|
In some cases you may need to sync single DjangoModel to multiple ClickHouse models.
 | 
						|
This model gives ability to reduce number of relational database operations.
 | 
						|
You can read more in [sync](synchronization.md) section.
 | 
						|
 | 
						|
Example:
 | 
						|
```python
 | 
						|
from django_clickhouse.clickhouse_models import ClickHouseMultiModel
 | 
						|
from my_app.models import User
 | 
						|
 | 
						|
class MyMultiModel(ClickHouseMultiModel):
 | 
						|
    django_model = User
 | 
						|
    sub_models = [AgeData, HeightData]
 | 
						|
```
 | 
						|
 | 
						|
## ClickHouseModel namedtuple form
 | 
						|
[infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm) stores data rows in special Model objects.
 | 
						|
It works well on hundreds of records. 
 | 
						|
But when you sync 100k records in a batch, initializing 100k model instances will be slow.  
 | 
						|
Too optimize this process `ClickHouseModel` class have `get_tuple_class()` method.
 | 
						|
It generates a [namedtuple](https://docs.python.org/3/library/collections.html#collections.namedtuple) class,
 | 
						|
with same data fields a model has. 
 | 
						|
Initializing such tuples takes much less time, then initializing Model objects.
 | 
						|
 | 
						|
## Engines
 | 
						|
Engine is a way of storing, indexing, replicating and sorting data ClickHouse ([docs](https://clickhouse.yandex/docs/en/operations/table_engines/)).  
 | 
						|
Engine system is based on [infi.clickhouse_orm engine system](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/table_engines.md#table-engines).  
 | 
						|
This library extends original engine classes as each engine can have it's own synchronization mechanics. 
 | 
						|
Engines are defined in `django_clickhouse.engines` module.
 | 
						|
 | 
						|
Currently supported engines (with all infi functionality, [more info](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/table_engines.md#data-replication)):
 | 
						|
* `MergeTree`
 | 
						|
* `ReplacingMergeTree`
 | 
						|
* `SummingMergeTree`
 | 
						|
* `CollapsingMergeTree`
 | 
						|
 | 
						|
 | 
						|
## Serializers
 | 
						|
Serializer is a class which translates django model instances to [namedtuples, inserted into ClickHouse](#clickhousemodel-namedtuple-form).
 | 
						|
`django_clickhouse.serializers.Django2ClickHouseModelSerializer` is used by default in all models.
 | 
						|
 All serializers must inherit this class. 
 | 
						|
 | 
						|
Serializer must implement next interface:
 | 
						|
```python
 | 
						|
from django_clickhouse.serializers import Django2ClickHouseModelSerializer
 | 
						|
from django.db.models import Model as DjangoModel
 | 
						|
from typing import *
 | 
						|
 | 
						|
class CustomSerializer(Django2ClickHouseModelSerializer):
 | 
						|
    def __init__(self, model_cls: Type['ClickHouseModel'], fields: Optional[Iterable[str]] = None,
 | 
						|
                 exclude_fields: Optional[Iterable[str]] = None, writable: bool = False,
 | 
						|
                 defaults: Optional[dict] = None) -> None:
 | 
						|
        super().__init__(model_cls, fields=fields, exclude_fields=exclude_fields, writable=writable, defaults=defaults)
 | 
						|
 | 
						|
    def serialize(self, obj: DjangoModel) -> NamedTuple:
 | 
						|
        pass
 | 
						|
```
 |