mirror of
				https://github.com/carrotquest/django-clickhouse.git
				synced 2025-10-26 05:31:31 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			154 lines
		
	
	
		
			7.0 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			154 lines
		
	
	
		
			7.0 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Models
 | |
| Model is a pythonic class representing database table in your code.
 | |
|  It also defines an interface (methods) to perform operations on this table
 | |
|  and describes its configuration inside framework.
 | |
|  
 | |
| This library operates 2 kinds of models:  
 | |
| * DjangoModel, describing tables in source relational database (PostgreSQL, MySQL, etc.)  
 | |
| * ClickHouseModel, describing models in [ClickHouse](https://clickhouse.yandex/docs/en) database
 | |
|   
 | |
| In order to distinguish them, I will refer them as ClickHouseModel and DjangoModel in further documentation.
 | |
| 
 | |
| ## DjangoModel
 | |
| Django provides a [model system](https://docs.djangoproject.com/en/3.0/topics/db/models/) 
 | |
|  to interact with relational databases. 
 | |
|  In order to perform [synchronization](synchronization.md) we need to "catch" all [DML operations](https://en.wikipedia.org/wiki/Data_manipulation_language)
 | |
|  on source django model and save information about them in [storage](storages.md).
 | |
|  To achieve this, library introduces abstract `django_clickhouse.models.ClickHouseSyncModel` class.
 | |
|  Each model, inherited from `ClickHouseSyncModel` will automatically save information, needed to sync to storage.  
 | |
| Read [synchronization](synchronization.md) section for more info.
 | |
| 
 | |
| `ClickHouseSyncModel` saves information about:
 | |
| * `Model.objects.create()`, `Model.objects.bulk_create()`
 | |
| * `Model.save()`, `Model.delete()`
 | |
| * `QuerySet.update()`, `QuerySet.delete()`
 | |
| * All queries of [django-pg-returning](https://pypi.org/project/django-pg-returning/) library
 | |
| * All queries of [django-pg-bulk-update](https://pypi.org/project/django-pg-bulk-update/) library
 | |
| 
 | |
| You can also combine your custom django manager and queryset using mixins from `django_clickhouse.models` package:
 | |
|   
 | |
| **Important note**: Operations are saved in [transaction.on_commit()](https://docs.djangoproject.com/en/2.2/topics/db/transactions/#django.db.transaction.on_commit). 
 | |
|  The goal is avoiding syncing operations, not committed to relational database.
 | |
|  But this may also provide bad effect: situation, when transaction is committed,
 | |
|  but it hasn't been registered, if something went wrong during registration. 
 | |
| 
 | |
| Example:
 | |
| ```python
 | |
| from django_clickhouse.models import ClickHouseSyncModel
 | |
| from django.db import models
 | |
| from datetime import date
 | |
| 
 | |
| class User(ClickHouseSyncModel):
 | |
|     first_name = models.CharField(max_length=50)
 | |
|     age = models.IntegerField()
 | |
|     birthday = models.DateField()
 | |
| 
 | |
| # All operations will be registered to sync with ClickHouse models:
 | |
| User.objects.create(first_name='Alice', age=16, birthday=date(2003, 6, 1))
 | |
| User(first_name='Bob', age=17, birthday=date(2002, 1, 1)).save()
 | |
| User.objects.update(first_name='Candy')
 | |
| 
 | |
| # Custom manager
 | |
| 
 | |
| ```
 | |
| 
 | |
| ## ClickHouseModel
 | |
| This kind of model is based on [infi.clickhouse_orm Model](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/models_and_databases.md#defining-models)
 | |
|  and represents table in [ClickHouse database](https://clickhouse.yandex/docs/en).
 | |
| 
 | |
| You should define `ClickHouseModel` subclass for each table you want to access and sync in ClickHouse.
 | |
| Each model should be inherited from `django_clickhouse.clickhouse_models.ClickHouseModel`.
 | |
| By default, models are searched in `clickhouse_models` module of each django app.
 | |
| You can change modules name, using setting [CLICKHOUSE_MODELS_MODULE](configuration.md#clickhouse_models_module)
 | |
|  
 | |
| You can read more about creating models and fields [here](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/models_and_databases.md#defining-models):
 | |
| all capabilities are supported. At the same time, django-clickhouse libraries adds:
 | |
| * [routing attributes and methods](routing.md)
 | |
| * [sync attributes and methods](synchronization.md)
 | |
| 
 | |
| Example:
 | |
| ```python
 | |
| from django_clickhouse.clickhouse_models import ClickHouseModel
 | |
| from django_clickhouse.engines import MergeTree
 | |
| from infi.clickhouse_orm import fields
 | |
| from my_app.models import User
 | |
| 
 | |
| 
 | |
| class HeightData(ClickHouseModel):
 | |
|     django_model = User
 | |
| 
 | |
|     first_name = fields.StringField()
 | |
|     birthday = fields.DateField()
 | |
|     height = fields.Float32Field()
 | |
| 
 | |
|     engine = MergeTree('birthday', ('first_name', 'last_name', 'birthday'))
 | |
| 
 | |
| 
 | |
| class AgeData(ClickHouseModel):
 | |
|     django_model = User
 | |
| 
 | |
|     first_name = fields.StringField()
 | |
|     birthday = fields.DateField()
 | |
|     age = fields.UInt32Field()
 | |
| 
 | |
|     engine = MergeTree('birthday', ('first_name', 'last_name', 'birthday'))
 | |
| ```
 | |
| 
 | |
| ### ClickHouseMultiModel
 | |
| In some cases you may need to sync single DjangoModel to multiple ClickHouse models.
 | |
| This model gives ability to reduce number of relational database operations.
 | |
| You can read more in [sync](synchronization.md) section.
 | |
| 
 | |
| Example:
 | |
| ```python
 | |
| from django_clickhouse.clickhouse_models import ClickHouseMultiModel
 | |
| from my_app.models import User
 | |
| 
 | |
| class MyMultiModel(ClickHouseMultiModel):
 | |
|     django_model = User
 | |
|     sub_models = [AgeData, HeightData]
 | |
| ```
 | |
| 
 | |
| ## ClickHouseModel namedtuple form
 | |
| [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm) stores data rows in special Model objects.
 | |
| It works well on hundreds of records. 
 | |
| But when you sync 100k records in a batch, initializing 100k model instances will be slow.  
 | |
| Too optimize this process `ClickHouseModel` class have `get_tuple_class()` method.
 | |
| It generates a [namedtuple](https://docs.python.org/3/library/collections.html#collections.namedtuple) class,
 | |
| with same data fields a model has. 
 | |
| Initializing such tuples takes much less time, then initializing Model objects.
 | |
| 
 | |
| ## Engines
 | |
| Engine is a way of storing, indexing, replicating and sorting data ClickHouse ([docs](https://clickhouse.yandex/docs/en/operations/table_engines/)).  
 | |
| Engine system is based on [infi.clickhouse_orm engine system](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/table_engines.md#table-engines).  
 | |
| This library extends original engine classes as each engine can have it's own synchronization mechanics. 
 | |
| Engines are defined in `django_clickhouse.engines` module.
 | |
| 
 | |
| Currently supported engines (with all infi functionality, [more info](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/table_engines.md#data-replication)):
 | |
| * `MergeTree`
 | |
| * `ReplacingMergeTree`
 | |
| * `SummingMergeTree`
 | |
| * `CollapsingMergeTree`
 | |
| 
 | |
| 
 | |
| ## Serializers
 | |
| Serializer is a class which translates django model instances to [namedtuples, inserted into ClickHouse](#clickhousemodel-namedtuple-form).
 | |
| `django_clickhouse.serializers.Django2ClickHouseModelSerializer` is used by default in all models.
 | |
|  All serializers must inherit this class. 
 | |
| 
 | |
| Serializer must implement next interface:
 | |
| ```python
 | |
| from django_clickhouse.serializers import Django2ClickHouseModelSerializer
 | |
| from django.db.models import Model as DjangoModel
 | |
| from typing import *
 | |
| 
 | |
| class CustomSerializer(Django2ClickHouseModelSerializer):
 | |
|     def __init__(self, model_cls: Type['ClickHouseModel'], fields: Optional[Iterable[str]] = None,
 | |
|                  exclude_fields: Optional[Iterable[str]] = None, writable: bool = False,
 | |
|                  defaults: Optional[dict] = None) -> None:
 | |
|         super().__init__(model_cls, fields=fields, exclude_fields=exclude_fields, writable=writable, defaults=defaults)
 | |
| 
 | |
|     def serialize(self, obj: DjangoModel) -> NamedTuple:
 | |
|         pass
 | |
| ```
 |