mirror of
https://github.com/carrotquest/django-clickhouse.git
synced 2024-11-24 18:13:46 +03:00
Added more docs
This commit is contained in:
parent
c0afa7b53a
commit
f2dc978634
|
@ -1,9 +1,9 @@
|
|||
# Basic information
|
||||
## <a name="about">About</a>
|
||||
## About
|
||||
This project's goal is to build [Yandex ClickHouse](https://clickhouse.yandex/) database into [Django](https://www.djangoproject.com/) project.
|
||||
It is based on [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm) library.
|
||||
|
||||
## <a name="features">Features</a>
|
||||
## Features
|
||||
* Multiple ClickHouse database configuration in [settings.py](https://docs.djangoproject.com/en/2.1/ref/settings/)
|
||||
* ORM to create and manage ClickHouse models.
|
||||
* ClickHouse migration system.
|
||||
|
@ -11,26 +11,26 @@ It is based on [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhous
|
|||
* Effective periodical synchronization of django models to ClickHouse without loosing data.
|
||||
* Synchronization process monitoring.
|
||||
|
||||
## <a name="requirements">Requirements</a>
|
||||
## Requirements
|
||||
* [Python 3](https://www.python.org/downloads/)
|
||||
* [Django](https://docs.djangoproject.com/) 1.7+
|
||||
* [Yandex ClickHouse](https://clickhouse.yandex/)
|
||||
* [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm)
|
||||
* pytz
|
||||
* six
|
||||
* typing
|
||||
* psycopg2
|
||||
* celery
|
||||
* statsd
|
||||
* [pytz](https://pypi.org/project/pytz/)
|
||||
* [six](https://pypi.org/project/six/)
|
||||
* [typing](https://pypi.org/project/typing/)
|
||||
* [psycopg2](https://www.psycopg.org/)
|
||||
* [celery](http://www.celeryproject.org/)
|
||||
* [statsd](https://pypi.org/project/statsd/)
|
||||
|
||||
### Optional libraries
|
||||
* [redis-py](https://redis-py.readthedocs.io/en/latest/) for [RedisStorage](storages.md#redis_storage)
|
||||
* [redis-py](https://redis-py.readthedocs.io/en/latest/) for [RedisStorage](storages.md#redisstorage)
|
||||
* [django-pg-returning](https://github.com/M1hacka/django-pg-returning)
|
||||
for optimizing registering updates in [PostgreSQL](https://www.postgresql.org/)
|
||||
* [django-pg-bulk-update](https://github.com/M1hacka/django-pg-bulk-update)
|
||||
for performing effective bulk update operation in [PostgreSQL](https://www.postgresql.org/)
|
||||
for performing effective bulk update and create operations in [PostgreSQL](https://www.postgresql.org/)
|
||||
|
||||
## <a name="installation">Installation</a>
|
||||
## Installation
|
||||
Install via pip:
|
||||
`pip install django-clickhouse` ([not released yet](https://github.com/carrotquest/django-clickhouse/issues/3))
|
||||
or via setup.py:
|
||||
|
|
|
@ -3,19 +3,18 @@
|
|||
Library configuration is made in settings.py. All parameters start with `CLICKHOUSE_` prefix.
|
||||
Prefix can be changed using `CLICKHOUSE_SETTINGS_PREFIX` parameter.
|
||||
|
||||
### <a name="databases">CLICKHOUSE_SETTINGS_PREFIX</a>
|
||||
### CLICKHOUSE_SETTINGS_PREFIX
|
||||
Defaults to: `'CLICKHOUSE_'`
|
||||
You can change `CLICKHOUSE_` prefix in settings using this parameter to anything your like.
|
||||
|
||||
### <a name="databases">CLICKHOUSE_DATABASES</a>
|
||||
### CLICKHOUSE_DATABASES
|
||||
Defaults to: `{}`
|
||||
A dictionary, defining databases in django-like style.
|
||||
<!--- TODO Add link --->
|
||||
Key is an alias to communicate with this database in [connections]() and [using]().
|
||||
Value is a configuration dict with parameters:
|
||||
* [infi.clickhouse_orm database parameters](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/class_reference.md#database)
|
||||
<!--- TODO Add link --->
|
||||
* `migrate: bool` - indicates if this database should be migrated. See [migrations]().
|
||||
* `migrate: bool` - indicates if this database should be migrated. See [migrations](migrations.md).
|
||||
|
||||
Example:
|
||||
```python
|
||||
|
@ -24,22 +23,28 @@ CLICKHOUSE_DATABASES = {
|
|||
'db_name': 'test',
|
||||
'username': 'default',
|
||||
'password': ''
|
||||
},
|
||||
'reader': {
|
||||
'db_name': 'read_only',
|
||||
'username': 'reader',
|
||||
'readonly': True,
|
||||
'password': ''
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### <a name="default_db_alias">CLICKHOUSE_DEFAULT_DB_ALIAS</a>
|
||||
### CLICKHOUSE_DEFAULT_DB_ALIAS
|
||||
Defaults to: `'default'`
|
||||
<!--- TODO Add link --->
|
||||
A database alias to use in [QuerySets]() if direct [using]() is not specified.
|
||||
|
||||
### <a name="sync_storage">CLICKHOUSE_SYNC_STORAGE</a>
|
||||
### CLICKHOUSE_SYNC_STORAGE
|
||||
Defaults to: `'django_clickhouse.storages.RedisStorage'`
|
||||
An intermediate storage class to use. Can be a string or class. [More info about storages](storages.md).
|
||||
|
||||
### <a name="redis_config">CLICKHOUSE_REDIS_CONFIG</a>
|
||||
### CLICKHOUSE_REDIS_CONFIG
|
||||
Default to: `None`
|
||||
Redis configuration for [RedisStorage](storages.md#redis_storage).
|
||||
Redis configuration for [RedisStorage](storages.md#redisstorage).
|
||||
If given, should be a dictionary of parameters to pass to [redis-py](https://redis-py.readthedocs.io/en/latest/#redis.Redis).
|
||||
|
||||
Example:
|
||||
|
@ -52,45 +57,42 @@ CLICKHOUSE_REDIS_CONFIG = {
|
|||
}
|
||||
```
|
||||
|
||||
### <a name="sync_batch_size">CLICKHOUSE_SYNC_BATCH_SIZE</a>
|
||||
### CLICKHOUSE_SYNC_BATCH_SIZE
|
||||
Defaults to: `10000`
|
||||
Maximum number of operations, fetched by sync process from intermediate storage per sync round.
|
||||
|
||||
### <a name="sync_delay">CLICKHOUSE_SYNC_DELAY</a>
|
||||
### CLICKHOUSE_SYNC_DELAY
|
||||
Defaults to: `5`
|
||||
A delay in seconds between two sync rounds start.
|
||||
|
||||
### <a name="models_module">CLICKHOUSE_MODELS_MODULE</a>
|
||||
### CLICKHOUSE_MODELS_MODULE
|
||||
Defaults to: `'clickhouse_models'`
|
||||
<!--- TODO Add link --->
|
||||
Module name inside [django app](https://docs.djangoproject.com/en/2.2/intro/tutorial01/),
|
||||
where [ClickHouseModel]() classes are search during migrations.
|
||||
Module name inside [django app](https://docs.djangoproject.com/en/3.0/intro/tutorial01/),
|
||||
where [ClickHouseModel](models.md#clickhousemodel) classes are search during migrations.
|
||||
|
||||
### <a name="database_router">CLICKHOUSE_DATABASE_ROUTER</a>
|
||||
### CLICKHOUSE_DATABASE_ROUTER
|
||||
Defaults to: `'django_clickhouse.routers.DefaultRouter'`
|
||||
<!--- TODO Add link --->
|
||||
A dotted path to class, representing [database router]().
|
||||
A dotted path to class, representing [database router](routing.md#router).
|
||||
|
||||
### <a name="migrations_package">CLICKHOUSE_MIGRATIONS_PACKAGE</a>
|
||||
### CLICKHOUSE_MIGRATIONS_PACKAGE
|
||||
Defaults to: `'clickhouse_migrations'`
|
||||
A python package name inside [django app](https://docs.djangoproject.com/en/2.2/intro/tutorial01/),
|
||||
A python package name inside [django app](https://docs.djangoproject.com/en/3.0/intro/tutorial01/),
|
||||
where migration files are searched.
|
||||
|
||||
### <a name="migration_history_model">CLICKHOUSE_MIGRATION_HISTORY_MODEL</a>
|
||||
### CLICKHOUSE_MIGRATION_HISTORY_MODEL
|
||||
Defaults to: `'django_clickhouse.migrations.MigrationHistory'`
|
||||
<!--- TODO Add link --->
|
||||
A dotted name of a ClickHouseModel subclass (including module path), representing [MigrationHistory]() model.
|
||||
A dotted name of a ClickHouseModel subclass (including module path),
|
||||
representing [MigrationHistory model](migrations.md#migrationhistory-clickhousemodel).
|
||||
|
||||
### <a name="migrate_with_default_db">CLICKHOUSE_MIGRATE_WITH_DEFAULT_DB</a>
|
||||
### CLICKHOUSE_MIGRATE_WITH_DEFAULT_DB
|
||||
Defaults to: `True`
|
||||
A boolean flag enabling automatic ClickHouse migration,
|
||||
when you call [`migrate`](https://docs.djangoproject.com/en/2.2/ref/django-admin/#django-admin-migrate) on default database.
|
||||
when you call [`migrate`](https://docs.djangoproject.com/en/2.2/ref/django-admin/#django-admin-migrate) on `default` database.
|
||||
|
||||
### <a name="statd_prefix">CLICKHOUSE_STATSD_PREFIX</a>
|
||||
### CLICKHOUSE_STATSD_PREFIX
|
||||
Defaults to: `clickhouse`
|
||||
<!--- TODO Add link --->
|
||||
A prefix in [statsd](https://pythonhosted.org/python-statsd/) added to each library metric. See [metrics]()
|
||||
A prefix in [statsd](https://pythonhosted.org/python-statsd/) added to each library metric. See [monitoring](monitoring.md).
|
||||
|
||||
### <a name="celery_queue">CLICKHOUSE_CELERY_QUEUE</a>
|
||||
### CLICKHOUSE_CELERY_QUEUE
|
||||
Defaults to: `'celery'`
|
||||
A name of a queue, used by celery to plan library sync tasks.
|
||||
|
|
|
@ -5,7 +5,9 @@
|
|||
* [Features](basic_information.md#features)
|
||||
* [Requirements](basic_information.md#requirements)
|
||||
* [Installation](basic_information.md#installation)
|
||||
* [Design motivation](motivation.md)
|
||||
* Usage
|
||||
* [Overview](overview.md)
|
||||
* [Models](models.md)
|
||||
* [DjangoModel](models.md#DjangoModel)
|
||||
* [ClickHouseModel](models.md#ClickHouseModel)
|
||||
|
@ -14,4 +16,6 @@
|
|||
* [Migrations](migrations.md)
|
||||
* [Synchronization](synchronization.md)
|
||||
* [Storages](storages.md)
|
||||
* [RedisStorage](storages.md#redis_storage)
|
||||
* [RedisStorage](storages.md#redisstorage)
|
||||
* [Monitoring](monitoring.md)
|
||||
* [Performance notes](performance.md)
|
||||
|
|
|
@ -5,7 +5,7 @@ but makes it a little bit more django-like.
|
|||
|
||||
## File structure
|
||||
Each django app can have optional `clickhouse_migrations` package.
|
||||
This is a default package name, it can be changed with [CLICKHOUSE_MIGRATIONS_PACKAGE](configuration.md#migrations_package) setting.
|
||||
This is a default package name, it can be changed with [CLICKHOUSE_MIGRATIONS_PACKAGE](configuration.md#clickhouse_migrations_package) setting.
|
||||
|
||||
Package contains py files, starting with 4-digit number.
|
||||
A number gives an order in which migrations will be applied.
|
||||
|
@ -17,24 +17,27 @@ my_app
|
|||
>>>> __init__.py
|
||||
>>>> 0001_initial.py
|
||||
>>>> 0002_add_new_field_to_my_model.py
|
||||
>> clickhouse_models.py
|
||||
>> urls.py
|
||||
>> views.py
|
||||
```
|
||||
|
||||
## Migration files
|
||||
Each file must contain a `Migration` class, inherited from `django_clickhouse.migrations.Migration`.
|
||||
The class should define an `operations` attribute - a list of operations to apply one by one.
|
||||
Operation is one of operations, supported by [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/schema_migrations.md).
|
||||
Operation is one of [operations, supported by infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/schema_migrations.md).
|
||||
|
||||
```python
|
||||
from django_clickhouse import migrations
|
||||
from my_app.clickhouse_models import ClickHouseUser
|
||||
|
||||
class Migration(migrations.Migration):
|
||||
operations = [
|
||||
migrations.CreateTable(ClickHouseTestModel),
|
||||
migrations.CreateTable(ClickHouseCollapseTestModel)
|
||||
migrations.CreateTable(ClickHouseUser)
|
||||
]
|
||||
```
|
||||
|
||||
## MigrationHistory ClickHouse model
|
||||
## MigrationHistory ClickHouseModel
|
||||
This model stores information about applied migrations.
|
||||
By default, library uses `django_clickhouse.migrations.MigrationHistory` model,
|
||||
but this can be changed using `CLICKHOUSE_MIGRATION_HISTORY_MODEL` setting.
|
||||
|
@ -45,23 +48,26 @@ MigrationHistory model is stored in default database.
|
|||
|
||||
## Automatic migrations
|
||||
When library is installed, it tries applying migrations every time,
|
||||
you call `python manage.py migrate`. If you want to disable this, use [CLICKHOUSE_MIGRATE_WITH_DEFAULT_DB](configuration.md#migrate_with_default_db) settings.
|
||||
you call [django migrate](https://docs.djangoproject.com/en/3.0/ref/django-admin/#django-admin-migrate). If you want to disable this, use [CLICKHOUSE_MIGRATE_WITH_DEFAULT_DB](configuration.md#clickhouse_migrate_with_default_db) setting.
|
||||
|
||||
Note: migrations are only applied, when `default` database is migrated.
|
||||
By default migrations are applied to all [CLICKHOUSE_DATABASES](configuration.md#clickhouse_databases), which have no flags:
|
||||
* `'migrate': False`
|
||||
* `'readonly': True`
|
||||
|
||||
Note: migrations are only applied, with django `default` database.
|
||||
So if you call `python manage.py migrate --database=secondary` they wouldn't be applied.
|
||||
|
||||
## Migration algorithm
|
||||
- Gets a list of databases from `CLICKHOUSE_DATABASES` settings. Migrate them one by one.
|
||||
- Find all django apps from `INSTALLED_APPS` settings, which have no `readonly=True` setting and have `migrate=True` settings.
|
||||
Migrate them one by one.
|
||||
* Iterate over `INSTAALLED_APPS`, searching for `clickhouse_migrations` package
|
||||
- Get a list of databases from `CLICKHOUSE_DATABASES` setting. Migrate them one by one.
|
||||
- Find all django apps from `INSTALLED_APPS` setting, which have no `readonly=True` attribute and have `migrate=True` attribute. Migrate them one by one.
|
||||
* Iterate over `INSTAALLED_APPS`, searching for [clickhouse_migrations package](#file-structure)
|
||||
* If package was not found, skip app.
|
||||
* Get a list of migrations applied from `MigrationHistory` model
|
||||
* Get a list of migrations applied from [MigrationHistory model](#migrationhistory-clickhousemodel)
|
||||
* Get a list of unapplied migrations
|
||||
* Get `Migration` class from each migration and call it `apply()` method
|
||||
* `apply()` iterates operations, checking if it should be applied with [router](router.md)
|
||||
* Get [Migration class](#migration-files) from each migration and call it `apply()` method
|
||||
* `apply()` iterates operations, checking if it should be applied with [router](routing.md)
|
||||
* If migration should be applied, it is applied
|
||||
* Mark migration as applied in `MigrationHistory` model
|
||||
* Mark migration as applied in [MigrationHistory model](#migrationhistory-clickhousemodel)
|
||||
|
||||
## Security notes
|
||||
1) ClickHouse has no transaction system, as django relational databases.
|
||||
|
|
|
@ -1,20 +1,20 @@
|
|||
# Models
|
||||
Model is a pythonic class representing database table in your code.
|
||||
It also defined an interface (methods) to perform operations on this table
|
||||
It also defines an interface (methods) to perform operations on this table
|
||||
and describes its configuration inside framework.
|
||||
|
||||
This library operates 2 kinds of models:
|
||||
* Django model, describing tables in source relational model
|
||||
* DjangoModel, describing tables in source relational database (PostgreSQL, MySQL, etc.)
|
||||
* ClickHouseModel, describing models in [ClickHouse](https://clickhouse.yandex/docs/en) database
|
||||
|
||||
In order to distinguish them, I will refer them as ClickHouseModel and DjangoModel in further documentation.
|
||||
|
||||
## DjangoModel
|
||||
Django provides a [model system](https://docs.djangoproject.com/en/2.2/topics/db/models/)
|
||||
Django provides a [model system](https://docs.djangoproject.com/en/3.0/topics/db/models/)
|
||||
to interact with relational databases.
|
||||
In order to perform [synchronization](synchronization.md) we need to "catch" all DML operations
|
||||
on source django model and save information about it in [storage](storages.md).
|
||||
To achieve this library introduces abstract `django_clickhouse.models.ClickHouseSyncModel` class.
|
||||
In order to perform [synchronization](synchronization.md) we need to "catch" all [DML operations](https://en.wikipedia.org/wiki/Data_manipulation_language)
|
||||
on source django model and save information about them in [storage](storages.md).
|
||||
To achieve this, library introduces abstract `django_clickhouse.models.ClickHouseSyncModel` class.
|
||||
Each model, inherited from `ClickHouseSyncModel` will automatically save information, needed to sync to storage.
|
||||
Read [synchronization](synchronization.md) section for more info.
|
||||
|
||||
|
@ -25,7 +25,7 @@ Read [synchronization](synchronization.md) section for more info.
|
|||
* All queries of [django-pg-returning](https://pypi.org/project/django-pg-returning/) library
|
||||
* All queries of [django-pg-bulk-update](https://pypi.org/project/django-pg-bulk-update/) library
|
||||
|
||||
You can also combine your custom django manager and queryset using mixins from `django_clickhouse.models` package.
|
||||
You can also combine your custom django manager and queryset using mixins from `django_clickhouse.models` package:
|
||||
|
||||
**Important note**: Operations are saved in [transaction.on_commit()](https://docs.djangoproject.com/en/2.2/topics/db/transactions/#django.db.transaction.on_commit).
|
||||
The goal is avoiding syncing operations, not committed to relational database.
|
||||
|
@ -44,9 +44,12 @@ class User(ClickHouseSyncModel):
|
|||
birthday = models.DateField()
|
||||
|
||||
# All operations will be registered to sync with ClickHouse models:
|
||||
MyModel.objects.create(first_name='Alice', age=16, , birthday=date(2003, 6, 1))
|
||||
MyModel(first_name='Bob', age=17, birthday=date(2002, 1, 1)).save()
|
||||
MyModel.objects.update(first_name='Candy')
|
||||
User.objects.create(first_name='Alice', age=16, birthday=date(2003, 6, 1))
|
||||
User(first_name='Bob', age=17, birthday=date(2002, 1, 1)).save()
|
||||
User.objects.update(first_name='Candy')
|
||||
|
||||
# Custom manager
|
||||
|
||||
```
|
||||
|
||||
## ClickHouseModel
|
||||
|
@ -56,10 +59,10 @@ This kind of model is based on [infi.clickhouse_orm Model](https://github.com/In
|
|||
You should define `ClickHouseModel` subclass for each table you want to access and sync in ClickHouse.
|
||||
Each model should be inherited from `django_clickhouse.clickhouse_models.ClickHouseModel`.
|
||||
By default, models are searched in `clickhouse_models` module of each django app.
|
||||
You can change modules name, using stting [CLICKHOUSE_MODELS_MODULE](configuration.md#models_module)
|
||||
You can change modules name, using setting [CLICKHOUSE_MODELS_MODULE](configuration.md#clickhouse_models_module)
|
||||
|
||||
You can read more about creating models and fields [here](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/models_and_databases.md#defining-models):
|
||||
all capabilites are supported. At the same time, django-clickhouse libraries adds:
|
||||
all capabilities are supported. At the same time, django-clickhouse libraries adds:
|
||||
* [routing attributes and methods](routing.md)
|
||||
* [sync attributes and methods](synchronization.md)
|
||||
|
||||
|
@ -68,6 +71,8 @@ Example:
|
|||
from django_clickhouse.clickhouse_models import ClickHouseModel
|
||||
from django_clickhouse.engines import MergeTree
|
||||
from infi.clickhouse_orm import fields
|
||||
from my_app.models import User
|
||||
|
||||
|
||||
class HeightData(ClickHouseModel):
|
||||
django_model = User
|
||||
|
@ -84,7 +89,7 @@ class AgeData(ClickHouseModel):
|
|||
|
||||
first_name = fields.StringField()
|
||||
birthday = fields.DateField()
|
||||
age = fields.IntegerField()
|
||||
age = fields.UInt32Field()
|
||||
|
||||
engine = MergeTree('birthday', ('first_name', 'last_name', 'birthday'))
|
||||
```
|
||||
|
@ -97,6 +102,7 @@ You can read more in [sync](synchronization.md) section.
|
|||
Example:
|
||||
```python
|
||||
from django_clickhouse.clickhouse_models import ClickHouseMultiModel
|
||||
from my_app.models import User
|
||||
|
||||
class MyMultiModel(ClickHouseMultiModel):
|
||||
django_model = User
|
||||
|
@ -104,7 +110,13 @@ class MyMultiModel(ClickHouseMultiModel):
|
|||
```
|
||||
|
||||
## Engines
|
||||
Engine is a way of storing, indexing, replicating and sorting data in [ClickHouse](https://clickhouse.yandex/docs/en/operations/table_engines/).
|
||||
Engine system is based on [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/table_engines.md#table-engines).
|
||||
django-clickhouse extends original engine classes, as each engine can have it's own synchronization mechanics.
|
||||
Engine is a way of storing, indexing, replicating and sorting data ClickHouse ([docs](https://clickhouse.yandex/docs/en/operations/table_engines/)).
|
||||
Engine system is based on [infi.clickhouse_orm engine system](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/table_engines.md#table-engines).
|
||||
This library extends original engine classes as each engine can have it's own synchronization mechanics.
|
||||
Engines are defined in `django_clickhouse.engines` module.
|
||||
|
||||
Currently supported engines (with all infi functionality, [more info](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/table_engines.md#data-replication)):
|
||||
* `MergeTree`
|
||||
* `ReplacingMergeTree`
|
||||
* `SummingMergeTree`
|
||||
* `CollapsingMergeTree`
|
||||
|
|
56
docs/monitoring.md
Normal file
56
docs/monitoring.md
Normal file
|
@ -0,0 +1,56 @@
|
|||
# Monitoring
|
||||
In order to monitor [synchronization](synchronization.md) process, [statsd](https://pypi.org/project/statsd/) is used.
|
||||
Data from statsd then can be used by [Prometheus exporter](https://github.com/prometheus/statsd_exporter)
|
||||
or [Graphite](https://graphite.readthedocs.io/en/latest/).
|
||||
|
||||
## Configuration
|
||||
Library expects statsd to be configured as written in [statsd docs for django](https://statsd.readthedocs.io/en/latest/configure.html#in-django).
|
||||
You can set a common prefix for all keys in this library using [CLICKHOUSE_STATSD_PREFIX](configuration.md#clickhouse_statsd_prefix) parameter.
|
||||
|
||||
## Exported metrics
|
||||
## Gauges
|
||||
* `<prefix>.sync.<model_name>.queue`
|
||||
Number of elements in [intermediate storage](storages.md) queue waiting for import.
|
||||
<!--- TODO Add link --->
|
||||
Queue should not be big. It depends on [sync_delay]() configured and time for syncing single batch.
|
||||
It is a good parameter to watch and alert on.
|
||||
|
||||
## Timers
|
||||
All time is sent in milliseconds.
|
||||
|
||||
* `<prefix>.sync.<model_name>.total`
|
||||
Total time of single batch task execution.
|
||||
|
||||
* `<prefix>.sync.<model_name>.steps.<step_name>`
|
||||
`<step_name>` is one of `pre_sync`, `get_operations`, `get_sync_objects`, `get_insert_batch`, `get_final_versions`,
|
||||
`insert`, `post_sync`. Read [here](synchronization.md) for more details.
|
||||
Time of each sync step. Can be useful to debug reasons of long sync process.
|
||||
|
||||
* `<prefix>.inserted_tuples.<model_name>`
|
||||
Time of inserting batch of data into ClickHouse.
|
||||
It excludes as much python code as it could to distinguish real INSERT time from python data preparation.
|
||||
|
||||
* `<prefix>.sync.<model_name>.register_operations`
|
||||
Time of inserting sync operations into storage.
|
||||
|
||||
## Counters
|
||||
* `<prefix>.sync.<model_name>.register_operations.<op_name>`
|
||||
`<op_name>` is one or `create`, `update`, `delete`.
|
||||
Number of DML operations added by DjangoModel methods calls to sync queue.
|
||||
|
||||
* `<prefix>.sync.<model_name>.operations`
|
||||
Number of operations, fetched from [storage](storages.md) for sync in one batch.
|
||||
|
||||
* `<prefix>.sync.<model_name>.import_objects`
|
||||
Number of objects, fetched from relational storage (based on operations) in order to sync with ClickHouse models.
|
||||
|
||||
* `<prefix>.inserted_tuples.<model_name>`
|
||||
Number of rows inserted to ClickHouse.
|
||||
|
||||
* `<prefix>.sync.<model_name>.lock.timeout`
|
||||
Number of locks in [RedisStorage](storages.md#redisstorage), not acquired and skipped by timeout.
|
||||
This value should be zero. If not, it means your model sync takes longer then sync task call interval.
|
||||
|
||||
* `<prefix>.sync.<model_name>.lock.hard_release`
|
||||
Number of locks in [RedisStorage](storages.md#redisstorage), released hardly (as process which required a lock is dead).
|
||||
This value should be zero. If not, it means your sync tasks are killed hardly during the sync process (by OutOfMemory killer, for instance).
|
35
docs/motivation.md
Normal file
35
docs/motivation.md
Normal file
|
@ -0,0 +1,35 @@
|
|||
# Design motivation
|
||||
## Separate from django database setting, QuerySet and migration system
|
||||
ClickHouse SQL and DML language is near to standard, but does not follow it exactly ([docs](https://clickhouse.tech/docs/en/introduction/distinctive_features/#sql-support)).
|
||||
As a result, it can not be easily integrated into django query subsystem as it expects databases to support:
|
||||
1. Transactions.
|
||||
2. INNER/OUTER JOINS by condition.
|
||||
3. Full featured updates and deletes.
|
||||
4. Per database replication (ClickHouse has per table replication)
|
||||
5. Other features, not supported in ClickHouse.
|
||||
|
||||
In order to have more functionality, [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm)
|
||||
is used as base library for databases, querysets and migrations. The most part of it is compatible and can be used without any changes.
|
||||
|
||||
## Sync over intermediate storage
|
||||
This library has several goals which lead to intermediate storage:
|
||||
1. Fail resistant import, does not matter what the fail reason is:
|
||||
ClickHouse fail, network fail, killing import process by system (OOM, for instance).
|
||||
2. ClickHouse does not like single row inserts: [docs](https://clickhouse.tech/docs/en/introduction/performance/#performance-when-inserting-data).
|
||||
So it's worth batching data somewhere before inserting it.
|
||||
ClickHouse provide BufferEngine for this, but it can loose data if ClickHouse fails - and no one will now about it.
|
||||
3. Better scalability. Different intermediate storages may be implemented in the future, based on databases, queue systems or even BufferEngine.
|
||||
|
||||
## Replication and routing
|
||||
In primitive cases people just have single database or cluster with same tables on each replica.
|
||||
But as ClickHouse has per table replication a more complicated structure can be built:
|
||||
1. Model A is stored on servers 1 and 2
|
||||
2. Model B is stored on servers 2, 3 and 5
|
||||
3. Model C is stored on servers 1, 3 and 4
|
||||
|
||||
Moreover, migration operations in ClickHouse can also be auto-replicated (`ALTER TABLE`, for instance) or not (`CREATE TABLE`).
|
||||
|
||||
In order to make replication scheme scalable:
|
||||
1. Each model has it's own read / write / migrate [routing configuration](routing.md#clickhousemodel-routing-attributes).
|
||||
2. You can use [router](routing.md#router) like django does to set basic routing rules for all models or model groups.
|
||||
|
141
docs/overview.md
Normal file
141
docs/overview.md
Normal file
|
@ -0,0 +1,141 @@
|
|||
# Usage overview
|
||||
## Requirements
|
||||
At the begging I expect, that you already have:
|
||||
1. [ClickHouse](https://clickhouse.tech/docs/en/) (with [ZooKeeper](https://zookeeper.apache.org/), if you use replication)
|
||||
2. Relational database used with [Django](https://www.djangoproject.com/). For instance, [PostgreSQL](https://www.postgresql.org/)
|
||||
3. [Django database set up](https://docs.djangoproject.com/en/3.0/ref/databases/)
|
||||
4. [Intermediate storage](storages.md) set up. For instance, [Redis](https://redis.io/).
|
||||
|
||||
## Configuration
|
||||
Add required parameters to [Django settings.py](https://docs.djangoproject.com/en/3.0/topics/settings/):
|
||||
1. [CLICKHOUSE_DATABASES](configuration.md#clickhouse_databases)
|
||||
2. [Intermediate storage](storages.md) configuration. For instance, [RedisStorage](storages.md#redisstorage)
|
||||
3. It's recommended to change [CLICKHOUSE_CELERY_QUEUE](configuration.md#clickhouse_celery_queue)
|
||||
4. Add sync task to [celerybeat schedule](http://docs.celeryproject.org/en/v2.3.3/userguide/periodic-tasks.html).
|
||||
Note, that executing planner every 2 seconds doesn't mean sync is executed every 2 seconds.
|
||||
Sync time depends on model sync_delay attribute value and [CLICKHOUSE_SYNC_DELAY](configuration.md#clickhouse_sync_delay) configuration parameter.
|
||||
You can read more in [sync section](synchronization.md).
|
||||
|
||||
You can also change other [configuration parameters](configuration.md) depending on your project.
|
||||
|
||||
#### Example
|
||||
```python
|
||||
# django-clickhouse library setup
|
||||
CLICKHOUSE_DATABASES = {
|
||||
# Connection name to refer in using(...) method
|
||||
'default': {
|
||||
'db_name': 'test',
|
||||
'username': 'default',
|
||||
'password': ''
|
||||
}
|
||||
}
|
||||
CLICKHOUSE_REDIS_CONFIG = {
|
||||
'host': '127.0.0.1',
|
||||
'port': 6379,
|
||||
'db': 8,
|
||||
'socket_timeout': 10
|
||||
}
|
||||
CLICKHOUSE_CELERY_QUEUE = 'clickhouse'
|
||||
|
||||
# If you have no any celerybeat tasks, define a new dictionary
|
||||
# More info: http://docs.celeryproject.org/en/v2.3.3/userguide/periodic-tasks.html
|
||||
from datetime import timedelta
|
||||
CELERYBEAT_SCHEDULE = {
|
||||
'clickhouse_auto_sync': {
|
||||
'task': 'django_clickhouse.tasks.clickhouse_auto_sync',
|
||||
'schedule': timedelta(seconds=2), # Every 2 seconds
|
||||
'options': {'expires': 1, 'queue': CLICKHOUSE_CELERY_QUEUE}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Adopting django model
|
||||
Read [ClickHouseSyncModel](models.md#djangomodel) section.
|
||||
Inherit all [django models](https://docs.djangoproject.com/en/3.0/topics/db/models/)
|
||||
you want to sync with ClickHouse from `django_clickhouse.models.ClickHouseSyncModel` or sync mixins.
|
||||
|
||||
```python
|
||||
from django_clickhouse.models import ClickHouseSyncModel
|
||||
from django.db import models
|
||||
|
||||
class User(ClickHouseSyncModel):
|
||||
first_name = models.CharField(max_length=50)
|
||||
visits = models.IntegerField(default=0)
|
||||
birthday = models.DateField()
|
||||
```
|
||||
|
||||
## Create ClickHouseModel
|
||||
1. Read [ClickHouseModel section](models.md#clickhousemodel)
|
||||
2. Create `clickhouse_models.py` in your django app.
|
||||
3. Add `ClickHouseModel` class there:
|
||||
```python
|
||||
from django_clickhouse.clickhouse_models import ClickHouseModel
|
||||
from django_clickhouse.engines import MergeTree
|
||||
from infi.clickhouse_orm import fields
|
||||
from my_app.models import User
|
||||
|
||||
class ClickHouseUser(ClickHouseModel):
|
||||
django_model = User
|
||||
sync_delay = 5
|
||||
|
||||
id = fields.UInt32Field()
|
||||
first_name = fields.StringField()
|
||||
birthday = fields.DateField()
|
||||
visits = fields.UInt32Field(default=0)
|
||||
|
||||
engine = MergeTree('birthday', ('birthday',))
|
||||
```
|
||||
|
||||
## Migration to create table in ClickHouse
|
||||
1. Read [migrations](migrations.md) section
|
||||
2. Create `clickhouse_migrations` package in your django app
|
||||
3. Create `0001_initial.py` file inside the created package. Result structure should be:
|
||||
```
|
||||
my_app
|
||||
>> clickhouse_migrations
|
||||
>>>> __init__.py
|
||||
>>>> 0001_initial.py
|
||||
>> clickhouse_models.py
|
||||
>> models.py
|
||||
```
|
||||
|
||||
4. Add content to file `0001_initial.py`:
|
||||
```python
|
||||
from django_clickhouse import migrations
|
||||
from my_app.cilckhouse_models import ClickHouseUser
|
||||
|
||||
class Migration(migrations.Migration):
|
||||
operations = [
|
||||
migrations.CreateTable(ClickHouseUser)
|
||||
]
|
||||
```
|
||||
|
||||
## Run migrations
|
||||
Call [django migrate](https://docs.djangoproject.com/en/3.0/ref/django-admin/#django-admin-migrate)
|
||||
to apply created migration and create table in ClickHouse.
|
||||
|
||||
## Set up and run celery sync process
|
||||
Set up [celery worker](https://docs.celeryproject.org/en/latest/userguide/workers.html#starting-the-worker) for [CLICKHOUSE_CELERY_QUEUE](configuration.md#clickhouse_celery_queue) and [celerybeat](https://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#starting-the-scheduler).
|
||||
|
||||
## Test sync and write analytics queries
|
||||
1. Read [monitoring section](monitoring.md) in order to set up your monitoring system.
|
||||
2. Read [query section](queries.md) to understand how to query database.
|
||||
2. Create some data in source table with django.
|
||||
3. Check, if it is synced.
|
||||
|
||||
#### Example
|
||||
```python
|
||||
import time
|
||||
from my_app.models import User
|
||||
from my_app.clickhouse_models import ClickHouseUser
|
||||
|
||||
u = User.objects.create(first_name='Alice', birthday=datetime.date(1987, 1, 1), visits=1)
|
||||
|
||||
# Wait for celery task is executed at list once
|
||||
time.sleep(6)
|
||||
|
||||
assert ClickHouseUser.objects.filter(id=u.id).count() == 1, "Sync is not working"
|
||||
```
|
||||
|
||||
## Congratulations
|
||||
Tune your integration to achieve better performance if needed: [docs](performance.md).
|
3
docs/performance.md
Normal file
3
docs/performance.md
Normal file
|
@ -0,0 +1,3 @@
|
|||
# Sync performance
|
||||
|
||||
TODO
|
|
@ -1,4 +1,13 @@
|
|||
# Making queries
|
||||
|
||||
## Motivation
|
||||
ClickHouse SQL language is near to standard, but does not follow it exactly ([docs](https://clickhouse.tech/docs/en/introduction/distinctive_features/#sql-support)).
|
||||
It can not be easily integrated into django query subsystem as it expects databases to support standard SQL language features like transactions and INNER/OUTER JOINS by condition.
|
||||
|
||||
In order to fit it
|
||||
|
||||
|
||||
|
||||
Libraries query system extends [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/querysets.md).
|
||||
|
||||
TODO
|
||||
|
|
|
@ -15,9 +15,9 @@ Unlike traditional relational databases, [ClickHouse](https://clickhouse.yandex/
|
|||
3) To make system more extendable we need default routing, per model routing and router class for complex cases.
|
||||
|
||||
## Introduction
|
||||
All database connections are defined in [CLICKHOUSE_DATABASES](configuration.md#databases) setting.
|
||||
All database connections are defined in [CLICKHOUSE_DATABASES](configuration.md#clickhouse_databases) setting.
|
||||
Each connection has it's alias name to refer with.
|
||||
If no routing is configured, [CLICKHOUSE_DEFAULT_DB_ALIAS](configuration.md#default_db_alias) is used.
|
||||
If no routing is configured, [CLICKHOUSE_DEFAULT_DB_ALIAS](configuration.md#clickhouse_default_db_alias) is used.
|
||||
|
||||
## Router
|
||||
Router is a class, defining 3 methods:
|
||||
|
@ -29,7 +29,7 @@ Router is a class, defining 3 methods:
|
|||
Checks if migration `operation` should be applied in django application `app_label` on database `db_alias`.
|
||||
Optional `model` field can be used to determine migrations on concrete model.
|
||||
|
||||
By default [CLICKHOUSE_DATABASE_ROUTER](configuration.md#database_router) is used.
|
||||
By default [CLICKHOUSE_DATABASE_ROUTER](configuration.md#clickhouse_database_router) is used.
|
||||
It gets routing information from model fields, described below.
|
||||
|
||||
## ClickHouseModel routing attributes
|
||||
|
@ -54,7 +54,8 @@ class MyModel(ClickHouseModel):
|
|||
```
|
||||
|
||||
## Settings database in QuerySet
|
||||
Database can be set in each [QuerySet](# TODO) explicitly by using one of methods:
|
||||
<!--- TODO Add link --->
|
||||
Database can be set in each [QuerySet]() explicitly by using one of methods:
|
||||
* With [infi approach](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/querysets.md#querysets): `MyModel.objects_in(db_object).filter(id__in=[1,2,3]).count()`
|
||||
* With `using()` method: `MyModel.objects.filter(id__in=[1,2,3]).using(db_alias).count()`
|
||||
|
||||
|
|
|
@ -49,18 +49,18 @@ Each method of abstract `Storage` class takes `kwargs` parameters, which can be
|
|||
|
||||
* `post_sync_failed(import_key: str, exception: Exception, **kwargs) -> None:`
|
||||
Called if any exception has occurred during import process. It cleans storage after unsuccessful import.
|
||||
Note that if import process is hardly killed (with OOM, for instance) this method is not called.
|
||||
Note that if import process is hardly killed (with OOM killer, for instance) this method is not called.
|
||||
|
||||
* `flush() -> None`
|
||||
*Dangerous*. Drops all data, kept by storage. It is used for cleaning up between tests.
|
||||
|
||||
|
||||
## Predefined storages
|
||||
### <a name="redis_storage">RedisStorage</a>
|
||||
### RedisStorage
|
||||
This storage uses [Redis database](https://redis.io/) as intermediate storage.
|
||||
To communicate with Redis it uses [redis-py](https://redis-py.readthedocs.io/en/latest/) library.
|
||||
It is not required, but should be installed to use RedisStorage.
|
||||
In order to use RedisStorage you must also fill [CLICKHOUSE_REDIS_CONFIG](configuration.md#redis_config) parameter.
|
||||
In order to use RedisStorage you must also fill [CLICKHOUSE_REDIS_CONFIG](configuration.md#clickhouse_redis_config) parameter.
|
||||
|
||||
Stored operation contains:
|
||||
* Django database alias where original record can be found.
|
||||
|
|
|
@ -1 +1,3 @@
|
|||
# Synchronization
|
||||
|
||||
TODO
|
|
@ -188,7 +188,7 @@ class ClickHouseSyncModel(DjangoModel):
|
|||
|
||||
@receiver(post_save)
|
||||
def post_save(sender, instance, **kwargs):
|
||||
statsd.incr('clickhouse.sync.post_save'.format('post_save'), 1)
|
||||
statsd.incr('%s.sync.post_save' % config.STATSD_PREFIX, 1)
|
||||
if issubclass(sender, ClickHouseSyncModel):
|
||||
instance.post_save(kwargs.get('created', False), using=kwargs.get('using'))
|
||||
|
||||
|
|
Loading…
Reference in New Issue
Block a user