diff --git a/README.md b/README.md
index e3a58ff..2d550c9 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,2 @@
-# django-clickhouse
\ No newline at end of file
+# django-clickhouse
+Documentation is [here](docs/index.md)
\ No newline at end of file
diff --git a/docs/basic_information.md b/docs/basic_information.md
index 05ec48b..f0bac10 100644
--- a/docs/basic_information.md
+++ b/docs/basic_information.md
@@ -1,9 +1,9 @@
# Basic information
-## About
+## About
This project's goal is to build [Yandex ClickHouse](https://clickhouse.yandex/) database into [Django](https://www.djangoproject.com/) project.
It is based on [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm) library.
-## Features
+## Features
* Multiple ClickHouse database configuration in [settings.py](https://docs.djangoproject.com/en/2.1/ref/settings/)
* ORM to create and manage ClickHouse models.
* ClickHouse migration system.
@@ -11,26 +11,26 @@ It is based on [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhous
* Effective periodical synchronization of django models to ClickHouse without loosing data.
* Synchronization process monitoring.
-## Requirements
+## Requirements
* [Python 3](https://www.python.org/downloads/)
* [Django](https://docs.djangoproject.com/) 1.7+
* [Yandex ClickHouse](https://clickhouse.yandex/)
* [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm)
-* pytz
-* six
-* typing
-* psycopg2
-* celery
-* statsd
+* [pytz](https://pypi.org/project/pytz/)
+* [six](https://pypi.org/project/six/)
+* [typing](https://pypi.org/project/typing/)
+* [psycopg2](https://www.psycopg.org/)
+* [celery](http://www.celeryproject.org/)
+* [statsd](https://pypi.org/project/statsd/)
### Optional libraries
-* [redis-py](https://redis-py.readthedocs.io/en/latest/) for [RedisStorage](storages.md#redis_storage)
+* [redis-py](https://redis-py.readthedocs.io/en/latest/) for [RedisStorage](storages.md#redisstorage)
* [django-pg-returning](https://github.com/M1hacka/django-pg-returning)
for optimizing registering updates in [PostgreSQL](https://www.postgresql.org/)
* [django-pg-bulk-update](https://github.com/M1hacka/django-pg-bulk-update)
- for performing effective bulk update operation in [PostgreSQL](https://www.postgresql.org/)
+ for performing effective bulk update and create operations in [PostgreSQL](https://www.postgresql.org/)
-## Installation
+## Installation
Install via pip:
`pip install django-clickhouse` ([not released yet](https://github.com/carrotquest/django-clickhouse/issues/3))
or via setup.py:
diff --git a/docs/configuration.md b/docs/configuration.md
index 07b4b29..2737a18 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -3,19 +3,18 @@
Library configuration is made in settings.py. All parameters start with `CLICKHOUSE_` prefix.
Prefix can be changed using `CLICKHOUSE_SETTINGS_PREFIX` parameter.
-### CLICKHOUSE_SETTINGS_PREFIX
+### CLICKHOUSE_SETTINGS_PREFIX
Defaults to: `'CLICKHOUSE_'`
You can change `CLICKHOUSE_` prefix in settings using this parameter to anything your like.
-### CLICKHOUSE_DATABASES
+### CLICKHOUSE_DATABASES
Defaults to: `{}`
A dictionary, defining databases in django-like style.
Key is an alias to communicate with this database in [connections]() and [using]().
Value is a configuration dict with parameters:
* [infi.clickhouse_orm database parameters](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/class_reference.md#database)
-
-* `migrate: bool` - indicates if this database should be migrated. See [migrations]().
+* `migrate: bool` - indicates if this database should be migrated. See [migrations](migrations.md).
Example:
```python
@@ -24,22 +23,28 @@ CLICKHOUSE_DATABASES = {
'db_name': 'test',
'username': 'default',
'password': ''
- }
+ },
+ 'reader': {
+ 'db_name': 'read_only',
+ 'username': 'reader',
+ 'readonly': True,
+ 'password': ''
+ }
}
```
-### CLICKHOUSE_DEFAULT_DB_ALIAS
+### CLICKHOUSE_DEFAULT_DB_ALIAS
Defaults to: `'default'`
A database alias to use in [QuerySets]() if direct [using]() is not specified.
-### CLICKHOUSE_SYNC_STORAGE
+### CLICKHOUSE_SYNC_STORAGE
Defaults to: `'django_clickhouse.storages.RedisStorage'`
An intermediate storage class to use. Can be a string or class. [More info about storages](storages.md).
-### CLICKHOUSE_REDIS_CONFIG
+### CLICKHOUSE_REDIS_CONFIG
Default to: `None`
-Redis configuration for [RedisStorage](storages.md#redis_storage).
+Redis configuration for [RedisStorage](storages.md#redisstorage).
If given, should be a dictionary of parameters to pass to [redis-py](https://redis-py.readthedocs.io/en/latest/#redis.Redis).
Example:
@@ -52,45 +57,42 @@ CLICKHOUSE_REDIS_CONFIG = {
}
```
-### CLICKHOUSE_SYNC_BATCH_SIZE
+### CLICKHOUSE_SYNC_BATCH_SIZE
Defaults to: `10000`
Maximum number of operations, fetched by sync process from intermediate storage per sync round.
-### CLICKHOUSE_SYNC_DELAY
+### CLICKHOUSE_SYNC_DELAY
Defaults to: `5`
A delay in seconds between two sync rounds start.
-### CLICKHOUSE_MODELS_MODULE
+### CLICKHOUSE_MODELS_MODULE
Defaults to: `'clickhouse_models'`
-
-Module name inside [django app](https://docs.djangoproject.com/en/2.2/intro/tutorial01/),
-where [ClickHouseModel]() classes are search during migrations.
+Module name inside [django app](https://docs.djangoproject.com/en/3.0/intro/tutorial01/),
+where [ClickHouseModel](models.md#clickhousemodel) classes are search during migrations.
-### CLICKHOUSE_DATABASE_ROUTER
+### CLICKHOUSE_DATABASE_ROUTER
Defaults to: `'django_clickhouse.routers.DefaultRouter'`
-
-A dotted path to class, representing [database router]().
+A dotted path to class, representing [database router](routing.md#router).
-### CLICKHOUSE_MIGRATIONS_PACKAGE
+### CLICKHOUSE_MIGRATIONS_PACKAGE
Defaults to: `'clickhouse_migrations'`
-A python package name inside [django app](https://docs.djangoproject.com/en/2.2/intro/tutorial01/),
+A python package name inside [django app](https://docs.djangoproject.com/en/3.0/intro/tutorial01/),
where migration files are searched.
-### CLICKHOUSE_MIGRATION_HISTORY_MODEL
+### CLICKHOUSE_MIGRATION_HISTORY_MODEL
Defaults to: `'django_clickhouse.migrations.MigrationHistory'`
-
-A dotted name of a ClickHouseModel subclass (including module path), representing [MigrationHistory]() model.
+A dotted name of a ClickHouseModel subclass (including module path),
+ representing [MigrationHistory model](migrations.md#migrationhistory-clickhousemodel).
-### CLICKHOUSE_MIGRATE_WITH_DEFAULT_DB
+### CLICKHOUSE_MIGRATE_WITH_DEFAULT_DB
Defaults to: `True`
A boolean flag enabling automatic ClickHouse migration,
-when you call [`migrate`](https://docs.djangoproject.com/en/2.2/ref/django-admin/#django-admin-migrate) on default database.
+when you call [`migrate`](https://docs.djangoproject.com/en/2.2/ref/django-admin/#django-admin-migrate) on `default` database.
-### CLICKHOUSE_STATSD_PREFIX
+### CLICKHOUSE_STATSD_PREFIX
Defaults to: `clickhouse`
-
-A prefix in [statsd](https://pythonhosted.org/python-statsd/) added to each library metric. See [metrics]()
+A prefix in [statsd](https://pythonhosted.org/python-statsd/) added to each library metric. See [monitoring](monitoring.md).
-### CLICKHOUSE_CELERY_QUEUE
+### CLICKHOUSE_CELERY_QUEUE
Defaults to: `'celery'`
A name of a queue, used by celery to plan library sync tasks.
diff --git a/docs/index.md b/docs/index.md
index 8afcfce..e3dfdd2 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -5,7 +5,9 @@
* [Features](basic_information.md#features)
* [Requirements](basic_information.md#requirements)
* [Installation](basic_information.md#installation)
+ * [Design motivation](motivation.md)
* Usage
+ * [Overview](overview.md)
* [Models](models.md)
* [DjangoModel](models.md#DjangoModel)
* [ClickHouseModel](models.md#ClickHouseModel)
@@ -14,4 +16,6 @@
* [Migrations](migrations.md)
* [Synchronization](synchronization.md)
* [Storages](storages.md)
- * [RedisStorage](storages.md#redis_storage)
+ * [RedisStorage](storages.md#redisstorage)
+ * [Monitoring](monitoring.md)
+ * [Performance notes](performance.md)
diff --git a/docs/migrations.md b/docs/migrations.md
index dfe050d..f71d77b 100644
--- a/docs/migrations.md
+++ b/docs/migrations.md
@@ -5,7 +5,7 @@ but makes it a little bit more django-like.
## File structure
Each django app can have optional `clickhouse_migrations` package.
- This is a default package name, it can be changed with [CLICKHOUSE_MIGRATIONS_PACKAGE](configuration.md#migrations_package) setting.
+ This is a default package name, it can be changed with [CLICKHOUSE_MIGRATIONS_PACKAGE](configuration.md#clickhouse_migrations_package) setting.
Package contains py files, starting with 4-digit number.
A number gives an order in which migrations will be applied.
@@ -17,24 +17,27 @@ my_app
>>>> __init__.py
>>>> 0001_initial.py
>>>> 0002_add_new_field_to_my_model.py
+>> clickhouse_models.py
+>> urls.py
+>> views.py
```
## Migration files
Each file must contain a `Migration` class, inherited from `django_clickhouse.migrations.Migration`.
The class should define an `operations` attribute - a list of operations to apply one by one.
-Operation is one of operations, supported by [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/schema_migrations.md).
+Operation is one of [operations, supported by infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/schema_migrations.md).
```python
from django_clickhouse import migrations
+from my_app.clickhouse_models import ClickHouseUser
class Migration(migrations.Migration):
operations = [
- migrations.CreateTable(ClickHouseTestModel),
- migrations.CreateTable(ClickHouseCollapseTestModel)
+ migrations.CreateTable(ClickHouseUser)
]
```
-## MigrationHistory ClickHouse model
+## MigrationHistory ClickHouseModel
This model stores information about applied migrations.
By default, library uses `django_clickhouse.migrations.MigrationHistory` model,
but this can be changed using `CLICKHOUSE_MIGRATION_HISTORY_MODEL` setting.
@@ -45,27 +48,30 @@ MigrationHistory model is stored in default database.
## Automatic migrations
When library is installed, it tries applying migrations every time,
-you call `python manage.py migrate`. If you want to disable this, use [CLICKHOUSE_MIGRATE_WITH_DEFAULT_DB](configuration.md#migrate_with_default_db) settings.
+you call [django migrate](https://docs.djangoproject.com/en/3.0/ref/django-admin/#django-admin-migrate). If you want to disable this, use [CLICKHOUSE_MIGRATE_WITH_DEFAULT_DB](configuration.md#clickhouse_migrate_with_default_db) setting.
+
+By default migrations are applied to all [CLICKHOUSE_DATABASES](configuration.md#clickhouse_databases), which have no flags:
+* `'migrate': False`
+* `'readonly': True`
-Note: migrations are only applied, when `default` database is migrated.
+Note: migrations are only applied, with django `default` database.
So if you call `python manage.py migrate --database=secondary` they wouldn't be applied.
## Migration algorithm
-- Gets a list of databases from `CLICKHOUSE_DATABASES` settings. Migrate them one by one.
- - Find all django apps from `INSTALLED_APPS` settings, which have no `readonly=True` setting and have `migrate=True` settings.
- Migrate them one by one.
- * Iterate over `INSTAALLED_APPS`, searching for `clickhouse_migrations` package
+- Get a list of databases from `CLICKHOUSE_DATABASES` setting. Migrate them one by one.
+ - Find all django apps from `INSTALLED_APPS` setting, which have no `readonly=True` attribute and have `migrate=True` attribute. Migrate them one by one.
+ * Iterate over `INSTAALLED_APPS`, searching for [clickhouse_migrations package](#file-structure)
* If package was not found, skip app.
- * Get a list of migrations applied from `MigrationHistory` model
+ * Get a list of migrations applied from [MigrationHistory model](#migrationhistory-clickhousemodel)
* Get a list of unapplied migrations
- * Get `Migration` class from each migration and call it `apply()` method
- * `apply()` iterates operations, checking if it should be applied with [router](router.md)
+ * Get [Migration class](#migration-files) from each migration and call it `apply()` method
+ * `apply()` iterates operations, checking if it should be applied with [router](routing.md)
* If migration should be applied, it is applied
- * Mark migration as applied in `MigrationHistory` model
+ * Mark migration as applied in [MigrationHistory model](#migrationhistory-clickhousemodel)
## Security notes
1) ClickHouse has no transaction system, as django relational databases.
As a result, if migration fails, it would be partially applied and there's no correct way to rollback.
I recommend to make migrations as small as possible, so it should be easier to determine and correct the result if something goes wrong.
2) Unlike django, this library is enable to unapply migrations.
- This functionality may be implemented in the future.
\ No newline at end of file
+ This functionality may be implemented in the future.
diff --git a/docs/models.md b/docs/models.md
index fdbc105..86d6415 100644
--- a/docs/models.md
+++ b/docs/models.md
@@ -1,20 +1,20 @@
# Models
Model is a pythonic class representing database table in your code.
- It also defined an interface (methods) to perform operations on this table
+ It also defines an interface (methods) to perform operations on this table
and describes its configuration inside framework.
This library operates 2 kinds of models:
-* Django model, describing tables in source relational model
+* DjangoModel, describing tables in source relational database (PostgreSQL, MySQL, etc.)
* ClickHouseModel, describing models in [ClickHouse](https://clickhouse.yandex/docs/en) database
In order to distinguish them, I will refer them as ClickHouseModel and DjangoModel in further documentation.
## DjangoModel
-Django provides a [model system](https://docs.djangoproject.com/en/2.2/topics/db/models/)
+Django provides a [model system](https://docs.djangoproject.com/en/3.0/topics/db/models/)
to interact with relational databases.
- In order to perform [synchronization](synchronization.md) we need to "catch" all DML operations
- on source django model and save information about it in [storage](storages.md).
- To achieve this library introduces abstract `django_clickhouse.models.ClickHouseSyncModel` class.
+ In order to perform [synchronization](synchronization.md) we need to "catch" all [DML operations](https://en.wikipedia.org/wiki/Data_manipulation_language)
+ on source django model and save information about them in [storage](storages.md).
+ To achieve this, library introduces abstract `django_clickhouse.models.ClickHouseSyncModel` class.
Each model, inherited from `ClickHouseSyncModel` will automatically save information, needed to sync to storage.
Read [synchronization](synchronization.md) section for more info.
@@ -25,7 +25,7 @@ Read [synchronization](synchronization.md) section for more info.
* All queries of [django-pg-returning](https://pypi.org/project/django-pg-returning/) library
* All queries of [django-pg-bulk-update](https://pypi.org/project/django-pg-bulk-update/) library
-You can also combine your custom django manager and queryset using mixins from `django_clickhouse.models` package.
+You can also combine your custom django manager and queryset using mixins from `django_clickhouse.models` package:
**Important note**: Operations are saved in [transaction.on_commit()](https://docs.djangoproject.com/en/2.2/topics/db/transactions/#django.db.transaction.on_commit).
The goal is avoiding syncing operations, not committed to relational database.
@@ -44,9 +44,12 @@ class User(ClickHouseSyncModel):
birthday = models.DateField()
# All operations will be registered to sync with ClickHouse models:
-MyModel.objects.create(first_name='Alice', age=16, , birthday=date(2003, 6, 1))
-MyModel(first_name='Bob', age=17, birthday=date(2002, 1, 1)).save()
-MyModel.objects.update(first_name='Candy')
+User.objects.create(first_name='Alice', age=16, birthday=date(2003, 6, 1))
+User(first_name='Bob', age=17, birthday=date(2002, 1, 1)).save()
+User.objects.update(first_name='Candy')
+
+# Custom manager
+
```
## ClickHouseModel
@@ -56,10 +59,10 @@ This kind of model is based on [infi.clickhouse_orm Model](https://github.com/In
You should define `ClickHouseModel` subclass for each table you want to access and sync in ClickHouse.
Each model should be inherited from `django_clickhouse.clickhouse_models.ClickHouseModel`.
By default, models are searched in `clickhouse_models` module of each django app.
-You can change modules name, using stting [CLICKHOUSE_MODELS_MODULE](configuration.md#models_module)
+You can change modules name, using setting [CLICKHOUSE_MODELS_MODULE](configuration.md#clickhouse_models_module)
You can read more about creating models and fields [here](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/models_and_databases.md#defining-models):
- all capabilites are supported. At the same time, django-clickhouse libraries adds:
+all capabilities are supported. At the same time, django-clickhouse libraries adds:
* [routing attributes and methods](routing.md)
* [sync attributes and methods](synchronization.md)
@@ -68,6 +71,8 @@ Example:
from django_clickhouse.clickhouse_models import ClickHouseModel
from django_clickhouse.engines import MergeTree
from infi.clickhouse_orm import fields
+from my_app.models import User
+
class HeightData(ClickHouseModel):
django_model = User
@@ -84,7 +89,7 @@ class AgeData(ClickHouseModel):
first_name = fields.StringField()
birthday = fields.DateField()
- age = fields.IntegerField()
+ age = fields.UInt32Field()
engine = MergeTree('birthday', ('first_name', 'last_name', 'birthday'))
```
@@ -97,6 +102,7 @@ You can read more in [sync](synchronization.md) section.
Example:
```python
from django_clickhouse.clickhouse_models import ClickHouseMultiModel
+from my_app.models import User
class MyMultiModel(ClickHouseMultiModel):
django_model = User
@@ -104,7 +110,13 @@ class MyMultiModel(ClickHouseMultiModel):
```
## Engines
-Engine is a way of storing, indexing, replicating and sorting data in [ClickHouse](https://clickhouse.yandex/docs/en/operations/table_engines/).
-Engine system is based on [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/table_engines.md#table-engines).
-django-clickhouse extends original engine classes, as each engine can have it's own synchronization mechanics.
+Engine is a way of storing, indexing, replicating and sorting data ClickHouse ([docs](https://clickhouse.yandex/docs/en/operations/table_engines/)).
+Engine system is based on [infi.clickhouse_orm engine system](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/table_engines.md#table-engines).
+This library extends original engine classes as each engine can have it's own synchronization mechanics.
Engines are defined in `django_clickhouse.engines` module.
+
+Currently supported engines (with all infi functionality, [more info](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/table_engines.md#data-replication)):
+* `MergeTree`
+* `ReplacingMergeTree`
+* `SummingMergeTree`
+* `CollapsingMergeTree`
diff --git a/docs/monitoring.md b/docs/monitoring.md
new file mode 100644
index 0000000..fadac7a
--- /dev/null
+++ b/docs/monitoring.md
@@ -0,0 +1,56 @@
+# Monitoring
+In order to monitor [synchronization](synchronization.md) process, [statsd](https://pypi.org/project/statsd/) is used.
+Data from statsd then can be used by [Prometheus exporter](https://github.com/prometheus/statsd_exporter)
+ or [Graphite](https://graphite.readthedocs.io/en/latest/).
+
+## Configuration
+Library expects statsd to be configured as written in [statsd docs for django](https://statsd.readthedocs.io/en/latest/configure.html#in-django).
+You can set a common prefix for all keys in this library using [CLICKHOUSE_STATSD_PREFIX](configuration.md#clickhouse_statsd_prefix) parameter.
+
+## Exported metrics
+## Gauges
+* `.sync..queue`
+ Number of elements in [intermediate storage](storages.md) queue waiting for import.
+
+ Queue should not be big. It depends on [sync_delay]() configured and time for syncing single batch.
+ It is a good parameter to watch and alert on.
+
+## Timers
+All time is sent in milliseconds.
+
+* `.sync..total`
+ Total time of single batch task execution.
+
+* `.sync..steps.`
+ `` is one of `pre_sync`, `get_operations`, `get_sync_objects`, `get_insert_batch`, `get_final_versions`,
+ `insert`, `post_sync`. Read [here](synchronization.md) for more details.
+ Time of each sync step. Can be useful to debug reasons of long sync process.
+
+* `.inserted_tuples.`
+ Time of inserting batch of data into ClickHouse.
+ It excludes as much python code as it could to distinguish real INSERT time from python data preparation.
+
+* `.sync..register_operations`
+ Time of inserting sync operations into storage.
+
+## Counters
+ * `.sync..register_operations.`
+ `` is one or `create`, `update`, `delete`.
+ Number of DML operations added by DjangoModel methods calls to sync queue.
+
+* `.sync..operations`
+ Number of operations, fetched from [storage](storages.md) for sync in one batch.
+
+* `.sync..import_objects`
+ Number of objects, fetched from relational storage (based on operations) in order to sync with ClickHouse models.
+
+* `.inserted_tuples.`
+ Number of rows inserted to ClickHouse.
+
+* `.sync..lock.timeout`
+ Number of locks in [RedisStorage](storages.md#redisstorage), not acquired and skipped by timeout.
+ This value should be zero. If not, it means your model sync takes longer then sync task call interval.
+
+* `.sync..lock.hard_release`
+ Number of locks in [RedisStorage](storages.md#redisstorage), released hardly (as process which required a lock is dead).
+ This value should be zero. If not, it means your sync tasks are killed hardly during the sync process (by OutOfMemory killer, for instance).
diff --git a/docs/motivation.md b/docs/motivation.md
new file mode 100644
index 0000000..88c36c0
--- /dev/null
+++ b/docs/motivation.md
@@ -0,0 +1,35 @@
+# Design motivation
+## Separate from django database setting, QuerySet and migration system
+ClickHouse SQL and DML language is near to standard, but does not follow it exactly ([docs](https://clickhouse.tech/docs/en/introduction/distinctive_features/#sql-support)).
+As a result, it can not be easily integrated into django query subsystem as it expects databases to support:
+1. Transactions.
+2. INNER/OUTER JOINS by condition.
+3. Full featured updates and deletes.
+4. Per database replication (ClickHouse has per table replication)
+5. Other features, not supported in ClickHouse.
+
+In order to have more functionality, [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm)
+ is used as base library for databases, querysets and migrations. The most part of it is compatible and can be used without any changes.
+
+## Sync over intermediate storage
+This library has several goals which lead to intermediate storage:
+1. Fail resistant import, does not matter what the fail reason is:
+ ClickHouse fail, network fail, killing import process by system (OOM, for instance).
+2. ClickHouse does not like single row inserts: [docs](https://clickhouse.tech/docs/en/introduction/performance/#performance-when-inserting-data).
+ So it's worth batching data somewhere before inserting it.
+ ClickHouse provide BufferEngine for this, but it can loose data if ClickHouse fails - and no one will now about it.
+3. Better scalability. Different intermediate storages may be implemented in the future, based on databases, queue systems or even BufferEngine.
+
+## Replication and routing
+In primitive cases people just have single database or cluster with same tables on each replica.
+But as ClickHouse has per table replication a more complicated structure can be built:
+1. Model A is stored on servers 1 and 2
+2. Model B is stored on servers 2, 3 and 5
+3. Model C is stored on servers 1, 3 and 4
+
+Moreover, migration operations in ClickHouse can also be auto-replicated (`ALTER TABLE`, for instance) or not (`CREATE TABLE`).
+
+In order to make replication scheme scalable:
+1. Each model has it's own read / write / migrate [routing configuration](routing.md#clickhousemodel-routing-attributes).
+2. You can use [router](routing.md#router) like django does to set basic routing rules for all models or model groups.
+
\ No newline at end of file
diff --git a/docs/overview.md b/docs/overview.md
new file mode 100644
index 0000000..8ea3f3f
--- /dev/null
+++ b/docs/overview.md
@@ -0,0 +1,141 @@
+# Usage overview
+## Requirements
+At the begging I expect, that you already have:
+1. [ClickHouse](https://clickhouse.tech/docs/en/) (with [ZooKeeper](https://zookeeper.apache.org/), if you use replication)
+2. Relational database used with [Django](https://www.djangoproject.com/). For instance, [PostgreSQL](https://www.postgresql.org/)
+3. [Django database set up](https://docs.djangoproject.com/en/3.0/ref/databases/)
+4. [Intermediate storage](storages.md) set up. For instance, [Redis](https://redis.io/).
+
+## Configuration
+Add required parameters to [Django settings.py](https://docs.djangoproject.com/en/3.0/topics/settings/):
+1. [CLICKHOUSE_DATABASES](configuration.md#clickhouse_databases)
+2. [Intermediate storage](storages.md) configuration. For instance, [RedisStorage](storages.md#redisstorage)
+3. It's recommended to change [CLICKHOUSE_CELERY_QUEUE](configuration.md#clickhouse_celery_queue)
+4. Add sync task to [celerybeat schedule](http://docs.celeryproject.org/en/v2.3.3/userguide/periodic-tasks.html).
+ Note, that executing planner every 2 seconds doesn't mean sync is executed every 2 seconds.
+ Sync time depends on model sync_delay attribute value and [CLICKHOUSE_SYNC_DELAY](configuration.md#clickhouse_sync_delay) configuration parameter.
+ You can read more in [sync section](synchronization.md).
+
+You can also change other [configuration parameters](configuration.md) depending on your project.
+
+#### Example
+```python
+# django-clickhouse library setup
+CLICKHOUSE_DATABASES = {
+ # Connection name to refer in using(...) method
+ 'default': {
+ 'db_name': 'test',
+ 'username': 'default',
+ 'password': ''
+ }
+}
+CLICKHOUSE_REDIS_CONFIG = {
+ 'host': '127.0.0.1',
+ 'port': 6379,
+ 'db': 8,
+ 'socket_timeout': 10
+}
+CLICKHOUSE_CELERY_QUEUE = 'clickhouse'
+
+# If you have no any celerybeat tasks, define a new dictionary
+# More info: http://docs.celeryproject.org/en/v2.3.3/userguide/periodic-tasks.html
+from datetime import timedelta
+CELERYBEAT_SCHEDULE = {
+ 'clickhouse_auto_sync': {
+ 'task': 'django_clickhouse.tasks.clickhouse_auto_sync',
+ 'schedule': timedelta(seconds=2), # Every 2 seconds
+ 'options': {'expires': 1, 'queue': CLICKHOUSE_CELERY_QUEUE}
+ }
+}
+```
+
+## Adopting django model
+Read [ClickHouseSyncModel](models.md#djangomodel) section.
+Inherit all [django models](https://docs.djangoproject.com/en/3.0/topics/db/models/)
+ you want to sync with ClickHouse from `django_clickhouse.models.ClickHouseSyncModel` or sync mixins.
+
+```python
+from django_clickhouse.models import ClickHouseSyncModel
+from django.db import models
+
+class User(ClickHouseSyncModel):
+ first_name = models.CharField(max_length=50)
+ visits = models.IntegerField(default=0)
+ birthday = models.DateField()
+```
+
+## Create ClickHouseModel
+1. Read [ClickHouseModel section](models.md#clickhousemodel)
+2. Create `clickhouse_models.py` in your django app.
+3. Add `ClickHouseModel` class there:
+```python
+from django_clickhouse.clickhouse_models import ClickHouseModel
+from django_clickhouse.engines import MergeTree
+from infi.clickhouse_orm import fields
+from my_app.models import User
+
+class ClickHouseUser(ClickHouseModel):
+ django_model = User
+ sync_delay = 5
+
+ id = fields.UInt32Field()
+ first_name = fields.StringField()
+ birthday = fields.DateField()
+ visits = fields.UInt32Field(default=0)
+
+ engine = MergeTree('birthday', ('birthday',))
+```
+
+## Migration to create table in ClickHouse
+1. Read [migrations](migrations.md) section
+2. Create `clickhouse_migrations` package in your django app
+3. Create `0001_initial.py` file inside the created package. Result structure should be:
+ ```
+ my_app
+ >> clickhouse_migrations
+ >>>> __init__.py
+ >>>> 0001_initial.py
+ >> clickhouse_models.py
+ >> models.py
+ ```
+
+4. Add content to file `0001_initial.py`:
+ ```python
+ from django_clickhouse import migrations
+ from my_app.cilckhouse_models import ClickHouseUser
+
+ class Migration(migrations.Migration):
+ operations = [
+ migrations.CreateTable(ClickHouseUser)
+ ]
+ ```
+
+## Run migrations
+Call [django migrate](https://docs.djangoproject.com/en/3.0/ref/django-admin/#django-admin-migrate)
+ to apply created migration and create table in ClickHouse.
+
+## Set up and run celery sync process
+Set up [celery worker](https://docs.celeryproject.org/en/latest/userguide/workers.html#starting-the-worker) for [CLICKHOUSE_CELERY_QUEUE](configuration.md#clickhouse_celery_queue) and [celerybeat](https://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#starting-the-scheduler).
+
+## Test sync and write analytics queries
+1. Read [monitoring section](monitoring.md) in order to set up your monitoring system.
+2. Read [query section](queries.md) to understand how to query database.
+2. Create some data in source table with django.
+3. Check, if it is synced.
+
+#### Example
+```python
+import time
+from my_app.models import User
+from my_app.clickhouse_models import ClickHouseUser
+
+u = User.objects.create(first_name='Alice', birthday=datetime.date(1987, 1, 1), visits=1)
+
+# Wait for celery task is executed at list once
+time.sleep(6)
+
+assert ClickHouseUser.objects.filter(id=u.id).count() == 1, "Sync is not working"
+```
+
+## Congratulations
+Tune your integration to achieve better performance if needed: [docs](performance.md).
diff --git a/docs/performance.md b/docs/performance.md
new file mode 100644
index 0000000..ba33ada
--- /dev/null
+++ b/docs/performance.md
@@ -0,0 +1,3 @@
+# Sync performance
+
+TODO
\ No newline at end of file
diff --git a/docs/queries.md b/docs/queries.md
index d0178a8..2d66add 100644
--- a/docs/queries.md
+++ b/docs/queries.md
@@ -1,4 +1,13 @@
# Making queries
+
+## Motivation
+ClickHouse SQL language is near to standard, but does not follow it exactly ([docs](https://clickhouse.tech/docs/en/introduction/distinctive_features/#sql-support)).
+It can not be easily integrated into django query subsystem as it expects databases to support standard SQL language features like transactions and INNER/OUTER JOINS by condition.
+
+In order to fit it
+
+
+
Libraries query system extends [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/querysets.md).
TODO
diff --git a/docs/routing.md b/docs/routing.md
index 69d399d..c1b7b9c 100644
--- a/docs/routing.md
+++ b/docs/routing.md
@@ -15,9 +15,9 @@ Unlike traditional relational databases, [ClickHouse](https://clickhouse.yandex/
3) To make system more extendable we need default routing, per model routing and router class for complex cases.
## Introduction
-All database connections are defined in [CLICKHOUSE_DATABASES](configuration.md#databases) setting.
+All database connections are defined in [CLICKHOUSE_DATABASES](configuration.md#clickhouse_databases) setting.
Each connection has it's alias name to refer with.
- If no routing is configured, [CLICKHOUSE_DEFAULT_DB_ALIAS](configuration.md#default_db_alias) is used.
+ If no routing is configured, [CLICKHOUSE_DEFAULT_DB_ALIAS](configuration.md#clickhouse_default_db_alias) is used.
## Router
Router is a class, defining 3 methods:
@@ -29,7 +29,7 @@ Router is a class, defining 3 methods:
Checks if migration `operation` should be applied in django application `app_label` on database `db_alias`.
Optional `model` field can be used to determine migrations on concrete model.
-By default [CLICKHOUSE_DATABASE_ROUTER](configuration.md#database_router) is used.
+By default [CLICKHOUSE_DATABASE_ROUTER](configuration.md#clickhouse_database_router) is used.
It gets routing information from model fields, described below.
## ClickHouseModel routing attributes
@@ -54,7 +54,8 @@ class MyModel(ClickHouseModel):
```
## Settings database in QuerySet
-Database can be set in each [QuerySet](# TODO) explicitly by using one of methods:
+
+Database can be set in each [QuerySet]() explicitly by using one of methods:
* With [infi approach](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/querysets.md#querysets): `MyModel.objects_in(db_object).filter(id__in=[1,2,3]).count()`
* With `using()` method: `MyModel.objects.filter(id__in=[1,2,3]).using(db_alias).count()`
diff --git a/docs/storages.md b/docs/storages.md
index d7e429b..e8e1f1d 100644
--- a/docs/storages.md
+++ b/docs/storages.md
@@ -49,18 +49,18 @@ Each method of abstract `Storage` class takes `kwargs` parameters, which can be
* `post_sync_failed(import_key: str, exception: Exception, **kwargs) -> None:`
Called if any exception has occurred during import process. It cleans storage after unsuccessful import.
- Note that if import process is hardly killed (with OOM, for instance) this method is not called.
+ Note that if import process is hardly killed (with OOM killer, for instance) this method is not called.
* `flush() -> None`
*Dangerous*. Drops all data, kept by storage. It is used for cleaning up between tests.
## Predefined storages
-### RedisStorage
+### RedisStorage
This storage uses [Redis database](https://redis.io/) as intermediate storage.
To communicate with Redis it uses [redis-py](https://redis-py.readthedocs.io/en/latest/) library.
It is not required, but should be installed to use RedisStorage.
-In order to use RedisStorage you must also fill [CLICKHOUSE_REDIS_CONFIG](configuration.md#redis_config) parameter.
+In order to use RedisStorage you must also fill [CLICKHOUSE_REDIS_CONFIG](configuration.md#clickhouse_redis_config) parameter.
Stored operation contains:
* Django database alias where original record can be found.
diff --git a/docs/synchronization.md b/docs/synchronization.md
index ef0d0ee..3acc749 100644
--- a/docs/synchronization.md
+++ b/docs/synchronization.md
@@ -1 +1,3 @@
# Synchronization
+
+TODO
\ No newline at end of file
diff --git a/src/django_clickhouse/models.py b/src/django_clickhouse/models.py
index f430275..44c0f34 100644
--- a/src/django_clickhouse/models.py
+++ b/src/django_clickhouse/models.py
@@ -188,7 +188,7 @@ class ClickHouseSyncModel(DjangoModel):
@receiver(post_save)
def post_save(sender, instance, **kwargs):
- statsd.incr('clickhouse.sync.post_save'.format('post_save'), 1)
+ statsd.incr('%s.sync.post_save' % config.STATSD_PREFIX, 1)
if issubclass(sender, ClickHouseSyncModel):
instance.post_save(kwargs.get('created', False), using=kwargs.get('using'))