# Usage overview ## Requirements At the begging I expect, that you already have: 1. [ClickHouse](https://clickhouse.tech/docs/en/) (with [ZooKeeper](https://zookeeper.apache.org/), if you use replication) 2. Relational database used with [Django](https://www.djangoproject.com/). For instance, [PostgreSQL](https://www.postgresql.org/) 3. [Django database set up](https://docs.djangoproject.com/en/3.0/ref/databases/) 4. [Intermediate storage](storages.md) set up. For instance, [Redis](https://redis.io/). ## Configuration Add required parameters to [Django settings.py](https://docs.djangoproject.com/en/3.0/topics/settings/): 1. Add `'django_clickhouse'` to `INSTALLED_APPS` 2. [CLICKHOUSE_DATABASES](configuration.md#clickhouse_databases) 3. [Intermediate storage](storages.md) configuration. For instance, [RedisStorage](storages.md#redisstorage) 4. It's recommended to change [CLICKHOUSE_CELERY_QUEUE](configuration.md#clickhouse_celery_queue) 5. Add sync task to [celerybeat schedule](http://docs.celeryproject.org/en/v2.3.3/userguide/periodic-tasks.html). Note, that executing planner every 2 seconds doesn't mean sync is executed every 2 seconds. Sync time depends on model sync_delay attribute value and [CLICKHOUSE_SYNC_DELAY](configuration.md#clickhouse_sync_delay) configuration parameter. You can read more in [sync section](synchronization.md). You can also change other [configuration parameters](configuration.md) depending on your project. #### Example * if you already have a celery work flow: ```python # django-clickhouse library setup CLICKHOUSE_DATABASES = { # Connection name to refer in using(...) method 'default': { 'db_name': 'test', 'username': 'default', 'password': '' } } CLICKHOUSE_REDIS_CONFIG = { 'host': '127.0.0.1', 'port': 6379, 'db': 8, 'socket_timeout': 10 } CLICKHOUSE_CELERY_QUEUE = 'clickhouse' # If you have no any celerybeat tasks, define a new dictionary # More info: http://docs.celeryproject.org/en/v2.3.3/userguide/periodic-tasks.html from datetime import timedelta CELERYBEAT_SCHEDULE = { 'clickhouse_auto_sync': { 'task': 'django_clickhouse.tasks.clickhouse_auto_sync', 'schedule': timedelta(seconds=2), # Every 2 seconds 'options': {'expires': 1, 'queue': CLICKHOUSE_CELERY_QUEUE} } } ``` * if you don't have a celery workflow: create a `celery.py` file in `mysite/mysite/celery.py`: ```python import os from datetime import timedelta from celery import Celery # set the default Django settings module for the 'celery' program. os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mysite.settings") app = Celery("podafarini") # Using a string here means the worker doesn't have to serialize # the configuration object to child processes. # - namespace='CELERY' means all celery-related configuration keys # should have a `CELERY_` prefix. app.config_from_object("django.conf:settings", namespace="CELERY") # Load task modules from all registered Django app configs. app.autodiscover_tasks() app.conf.beat_schedule = { "clickhouse_auto_sync": { "task": "django_clickhouse.tasks.clickhouse_auto_sync", "schedule": timedelta(seconds=2), # Every 2 seconds "options": {"expires": 1, "queue": "clickhouse"}, } } ``` `mysite/mysite/__init__.py`: ```python # This will make sure the app is always imported when # Django starts so that shared_task will use this app. from .celery import app as celery_app __all__ = ("celery_app",) ``` ## Adopting django model Read [ClickHouseSyncModel](models.md#djangomodel) section. Inherit all [django models](https://docs.djangoproject.com/en/3.0/topics/db/models/) you want to sync with ClickHouse from `django_clickhouse.models.ClickHouseSyncModel` or sync mixins. ```python from django_clickhouse.models import ClickHouseSyncModel from django.db import models class User(ClickHouseSyncModel): first_name = models.CharField(max_length=50) visits = models.IntegerField(default=0) birthday = models.DateField() ``` ## Create ClickHouseModel 1. Read [ClickHouseModel section](models.md#clickhousemodel) 2. Create `clickhouse_models.py` in your django app. 3. Add `ClickHouseModel` class there: ```python from django_clickhouse.clickhouse_models import ClickHouseModel from django_clickhouse.engines import MergeTree from infi.clickhouse_orm import fields from my_app.models import User class ClickHouseUser(ClickHouseModel): django_model = User # Uncomment the line below if you want your models to sync between databases # need_sync = True id = fields.UInt32Field() first_name = fields.StringField() birthday = fields.DateField() visits = fields.UInt32Field(default=0) engine = MergeTree('birthday', ('birthday',)) ``` ## Migration to create table in ClickHouse 1. Read [migrations](migrations.md) section 2. Create `clickhouse_migrations` package in your django app 3. Create `0001_initial.py` file inside the created package. Result structure should be: ``` my_app | clickhouse_migrations |-- __init__.py |-- 0001_initial.py | clickhouse_models.py | models.py ``` 4. Add content to file `0001_initial.py`: ```python from django_clickhouse import migrations from my_app.cilckhouse_models import ClickHouseUser class Migration(migrations.Migration): operations = [ migrations.CreateTable(ClickHouseUser) ] ``` ## Run migrations Call [django migrate](https://docs.djangoproject.com/en/3.0/ref/django-admin/#django-admin-migrate) to apply created migration and create table in ClickHouse. ## Set up and run celery sync process Set up [celery worker](https://docs.celeryproject.org/en/latest/userguide/workers.html#starting-the-worker) for [CLICKHOUSE_CELERY_QUEUE](configuration.md#clickhouse_celery_queue) and [celerybeat](https://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#starting-the-scheduler). ## Test sync and write analytics queries 1. Read [monitoring section](monitoring.md) in order to set up your monitoring system. 2. Read [query section](queries.md) to understand how to query database. 2. Create some data in source table with django. 3. Check, if it is synced. #### Example ```python import time from my_app.models import User from my_app.clickhouse_models import ClickHouseUser u = User.objects.create(first_name='Alice', birthday=datetime.date(1987, 1, 1), visits=1) # Wait for celery task is executed at list once time.sleep(6) assert ClickHouseUser.objects.filter(id=u.id).count() == 1, "Sync is not working" ``` ## Congratulations Tune your integration to achieve better performance if needed: [docs](performance.md).