django-clickhouse/docs/overview.md
Mohammadreza Varasteh 41060440d6 ADD some info to docs
2021-03-14 11:05:53 +03:30

6.8 KiB

Usage overview

Requirements

At the begging I expect, that you already have:

  1. ClickHouse (with ZooKeeper, if you use replication)
  2. Relational database used with Django. For instance, PostgreSQL
  3. Django database set up
  4. Intermediate storage set up. For instance, Redis.

Configuration

Add required parameters to Django settings.py:

  1. add django_clickhouse to INSTALLED_APPS
  2. CLICKHOUSE_DATABASES
  3. Intermediate storage configuration. For instance, RedisStorage
  4. It's recommended to change CLICKHOUSE_CELERY_QUEUE
  5. Add sync task to celerybeat schedule.
    Note, that executing planner every 2 seconds doesn't mean sync is executed every 2 seconds. Sync time depends on model sync_delay attribute value and CLICKHOUSE_SYNC_DELAY configuration parameter. You can read more in sync section.

You can also change other configuration parameters depending on your project.

Example

  • if you already have a celery work flow:

    # django-clickhouse library setup
    CLICKHOUSE_DATABASES = {
        # Connection name to refer in using(...) method 
        'default': {
            'db_name': 'test',
            'username': 'default',
            'password': ''
        }
    }
    CLICKHOUSE_REDIS_CONFIG = {
        'host': '127.0.0.1',
        'port': 6379,
        'db': 8,
        'socket_timeout': 10
    }
    CLICKHOUSE_CELERY_QUEUE = 'clickhouse'
    
    # If you have no any celerybeat tasks, define a new dictionary
    # More info: http://docs.celeryproject.org/en/v2.3.3/userguide/periodic-tasks.html
    from datetime import timedelta
    CELERYBEAT_SCHEDULE = {
        'clickhouse_auto_sync': {
            'task': 'django_clickhouse.tasks.clickhouse_auto_sync',
            'schedule': timedelta(seconds=2),  # Every 2 seconds
            'options': {'expires': 1, 'queue': CLICKHOUSE_CELERY_QUEUE}
        }
    }
    
  • if you don't have a celery workflow:

    create a celery.py file in mysite/mysite/celery.py:

    import os
    from datetime import timedelta
    from celery import Celery
    
    # set the default Django settings module for the 'celery' program.
    os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mysite.settings")
    
    app = Celery("podafarini")
    
    # Using a string here means the worker doesn't have to serialize
    # the configuration object to child processes.
    # - namespace='CELERY' means all celery-related configuration keys
    #   should have a `CELERY_` prefix.
    app.config_from_object("django.conf:settings", namespace="CELERY")
    # Load task modules from all registered Django app configs.
    app.autodiscover_tasks()
    app.conf.beat_schedule = {
        "clickhouse_auto_sync": {
            "task": "django_clickhouse.tasks.clickhouse_auto_sync",
            "schedule": timedelta(seconds=2),  # Every 2 seconds
            "options": {"expires": 1, "queue": "clickhouse"},
        }
    }
    

    mysite/mysite/__init__.py:

    # This will make sure the app is always imported when
    # Django starts so that shared_task will use this app.
    from .celery import app as celery_app
    
    __all__ = ("celery_app",)
    
    

Adopting django model

Read ClickHouseSyncModel section. Inherit all django models you want to sync with ClickHouse from django_clickhouse.models.ClickHouseSyncModel or sync mixins.

from django_clickhouse.models import ClickHouseSyncModel
from django.db import models

class User(ClickHouseSyncModel):
    first_name = models.CharField(max_length=50)
    visits = models.IntegerField(default=0)
    birthday = models.DateField()

Create ClickHouseModel

  1. Read ClickHouseModel section
  2. Create clickhouse_models.py in your django app.
  3. Add ClickHouseModel class there:
from django_clickhouse.clickhouse_models import ClickHouseModel
from django_clickhouse.engines import MergeTree
from infi.clickhouse_orm import fields
from my_app.models import User

class ClickHouseUser(ClickHouseModel):
    django_model = User
    # Uncomment the line below if you want your models to sync between databases
    # need_sync = True
    id = fields.UInt32Field()
    first_name = fields.StringField()
    birthday = fields.DateField()
    visits = fields.UInt32Field(default=0)

    engine = MergeTree('birthday', ('birthday',))

Migration to create table in ClickHouse

  1. Read migrations section

  2. Create clickhouse_migrations package in your django app

  3. Create 0001_initial.py file inside the created package. Result structure should be:

    my_app
    | clickhouse_migrations
    |-- __init__.py
    |-- 0001_initial.py
    | clickhouse_models.py
    | models.py
    
  4. Add content to file 0001_initial.py:

    from django_clickhouse import migrations
    from my_app.cilckhouse_models import ClickHouseUser
    
    class Migration(migrations.Migration):
        operations = [
            migrations.CreateTable(ClickHouseUser)
        ]
    

Run migrations

Call django migrate to apply created migration and create table in ClickHouse.

Set up and run celery sync process

Set up celery worker for CLICKHOUSE_CELERY_QUEUE and celerybeat.

Test sync and write analytics queries

  1. Read monitoring section in order to set up your monitoring system.
  2. Read query section to understand how to query database.
  3. Create some data in source table with django.
  4. Check, if it is synced.

Example

import time
from my_app.models import User
from my_app.clickhouse_models import ClickHouseUser

u = User.objects.create(first_name='Alice', birthday=datetime.date(1987, 1, 1), visits=1)

# Wait for celery task is executed at list once
time.sleep(6)

assert ClickHouseUser.objects.filter(id=u.id).count() == 1, "Sync is not working"

Congratulations

Tune your integration to achieve better performance if needed: docs.