django-clickhouse/docs/overview.md
2021-03-15 13:25:11 +05:00

5.6 KiB

Usage overview

Requirements

At the begging I expect, that you already have:

  1. ClickHouse (with ZooKeeper, if you use replication)
  2. Relational database used with Django. For instance, PostgreSQL
  3. Django database set up
  4. Intermediate storage set up. For instance, Redis
  5. Celery set up in order to sync data automatically.

Configuration

Add required parameters to Django settings.py:

  1. Add 'django_clickhouse' to INSTALLED_APPS
  2. CLICKHOUSE_DATABASES
  3. Intermediate storage configuration. For instance, RedisStorage
  4. It's recommended to change CLICKHOUSE_CELERY_QUEUE
  5. Add sync task to celerybeat schedule.
    Note, that executing planner every 2 seconds doesn't mean sync is executed every 2 seconds. Sync time depends on model sync_delay attribute value and CLICKHOUSE_SYNC_DELAY configuration parameter. You can read more in sync section.

You can also change other configuration parameters depending on your project.

Example

INSTALLED_APPS = (
    # Your apps may go here
    'django_clickhouse',
    # Your apps may go here
)

# django-clickhouse library setup
CLICKHOUSE_DATABASES = {
    # Connection name to refer in using(...) method 
    'default': {
        'db_name': 'test',
        'username': 'default',
        'password': ''
    }
}
CLICKHOUSE_REDIS_CONFIG = {
    'host': '127.0.0.1',
    'port': 6379,
    'db': 8,
    'socket_timeout': 10
}
CLICKHOUSE_CELERY_QUEUE = 'clickhouse'

# If you have no any celerybeat tasks, define a new dictionary
# More info: http://docs.celeryproject.org/en/v2.3.3/userguide/periodic-tasks.html
from datetime import timedelta
CELERYBEAT_SCHEDULE = {
    'clickhouse_auto_sync': {
        'task': 'django_clickhouse.tasks.clickhouse_auto_sync',
        'schedule': timedelta(seconds=2),  # Every 2 seconds
        'options': {'expires': 1, 'queue': CLICKHOUSE_CELERY_QUEUE}
    }
}

Adopting django model

Read ClickHouseSyncModel section. Inherit all django models you want to sync with ClickHouse from django_clickhouse.models.ClickHouseSyncModel or sync mixins.

from django_clickhouse.models import ClickHouseSyncModel
from django.db import models

class User(ClickHouseSyncModel):
    first_name = models.CharField(max_length=50)
    visits = models.IntegerField(default=0)
    birthday = models.DateField()

Create ClickHouseModel

  1. Read ClickHouseModel section
  2. Create clickhouse_models.py in your django app.
  3. Add ClickHouseModel class there:
from django_clickhouse.clickhouse_models import ClickHouseModel
from django_clickhouse.engines import MergeTree
from infi.clickhouse_orm import fields
from my_app.models import User

class ClickHouseUser(ClickHouseModel):
    django_model = User
    
    # Uncomment the line below if you want your models to be synced automatically
    # sync_enabled = True
    
    id = fields.UInt32Field()
    first_name = fields.StringField()
    birthday = fields.DateField()
    visits = fields.UInt32Field(default=0)

    engine = MergeTree('birthday', ('birthday',))

Migration to create table in ClickHouse

  1. Read migrations section

  2. Create clickhouse_migrations package in your django app

  3. Create 0001_initial.py file inside the created package. Result structure should be:

    my_app
    | clickhouse_migrations
    |-- __init__.py
    |-- 0001_initial.py
    | clickhouse_models.py
    | models.py
    
  4. Add content to file 0001_initial.py:

    from django_clickhouse import migrations
    from my_app.cilckhouse_models import ClickHouseUser
    
    class Migration(migrations.Migration):
        operations = [
            migrations.CreateTable(ClickHouseUser)
        ]
    

Run migrations

Call django migrate to apply created migration and create table in ClickHouse.

Set up and run celery sync process

Set up celery worker for CLICKHOUSE_CELERY_QUEUE and celerybeat.

Test sync and write analytics queries

  1. Read monitoring section in order to set up your monitoring system.
  2. Read query section to understand how to query database.
  3. Create some data in source table with django.
  4. Check, if it is synced.

Example

import time
from my_app.models import User
from my_app.clickhouse_models import ClickHouseUser

u = User.objects.create(first_name='Alice', birthday=datetime.date(1987, 1, 1), visits=1)

# Wait for celery task is executed at list once
time.sleep(6)

assert ClickHouseUser.objects.filter(id=u.id).count() == 1, "Sync is not working"

Congratulations

Tune your integration to achieve better performance if needed: docs.