mirror of
https://github.com/carrotquest/django-clickhouse.git
synced 2024-11-14 05:16:34 +03:00
5.3 KiB
5.3 KiB
Usage overview
Requirements
At the begging I expect, that you already have:
- ClickHouse (with ZooKeeper, if you use replication)
- Relational database used with Django. For instance, PostgreSQL
- Django database set up
- Intermediate storage set up. For instance, Redis.
Configuration
Add required parameters to Django settings.py:
- CLICKHOUSE_DATABASES
- Intermediate storage configuration. For instance, RedisStorage
- It's recommended to change CLICKHOUSE_CELERY_QUEUE
- Add sync task to celerybeat schedule.
Note, that executing planner every 2 seconds doesn't mean sync is executed every 2 seconds. Sync time depends on model sync_delay attribute value and CLICKHOUSE_SYNC_DELAY configuration parameter. You can read more in sync section.
You can also change other configuration parameters depending on your project.
Example
# django-clickhouse library setup
CLICKHOUSE_DATABASES = {
# Connection name to refer in using(...) method
'default': {
'db_name': 'test',
'username': 'default',
'password': ''
}
}
CLICKHOUSE_REDIS_CONFIG = {
'host': '127.0.0.1',
'port': 6379,
'db': 8,
'socket_timeout': 10
}
CLICKHOUSE_CELERY_QUEUE = 'clickhouse'
# If you have no any celerybeat tasks, define a new dictionary
# More info: http://docs.celeryproject.org/en/v2.3.3/userguide/periodic-tasks.html
from datetime import timedelta
CELERYBEAT_SCHEDULE = {
'clickhouse_auto_sync': {
'task': 'django_clickhouse.tasks.clickhouse_auto_sync',
'schedule': timedelta(seconds=2), # Every 2 seconds
'options': {'expires': 1, 'queue': CLICKHOUSE_CELERY_QUEUE}
}
}
Adopting django model
Read ClickHouseSyncModel section.
Inherit all django models
you want to sync with ClickHouse from django_clickhouse.models.ClickHouseSyncModel
or sync mixins.
from django_clickhouse.models import ClickHouseSyncModel
from django.db import models
class User(ClickHouseSyncModel):
first_name = models.CharField(max_length=50)
visits = models.IntegerField(default=0)
birthday = models.DateField()
Create ClickHouseModel
- Read ClickHouseModel section
- Create
clickhouse_models.py
in your django app. - Add
ClickHouseModel
class there:
from django_clickhouse.clickhouse_models import ClickHouseModel
from django_clickhouse.engines import MergeTree
from infi.clickhouse_orm import fields
from my_app.models import User
class ClickHouseUser(ClickHouseModel):
django_model = User
sync_delay = 5
id = fields.UInt32Field()
first_name = fields.StringField()
birthday = fields.DateField()
visits = fields.UInt32Field(default=0)
engine = MergeTree('birthday', ('birthday',))
Migration to create table in ClickHouse
-
Read migrations section
-
Create
clickhouse_migrations
package in your django app -
Create
0001_initial.py
file inside the created package. Result structure should be:my_app >> clickhouse_migrations >>>> __init__.py >>>> 0001_initial.py >> clickhouse_models.py >> models.py
-
Add content to file
0001_initial.py
:from django_clickhouse import migrations from my_app.cilckhouse_models import ClickHouseUser class Migration(migrations.Migration): operations = [ migrations.CreateTable(ClickHouseUser) ]
Run migrations
Call django migrate to apply created migration and create table in ClickHouse.
Set up and run celery sync process
Set up celery worker for CLICKHOUSE_CELERY_QUEUE and celerybeat.
Test sync and write analytics queries
- Read monitoring section in order to set up your monitoring system.
- Read query section to understand how to query database.
- Create some data in source table with django.
- Check, if it is synced.
Example
import time
from my_app.models import User
from my_app.clickhouse_models import ClickHouseUser
u = User.objects.create(first_name='Alice', birthday=datetime.date(1987, 1, 1), visits=1)
# Wait for celery task is executed at list once
time.sleep(6)
assert ClickHouseUser.objects.filter(id=u.id).count() == 1, "Sync is not working"
Congratulations
Tune your integration to achieve better performance if needed: docs.