mirror of
https://github.com/carrotquest/django-clickhouse.git
synced 2024-11-22 09:06:43 +03:00
1) Started writing the docs
2)
This commit is contained in:
parent
10fae9220c
commit
c2beabbc07
35
docs/basic_information.md
Normal file
35
docs/basic_information.md
Normal file
|
@ -0,0 +1,35 @@
|
||||||
|
# Basic information
|
||||||
|
## <a name="about">About</a>
|
||||||
|
This project's goal is to build [Yandex ClickHouse](https://clickhouse.yandex/) database into [Django](https://www.djangoproject.com/) project.
|
||||||
|
It is based on [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm) library.
|
||||||
|
|
||||||
|
## <a name="features">Features</a>
|
||||||
|
* Multiple ClickHouse database configuration in [settings.py](https://docs.djangoproject.com/en/2.1/ref/settings/)
|
||||||
|
* ORM to create and manage ClickHouse models.
|
||||||
|
* ClickHouse migration system.
|
||||||
|
* Scalable serialization of django model instances to ORM model instances.
|
||||||
|
* Effective periodical synchronization of django models to ClickHouse without loosing data.
|
||||||
|
* Synchronization process monitoring.
|
||||||
|
|
||||||
|
## <a name="requirements">Requirements</a>
|
||||||
|
* [Python 3](https://www.python.org/downloads/)
|
||||||
|
* [Django](https://docs.djangoproject.com/) 1.7+
|
||||||
|
* [Yandex ClickHouse](https://clickhouse.yandex/)
|
||||||
|
* [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm)
|
||||||
|
* pytz
|
||||||
|
* six
|
||||||
|
* typing
|
||||||
|
* psycopg2
|
||||||
|
* celery
|
||||||
|
* statsd
|
||||||
|
|
||||||
|
### Optional libraries
|
||||||
|
* [redis-py](https://redis-py.readthedocs.io/en/latest/) for [RedisStorage](storages.md#redis_storage)
|
||||||
|
* [django-pg-returning](https://travis-ci.com/M1hacka/django-pg-returning)
|
||||||
|
for optimizing registering updates in [PostgreSQL](https://www.postgresql.org/)
|
||||||
|
|
||||||
|
## <a name="installation">Installation</a>
|
||||||
|
Install via pip:
|
||||||
|
`pip install django-clickhouse` ([not released yet](https://github.com/carrotquest/django-clickhouse/issues/3))
|
||||||
|
or via setup.py:
|
||||||
|
`python setup.py install`
|
96
docs/configuration.md
Normal file
96
docs/configuration.md
Normal file
|
@ -0,0 +1,96 @@
|
||||||
|
# Configuration
|
||||||
|
|
||||||
|
Library configuration is made in settings.py. All parameters start with `CLICKHOUSE_` prefix.
|
||||||
|
Prefix can be changed using `CLICKHOUSE_SETTINGS_PREFIX` parameter.
|
||||||
|
|
||||||
|
### <a name="databases">CLICKHOUSE_SETTINGS_PREFIX</a>
|
||||||
|
Defaults to: `'CLICKHOUSE_'`
|
||||||
|
You can change `CLICKHOUSE_` prefix in settings using this parameter to anything your like.
|
||||||
|
|
||||||
|
### <a name="databases">CLICKHOUSE_DATABASES</a>
|
||||||
|
Defaults to: `{}`
|
||||||
|
A dictionary, defining databases in django-like style.
|
||||||
|
<!--- TODO Add link --->
|
||||||
|
Key is an alias to communicate with this database in [connections]() and [using]().
|
||||||
|
Value is a configuration dict with parameters:
|
||||||
|
* [infi.clickhouse_orm database parameters](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/class_reference.md#database)
|
||||||
|
<!--- TODO Add link --->
|
||||||
|
* `migrate: bool` - indicates if this database should be migrated. See [migrations]().
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```python
|
||||||
|
CLICKHOUSE_DATABASES = {
|
||||||
|
'default': {
|
||||||
|
'db_name': 'test',
|
||||||
|
'username': 'default',
|
||||||
|
'password': ''
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### <a name="default_db_alias">CLICKHOUSE_DEFAULT_DB_ALIAS</a>
|
||||||
|
Defaults to: `'default'`
|
||||||
|
<!--- TODO Add link --->
|
||||||
|
A database alias to use in [QuerySets]() if direct [using]() is not specified.
|
||||||
|
|
||||||
|
### <a name="sync_storage">CLICKHOUSE_SYNC_STORAGE</a>
|
||||||
|
Defaults to: `'django_clickhouse.storages.RedisStorage'`
|
||||||
|
An intermediate storage class to use. Can be a string or class. [More info about storages](storages.md).
|
||||||
|
|
||||||
|
### <a name="redis_config">CLICKHOUSE_REDIS_CONFIG</a>
|
||||||
|
Default to: `None`
|
||||||
|
Redis configuration for [RedisStorage](storages.md#redis_storage).
|
||||||
|
If given, should be a dictionary of parameters to pass to [redis-py](https://redis-py.readthedocs.io/en/latest/#redis.Redis).
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```python
|
||||||
|
CLICKHOUSE_REDIS_CONFIG = {
|
||||||
|
'host': '127.0.0.1',
|
||||||
|
'port': 6379,
|
||||||
|
'db': 8,
|
||||||
|
'socket_timeout': 10
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### <a name="sync_batch_size">CLICKHOUSE_SYNC_BATCH_SIZE</a>
|
||||||
|
Defaults to: `10000`
|
||||||
|
Maximum number of operations, fetched by sync process from intermediate storage per sync round.
|
||||||
|
|
||||||
|
### <a name="sync_delay">CLICKHOUSE_SYNC_DELAY</a>
|
||||||
|
Defaults to: `5`
|
||||||
|
A delay in seconds between two sync rounds start.
|
||||||
|
|
||||||
|
### <a name="models_module">CLICKHOUSE_MODELS_MODULE</a>
|
||||||
|
Defaults to: `'clickhouse_models'`
|
||||||
|
<!--- TODO Add link --->
|
||||||
|
Module name inside [django app](https://docs.djangoproject.com/en/2.2/intro/tutorial01/),
|
||||||
|
where [ClickHouseModel]() classes are search during migrations.
|
||||||
|
|
||||||
|
### <a name="database_router">CLICKHOUSE_DATABASE_ROUTER</a>
|
||||||
|
Defaults to: `'django_clickhouse.routers.DefaultRouter'`
|
||||||
|
<!--- TODO Add link --->
|
||||||
|
A dotted path to class, representing [database router]().
|
||||||
|
|
||||||
|
### <a name="migrations_package">CLICKHOUSE_MIGRATIONS_PACKAGE</a>
|
||||||
|
Defaults to: `'clickhouse_migrations'`
|
||||||
|
A python package name inside [django app](https://docs.djangoproject.com/en/2.2/intro/tutorial01/),
|
||||||
|
where migration files are searched.
|
||||||
|
|
||||||
|
### <a name="migration_history_model">CLICKHOUSE_MIGRATION_HISTORY_MODEL</a>
|
||||||
|
Defaults to: `'django_clickhouse.migrations.MigrationHistory'`
|
||||||
|
<!--- TODO Add link --->
|
||||||
|
A dotted name of a ClickHouseModel subclass (including module path), representing [MigrationHistory]() model.
|
||||||
|
|
||||||
|
### <a name="migrate_with_default_db">CLICKHOUSE_MIGRATE_WITH_DEFAULT_DB</a>
|
||||||
|
Defaults to: `True`
|
||||||
|
A boolean flag enabling automatic ClickHouse migration,
|
||||||
|
when you call [`migrate`](https://docs.djangoproject.com/en/2.2/ref/django-admin/#django-admin-migrate) on default database.
|
||||||
|
|
||||||
|
### <a name="statd_prefix">CLICKHOUSE_STATSD_PREFIX</a>
|
||||||
|
Defaults to: `clickhouse`
|
||||||
|
<!--- TODO Add link --->
|
||||||
|
A prefix in [statsd](https://pythonhosted.org/python-statsd/) added to each library metric. See [metrics]()
|
||||||
|
|
||||||
|
### <a name="celery_queue">CLICKHOUSE_CELERY_QUEUE</a>
|
||||||
|
Defaults to: `'celery'`
|
||||||
|
A name of a queue, used by celery to plan library sync tasks.
|
10
docs/index.md
Normal file
10
docs/index.md
Normal file
|
@ -0,0 +1,10 @@
|
||||||
|
# Table of contents
|
||||||
|
|
||||||
|
* [Basic information](basic_information.md)
|
||||||
|
* [About](basic_information.md#about)
|
||||||
|
* [Features](basic_information.md#features)
|
||||||
|
* [Requirements](basic_information.md#requirements)
|
||||||
|
* [Installation](basic_information.md#installation)
|
||||||
|
* Usage
|
||||||
|
* [Storages](storages.md)
|
||||||
|
* [RedisStorage](storages.md#redis_storage)
|
70
docs/storages.md
Normal file
70
docs/storages.md
Normal file
|
@ -0,0 +1,70 @@
|
||||||
|
# Storages
|
||||||
|
Storage class is a facade, that stores information about operations, which where performed on django models.
|
||||||
|
It has three main purposes:
|
||||||
|
* Storage should be fast to insert single records. It forms a batch of data, which is then inserted to ClickHouse.
|
||||||
|
* Storage guarantees, that no data is lost.
|
||||||
|
Intermediate data in storage is deleted only after importing batch finishes successfully.
|
||||||
|
If it fails in some point - starting new import process should import failed data again.
|
||||||
|
* Keep information about sync process. For instance, last time the model sync has been called.
|
||||||
|
|
||||||
|
In order to determine different models from each other storage uses `import_key`.
|
||||||
|
By default, it is generated by `ClickHouseModel.get_import_key()` method and is equal to class name.
|
||||||
|
|
||||||
|
Each method of abstract `Storage` class takes `kwargs` parameters, which can be used in concrete storage.
|
||||||
|
|
||||||
|
## Storage methods
|
||||||
|
* `register_operations(import_key: str, operation: str, *pks: *Any) -> int`
|
||||||
|
Saves a new operation in source database to storage. This method should be fast.
|
||||||
|
It is called after source database transaction is committed.
|
||||||
|
Method returns number of operations registered.
|
||||||
|
`operation` is one of `insert`, `update` or `delete`
|
||||||
|
`pks` is an iterable of strings, enough to select needed records from source database.
|
||||||
|
|
||||||
|
* `get_last_sync_time(import_key: str) -> Optional[datetime.datetime]`
|
||||||
|
Returns last time, a model sync has been called. If no sync has been done, returns None.
|
||||||
|
|
||||||
|
* `set_last_sync_time(import_key: str, dt: datetime.datetime) -> None`
|
||||||
|
Saves datetime, when a sync process has been called last time.
|
||||||
|
|
||||||
|
* `register_operations_wrapped(self, import_key: str, operation: str, *pks: *Any) -> int`
|
||||||
|
A wrapper for register_operations. It's goal is to write metrics and logs.
|
||||||
|
|
||||||
|
* `pre_sync(import_key: str, **kwargs) -> None`
|
||||||
|
Called before import process starts. It initializes storage for importing new batch.
|
||||||
|
|
||||||
|
* `operations_count(import_key: str, **kwargs) -> int`
|
||||||
|
Counts, how many operations are waiting for import in storage.
|
||||||
|
|
||||||
|
* `get_operations(import_key: str, count: int, **kwargs) -> List[Tuple[str, str]]`
|
||||||
|
Returns a next batch of operations to import. `count` parameter gives a number of operations to return.
|
||||||
|
Operation is a tuple `(operation, primary_key)`, where `operation` is one of insert, update or delete
|
||||||
|
and `primary_key` is a string enough to select record from source database.
|
||||||
|
|
||||||
|
* `post_sync(import_key: str, **kwargs) -> None`
|
||||||
|
Called after import process have finished. It cleans storage after importing a batch.
|
||||||
|
|
||||||
|
* `post_batch_removed(import_key: str, batch_size: int) -> None`
|
||||||
|
This method should be called by `post_sync` method after data is removed from storage.
|
||||||
|
By default, it marks queue size metric.
|
||||||
|
|
||||||
|
* `post_sync_failed(import_key: str, exception: Exception, **kwargs) -> None:`
|
||||||
|
Called if any exception has occurred during import process. It cleans storage after unsuccessful import.
|
||||||
|
Note that if import process is hardly killed (with OOM, for instance) this method is not called.
|
||||||
|
|
||||||
|
* `flush() -> None`
|
||||||
|
*Dangerous*. Drops all data, kept by storage. It is used for cleaning up between tests.
|
||||||
|
|
||||||
|
|
||||||
|
## Predefined storages
|
||||||
|
### <a name="redis_storage">RedisStorage</a>
|
||||||
|
This storage uses [Redis database](https://redis.io/) as intermediate storage.
|
||||||
|
To communicate with Redis it uses [redis-py](https://redis-py.readthedocs.io/en/latest/) library.
|
||||||
|
It is not required, but should be installed to use RedisStorage.
|
||||||
|
In order to use RedisStorage you must also fill [CLICKHOUSE_REDIS_CONFIG](configuration.md#redis_config) parameter.
|
||||||
|
|
||||||
|
Stored operation contains:
|
||||||
|
* Django database alias where original record can be found.
|
||||||
|
* Record primary key
|
||||||
|
* Operation performed (insert, update, delete)
|
||||||
|
|
||||||
|
This storage does not allow multi-threaded sync.
|
2
docs/usage.md
Normal file
2
docs/usage.md
Normal file
|
@ -0,0 +1,2 @@
|
||||||
|
# Usage
|
||||||
|
|
Loading…
Reference in New Issue
Block a user