mirror of
https://github.com/carrotquest/django-clickhouse.git
synced 2024-11-22 00:56:37 +03:00
1) Started writing the docs
2)
This commit is contained in:
parent
10fae9220c
commit
c2beabbc07
35
docs/basic_information.md
Normal file
35
docs/basic_information.md
Normal file
|
@ -0,0 +1,35 @@
|
|||
# Basic information
|
||||
## <a name="about">About</a>
|
||||
This project's goal is to build [Yandex ClickHouse](https://clickhouse.yandex/) database into [Django](https://www.djangoproject.com/) project.
|
||||
It is based on [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm) library.
|
||||
|
||||
## <a name="features">Features</a>
|
||||
* Multiple ClickHouse database configuration in [settings.py](https://docs.djangoproject.com/en/2.1/ref/settings/)
|
||||
* ORM to create and manage ClickHouse models.
|
||||
* ClickHouse migration system.
|
||||
* Scalable serialization of django model instances to ORM model instances.
|
||||
* Effective periodical synchronization of django models to ClickHouse without loosing data.
|
||||
* Synchronization process monitoring.
|
||||
|
||||
## <a name="requirements">Requirements</a>
|
||||
* [Python 3](https://www.python.org/downloads/)
|
||||
* [Django](https://docs.djangoproject.com/) 1.7+
|
||||
* [Yandex ClickHouse](https://clickhouse.yandex/)
|
||||
* [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm)
|
||||
* pytz
|
||||
* six
|
||||
* typing
|
||||
* psycopg2
|
||||
* celery
|
||||
* statsd
|
||||
|
||||
### Optional libraries
|
||||
* [redis-py](https://redis-py.readthedocs.io/en/latest/) for [RedisStorage](storages.md#redis_storage)
|
||||
* [django-pg-returning](https://travis-ci.com/M1hacka/django-pg-returning)
|
||||
for optimizing registering updates in [PostgreSQL](https://www.postgresql.org/)
|
||||
|
||||
## <a name="installation">Installation</a>
|
||||
Install via pip:
|
||||
`pip install django-clickhouse` ([not released yet](https://github.com/carrotquest/django-clickhouse/issues/3))
|
||||
or via setup.py:
|
||||
`python setup.py install`
|
96
docs/configuration.md
Normal file
96
docs/configuration.md
Normal file
|
@ -0,0 +1,96 @@
|
|||
# Configuration
|
||||
|
||||
Library configuration is made in settings.py. All parameters start with `CLICKHOUSE_` prefix.
|
||||
Prefix can be changed using `CLICKHOUSE_SETTINGS_PREFIX` parameter.
|
||||
|
||||
### <a name="databases">CLICKHOUSE_SETTINGS_PREFIX</a>
|
||||
Defaults to: `'CLICKHOUSE_'`
|
||||
You can change `CLICKHOUSE_` prefix in settings using this parameter to anything your like.
|
||||
|
||||
### <a name="databases">CLICKHOUSE_DATABASES</a>
|
||||
Defaults to: `{}`
|
||||
A dictionary, defining databases in django-like style.
|
||||
<!--- TODO Add link --->
|
||||
Key is an alias to communicate with this database in [connections]() and [using]().
|
||||
Value is a configuration dict with parameters:
|
||||
* [infi.clickhouse_orm database parameters](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/class_reference.md#database)
|
||||
<!--- TODO Add link --->
|
||||
* `migrate: bool` - indicates if this database should be migrated. See [migrations]().
|
||||
|
||||
Example:
|
||||
```python
|
||||
CLICKHOUSE_DATABASES = {
|
||||
'default': {
|
||||
'db_name': 'test',
|
||||
'username': 'default',
|
||||
'password': ''
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### <a name="default_db_alias">CLICKHOUSE_DEFAULT_DB_ALIAS</a>
|
||||
Defaults to: `'default'`
|
||||
<!--- TODO Add link --->
|
||||
A database alias to use in [QuerySets]() if direct [using]() is not specified.
|
||||
|
||||
### <a name="sync_storage">CLICKHOUSE_SYNC_STORAGE</a>
|
||||
Defaults to: `'django_clickhouse.storages.RedisStorage'`
|
||||
An intermediate storage class to use. Can be a string or class. [More info about storages](storages.md).
|
||||
|
||||
### <a name="redis_config">CLICKHOUSE_REDIS_CONFIG</a>
|
||||
Default to: `None`
|
||||
Redis configuration for [RedisStorage](storages.md#redis_storage).
|
||||
If given, should be a dictionary of parameters to pass to [redis-py](https://redis-py.readthedocs.io/en/latest/#redis.Redis).
|
||||
|
||||
Example:
|
||||
```python
|
||||
CLICKHOUSE_REDIS_CONFIG = {
|
||||
'host': '127.0.0.1',
|
||||
'port': 6379,
|
||||
'db': 8,
|
||||
'socket_timeout': 10
|
||||
}
|
||||
```
|
||||
|
||||
### <a name="sync_batch_size">CLICKHOUSE_SYNC_BATCH_SIZE</a>
|
||||
Defaults to: `10000`
|
||||
Maximum number of operations, fetched by sync process from intermediate storage per sync round.
|
||||
|
||||
### <a name="sync_delay">CLICKHOUSE_SYNC_DELAY</a>
|
||||
Defaults to: `5`
|
||||
A delay in seconds between two sync rounds start.
|
||||
|
||||
### <a name="models_module">CLICKHOUSE_MODELS_MODULE</a>
|
||||
Defaults to: `'clickhouse_models'`
|
||||
<!--- TODO Add link --->
|
||||
Module name inside [django app](https://docs.djangoproject.com/en/2.2/intro/tutorial01/),
|
||||
where [ClickHouseModel]() classes are search during migrations.
|
||||
|
||||
### <a name="database_router">CLICKHOUSE_DATABASE_ROUTER</a>
|
||||
Defaults to: `'django_clickhouse.routers.DefaultRouter'`
|
||||
<!--- TODO Add link --->
|
||||
A dotted path to class, representing [database router]().
|
||||
|
||||
### <a name="migrations_package">CLICKHOUSE_MIGRATIONS_PACKAGE</a>
|
||||
Defaults to: `'clickhouse_migrations'`
|
||||
A python package name inside [django app](https://docs.djangoproject.com/en/2.2/intro/tutorial01/),
|
||||
where migration files are searched.
|
||||
|
||||
### <a name="migration_history_model">CLICKHOUSE_MIGRATION_HISTORY_MODEL</a>
|
||||
Defaults to: `'django_clickhouse.migrations.MigrationHistory'`
|
||||
<!--- TODO Add link --->
|
||||
A dotted name of a ClickHouseModel subclass (including module path), representing [MigrationHistory]() model.
|
||||
|
||||
### <a name="migrate_with_default_db">CLICKHOUSE_MIGRATE_WITH_DEFAULT_DB</a>
|
||||
Defaults to: `True`
|
||||
A boolean flag enabling automatic ClickHouse migration,
|
||||
when you call [`migrate`](https://docs.djangoproject.com/en/2.2/ref/django-admin/#django-admin-migrate) on default database.
|
||||
|
||||
### <a name="statd_prefix">CLICKHOUSE_STATSD_PREFIX</a>
|
||||
Defaults to: `clickhouse`
|
||||
<!--- TODO Add link --->
|
||||
A prefix in [statsd](https://pythonhosted.org/python-statsd/) added to each library metric. See [metrics]()
|
||||
|
||||
### <a name="celery_queue">CLICKHOUSE_CELERY_QUEUE</a>
|
||||
Defaults to: `'celery'`
|
||||
A name of a queue, used by celery to plan library sync tasks.
|
10
docs/index.md
Normal file
10
docs/index.md
Normal file
|
@ -0,0 +1,10 @@
|
|||
# Table of contents
|
||||
|
||||
* [Basic information](basic_information.md)
|
||||
* [About](basic_information.md#about)
|
||||
* [Features](basic_information.md#features)
|
||||
* [Requirements](basic_information.md#requirements)
|
||||
* [Installation](basic_information.md#installation)
|
||||
* Usage
|
||||
* [Storages](storages.md)
|
||||
* [RedisStorage](storages.md#redis_storage)
|
70
docs/storages.md
Normal file
70
docs/storages.md
Normal file
|
@ -0,0 +1,70 @@
|
|||
# Storages
|
||||
Storage class is a facade, that stores information about operations, which where performed on django models.
|
||||
It has three main purposes:
|
||||
* Storage should be fast to insert single records. It forms a batch of data, which is then inserted to ClickHouse.
|
||||
* Storage guarantees, that no data is lost.
|
||||
Intermediate data in storage is deleted only after importing batch finishes successfully.
|
||||
If it fails in some point - starting new import process should import failed data again.
|
||||
* Keep information about sync process. For instance, last time the model sync has been called.
|
||||
|
||||
In order to determine different models from each other storage uses `import_key`.
|
||||
By default, it is generated by `ClickHouseModel.get_import_key()` method and is equal to class name.
|
||||
|
||||
Each method of abstract `Storage` class takes `kwargs` parameters, which can be used in concrete storage.
|
||||
|
||||
## Storage methods
|
||||
* `register_operations(import_key: str, operation: str, *pks: *Any) -> int`
|
||||
Saves a new operation in source database to storage. This method should be fast.
|
||||
It is called after source database transaction is committed.
|
||||
Method returns number of operations registered.
|
||||
`operation` is one of `insert`, `update` or `delete`
|
||||
`pks` is an iterable of strings, enough to select needed records from source database.
|
||||
|
||||
* `get_last_sync_time(import_key: str) -> Optional[datetime.datetime]`
|
||||
Returns last time, a model sync has been called. If no sync has been done, returns None.
|
||||
|
||||
* `set_last_sync_time(import_key: str, dt: datetime.datetime) -> None`
|
||||
Saves datetime, when a sync process has been called last time.
|
||||
|
||||
* `register_operations_wrapped(self, import_key: str, operation: str, *pks: *Any) -> int`
|
||||
A wrapper for register_operations. It's goal is to write metrics and logs.
|
||||
|
||||
* `pre_sync(import_key: str, **kwargs) -> None`
|
||||
Called before import process starts. It initializes storage for importing new batch.
|
||||
|
||||
* `operations_count(import_key: str, **kwargs) -> int`
|
||||
Counts, how many operations are waiting for import in storage.
|
||||
|
||||
* `get_operations(import_key: str, count: int, **kwargs) -> List[Tuple[str, str]]`
|
||||
Returns a next batch of operations to import. `count` parameter gives a number of operations to return.
|
||||
Operation is a tuple `(operation, primary_key)`, where `operation` is one of insert, update or delete
|
||||
and `primary_key` is a string enough to select record from source database.
|
||||
|
||||
* `post_sync(import_key: str, **kwargs) -> None`
|
||||
Called after import process have finished. It cleans storage after importing a batch.
|
||||
|
||||
* `post_batch_removed(import_key: str, batch_size: int) -> None`
|
||||
This method should be called by `post_sync` method after data is removed from storage.
|
||||
By default, it marks queue size metric.
|
||||
|
||||
* `post_sync_failed(import_key: str, exception: Exception, **kwargs) -> None:`
|
||||
Called if any exception has occurred during import process. It cleans storage after unsuccessful import.
|
||||
Note that if import process is hardly killed (with OOM, for instance) this method is not called.
|
||||
|
||||
* `flush() -> None`
|
||||
*Dangerous*. Drops all data, kept by storage. It is used for cleaning up between tests.
|
||||
|
||||
|
||||
## Predefined storages
|
||||
### <a name="redis_storage">RedisStorage</a>
|
||||
This storage uses [Redis database](https://redis.io/) as intermediate storage.
|
||||
To communicate with Redis it uses [redis-py](https://redis-py.readthedocs.io/en/latest/) library.
|
||||
It is not required, but should be installed to use RedisStorage.
|
||||
In order to use RedisStorage you must also fill [CLICKHOUSE_REDIS_CONFIG](configuration.md#redis_config) parameter.
|
||||
|
||||
Stored operation contains:
|
||||
* Django database alias where original record can be found.
|
||||
* Record primary key
|
||||
* Operation performed (insert, update, delete)
|
||||
|
||||
This storage does not allow multi-threaded sync.
|
2
docs/usage.md
Normal file
2
docs/usage.md
Normal file
|
@ -0,0 +1,2 @@
|
|||
# Usage
|
||||
|
Loading…
Reference in New Issue
Block a user