Added docs about databases and queries

This commit is contained in:
M1ha 2020-02-06 16:17:45 +05:00
parent f2dc978634
commit 951c13ad7d
5 changed files with 104 additions and 13 deletions

View File

@ -10,8 +10,7 @@ You can change `CLICKHOUSE_` prefix in settings using this parameter to anything
### CLICKHOUSE_DATABASES
Defaults to: `{}`
A dictionary, defining databases in django-like style.
<!--- TODO Add link --->
Key is an alias to communicate with this database in [connections]() and [using]().
Key is an alias to communicate with this database in [connections](databases.md#getting-database-objects) and [using](routing.md#settings-database-in-queryset).
Value is a configuration dict with parameters:
* [infi.clickhouse_orm database parameters](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/class_reference.md#database)
* `migrate: bool` - indicates if this database should be migrated. See [migrations](migrations.md).
@ -35,8 +34,7 @@ CLICKHOUSE_DATABASES = {
### CLICKHOUSE_DEFAULT_DB_ALIAS
Defaults to: `'default'`
<!--- TODO Add link --->
A database alias to use in [QuerySets]() if direct [using]() is not specified.
A database alias to use in [QuerySets](queries.md) if direct [using](routing.md#settings-database-in-queryset) is not specified.
### CLICKHOUSE_SYNC_STORAGE
Defaults to: `'django_clickhouse.storages.RedisStorage'`

40
docs/databases.md Normal file
View File

@ -0,0 +1,40 @@
# Databases
Direct usage of `Database` objects is not expected in this library. But in some cases, you may still need them.
This section describes `Database` objects and there usage.
`django_clickhouse.database.Database` is a class, describing a ClickHouse database connection.
## Getting database objects
To get a `Database` object by its alias name in [CLICKHOUSE_DATABASES](configuration.md#clickhouse_databases)
use `django_clickhouse.database.connections` object.
This object is a `django_clickhouse.database.ConnectionProxy` instance:
it creates `Database` objects when they are used for the first time and stores them in memory.
Example:
```python
from django_clickhouse.database import connections
# Database objects are inited on first call
db = connections['default']
secondary = connections['secondary']
# Already inited - object is returned from memory
db_link = connections['default']
```
## Database object
Database class is based on [infi.clickhouse_orm Database object](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/models_and_databases.md#models-and-databases),
but extends it with some extra attributes and methods:
### Database migrations are restricted
I expect this library [migration system](migrations.md) to be used.
Direct database migration will lead to migration information errors.
### `insert_tuples` and `select_tuples` methods
[infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm) store data rows in Model objects.
It works well on hundreds of records.
But when you sync 100k records in a batch, initializing 100k model instances will be slow.
Too optimize this process `ClickHouseModel` class have `get_tuple_class()` method.
It generates a [namedtuple](https://docs.python.org/3/library/collections.html#collections.namedtuple) class,
with same data fields a model has.
Initializing such tuples takes much less time, then initializing Model objects.

View File

@ -12,7 +12,8 @@
* [DjangoModel](models.md#DjangoModel)
* [ClickHouseModel](models.md#ClickHouseModel)
* [Making queries](queries.md)
* [Database routing](routing.md)
* [Databases](models.md)
* [Routing](routing.md)
* [Migrations](migrations.md)
* [Synchronization](synchronization.md)
* [Storages](storages.md)

View File

@ -1,13 +1,66 @@
# Making queries
## Motivation
ClickHouse SQL language is near to standard, but does not follow it exactly ([docs](https://clickhouse.tech/docs/en/introduction/distinctive_features/#sql-support)).
It can not be easily integrated into django query subsystem as it expects databases to support standard SQL language features like transactions and INNER/OUTER JOINS by condition.
QuerySet system used by this library looks very similar to django, but it is implemented separately.
You can read reasons for this design [here](motivation.md#separate-from-django-database-setting-queryset-and-migration-system).
In order to fit it
## Usage
Library query system extends [infi.clickhouse-orm QuerySet system](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/querysets.md) and supports all it features.
In most cases you have no need to create querysets explicitly - just use `objects` attribute or `objects_in(db)` method of `ClickHouseModel`.
At the same time `django-clickhouse` adds some extra features to `QuerySet` and `AggregateQuerySet`.
They are available if your model inherits `django_clickhouse.clickhouse_models.ClickHouseModel`.
## Extra features
### Django-like routing system
There's no need to set database object explicitly with `objects_in(...)` method, as original QuerySet expects.
Database is determined based on library configuration and [router](routing.md#router) used.
If you want to set database explicitly you can use any of approaches:
* [infi approach](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/querysets.md#querysets)
* Django like `QuerySet.using(db_alias)` method
Libraries query system extends [infi.clickhouse-orm](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/querysets.md).
Example:
```python
from django_clickhouse.database import connections
from my_app.clickhouse_models import ClickHouseUser
TODO
# This query will choose database using current router.
# By default django_clickhouse.routers.DefaultRouter is used.
# It gets one random database, from ClickHouseUser.read_db_aliases for read queries
ClickHouseUser.objects.filter(id__in=[1,2,3]).count()
# These queries do the same thing, using 'secondary' connection from CLICKHOUSE_DATABASES setting
ClickHouseUser.objects_in(connections['secondary']).filter(id__in=[1,2,3]).count()
ClickHouseUser.objects.filter(id__in=[1,2,3]).using('secondary').count()
# You can get database to use with get_database(for_write: bool = False) method
# Note that it if you have multiple database in model settings,
# DefaultRouter can return any of them each time function is called, function is stateless
ClickHouseUser.objects.get_database(for_write=False)
```
### QuerySet create methods
This library adds methods to add objects like django does without direct Database object usage.
Example:
```python
from datetime import date
from my_app.clickhouse_models import ClickHouseUser
# This queries will choose database using current router.
# By default django_clickhouse.routers.DefaultRouter is used.
# It gets one random database, from ClickHouseUser.write_db_aliases for write queries
# You can set database explicitly with using(...) or objects_in(...) methods
instance = ClickHouseUser.objects.create(id=1, first_name='Alice', visits=1, birthday=date(2003, 6, 1))
objs = ClickHouseUser.objects.bulk_create([
ClickHouseUser(id=2, first_name='Bob', visits=2, birthday=date(2001, 5, 1)),
ClickHouseUser(id=3, first_name='Jhon', visits=3, birthday=date(2002, 7, 11))
], batch_size=10)
```
### Getting all objects
`QuerySet.all()` method returns copy of current QuerySet:
```python
from my_app.clickhouse_models import ClickHouseUser
qs = ClickHouseUser.objects.all()
```

View File

@ -54,8 +54,7 @@ class MyModel(ClickHouseModel):
```
## Settings database in QuerySet
<!--- TODO Add link --->
Database can be set in each [QuerySet]() explicitly by using one of methods:
Database can be set in each [QuerySet](queries.md) explicitly by using one of methods:
* With [infi approach](https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/querysets.md#querysets): `MyModel.objects_in(db_object).filter(id__in=[1,2,3]).count()`
* With `using()` method: `MyModel.objects.filter(id__in=[1,2,3]).using(db_alias).count()`