Improve docs

This commit is contained in:
Itai Shirav 2020-02-08 12:38:23 +02:00
parent 4ffc27100d
commit 93747f7758
16 changed files with 1464 additions and 167 deletions

File diff suppressed because it is too large Load Diff

85
docs/expressions.md Normal file
View File

@ -0,0 +1,85 @@
Expressions
===========
One of the ORM's core concepts is _expressions_, which are composed using functions, operators and model fields. Expressions are used in multiple places in the ORM:
- When defining [field options](field_options.md) - `default`, `alias` and `materialized`.
- In [table engine](table_engines.md) parameters for engines in the `MergeTree` family.
- In [queryset](querysets.md) methods such as `filter`, `exclude`, `order_by`, `extra`, `aggregate` and `limit_by`.
Using Expressions
-----------------
Expressions usually include ClickHouse database functions, which are made available by the `F` class. Here's a simple function:
```python
from infi.clickhouse_orm.models import F
expr = F.today()
```
Functions that accept arguments can be composed, just like when using SQL:
```python
expr = F.toDayOfWeek(F.today())
```
You can see the SQL expression that is represented by an ORM expression by calling its `to_sql` or `repr` methods:
```python
>>> print(expr.to_sql())
toDayOfWeek(today())
```
### Operators
ORM expressions support Python's standard arithmetic operators, so you can compose expressions using `+`, `-`, `*`, `/` and `%`. For example:
```python
# A random integer between 1 and 10
F.rand() % 10 + 1
```
There is also support for comparison operators (`<`, `<=`, `==`, `>=`, `>`, `!=`) and logical operators (`&`, `|`, `~`, `^`) which are often used for filtering querysets:
```python
# Is it Friday the 13th?
(F.toDayOfWeek(F.today()) == 6) & (F.toDayOfMonth(F.today()) == 13)
```
### Referring to model fields
To refer to a model field inside an expression, use `<class>.<field>` syntax, for example:
```python
# Convert the temperature from Celsius to Fahrenheit
Sensor.temperature * 1.8 + 32
```
Inside model class definitions omit the class name:
```python
class Person(Model):
height_cm = fields.Float32Field()
height_inch = fields.Float32Field(alias=height_cm/2.54)
...
```
### Creating new "functions"
Since expressions are just Python objects until they get converted to SQL, it is possible to invent new "functions" by combining existing ones into useful building blocks. For example, we can create a reusable expression that takes a string and trims whitespace, converts it to uppercase, and changes blanks to underscores:
```python
def normalize_string(s):
return F.replaceAll(F.upper(F.trimBoth(s)), ' ', '_')
```
Then we can use this expression anywhere we need it:
```python
class Event(Model):
code = fields.StringField()
normalized_code = fields.StringField(materialized=normalize_string(code))
```
### Which functions are available?
ClickHouse has many hundreds of functions, and new ones often get added. If you encounter a function that the database supports but is not available in the `F` class, please report this via a GitHub issue. You can still use the function by providing its name:
```python
expr = F("someFunctionName", arg1, arg2, ...)
```
---
[<< Models and Databases](models_and_databases.md) | [Table of Contents](toc.md) | [Querysets >>](querysets.md)

112
docs/field_options.md Normal file
View File

@ -0,0 +1,112 @@
Field Options
=============
All field types accept the following arguments:
- default
- alias
- materialized
- readonly
- codec
Note that `default`, `alias` and `materialized` are mutually exclusive - you cannot use more than one of them in a single field.
## default
Specifies a default value to use for the field. If not given, the field will have a default value based on its type: empty string for string fields, zero for numeric fields, etc.
The default value can be a Python value suitable for the field type, or an expression. For example:
```python
class Event(models.Model):
name = fields.StringField(default="EVENT")
repeated = fields.UInt32Field(default=1)
created = fields.DateTimeField(default=F.now())
engine = engines.Memory()
...
```
When creating a model instance, any fields you do not specify get their default value. Fields that use a default expression are assigned a sentinel value of `infi.clickhouse_orm.models.NO_VALUE` instead. For example:
```python
>>> event = Event()
>>> print(event.to_dict())
{'name': 'EVENT', 'repeated': 1, 'created': <NO_VALUE>}
```
:warning: Due to a bug in ClickHouse versions prior to 20.1.2.4, insertion of records with expressions for default values may fail.
## alias / materialized
The `alias` and `materialized` attributes expect an expression that gets calculated by the database. The difference is that `alias` fields are calculated on the fly, while `materialized` fields are calculated when the record is inserted, and are stored on disk.
You can use any expression, and can refer to other model fields. For example:
```python
class Event(models.Model):
created = fields.DateTimeField()
created_date = fields.DateTimeField(materialized=F.toDate(created))
name = fields.StringField()
normalized_name = fields.StringField(alias=F.upper(F.trim(name)))
engine = engines.Memory()
```
For backwards compatibility with older versions of the ORM, you can pass the expression as an SQL string:
```python
created_date = fields.DateTimeField(materialized="toDate(created)")
```
Both field types can't be inserted into the database directly, so they are ignored when using the `Database.insert()` method. ClickHouse does not return the field values if you use `"SELECT * FROM ..."` - you have to list these field names explicitly in the query.
Usage:
```python
obj = Event(created=datetime.now(), name='MyEvent')
db = Database('my_test_db')
db.insert([obj])
# All values will be retrieved from database
db.select('SELECT created, created_date, username, name FROM $db.event', model_class=Event)
# created_date and username will contain a default value
db.select('SELECT * FROM $db.event', model_class=Event)
```
When creating a model instance, any alias or materialized fields are assigned a sentinel value of `infi.clickhouse_orm.models.NO_VALUE` since their real values can only be known after insertion to the database.
## codec
This attribute specifies the compression algorithm to use for the field (instead of the default data compression algorithm defined in server settings).
Supported compression algorithms:
| Codec | Argument | Comment
| -------------------- | -------------------------------------------| ----------------------------------------------------
| NONE | None | No compression.
| LZ4 | None | LZ4 compression.
| LZ4HC(`level`) | Possible `level` range: [3, 12]. | Default value: 9. Greater values stands for better compression and higher CPU usage. Recommended value range: [4,9].
| ZSTD(`level`) | Possible `level`range: [1, 22]. | Default value: 1. Greater values stands for better compression and higher CPU usage. Levels >= 20, should be used with caution, as they require more memory.
| Delta(`delta_bytes`) | Possible `delta_bytes` range: 1, 2, 4 , 8. | Default value for `delta_bytes` is `sizeof(type)` if it is equal to 1, 2,4 or 8 and equals to 1 otherwise.
Codecs can be combined by separating their names with commas. The default database codec is not included into pipeline (if it should be applied to a field, you have to specify it explicitly in pipeline).
Recommended usage for codecs:
- When values for particular metric do not differ significantly from point to point, delta-encoding allows to reduce disk space usage significantly.
- DateTime works great with pipeline of Delta, ZSTD and the column size can be compressed to 2-3% of its original size (given a smooth datetime data)
- Numeric types usually enjoy best compression rates with ZSTD
- String types enjoy good compression rates with LZ4HC
Example:
```python
class Stats(models.Model):
id = fields.UInt64Field(codec='ZSTD(10)')
timestamp = fields.DateTimeField(codec='Delta,ZSTD')
timestamp_date = fields.DateField(codec='Delta(4),ZSTD(22)')
metadata_id = fields.Int64Field(codec='LZ4')
status = fields.StringField(codec='LZ4HC(10)')
calculation = fields.NullableField(fields.Float32Field(), codec='ZSTD')
alerts = fields.ArrayField(fields.FixedStringField(length=15), codec='Delta(2),LZ4HC')
engine = MergeTree('timestamp_date', ('id', 'timestamp'))
```
Note: This feature is supported on ClickHouse version 19.1.16 and above. Codec arguments will be ignored by the ORM for older versions of ClickHouse.
## readonly
This attribute is set automatically for fields with `alias` or `materialized` attributes, you do not need to pass it yourself.
---
[<< Querysets](querysets.md) | [Table of Contents](toc.md) | [Field Types >>](field_types.md)

View File

@ -33,112 +33,6 @@ The following field types are supported:
| ArrayField | Array | list | See below | ArrayField | Array | list | See below
| NullableField | Nullable | See below | See below | NullableField | Nullable | See below | See below
Field Options
----------------
All field types accept the following arguments:
- default
- alias
- materialized
- readonly
- codec
Note that `default`, `alias` and `materialized` are mutually exclusive - you cannot use more than one of them in a single field.
### default
Specifies a default value to use for the field. If not given, the field will have a default value based on its type: empty string for string fields, zero for numeric fields, etc.
The default value can be a Python value suitable for the field type, or an expression. For example:
```python
class Event(models.Model):
name = fields.StringField(default="EVENT")
repeated = fields.UInt32Field(default=1)
created = fields.DateTimeField(default=F.now())
engine = engines.Memory()
...
```
When creating a model instance, any fields you do not specify get their default value. Fields that use a default expression are assigned a sentinel value of `infi.clickhouse_orm.models.NO_VALUE` instead. For example:
```python
>>> event = Event()
>>> print(event.to_dict())
{'name': 'EVENT', 'repeated': 1, 'created': <NO_VALUE>}
```
:warning: Due to a bug in ClickHouse versions prior to 20.1.2.4, insertion of records with expressions for default values may fail.
### alias / materialized
The `alias` and `materialized` attributes expect an expression that gets calculated by the database. The difference is that `alias` fields are calculated on the fly, while `materialized` fields are calculated when the record is inserted, and are stored on disk.
You can use any expression, and can refer to other model fields. For example:
```python
class Event(models.Model):
created = fields.DateTimeField()
created_date = fields.DateTimeField(materialized=F.toDate(created))
name = fields.StringField()
normalized_name = fields.StringField(alias=F.upper(F.trim(name)))
engine = engines.Memory()
```
For backwards compatibility with older versions of the ORM, you can pass the expression as an SQL string:
```python
created_date = fields.DateTimeField(materialized="toDate(created)")
```
Both field types can't be inserted into the database directly, so they are ignored when using the `Database.insert()` method. ClickHouse does not return the field values if you use `"SELECT * FROM ..."` - you have to list these field names explicitly in the query.
Usage:
```python
obj = Event(created=datetime.now(), name='MyEvent')
db = Database('my_test_db')
db.insert([obj])
# All values will be retrieved from database
db.select('SELECT created, created_date, username, name FROM $db.event', model_class=Event)
# created_date and username will contain a default value
db.select('SELECT * FROM $db.event', model_class=Event)
```
When creating a model instance, any alias or materialized fields are assigned a sentinel value of `infi.clickhouse_orm.models.NO_VALUE` since their real values can only be known after insertion to the database.
### readonly
This attribute is set automatically for fields with `alias` or `materialized` attributes, you do not need to pass it yourself.
### codec
This attribute specifies the compression algorithm to use for the field (instead of the default data compression algorithm defined in server settings).
Supported compression algorithms:
| Codec | Argument | Comment
| -------------------- | -------------------------------------------| ----------------------------------------------------
| NONE | None | No compression.
| LZ4 | None | LZ4 compression.
| LZ4HC(`level`) | Possible `level` range: [3, 12]. | Default value: 9. Greater values stands for better compression and higher CPU usage. Recommended value range: [4,9].
| ZSTD(`level`) | Possible `level`range: [1, 22]. | Default value: 1. Greater values stands for better compression and higher CPU usage. Levels >= 20, should be used with caution, as they require more memory.
| Delta(`delta_bytes`) | Possible `delta_bytes` range: 1, 2, 4 , 8. | Default value for `delta_bytes` is `sizeof(type)` if it is equal to 1, 2,4 or 8 and equals to 1 otherwise.
Codecs can be combined by separating their names with commas. The default database codec is not included into pipeline (if it should be applied to a field, you have to specify it explicitly in pipeline).
Recommended usage for codecs:
- When values for particular metric do not differ significantly from point to point, delta-encoding allows to reduce disk space usage significantly.
- DateTime works great with pipeline of Delta, ZSTD and the column size can be compressed to 2-3% of its original size (given a smooth datetime data)
- Numeric types usually enjoy best compression rates with ZSTD
- String types enjoy good compression rates with LZ4HC
Example:
```python
class Stats(models.Model):
id = fields.UInt64Field(codec='ZSTD(10)')
timestamp = fields.DateTimeField(codec='Delta,ZSTD')
timestamp_date = fields.DateField(codec='Delta(4),ZSTD(22)')
metadata_id = fields.Int64Field(codec='LZ4')
status = fields.StringField(codec='LZ4HC(10)')
calculation = fields.NullableField(fields.Float32Field(), codec='ZSTD')
alerts = fields.ArrayField(fields.FixedStringField(length=15), codec='Delta(2),LZ4HC')
engine = MergeTree('timestamp_date', ('id', 'timestamp'))
```
Note: This feature is supported on ClickHouse version 19.1.16 and above. Codec arguments will be ignored by the ORM for older versions of ClickHouse.
DateTimeField and Time Zones DateTimeField and Time Zones
---------------------------- ----------------------------
@ -294,4 +188,4 @@ class BooleanField(Field):
--- ---
[<< Querysets](querysets.md) | [Table of Contents](toc.md) | [Table Engines >>](table_engines.md) [<< Field Options](field_options.md) | [Table of Contents](toc.md) | [Table Engines >>](table_engines.md)

View File

@ -31,6 +31,8 @@ Each field has a "natural" default value - empty string for string fields, zero
first_name = fields.StringField(default="anonymous") first_name = fields.StringField(default="anonymous")
For additional details see [here](field_options.md).
### Null values ### Null values
To allow null values in a field, wrap it inside a `NullableField`: To allow null values in a field, wrap it inside a `NullableField`:
@ -39,25 +41,27 @@ To allow null values in a field, wrap it inside a `NullableField`:
In this case, the default value for that field becomes `null` unless otherwise specified. In this case, the default value for that field becomes `null` unless otherwise specified.
For more information about `NullableField` see [Field Types](field_types.md).
### Materialized fields ### Materialized fields
The value of a materialized field is calculated from other fields in the model. For example: The value of a materialized field is calculated from other fields in the model. For example:
year_born = fields.Int16Field(materialized="toYear(birthday)") year_born = fields.Int16Field(materialized=F.toYear(birthday))
Materialized fields are read-only, meaning that their values are not sent to the database when inserting records. Materialized fields are read-only, meaning that their values are not sent to the database when inserting records.
It is not possible to specify a default value for a materialized field. For additional details see [here](field_options.md).
### Alias fields ### Alias fields
An alias field is a field whose value is calculated by ClickHouse on the fly, as a function of other fields. It is not physically stored by the database. For example: An alias field is a field whose value is calculated by ClickHouse on the fly, as a function of other fields. It is not physically stored by the database. For example:
weekday_born = field.UInt8Field(alias="toDayOfWeek(birthday)") weekday_born = field.UInt8Field(alias=F.toDayOfWeek(birthday))
Alias fields are read-only, meaning that their values are not sent to the database when inserting records. Alias fields are read-only, meaning that their values are not sent to the database when inserting records.
It is not possible to specify a default value for an alias field. For additional details see [here](field_options.md).
### Table Names ### Table Names
@ -121,19 +125,19 @@ Reading from the Database
Loading model instances from the database is simple: Loading model instances from the database is simple:
for person in db.select("SELECT * FROM my_test_db.person", model_class=Person): for person in db.select("SELECT * FROM my_test_db.person", model_class=Person):
print person.first_name, person.last_name print(person.first_name, person.last_name)
Do not include a `FORMAT` clause in the query, since the ORM automatically sets the format to `TabSeparatedWithNamesAndTypes`. Do not include a `FORMAT` clause in the query, since the ORM automatically sets the format to `TabSeparatedWithNamesAndTypes`.
It is possible to select only a subset of the columns, and the rest will receive their default values: It is possible to select only a subset of the columns, and the rest will receive their default values:
for person in db.select("SELECT first_name FROM my_test_db.person WHERE last_name='Smith'", model_class=Person): for person in db.select("SELECT first_name FROM my_test_db.person WHERE last_name='Smith'", model_class=Person):
print person.first_name print(person.first_name)
The ORM provides a way to build simple queries without writing SQL by hand. The previous snippet can be written like this: The ORM provides a way to build simple queries without writing SQL by hand. The previous snippet can be written like this:
for person in Person.objects_in(db).filter(last_name='Smith').only('first_name'): for person in Person.objects_in(db).filter(last_name='Smith').only('first_name'):
print person.first_name print(person.first_name)
See [Querysets](querysets.md) for more information. See [Querysets](querysets.md) for more information.
@ -144,7 +148,7 @@ Reading without a Model
When running a query, specifying a model class is not required. In case you do not provide a model class, an ad-hoc class will be defined based on the column names and types returned by the query: When running a query, specifying a model class is not required. In case you do not provide a model class, an ad-hoc class will be defined based on the column names and types returned by the query:
for row in db.select("SELECT max(height) as max_height FROM my_test_db.person"): for row in db.select("SELECT max(height) as max_height FROM my_test_db.person"):
print row.max_height print(row.max_height)
This is a very convenient feature that saves you the need to define a model for each query, while still letting you work with Pythonic column values and an elegant syntax. This is a very convenient feature that saves you the need to define a model for each query, while still letting you work with Pythonic column values and an elegant syntax.
@ -180,9 +184,9 @@ It is possible to paginate through model instances:
>>> order_by = 'first_name, last_name' >>> order_by = 'first_name, last_name'
>>> page = db.paginate(Person, order_by, page_num=1, page_size=10) >>> page = db.paginate(Person, order_by, page_num=1, page_size=10)
>>> print page.number_of_objects >>> print(page.number_of_objects)
2507 2507
>>> print page.pages_total >>> print(page.pages_total)
251 251
>>> for person in page.objects: >>> for person in page.objects:
>>> # do something >>> # do something
@ -204,4 +208,4 @@ Note that `order_by` must be chosen so that the ordering is unique, otherwise th
--- ---
[<< Overview](index.md) | [Table of Contents](toc.md) | [Querysets >>](querysets.md) [<< Overview](index.md) | [Table of Contents](toc.md) | [Expressions >>](expressions.md)

View File

@ -8,7 +8,7 @@ A queryset is an object that represents a database query using a specific Model.
This queryset matches all Person instances in the database. You can get these instances using iteration: This queryset matches all Person instances in the database. You can get these instances using iteration:
for person in qs: for person in qs:
print person.first_name, person.last_name print(person.first_name, person.last_name)
Filtering Filtering
--------- ---------
@ -128,7 +128,7 @@ Adds a DISTINCT clause to the query, meaning that any duplicate rows in the resu
Final Final
-------- --------
This method can be used only with CollapsingMergeTree engine. This method can be used only with `CollapsingMergeTree` engine.
Adds a FINAL modifier to the query, meaning data is selected fully "collapsed" by sign field. Adds a FINAL modifier to the query, meaning data is selected fully "collapsed" by sign field.
>>> Person.objects_in(database).count() >>> Person.objects_in(database).count()
@ -162,9 +162,9 @@ Similar to `Database.paginate`, you can go over the queryset results one page at
>>> qs = Person.objects_in(database).order_by('last_name', 'first_name') >>> qs = Person.objects_in(database).order_by('last_name', 'first_name')
>>> page = qs.paginate(page_num=1, page_size=10) >>> page = qs.paginate(page_num=1, page_size=10)
>>> print page.number_of_objects >>> print(page.number_of_objects)
2507 2507
>>> print page.pages_total >>> print(page.pages_total)
251 251
>>> for person in page.objects: >>> for person in page.objects:
>>> # do something >>> # do something
@ -185,9 +185,9 @@ Aggregation
It is possible to use aggregation functions over querysets using the `aggregate` method. The simplest form of aggregation works over all rows in the queryset: It is possible to use aggregation functions over querysets using the `aggregate` method. The simplest form of aggregation works over all rows in the queryset:
>>> qs = Person.objects_in(database).aggregate(average_height='avg(height)') >>> qs = Person.objects_in(database).aggregate(average_height='avg(height)')
>>> print qs.count() >>> print(qs.count())
1 1
>>> for row in qs: print row.average_height >>> for row in qs: print(row.average_height)
1.71 1.71
The returned row or rows are no longer instances of the base model (`Person` in this example), but rather instances of an ad-hoc model that includes only the fields specified in the call to `aggregate`. The returned row or rows are no longer instances of the base model (`Person` in this example), but rather instances of an ad-hoc model that includes only the fields specified in the call to `aggregate`.
@ -215,7 +215,7 @@ To achieve this, you can use `with_totals` method. It will return extra row (las
values aggregated for all rows suitable for filters. values aggregated for all rows suitable for filters.
qs = Person.objects_in(database).aggregate('first_name', num='count()').with_totals().order_by('-count')[:3] qs = Person.objects_in(database).aggregate('first_name', num='count()').with_totals().order_by('-count')[:3]
>>> print qs.count() >>> print(qs.count())
4 4
>>> for row in qs: >>> for row in qs:
>>> print("'{}': {}".format(row.first_name, row.count)) >>> print("'{}': {}".format(row.first_name, row.count))
@ -225,4 +225,4 @@ values aggregated for all rows suitable for filters.
--- ---
[<< Models and Databases](models_and_databases.md) | [Table of Contents](toc.md) | [Field Types >>](field_types.md) [<< Expressions](expressions.md) | [Table of Contents](toc.md) | [Field Options >>](field_options.md)

View File

@ -1,7 +1,7 @@
Table Engines Table Engines
============= =============
See: [ClickHouse Documentation](https://clickhouse.yandex/docs/en/table_engines/) See: [ClickHouse Documentation](https://clickhouse.tech/docs/en/operations/table_engines/)
Each model must have an engine instance, used when creating the table in ClickHouse. Each model must have an engine instance, used when creating the table in ClickHouse.

View File

@ -30,13 +30,17 @@
* [Pagination](querysets.md#pagination) * [Pagination](querysets.md#pagination)
* [Aggregation](querysets.md#aggregation) * [Aggregation](querysets.md#aggregation)
* [Field Options](field_options.md#field-options)
* [default](field_options.md#default)
* [alias / materialized](field_options.md#alias-/-materialized)
* [codec](field_options.md#codec)
* [readonly](field_options.md#readonly)
* [Field Types](field_types.md#field-types) * [Field Types](field_types.md#field-types)
* [DateTimeField and Time Zones](field_types.md#datetimefield-and-time-zones) * [DateTimeField and Time Zones](field_types.md#datetimefield-and-time-zones)
* [Working with enum fields](field_types.md#working-with-enum-fields) * [Working with enum fields](field_types.md#working-with-enum-fields)
* [Working with array fields](field_types.md#working-with-array-fields) * [Working with array fields](field_types.md#working-with-array-fields)
* [Working with materialized and alias fields](field_types.md#working-with-materialized-and-alias-fields)
* [Working with nullable fields](field_types.md#working-with-nullable-fields) * [Working with nullable fields](field_types.md#working-with-nullable-fields)
* [Working with field compression codecs](field_types.md#working-with-field-compression-codecs)
* [Working with LowCardinality fields](field_types.md#working-with-lowcardinality-fields) * [Working with LowCardinality fields](field_types.md#working-with-lowcardinality-fields)
* [Creating custom field types](field_types.md#creating-custom-field-types) * [Creating custom field types](field_types.md#creating-custom-field-types)
@ -84,6 +88,8 @@
* [FixedStringField](class_reference.md#fixedstringfield) * [FixedStringField](class_reference.md#fixedstringfield)
* [Float32Field](class_reference.md#float32field) * [Float32Field](class_reference.md#float32field)
* [Float64Field](class_reference.md#float64field) * [Float64Field](class_reference.md#float64field)
* [IPv4Field](class_reference.md#ipv4field)
* [IPv6Field](class_reference.md#ipv6field)
* [Int16Field](class_reference.md#int16field) * [Int16Field](class_reference.md#int16field)
* [Int32Field](class_reference.md#int32field) * [Int32Field](class_reference.md#int32field)
* [Int64Field](class_reference.md#int64field) * [Int64Field](class_reference.md#int64field)
@ -111,4 +117,8 @@
* [infi.clickhouse_orm.query](class_reference.md#infi.clickhouse_orm.query) * [infi.clickhouse_orm.query](class_reference.md#infi.clickhouse_orm.query)
* [QuerySet](class_reference.md#queryset) * [QuerySet](class_reference.md#queryset)
* [AggregateQuerySet](class_reference.md#aggregatequeryset) * [AggregateQuerySet](class_reference.md#aggregatequeryset)
* [infi.clickhouse_orm.funcs](class_reference.md#infi.clickhouse_orm.funcs)
* [F](class_reference.md#f)
* [infi.clickhouse_orm.system_models](class_reference.md#infi.clickhouse_orm.system_models)
* [SystemPart](class_reference.md#systempart)

View File

@ -125,6 +125,8 @@ if __name__ == '__main__':
from infi.clickhouse_orm import engines from infi.clickhouse_orm import engines
from infi.clickhouse_orm import models from infi.clickhouse_orm import models
from infi.clickhouse_orm import query from infi.clickhouse_orm import query
from infi.clickhouse_orm import funcs
from infi.clickhouse_orm import system_models
print('Class Reference') print('Class Reference')
print('===============') print('===============')
@ -134,3 +136,5 @@ if __name__ == '__main__':
module_doc(sorted([fields.Field] + all_subclasses(fields.Field), key=lambda x: x.__name__), False) module_doc(sorted([fields.Field] + all_subclasses(fields.Field), key=lambda x: x.__name__), False)
module_doc([engines.Engine] + all_subclasses(engines.Engine), False) module_doc([engines.Engine] + all_subclasses(engines.Engine), False)
module_doc([query.QuerySet, query.AggregateQuerySet]) module_doc([query.QuerySet, query.AggregateQuerySet])
module_doc([funcs.F])
module_doc([system_models.SystemPart])

View File

@ -9,6 +9,7 @@ printf "# Table of Contents\n\n" > toc.md
generate_one "index.md" generate_one "index.md"
generate_one "models_and_databases.md" generate_one "models_and_databases.md"
generate_one "querysets.md" generate_one "querysets.md"
generate_one "field_options.md"
generate_one "field_types.md" generate_one "field_types.md"
generate_one "table_engines.md" generate_one "table_engines.md"
generate_one "schema_migrations.md" generate_one "schema_migrations.md"

View File

@ -1,4 +1,4 @@
from HTMLParser import HTMLParser from html.parser import HTMLParser
import sys import sys
@ -18,7 +18,7 @@ class HeadersToMarkdownParser(HTMLParser):
if tag.lower() in HEADER_TAGS: if tag.lower() in HEADER_TAGS:
indent = ' ' * int(self.inside[1]) indent = ' ' * int(self.inside[1])
fragment = self.text.lower().replace(' ', '-') fragment = self.text.lower().replace(' ', '-')
print '%s* [%s](%s#%s)' % (indent, self.text, sys.argv[1], fragment) print('%s* [%s](%s#%s)' % (indent, self.text, sys.argv[1], fragment))
self.inside = None self.inside = None
self.text = '' self.text = ''
@ -28,4 +28,4 @@ class HeadersToMarkdownParser(HTMLParser):
HeadersToMarkdownParser().feed(sys.stdin.read()) HeadersToMarkdownParser().feed(sys.stdin.read())
print print('')

View File

@ -216,11 +216,11 @@ class Distributed(Engine):
""" """
def __init__(self, cluster, table=None, sharding_key=None): def __init__(self, cluster, table=None, sharding_key=None):
""" """
:param cluster: what cluster to access data from - `cluster`: what cluster to access data from
:param table: underlying table that actually stores data. - `table`: underlying table that actually stores data.
If you are not specifying any table here, ensure that it can be inferred If you are not specifying any table here, ensure that it can be inferred
from your model's superclass (see models.DistributedModel.fix_engine_table) from your model's superclass (see models.DistributedModel.fix_engine_table)
:param sharding_key: how to distribute data among shards when inserting - `sharding_key`: how to distribute data among shards when inserting
straightly into Distributed table, optional straightly into Distributed table, optional
""" """
self.cluster = cluster self.cluster = cluster

View File

@ -74,9 +74,10 @@ class Field(FunctionOperatorsMixin):
def get_sql(self, with_default_expression=True, db=None): def get_sql(self, with_default_expression=True, db=None):
''' '''
Returns an SQL expression describing the field (e.g. for CREATE TABLE). Returns an SQL expression describing the field (e.g. for CREATE TABLE).
:param with_default_expression: If True, adds default value to sql.
- `with_default_expression`: If True, adds default value to sql.
It doesn't affect fields with alias and materialized values. It doesn't affect fields with alias and materialized values.
:param db: Database, used for checking supported features. - `db`: Database, used for checking supported features.
''' '''
sql = self.db_type sql = self.db_type
if with_default_expression: if with_default_expression:
@ -102,8 +103,10 @@ class Field(FunctionOperatorsMixin):
""" """
Checks if the instance if one of the types provided or if any of the inner_field child is one of the types Checks if the instance if one of the types provided or if any of the inner_field child is one of the types
provided, returns True if field or any inner_field is one of ths provided, False otherwise provided, returns True if field or any inner_field is one of ths provided, False otherwise
:param types: Iterable of types to check inclusion of instance
:return: Boolean - `types`: Iterable of types to check inclusion of instance
Returns: Boolean
""" """
if isinstance(self, types): if isinstance(self, types):
return True return True

View File

@ -110,10 +110,17 @@ class F(Cond, FunctionOperatorsMixin):
It doubles as a query condition when the function returns a boolean result. It doubles as a query condition when the function returns a boolean result.
""" """
def __init__(self, name, *args): def __init__(self, name, *args):
"""
Initializer.
"""
self.name = name self.name = name
self.args = args self.args = args
self.is_binary_operator = False self.is_binary_operator = False
def __repr__(self):
return self.to_sql()
def to_sql(self, *args): # FIXME why *args ? def to_sql(self, *args): # FIXME why *args ?
""" """
Generates an SQL string for this function and its arguments. Generates an SQL string for this function and its arguments.
@ -128,11 +135,11 @@ class F(Cond, FunctionOperatorsMixin):
else: else:
prefix = self.name prefix = self.name
sep = ', ' sep = ', '
arg_strs = (F.arg_to_sql(arg) for arg in self.args) arg_strs = (F._arg_to_sql(arg) for arg in self.args)
return prefix + '(' + sep.join(arg_strs) + ')' return prefix + '(' + sep.join(arg_strs) + ')'
@staticmethod @staticmethod
def arg_to_sql(arg): def _arg_to_sql(arg):
""" """
Converts a function argument to SQL string according to its type. Converts a function argument to SQL string according to its type.
Supports functions, model fields, strings, dates, datetimes, booleans, Supports functions, model fields, strings, dates, datetimes, booleans,
@ -156,7 +163,7 @@ class F(Cond, FunctionOperatorsMixin):
if arg is None: if arg is None:
return 'NULL' return 'NULL'
if is_iterable(arg): if is_iterable(arg):
return '[' + comma_join(F.arg_to_sql(x) for x in arg) + ']' return '[' + comma_join(F._arg_to_sql(x) for x in arg) + ']'
return str(arg) return str(arg)
# Arithmetic functions # Arithmetic functions

View File

@ -205,7 +205,7 @@ class Q(object):
def is_empty(self): def is_empty(self):
""" """
Checks if there are any conditions in Q object Checks if there are any conditions in Q object
:return: Boolean Returns: Boolean
""" """
return not bool(self._conds or self._children) return not bool(self._conds or self._children)

View File

@ -60,10 +60,12 @@ class SystemPart(Model):
def _partition_operation_sql(self, operation, settings=None, from_part=None): def _partition_operation_sql(self, operation, settings=None, from_part=None):
""" """
Performs some operation over partition Performs some operation over partition
:param db: Database object to execute operation on
:param operation: Operation to execute from SystemPart.OPERATIONS set - `db`: Database object to execute operation on
:param settings: Settings for executing request to ClickHouse over db.raw() method - `operation`: Operation to execute from SystemPart.OPERATIONS set
:return: Operation execution result - `settings`: Settings for executing request to ClickHouse over db.raw() method
Returns: Operation execution result
""" """
operation = operation.upper() operation = operation.upper()
assert operation in self.OPERATIONS, "operation must be in [%s]" % comma_join(self.OPERATIONS) assert operation in self.OPERATIONS, "operation must be in [%s]" % comma_join(self.OPERATIONS)
@ -76,41 +78,51 @@ class SystemPart(Model):
def detach(self, settings=None): def detach(self, settings=None):
""" """
Move a partition to the 'detached' directory and forget it. Move a partition to the 'detached' directory and forget it.
:param settings: Settings for executing request to ClickHouse over db.raw() method
:return: SQL Query - `settings`: Settings for executing request to ClickHouse over db.raw() method
Returns: SQL Query
""" """
return self._partition_operation_sql('DETACH', settings=settings) return self._partition_operation_sql('DETACH', settings=settings)
def drop(self, settings=None): def drop(self, settings=None):
""" """
Delete a partition Delete a partition
:param settings: Settings for executing request to ClickHouse over db.raw() method
:return: SQL Query - `settings`: Settings for executing request to ClickHouse over db.raw() method
Returns: SQL Query
""" """
return self._partition_operation_sql('DROP', settings=settings) return self._partition_operation_sql('DROP', settings=settings)
def attach(self, settings=None): def attach(self, settings=None):
""" """
Add a new part or partition from the 'detached' directory to the table. Add a new part or partition from the 'detached' directory to the table.
:param settings: Settings for executing request to ClickHouse over db.raw() method
:return: SQL Query - `settings`: Settings for executing request to ClickHouse over db.raw() method
Returns: SQL Query
""" """
return self._partition_operation_sql('ATTACH', settings=settings) return self._partition_operation_sql('ATTACH', settings=settings)
def freeze(self, settings=None): def freeze(self, settings=None):
""" """
Create a backup of a partition. Create a backup of a partition.
:param settings: Settings for executing request to ClickHouse over db.raw() method
:return: SQL Query - `settings`: Settings for executing request to ClickHouse over db.raw() method
Returns: SQL Query
""" """
return self._partition_operation_sql('FREEZE', settings=settings) return self._partition_operation_sql('FREEZE', settings=settings)
def fetch(self, zookeeper_path, settings=None): def fetch(self, zookeeper_path, settings=None):
""" """
Download a partition from another server. Download a partition from another server.
:param zookeeper_path: Path in zookeeper to fetch from
:param settings: Settings for executing request to ClickHouse over db.raw() method - `zookeeper_path`: Path in zookeeper to fetch from
:return: SQL Query - `settings`: Settings for executing request to ClickHouse over db.raw() method
Returns: SQL Query
""" """
return self._partition_operation_sql('FETCH', settings=settings, from_part=zookeeper_path) return self._partition_operation_sql('FETCH', settings=settings, from_part=zookeeper_path)
@ -118,9 +130,11 @@ class SystemPart(Model):
def get(cls, database, conditions=""): def get(cls, database, conditions=""):
""" """
Get all data from system.parts table Get all data from system.parts table
:param database: A database object to fetch data from.
:param conditions: WHERE clause conditions. Database condition is added automatically - `database`: A database object to fetch data from.
:return: A list of SystemPart objects - `conditions`: WHERE clause conditions. Database condition is added automatically
Returns: A list of SystemPart objects
""" """
assert isinstance(database, Database), "database must be database.Database class instance" assert isinstance(database, Database), "database must be database.Database class instance"
assert isinstance(conditions, str), "conditions must be a string" assert isinstance(conditions, str), "conditions must be a string"
@ -134,9 +148,11 @@ class SystemPart(Model):
def get_active(cls, database, conditions=""): def get_active(cls, database, conditions=""):
""" """
Gets active data from system.parts table Gets active data from system.parts table
:param database: A database object to fetch data from.
:param conditions: WHERE clause conditions. Database and active conditions are added automatically - `database`: A database object to fetch data from.
:return: A list of SystemPart objects - `conditions`: WHERE clause conditions. Database and active conditions are added automatically
Returns: A list of SystemPart objects
""" """
if conditions: if conditions:
conditions += ' AND ' conditions += ' AND '