RAMEN-206 Support LowCardinality in infi.clickhouse_orm

added documentation
This commit is contained in:
Roy Belio 2019-06-25 13:24:06 +03:00
parent 7fcbad44b9
commit 8d5e47a957
2 changed files with 33 additions and 0 deletions

View File

@ -188,6 +188,37 @@ class Stats(models.Model):
```
:exclamation:**_This feature is supported on ClickHouse version 19.1.16 and above, codec arguments will be ignored by the ORM for ClickHouse versions lower than 19.1.16_**
Working with LowCardinality fields
----------------------------------
Starting with version 19.0 ClickHouse offers a new type of field to improve the performance of queries
and compaction of columns for low entropy data.
[More specifically](https://github.com/yandex/ClickHouse/issues/4074) LowCardinality data type builds dictionaries automatically. It can use multiple different dictionaries if necessarily.
If the number of distinct values is pretty large, the dictionaries become local, several different dictionaries will be used for different ranges of data. For example, if you have too many distinct values in total, but only less than about a million values each day - then the queries by day will be processed efficiently, and queries for larger ranges will be processed rather efficiently.
LowCardinality works independently of (generic) fields compression.
LowCardinality fields are subsequently compressed as usual.
The compression ratios of LowCardinality fields for text data may be significantly better than without LowCardinality.
LowCardinality will give performance boost, in the form of processing speed, if the number of distinct values is less than a few millions. This is because data is processed in dictionary encoded form.
You can find further information about LowCardinality in [this presentation](https://github.com/yandex/clickhouse-presentations/blob/master/meetup19/string_optimization.pdf).
Usage example:
```python
class LowCardinalityModel(models.Model):
date = fields.DateField()
int32 = fields.LowCardinalityField(fields.Int32Field())
float32 = fields.LowCardinalityField(fields.Float32Field())
string = fields.LowCardinalityField(fields.StringField())
nullable = fields.LowCardinalityField(fields.NullableField(fields.StringField()))
array = fields.ArrayField(fields.LowCardinalityField(fields.UInt64Field()))
engine = MergeTree('date', ('date',))
```
:exclamation:**_LowCardinality field with inner array field is not supported. Use Array field with LowCardinality inner field as seen in the example._**
Creating custom field types
---------------------------
Sometimes it is convenient to use data types that are supported in Python, but have no corresponding column type in ClickHouse. In these cases it is possible to define a custom field class that knows how to convert the Pythonic object to a suitable representation in the database, and vice versa.

View File

@ -37,6 +37,8 @@
* [Working with materialized and alias fields](field_types.md#working-with-materialized-and-alias-fields)
* [Working with nullable fields](field_types.md#working-with-nullable-fields)
* [Working with field compression codecs](field_types.md#working-with-field-compression-codecs)
* [Working with LowCardinality fields](field_types.md#working-with-lowcardinality-fields)
* [Creating custom field types](field_types.md#creating-custom-field-types)
* [Table Engines](table_engines.md#table-engines)
* [Simple Engines](table_engines.md#simple-engines)