diff --git a/README b/README index a1320b1..636c9cb 100644 --- a/README +++ b/README @@ -1 +1,57 @@ -README.rst +Introduction +============ + +This project is simple ORM for working with the [ClickHouse database](https://clickhouse.yandex/). +It allows you to define model classes whose instances can be written to the database and read from it. + +Let's jump right in with a simple example of monitoring CPU usage. First we need to define the model class, +connect to the database and create a table for the model: + +```python +from infi.clickhouse_orm.database import Database +from infi.clickhouse_orm.models import Model +from infi.clickhouse_orm.fields import * +from infi.clickhouse_orm.engines import Memory + +class CPUStats(Model): + + timestamp = DateTimeField() + cpu_id = UInt16Field() + cpu_percent = Float32Field() + + engine = Memory() + +db = Database('demo') +db.create_table(CPUStats) +``` + +Now we can collect usage statistics per CPU, and write them to the database: + +```python +import psutil, time, datetime + +psutil.cpu_percent(percpu=True) # first sample should be discarded +while True: + time.sleep(1) + stats = psutil.cpu_percent(percpu=True) + timestamp = datetime.datetime.now() + db.insert([ + CPUStats(timestamp=timestamp, cpu_id=cpu_id, cpu_percent=cpu_percent) + for cpu_id, cpu_percent in enumerate(stats) + ]) +``` + +Querying the table is easy, using either the query builder or raw SQL: + +```python +# Calculate what percentage of the time CPU 1 was over 95% busy +total = CPUStats.objects_in(db).filter(cpu_id=1).count() +busy = CPUStats.objects_in(db).filter(cpu_id=1, cpu_percent__gt=95).count() +print 'CPU 1 was busy {:.2f}% of the time'.format(busy * 100.0 / total) + +# Calculate the average usage per CPU +for row in db.select('SELECT cpu_id, avg(cpu_percent) AS average FROM demo.cpustats GROUP BY cpu_id'): + print 'CPU {row.cpu_id}: {row.average:.2f}%'.format(row=row) +``` + +To learn more please visit the [documentation](docs/index.md). \ No newline at end of file diff --git a/README.rst b/README.rst deleted file mode 100644 index 0951776..0000000 --- a/README.rst +++ /dev/null @@ -1,436 +0,0 @@ -Overview -======== - -This project is simple ORM for working with the `ClickHouse database `_. -It allows you to define model classes whose instances can be written to the database and read from it. - -Installation -============ - -To install infi.clickhouse_orm:: - - pip install infi.clickhouse_orm - -Usage -===== - -Defining Models ---------------- - -Models are defined in a way reminiscent of Django's ORM:: - - from infi.clickhouse_orm import models, fields, engines - - class Person(models.Model): - - first_name = fields.StringField() - last_name = fields.StringField() - birthday = fields.DateField() - height = fields.Float32Field() - - engine = engines.MergeTree('birthday', ('first_name', 'last_name', 'birthday')) - -It is possible to provide a default value for a field, instead of its "natural" default (empty string for string fields, zero for numeric fields etc.). -Alternatively it is possible to pass alias or materialized parameters (see below for usage examples). -Only one of ``default``, ``alias`` and ``materialized`` parameters can be provided. - -See below for the supported field types and table engines. - -Table Names -*********** - -The table name used for the model is its class name, converted to lowercase. To override the default name, -implement the ``table_name`` method:: - - class Person(models.Model): - - ... - - @classmethod - def table_name(cls): - return 'people' - -Using Models ------------- - -Once you have a model, you can create model instances:: - - >>> dan = Person(first_name='Dan', last_name='Schwartz') - >>> suzy = Person(first_name='Suzy', last_name='Jones') - >>> dan.first_name - u'Dan' - -When values are assigned to model fields, they are immediately converted to their Pythonic data type. -In case the value is invalid, a ``ValueError`` is raised:: - - >>> suzy.birthday = '1980-01-17' - >>> suzy.birthday - datetime.date(1980, 1, 17) - >>> suzy.birthday = 0.5 - ValueError: Invalid value for DateField - 0.5 - >>> suzy.birthday = '1922-05-31' - ValueError: DateField out of range - 1922-05-31 is not between 1970-01-01 and 2038-01-19 - -Inserting to the Database -------------------------- - -To write your instances to ClickHouse, you need a ``Database`` instance:: - - from infi.clickhouse_orm.database import Database - - db = Database('my_test_db') - -This automatically connects to http://localhost:8123 and creates a database called my_test_db, unless it already exists. -If necessary, you can specify a different database URL and optional credentials:: - - db = Database('my_test_db', db_url='http://192.168.1.1:8050', username='scott', password='tiger') - -Using the ``Database`` instance you can create a table for your model, and insert instances to it:: - - db.create_table(Person) - db.insert([dan, suzy]) - -The ``insert`` method can take any iterable of model instances, but they all must belong to the same model class. - -Creating a read-only database is also supported. Such a ``Database`` instance can only read data, and cannot -modify data or schemas:: - - db = Database('my_test_db', readonly=True) - -Reading from the Database -------------------------- - -Loading model instances from the database is simple:: - - for person in db.select("SELECT * FROM my_test_db.person", model_class=Person): - print person.first_name, person.last_name - -Do not include a ``FORMAT`` clause in the query, since the ORM automatically sets the format to ``TabSeparatedWithNamesAndTypes``. - -It is possible to select only a subset of the columns, and the rest will receive their default values:: - - for person in db.select("SELECT first_name FROM my_test_db.person WHERE last_name='Smith'", model_class=Person): - print person.first_name - -SQL Placeholders -**************** - -There are a couple of special placeholders that you can use inside the SQL to make it easier to write: -``$db`` and ``$table``. The first one is replaced by the database name, and the second is replaced by -the database name plus table name (but is available only when the model is specified). - -So instead of this:: - - db.select("SELECT * FROM my_test_db.person", model_class=Person) - -you can use:: - - db.select("SELECT * FROM $db.person", model_class=Person) - -or even:: - - db.select("SELECT * FROM $table", model_class=Person) - -Ad-Hoc Models -************* - -Specifying a model class is not required. In case you do not provide a model class, an ad-hoc class will -be defined based on the column names and types returned by the query:: - - for row in db.select("SELECT max(height) as max_height FROM my_test_db.person"): - print row.max_height - -This is a very convenient feature that saves you the need to define a model for each query, while still letting -you work with Pythonic column values and an elegant syntax. - -Counting --------- - -The ``Database`` class also supports counting records easily:: - - >>> db.count(Person) - 117 - >>> db.count(Person, conditions="height > 1.90") - 6 - -Pagination ----------- - -It is possible to paginate through model instances:: - - >>> order_by = 'first_name, last_name' - >>> page = db.paginate(Person, order_by, page_num=1, page_size=10) - >>> print page.number_of_objects - 2507 - >>> print page.pages_total - 251 - >>> for person in page.objects: - >>> # do something - -The ``paginate`` method returns a ``namedtuple`` containing the following fields: - -- ``objects`` - the list of objects in this page -- ``number_of_objects`` - total number of objects in all pages -- ``pages_total`` - total number of pages -- ``number`` - the page number, starting from 1; the special value -1 may be used to retrieve the last page -- ``page_size`` - the number of objects per page - -You can optionally pass conditions to the query:: - - >>> page = db.paginate(Person, order_by, page_num=1, page_size=100, conditions='height > 1.90') - -Note that ``order_by`` must be chosen so that the ordering is unique, otherwise there might be -inconsistencies in the pagination (such as an instance that appears on two different pages). - - -System models -------------- - -`Clickhouse docs `_. - -System models are read only models for implementing part of the system's functionality, -and for providing access to information about how the system is working. - -Currently the following system models are supported: - -=================== ============ =================================================== -Class DB Table Comments -=================== ============ =================================================== -SystemPart system.parts Gives methods to work with partitions. See below. -=================== ============ =================================================== - - -Partitions and parts --------------------- - -`ClickHouse docs `_. - -A partition in a table is data for a single calendar month. Table "system.parts" contains information about each part. - -=================== ======================= ============================================================================================= -Method Parameters Comments -=================== ======================= ============================================================================================= -get(static) database, conditions="" Gets database partitions, filtered by conditions -get_active(static) database, conditions="" Gets only active (not detached or dropped) partitions, filtered by conditions -detach settings=None Detaches the partition. Settings is a dict of params to pass to http request -drop settings=None Drops the partition. Settings is a dict of params to pass to http request -attach settings=None Attaches already detached partition. Settings is a dict of params to pass to http request -freeze settings=None Freezes (makes backup) of the partition. Settings is a dict of params to pass to http request -fetch settings=None Fetches partition. Settings is a dict of params to pass to http request -=================== ======================= ============================================================================================= - -Usage example:: - - from infi.clickhouse_orm.database import Database - from infi.clickhouse_orm.system_models import SystemPart - db = Database('my_test_db', db_url='http://192.168.1.1:8050', username='scott', password='tiger') - partitions = SystemPart.get_active(db, conditions='') # Getting all active partitions of the database - if len(partitions) > 0: - partitions = sorted(partitions, key=lambda obj: obj.name) # Partition name is YYYYMM, so we can sort so - partitions[0].freeze() # Make a backup in /opt/clickhouse/shadow directory - partitions[0].drop() # Dropped partition - -``Note``: system.parts stores information for all databases. To be correct, -SystemPart model was designed to receive only given database parts. - - -Schema Migrations ------------------ - -Over time, your models may change and the database will have to be modified accordingly. -Migrations allow you to describe these changes succinctly using Python, and to apply them -to the database. A migrations table automatically keeps track of which migrations were already applied. - -For details please refer to the MIGRATIONS.rst document. - -Field Types ------------ - -Currently the following field types are supported: - -=================== ======== ================= =================================================== -Class DB Type Pythonic Type Comments -=================== ======== ================= =================================================== -StringField String unicode Encoded as UTF-8 when written to ClickHouse -FixedStringField String unicode Encoded as UTF-8 when written to ClickHouse -DateField Date datetime.date Range 1970-01-01 to 2038-01-19 -DateTimeField DateTime datetime.datetime Minimal value is 1970-01-01 00:00:00; Always in UTC -Int8Field Int8 int Range -128 to 127 -Int16Field Int16 int Range -32768 to 32767 -Int32Field Int32 int Range -2147483648 to 2147483647 -Int64Field Int64 int/long Range -9223372036854775808 to 9223372036854775807 -UInt8Field UInt8 int Range 0 to 255 -UInt16Field UInt16 int Range 0 to 65535 -UInt32Field UInt32 int Range 0 to 4294967295 -UInt64Field UInt64 int/long Range 0 to 18446744073709551615 -Float32Field Float32 float -Float64Field Float64 float -Enum8Field Enum8 Enum See below -Enum16Field Enum16 Enum See below -ArrayField Array list See below -=================== ======== ================= =================================================== - -DateTimeField and Time Zones -**************************** - -A ``DateTimeField`` can be assigned values from one of the following types: - -- datetime -- date -- integer - number of seconds since the Unix epoch -- string in ``YYYY-MM-DD HH:MM:SS`` format - -The assigned value always gets converted to a timezone-aware ``datetime`` in UTC. If the assigned -value is a timezone-aware ``datetime`` in another timezone, it will be converted to UTC. Otherwise, the assigned value is assumed to already be in UTC. - -DateTime values that are read from the database are also converted to UTC. ClickHouse formats them according to the -timezone of the server, and the ORM makes the necessary conversions. This requires a ClickHouse version which is new -enough to support the ``timezone()`` function, otherwise it is assumed to be using UTC. In any case, we recommend -settings the server timezone to UTC in order to prevent confusion. - -Working with enum fields -************************ - -``Enum8Field`` and ``Enum16Field`` provide support for working with ClickHouse enum columns. They accept -strings or integers as values, and convert them to the matching Pythonic Enum member. - -Python 3.4 and higher supports Enums natively. When using previous Python versions you -need to install the `enum34` library. - -Example of a model with an enum field:: - - Gender = Enum('Gender', 'male female unspecified') - - class Person(models.Model): - - first_name = fields.StringField() - last_name = fields.StringField() - birthday = fields.DateField() - gender = fields.Enum32Field(Gender) - - engine = engines.MergeTree('birthday', ('first_name', 'last_name', 'birthday')) - - suzy = Person(first_name='Suzy', last_name='Jones', gender=Gender.female) - -Working with array fields -************************* - -You can create array fields containing any data type, for example:: - - class SensorData(models.Model): - - date = fields.DateField() - temperatures = fields.ArrayField(fields.Float32Field()) - humidity_levels = fields.ArrayField(fields.UInt8Field()) - - engine = engines.MergeTree('date', ('date',)) - - data = SensorData(date=date.today(), temperatures=[25.5, 31.2, 28.7], humidity_levels=[41, 39, 66]) - - -Working with materialized and alias fields -****************************************** - -ClickHouse provides an opportunity to create MATERIALIZED and ALIAS Fields. - -See documentation `here `_. - -Both field types can't be inserted into the database directly, so they are ignored when using the ``Database.insert()`` method. -ClickHouse does not return the field values if you use ``"SELECT * FROM ..."`` - you have to list these field -names explicitly in the query. - -Usage:: - - class Event(models.Model): - - created = fields.DateTimeField() - created_date = fields.DateTimeField(materialized='toDate(created)') - name = fields.StringField() - username = fields.StringField(alias='name') - - engine = engines.MergeTree('created_date', ('created_date', 'created')) - - obj = Event(created=datetime.now(), name='MyEvent') - db = Database('my_test_db') - db.insert([obj]) - # All values will be retrieved from database - db.select('SELECT created, created_date, username, name FROM $db.event', model_class=Event) - # created_date and username will contain a default value - db.select('SELECT * FROM $db.event', model_class=Event) - - -Table Engines -------------- - -Each model must have an engine instance, used when creating the table in ClickHouse. - -To define a ``MergeTree`` engine, supply the date column name and the names (or expressions) for the key columns:: - - engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate')) - -You may also provide a sampling expression:: - - engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'), sampling_expr='intHash32(UserID)') - -A ``CollapsingMergeTree`` engine is defined in a similar manner, but requires also a sign column:: - - engine = engines.CollapsingMergeTree('EventDate', ('CounterID', 'EventDate'), 'Sign') - -For a ``SummingMergeTree`` you can optionally specify the summing columns:: - - engine = engines.SummingMergeTree('EventDate', ('OrderID', 'EventDate', 'BannerID'), - summing_cols=('Shows', 'Clicks', 'Cost')) - -For a ``ReplacingMergeTree`` you can optionally specify the version column:: - - engine = engines.ReplacingMergeTree('EventDate', ('OrderID', 'EventDate', 'BannerID'), ver_col='Version') - -A ``Buffer`` engine is available for BufferModels. (See below how to use BufferModel). You can specify following parameters:: - - engine = engines.Buffer(Person) # you need to initialize engine with main Model. Other default parameters will be used - # or: - engine = engines.Buffer(Person, num_layers=16, min_time=10, - max_time=100, min_rows=10000, max_rows=1000000, - min_bytes=10000000, max_bytes=100000000) - -Buffer Models -------------- -Here's how you can define Model for Buffer Engine. The Buffer Model should be inherited from models.BufferModel and main Model:: - - class PersonBuffer(models.BufferModel, Person): - - engine = engines.Buffer(Person) - -Then you can insert objects into Buffer model and they will be handled by Clickhouse properly:: - - db.create_table(PersonBuffer) - suzy = PersonBuffer(first_name='Suzy', last_name='Jones') - dan = PersonBuffer(first_name='Dan', last_name='Schwartz') - db.insert([dan, suzy]) - - -Data Replication -**************** - -Any of the above engines can be converted to a replicated engine (e.g. ``ReplicatedMergeTree``) by adding two parameters, ``replica_table_path`` and ``replica_name``:: - - engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'), - replica_table_path='/clickhouse/tables/{layer}-{shard}/hits', - replica_name='{replica}') - -Development -=========== - -After cloning the project, run the following commands:: - - easy_install -U infi.projector - cd infi.clickhouse_orm - projector devenv build - -To run the tests, ensure that the ClickHouse server is running on http://localhost:8123/ (this is the default), and run:: - - bin/nosetests - -To see test coverage information run:: - - bin/nosetests --with-coverage --cover-package=infi.clickhouse_orm diff --git a/scripts/docs2html.sh b/scripts/docs2html.sh index 4916530..b8d87dd 100755 --- a/scripts/docs2html.sh +++ b/scripts/docs2html.sh @@ -3,5 +3,8 @@ mkdir -p ../htmldocs find ./ -iname "*.md" -type f -exec sh -c 'echo "Converting ${0}"; pandoc "${0}" -s -o "../htmldocs/${0%.md}.html"' {} \; +echo "Converting README" +pandoc ../README -s -o "../htmldocs/README.html" + echo "Fixing links" sed -i 's/\.md/\.html/g' ../htmldocs/*.html