refactor documentation

This commit is contained in:
Itai Shirav 2017-04-26 15:47:02 +03:00
parent abbe334875
commit 78bb857c8a
14 changed files with 830 additions and 0 deletions

2
.gitignore vendored
View File

@ -57,3 +57,5 @@ buildout.in
src/infi/clickhouse_orm/__version__.py
bootstrap.py
htmldocs/

16
docs/contributing.md Normal file
View File

@ -0,0 +1,16 @@
Contributing
============
After cloning the project, run the following commands:
easy_install -U infi.projector
cd infi.clickhouse_orm
projector devenv build
To run the tests, ensure that the ClickHouse server is running on <http://localhost:8123/> (this is the default), and run:
bin/nosetests
To see test coverage information run:
bin/nosetests --with-coverage --cover-package=infi.clickhouse_orm

104
docs/field_types.md Normal file
View File

@ -0,0 +1,104 @@
Field Types
===========
Currently the following field types are supported:
| Class | DB Type | Pythonic Type | Comments
| ------------------ | ---------- | ------------------- | -----------------------------------------------------
| StringField | String | unicode | Encoded as UTF-8 when written to ClickHouse
| FixedStringField | String | unicode | Encoded as UTF-8 when written to ClickHouse
| DateField | Date | datetime.date | Range 1970-01-01 to 2038-01-19
| DateTimeField | DateTime | datetime.datetime | Minimal value is 1970-01-01 00:00:00; Always in UTC
| Int8Field | Int8 | int | Range -128 to 127
| Int16Field | Int16 | int | Range -32768 to 32767
| Int32Field | Int32 | int | Range -2147483648 to 2147483647
| Int64Field | Int64 | int/long | Range -9223372036854775808 to 9223372036854775807
| UInt8Field | UInt8 | int | Range 0 to 255
| UInt16Field | UInt16 | int | Range 0 to 65535
| UInt32Field | UInt32 | int | Range 0 to 4294967295
| UInt64Field | UInt64 | int/long | Range 0 to 18446744073709551615
| Float32Field | Float32 | float |
| Float64Field | Float64 | float |
| Enum8Field | Enum8 | Enum | See below
| Enum16Field | Enum16 | Enum | See below
| ArrayField | Array | list | See below
DateTimeField and Time Zones
----------------------------
A `DateTimeField` can be assigned values from one of the following types:
- datetime
- date
- integer - number of seconds since the Unix epoch
- string in `YYYY-MM-DD HH:MM:SS` format
The assigned value always gets converted to a timezone-aware `datetime` in UTC. If the assigned value is a timezone-aware `datetime` in another timezone, it will be converted to UTC. Otherwise, the assigned value is assumed to already be in UTC.
DateTime values that are read from the database are also converted to UTC. ClickHouse formats them according to the timezone of the server, and the ORM makes the necessary conversions. This requires a ClickHouse
version which is new enough to support the `timezone()` function, otherwise it is assumed to be using UTC. In any case, we recommend settings the server timezone to UTC in order to prevent confusion.
Working with enum fields
------------------------
`Enum8Field` and `Enum16Field` provide support for working with ClickHouse enum columns. They accept strings or integers as values, and convert them to the matching Pythonic Enum member.
Python 3.4 and higher supports Enums natively. When using previous Python versions you need to install the enum34 library.
Example of a model with an enum field:
Gender = Enum('Gender', 'male female unspecified')
class Person(models.Model):
first_name = fields.StringField()
last_name = fields.StringField()
birthday = fields.DateField()
gender = fields.Enum32Field(Gender)
engine = engines.MergeTree('birthday', ('first_name', 'last_name', 'birthday'))
suzy = Person(first_name='Suzy', last_name='Jones', gender=Gender.female)
Working with array fields
-------------------------
You can create array fields containing any data type, for example:
class SensorData(models.Model):
date = fields.DateField()
temperatures = fields.ArrayField(fields.Float32Field())
humidity_levels = fields.ArrayField(fields.UInt8Field())
engine = engines.MergeTree('date', ('date',))
data = SensorData(date=date.today(), temperatures=[25.5, 31.2, 28.7], humidity_levels=[41, 39, 66])
Working with materialized and alias fields
------------------------------------------
ClickHouse provides an opportunity to create MATERIALIZED and ALIAS Fields.
See documentation [here](https://clickhouse.yandex/reference_en.html#Default%20values).
Both field types can't be inserted into the database directly, so they are ignored when using the `Database.insert()` method. ClickHouse does not return the field values if you use `"SELECT * FROM ..."` - you have to list these field names explicitly in the query.
Usage:
class Event(models.Model):
created = fields.DateTimeField()
created_date = fields.DateTimeField(materialized='toDate(created)')
name = fields.StringField()
username = fields.StringField(alias='name')
engine = engines.MergeTree('created_date', ('created_date', 'created'))
obj = Event(created=datetime.now(), name='MyEvent')
db = Database('my_test_db')
db.insert([obj])
# All values will be retrieved from database
db.select('SELECT created, created_date, username, name FROM $db.event', model_class=Event)
# created_date and username will contain a default value
db.select('SELECT * FROM $db.event', model_class=Event)

11
docs/index.md Normal file
View File

@ -0,0 +1,11 @@
Overview
========
This project is simple ORM for working with the [ClickHouse database](https://clickhouse.yandex/). It allows you to define model classes whose instances can be written to the database and read from it.
Installation
------------
To install infi.clickhouse_orm:
pip install infi.clickhouse_orm

View File

@ -0,0 +1,172 @@
Models and Databases
====================
Models represent ClickHouse tables, allowing you to work with them using familiar pythonic syntax.
Database instances connect to a specific ClickHouse database for running queries, inserting data and other operations.
Defining Models
---------------
Models are defined in a way reminiscent of Django's ORM:
from infi.clickhouse_orm import models, fields, engines
class Person(models.Model):
first_name = fields.StringField()
last_name = fields.StringField()
birthday = fields.DateField()
height = fields.Float32Field()
engine = engines.MergeTree('birthday', ('first_name', 'last_name', 'birthday'))
It is possible to provide a default value for a field, instead of its "natural" default (empty string for string fields, zero for numeric fields etc.). Alternatively it is possible to pass alias or materialized parameters (see below for usage examples). Only one of `default`, `alias` and `materialized` parameters can be provided.
For more details see [Field Types](field_types.md) and [Table Engines](table_engines.md).
### Table Names
The table name used for the model is its class name, converted to lowercase. To override the default name, implement the `table_name` method:
class Person(models.Model):
...
@classmethod
def table_name(cls):
return 'people'
Using Models
------------
Once you have a model, you can create model instances:
>>> dan = Person(first_name='Dan', last_name='Schwartz')
>>> suzy = Person(first_name='Suzy', last_name='Jones')
>>> dan.first_name
u'Dan'
When values are assigned to model fields, they are immediately converted to their Pythonic data type. In case the value is invalid, a `ValueError` is raised:
>>> suzy.birthday = '1980-01-17'
>>> suzy.birthday
datetime.date(1980, 1, 17)
>>> suzy.birthday = 0.5
ValueError: Invalid value for DateField - 0.5
>>> suzy.birthday = '1922-05-31'
ValueError: DateField out of range - 1922-05-31 is not between 1970-01-01 and 2038-01-19
Inserting to the Database
-------------------------
To write your instances to ClickHouse, you need a `Database` instance:
from infi.clickhouse_orm.database import Database
db = Database('my_test_db')
This automatically connects to <http://localhost:8123> and creates a database called my_test_db, unless it already exists. If necessary, you can specify a different database URL and optional credentials:
db = Database('my_test_db', db_url='http://192.168.1.1:8050', username='scott', password='tiger')
Using the `Database` instance you can create a table for your model, and insert instances to it:
db.create_table(Person)
db.insert([dan, suzy])
The `insert` method can take any iterable of model instances, but they all must belong to the same model class.
Creating a read-only database is also supported. Such a `Database` instance can only read data, and cannot modify data or schemas:
db = Database('my_test_db', readonly=True)
Reading from the Database
-------------------------
Loading model instances from the database is simple:
for person in db.select("SELECT * FROM my_test_db.person", model_class=Person):
print person.first_name, person.last_name
Do not include a `FORMAT` clause in the query, since the ORM automatically sets the format to `TabSeparatedWithNamesAndTypes`.
It is possible to select only a subset of the columns, and the rest will receive their default values:
for person in db.select("SELECT first_name FROM my_test_db.person WHERE last_name='Smith'", model_class=Person):
print person.first_name
The ORM provides a way to build simple queries without writing SQL by hand. The previous snippet can be written like this:
for person in Person.objects_in(db).filter(last_name='Smith').only('first_name'):
print person.first_name
See [Querysets](querysets.md) for more information.
Reading without a Model
-----------------------
When running a query, specifying a model class is not required. In case you do not provide a model class, an ad-hoc class will be defined based on the column names and types returned by the query:
for row in db.select("SELECT max(height) as max_height FROM my_test_db.person"):
print row.max_height
This is a very convenient feature that saves you the need to define a model for each query, while still letting you work with Pythonic column values and an elegant syntax.
SQL Placeholders
----------------
There are a couple of special placeholders that you can use inside the SQL to make it easier to write: `$db` and `$table`. The first one is replaced by the database name, and the second is replaced by the database name plus table name (but is available only when the model is specified).
So instead of this:
db.select("SELECT * FROM my_test_db.person", model_class=Person)
you can use:
db.select("SELECT * FROM $db.person", model_class=Person)
or even:
db.select("SELECT * FROM $table", model_class=Person)
Counting
--------
The `Database` class also supports counting records easily:
>>> db.count(Person)
117
>>> db.count(Person, conditions="height > 1.90")
6
Pagination
----------
It is possible to paginate through model instances:
>>> order_by = 'first_name, last_name'
>>> page = db.paginate(Person, order_by, page_num=1, page_size=10)
>>> print page.number_of_objects
2507
>>> print page.pages_total
251
>>> for person in page.objects:
>>> # do something
The `paginate` method returns a `namedtuple` containing the following fields:
- `objects` - the list of objects in this page
- `number_of_objects` - total number of objects in all pages
- `pages_total` - total number of pages
- `number` - the page number, starting from 1; the special value -1
may be used to retrieve the last page
- `page_size` - the number of objects per page
You can optionally pass conditions to the query:
>>> page = db.paginate(Person, order_by, page_num=1, page_size=100, conditions='height > 1.90')
Note that `order_by` must be chosen so that the ordering is unique, otherwise there might be inconsistencies in the pagination (such as an instance that appears on two different pages).

96
docs/querysets.md Normal file
View File

@ -0,0 +1,96 @@
Querysets
=========
A queryset is an object that represents a database query using a specific Model. It is lazy, meaning that it does not hit the database until you iterate over its matching rows (model instances). To create a base queryset for a model class, use:
qs = Person.objects_in(database)
This queryset matches all Person instances in the database. You can get these instances using iteration:
for person in qs:
print person.first_name, person.last_name
Filtering
---------
The `filter` and `exclude` methods are used for filtering the matching instances. Calling these methods returns a new queryset instance, with the added conditions. For example:
>>> qs = Person.objects_in(database)
>>> qs = qs.filter(first_name__startswith='V').exclude(birthday__lt='2000-01-01')
>>> qs.conditions_as_sql()
u"first_name LIKE 'V%' AND NOT (birthday < '2000-01-01') "
It is possible to specify several fields to filter or exclude by:
>>> qs = Person.objects_in(database).filter(last_name='Smith', height__gt=1.75)
>>> qs.conditions_as_sql()
u"last_name = 'Smith' AND height > 1.75"
There are different operators that can be used, by passing `<fieldname>__<operator>=<value>` (two underscores separate the field name from the operator). In case no operator is given, `eq` is used by default. Below are all the supported operators.
| Operator | Equivalent SQL | Comments |
| -------- | -------------------------------------------- | ---------------------------------- |
| `eq` | `field = value` | |
| `gt` | `field > value` | |
| `gte` | `field >= value` | |
| `lt` | `field < value` | |
| `lte` | `field <= value` | |
| `in` | `field IN (values)` | See below |
| `contains` | `field LIKE '%value%'` | For string fields only |
| `startswith` | `field LIKE 'value%'` | For string fields only |
| `endswith` | `field LIKE '%value'` | For string fields only |
| `icontains` | `lowerUTF8(field) LIKE lowerUTF8('%value%')` | For string fields only |
| `istartswith` | `lowerUTF8(field) LIKE lowerUTF8('value%')` | For string fields only |
| `iendswith` | `lowerUTF8(field) LIKE lowerUTF8('%value')` | For string fields only |
| `iexact` | `lowerUTF8(field) = lowerUTF8(value)` | For string fields only |
### Using the `in` Operator
The `in` operator expects one of three types of values:
* A list or tuple of simple values
* A string, which is used verbatim as the contents of the parentheses
* Another queryset (subquery)
For example if we want to select only people with Irish last names:
# A list of simple values
qs = Person.objects_in(database).filter(last_name__in=["Murphy", "O'Sullivan"])
# A string
subquery = "SELECT name from $db.irishlastname"
qs = Person.objects_in(database).filter(last_name__in=subquery)
# A queryset
subquery = IrishLastName.objects_in(database).only("name")
qs = Person.objects_in(database).filter(last_name__in=subquery)
Counting and Checking Existence
-------------------------------
Use the `count` method to get the number of matches:
Person.objects_in(database).count()
To check if there are any matches at all, you can use any of the following equivalent options:
if qs.count(): ...
if bool(qs): ...
if qs: ...
Ordering
--------
To sorting order of the results can be controlled using the `order_by` method:
qs = Person.objects_in(database).order_by('last_name', 'first_name')
The default order is ascending. To use descending order, add a minus sign before the field name:
qs = Person.objects_in(database).order_by('-height')
Omitting Fields
---------------
When not all model fields are needed, it is more efficient to omit them from the query. This is especially true when there are large fields that may slow the query down. Use the `only` method to specify which fields to retrieve:
qs = Person.objects_in(database).only('first_name', 'birthday')

60
docs/schema_migrations.md Normal file
View File

@ -0,0 +1,60 @@
Schema Migrations
=================
Over time, the ORM models in your application may change. Migrations provide a way to modify the database tables according to the changes in your models, without writing raw SQL.
The migrations that were applied to the database are recorded in the `infi_clickhouse_orm_migrations` table, so migrating the database will only apply any missing migrations.
Writing Migrations
------------------
To write migrations, create a Python package. Then create a python file for the initial migration. The migration files must begin with a four-digit number, and will be applied in sequence. For example::
analytics
|
+-- analytics_migrations
|
+-- __init__.py
|
+-- 0001_initial.py
|
+-- 0002_add_user_agents_table.py
Each migration file is expected to contain a list of `operations`, for example::
from infi.clickhouse_orm import migrations
from analytics import models
operations = [
migrations.CreateTable(models.Visits),
migrations.CreateTable(models.Visitors)
]
The following operations are supported:
**CreateTable**
A migration operation that creates a table for a given model class.
**DropTable**
A migration operation that drops the table of a given model class.
**AlterTable**
A migration operation that compares the table of a given model class to the models fields, and alters the table to match the model. The operation can:
- add new columns
- drop obsolete columns
- modify column types
Default values are not altered by this operation.
Running Migrations
------------------
To migrate a database, create a `Database` instance and call its `migrate` method with the package name containing your migrations::
Database('analytics_db').migrate('analytics.analytics_migrations')
Note that you may have more than one migrations package.

42
docs/system_models.md Normal file
View File

@ -0,0 +1,42 @@
System models
=============
[Clickhouse docs](https://clickhouse.yandex/reference_en.html#System%20tables).
System models are read only models for implementing part of the system's functionality, and for providing access to information about how the system is working.
Currently the following system models are supported:
| Class | DB Table | Comments
| ------------ | -------------- | ---------------------------------------------------
| SystemPart | system.parts | Gives methods to work with partitions. See below.
Partitions and parts
--------------------
[ClickHouse docs](https://clickhouse.yandex/reference_en.html#Manipulations%20with%20partitions%20and%20parts).
A partition in a table is data for a single calendar month. Table "system.parts" contains information about each part.
| Method | Parameters | Comments
| --------------------- | ------------------------- | -----------------------------------------------------------------------------------------------
| get(static) | database, conditions="" | Gets database partitions, filtered by conditions
| get_active(static) | database, conditions="" | Gets only active (not detached or dropped) partitions, filtered by conditions
| detach | settings=None | Detaches the partition. Settings is a dict of params to pass to http request
| drop | settings=None | Drops the partition. Settings is a dict of params to pass to http request
| attach | settings=None | Attaches already detached partition. Settings is a dict of params to pass to http request
| freeze | settings=None | Freezes (makes backup) of the partition. Settings is a dict of params to pass to http request
| fetch | settings=None | Fetches partition. Settings is a dict of params to pass to http request
Usage example:
from infi.clickhouse_orm.database import Database
from infi.clickhouse_orm.system_models import SystemPart
db = Database('my_test_db', db_url='http://192.168.1.1:8050', username='scott', password='tiger')
partitions = SystemPart.get_active(db, conditions='') # Getting all active partitions of the database
if len(partitions) > 0:
partitions = sorted(partitions, key=lambda obj: obj.name) # Partition name is YYYYMM, so we can sort so
partitions[0].freeze() # Make a backup in /opt/clickhouse/shadow directory
partitions[0].drop() # Dropped partition
`Note`: system.parts stores information for all databases. To be correct, SystemPart model was designed to receive only parts belonging to the given database instance.

58
docs/table_engines.md Normal file
View File

@ -0,0 +1,58 @@
Table Engines
=============
Each model must have an engine instance, used when creating the table in ClickHouse.
To define a `MergeTree` engine, supply the date column name and the names (or expressions) for the key columns:
engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'))
You may also provide a sampling expression:
engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'), sampling_expr='intHash32(UserID)')
A `CollapsingMergeTree` engine is defined in a similar manner, but requires also a sign column:
engine = engines.CollapsingMergeTree('EventDate', ('CounterID', 'EventDate'), 'Sign')
For a `SummingMergeTree` you can optionally specify the summing columns:
engine = engines.SummingMergeTree('EventDate', ('OrderID', 'EventDate', 'BannerID'),
summing_cols=('Shows', 'Clicks', 'Cost'))
For a `ReplacingMergeTree` you can optionally specify the version column:
engine = engines.ReplacingMergeTree('EventDate', ('OrderID', 'EventDate', 'BannerID'), ver_col='Version')
A `Buffer` engine is available for BufferModels. (See below how to use BufferModel). You can specify following parameters:
engine = engines.Buffer(Person) # you need to initialize engine with main Model. Other default parameters will be used
# or:
engine = engines.Buffer(Person, num_layers=16, min_time=10,
max_time=100, min_rows=10000, max_rows=1000000,
min_bytes=10000000, max_bytes=100000000)
Buffer Models
-------------
Here's how you can define Model for Buffer Engine. The Buffer Model should be inherited from models.BufferModel and main Model:
class PersonBuffer(models.BufferModel, Person):
engine = engines.Buffer(Person)
Then you can insert objects into Buffer model and they will be handled by ClickHouse properly:
db.create_table(PersonBuffer)
suzy = PersonBuffer(first_name='Suzy', last_name='Jones')
dan = PersonBuffer(first_name='Dan', last_name='Schwartz')
db.insert([dan, suzy])
Data Replication
----------------
Any of the above engines can be converted to a replicated engine (e.g. `ReplicatedMergeTree`) by adding two parameters, `replica_table_path` and `replica_name`:
engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'),
replica_table_path='/clickhouse/tables/{layer}-{shard}/hits',
replica_name='{replica}')

33
docs/toc.md Normal file
View File

@ -0,0 +1,33 @@
* [Overview](index.md#overview)
* [Installation](index.md#installation)
* [Models and Databases](models_and_databases.md#models-and-databases)
* [Defining Models](models_and_databases.md#defining-models)
* [Table Names](models_and_databases.md#table-names)
* [Using Models](models_and_databases.md#using-models)
* [Inserting to the Database](models_and_databases.md#inserting-to-the-database)
* [Reading from the Database](models_and_databases.md#reading-from-the-database)
* [Reading without a Model](models_and_databases.md#reading-without-a-model)
* [SQL Placeholders](models_and_databases.md#sql-placeholders)
* [Counting](models_and_databases.md#counting)
* [Pagination](models_and_databases.md#pagination)
* [Field Types](field_types.md#field-types)
* [DateTimeField and Time Zones](field_types.md#datetimefield-and-time-zones)
* [Working with enum fields](field_types.md#working-with-enum-fields)
* [Working with array fields](field_types.md#working-with-array-fields)
* [Working with materialized and alias fields](field_types.md#working-with-materialized-and-alias-fields)
* [Table Engines](table_engines.md#table-engines)
* [Buffer Models](table_engines.md#buffer-models)
* [Data Replication](table_engines.md#data-replication)
* [Schema Migrations](schema_migrations.md#schema-migrations)
* [Writing Migrations](schema_migrations.md#writing-migrations)
* [Running Migrations](schema_migrations.md#running-migrations)
* [System models](system_models.md#system-models)
* [Partitions and parts](system_models.md#partitions-and-parts)
* [Contributing](contributing.md#contributing)

22
scripts/README.md Normal file
View File

@ -0,0 +1,22 @@
This directory contains various scripts for use while developing.
docs2html
---------
Converts markdown docs to html for preview. Requires Pandoc.
Usage:
cd docs
../scripts/docs2html.sh
gh-md-toc
---------
Used by docs2html to generate the table of contents.
test_python3
------------
Creates a Python 3 virtualenv, clones the project into it, and runs the tests.
Usage:
./test_python3.sh

19
scripts/docs2html.sh Executable file
View File

@ -0,0 +1,19 @@
mkdir -p ../htmldocs
echo "Generating table of contents"
../scripts/gh-md-toc \
index.md \
models_and_databases.md \
querysets.md \
field_types.md \
table_engines.md \
schema_migrations.md \
system_models.md \
contributing.md \
> toc.md
find ./ -iname "*.md" -type f -exec sh -c 'echo "Converting ${0}"; pandoc "${0}" -s -o "../htmldocs/${0%.md}.html"' {} \;
echo "Fixing links"
sed -i 's/\.md/\.html/g' ../htmldocs/*.html

185
scripts/gh-md-toc Executable file
View File

@ -0,0 +1,185 @@
#!/usr/bin/env bash
#
# Source: https://github.com/ekalinin/github-markdown-toc
#
# Steps:
#
# 1. Download corresponding html file for some README.md:
# curl -s $1
#
# 2. Discard rows where no substring 'user-content-' (github's markup):
# awk '/user-content-/ { ...
#
# 3.1 Get last number in each row like ' ... </span></a>sitemap.js</h1'.
# It's a level of the current header:
# substr($0, length($0), 1)
#
# 3.2 Get level from 3.1 and insert corresponding number of spaces before '*':
# sprintf("%*s", substr($0, length($0), 1)*2, " ")
#
# 4. Find head's text and insert it inside "* [ ... ]":
# substr($0, match($0, /a>.*<\/h/)+2, RLENGTH-5)
#
# 5. Find anchor and insert it inside "(...)":
# substr($0, match($0, "href=\"[^\"]+?\" ")+6, RLENGTH-8)
#
gh_toc_version="0.4.8"
gh_user_agent="gh-md-toc v$gh_toc_version"
#
# Download rendered into html README.md by its url.
#
#
gh_toc_load() {
local gh_url=$1
if type curl &>/dev/null; then
curl --user-agent "$gh_user_agent" -s "$gh_url"
elif type wget &>/dev/null; then
wget --user-agent="$gh_user_agent" -qO- "$gh_url"
else
echo "Please, install 'curl' or 'wget' and try again."
exit 1
fi
}
#
# Converts local md file into html by GitHub
#
# ➥ curl -X POST --data '{"text": "Hello world github/linguist#1 **cool**, and #1!"}' https://api.github.com/markdown
# <p>Hello world github/linguist#1 <strong>cool</strong>, and #1!</p>'"
gh_toc_md2html() {
local gh_file_md=$1
curl -s --user-agent "$gh_user_agent" \
--data-binary @"$gh_file_md" -H "Content-Type:text/plain" \
https://api.github.com/markdown/raw
}
#
# Is passed string url
#
gh_is_url() {
if [[ $1 == https* || $1 == http* ]]; then
echo "yes"
else
echo "no"
fi
}
#
# TOC generator
#
gh_toc(){
local gh_src=$1
local gh_src_copy=$1
local gh_ttl_docs=$2
if [ "$gh_src" = "" ]; then
echo "Please, enter URL or local path for a README.md"
exit 1
fi
# Show "TOC" string only if working with one document
if [ "$gh_ttl_docs" = "1" ]; then
echo "Table of Contents"
echo "================="
echo ""
gh_src_copy=""
fi
if [ "$(gh_is_url "$gh_src")" == "yes" ]; then
gh_toc_load "$gh_src" | gh_toc_grab "$gh_src_copy"
else
gh_toc_md2html "$gh_src" | gh_toc_grab "$gh_src_copy"
fi
}
#
# Grabber of the TOC from rendered html
#
# $1 — a source url of document.
# It's need if TOC is generated for multiple documents.
#
gh_toc_grab() {
# if closed <h[1-6]> is on the new line, then move it on the prev line
# for example:
# was: The command <code>foo1</code>
# </h1>
# became: The command <code>foo1</code></h1>
sed -e ':a' -e 'N' -e '$!ba' -e 's/\n<\/h/<\/h/g' |
# find strings that corresponds to template
grep -E -o '<a\s*id="user-content-[^"]*".*</h[1-6]' |
# remove code tags
sed 's/<code>//' | sed 's/<\/code>//' |
# now all rows are like:
# <a id="user-content-..." href="..."><span ...></span></a> ... </h1
# format result line
# * $0 — whole string
echo -e "$(awk -v "gh_url=$1" '{
print sprintf("%*s", substr($0, length($0), 1)*3, " ") "* [" substr($0, match($0, /a>.*<\/h/)+2, RLENGTH-5)"](" gh_url substr($0, match($0, "href=\"[^\"]+?\" ")+6, RLENGTH-8) ")"}' | sed 'y/+/ /; s/%/\\x/g')"
}
#
# Returns filename only from full path or url
#
gh_toc_get_filename() {
echo "${1##*/}"
}
#
# Options hendlers
#
gh_toc_app() {
local app_name="gh-md-toc"
if [ "$1" = '--help' ] || [ $# -eq 0 ] ; then
echo "GitHub TOC generator ($app_name): $gh_toc_version"
echo ""
echo "Usage:"
echo " $app_name src [src] Create TOC for a README file (url or local path)"
echo " $app_name - Create TOC for markdown from STDIN"
echo " $app_name --help Show help"
echo " $app_name --version Show version"
return
fi
if [ "$1" = '--version' ]; then
echo "$gh_toc_version"
return
fi
if [ "$1" = "-" ]; then
if [ -z "$TMPDIR" ]; then
TMPDIR="/tmp"
elif [ -n "$TMPDIR" -a ! -d "$TMPDIR" ]; then
mkdir -p "$TMPDIR"
fi
local gh_tmp_md
gh_tmp_md=$(mktemp $TMPDIR/tmp.XXXXXX)
while read input; do
echo "$input" >> "$gh_tmp_md"
done
gh_toc_md2html "$gh_tmp_md" | gh_toc_grab ""
return
fi
for md in "$@"
do
echo ""
gh_toc "$md" "$#"
done
#echo ""
#echo "Created by [gh-md-toc](https://github.com/ekalinin/github-markdown-toc)"
}
#
# Entry point
#
gh_toc_app "$@"

10
scripts/test_python3.sh Normal file
View File

@ -0,0 +1,10 @@
cd /tmp
rm -rf /tmp/orm_env*
virtualenv -p python3 /tmp/orm_env
cd /tmp/orm_env
source bin/activate
pip install infi.projector
git clone https://github.com/Infinidat/infi.clickhouse_orm.git
cd infi.clickhouse_orm
projector devenv build
bin/nosetests