From cdc1a2710421cd053b17fcfb0dfb17bd66a1e1af Mon Sep 17 00:00:00 2001 From: Matthew Honnibal Date: Tue, 30 Dec 2014 21:20:34 +1100 Subject: [PATCH] * Update docs --- docs/source/api.rst | 4 + docs/source/how/api/index.rst | 8 -- docs/source/how/api/lexicon.rst | 6 -- docs/source/how/api/tokenizers/en.rst | 94 ------------------------ docs/source/how/api/tokenizers/index.rst | 8 -- docs/source/how/index.rst | 13 ---- docs/source/index.rst | 2 + docs/source/what/index.rst | 31 -------- docs/source/why/index.rst | 28 ------- 9 files changed, 6 insertions(+), 188 deletions(-) delete mode 100644 docs/source/how/api/index.rst delete mode 100644 docs/source/how/api/lexicon.rst delete mode 100644 docs/source/how/api/tokenizers/en.rst delete mode 100644 docs/source/how/api/tokenizers/index.rst delete mode 100644 docs/source/how/index.rst delete mode 100644 docs/source/what/index.rst delete mode 100644 docs/source/why/index.rst diff --git a/docs/source/api.rst b/docs/source/api.rst index 1a576c211..7e2b519c2 100644 --- a/docs/source/api.rst +++ b/docs/source/api.rst @@ -23,6 +23,10 @@ under spacy.en.defs. .. autommodule:: spacy.en.pos :members: +.. automodule:: spacy.en.attrs + :members: + :undoc-members: + The Tokens Class ---------------- diff --git a/docs/source/how/api/index.rst b/docs/source/how/api/index.rst deleted file mode 100644 index 0bc6afeae..000000000 --- a/docs/source/how/api/index.rst +++ /dev/null @@ -1,8 +0,0 @@ -API -=== - -.. toctree:: - :maxdepth: 2 - - tokenizers/index.rst - lexicon.rst diff --git a/docs/source/how/api/lexicon.rst b/docs/source/how/api/lexicon.rst deleted file mode 100644 index d1aeed990..000000000 --- a/docs/source/how/api/lexicon.rst +++ /dev/null @@ -1,6 +0,0 @@ -spacy.word.Lexeme -================= - - -.. autoclass:: spacy.word.Lexeme - :members: diff --git a/docs/source/how/api/tokenizers/en.rst b/docs/source/how/api/tokenizers/en.rst deleted file mode 100644 index 3c278c151..000000000 --- a/docs/source/how/api/tokenizers/en.rst +++ /dev/null @@ -1,94 +0,0 @@ -spacy.en.EN -============ - -.. automodule:: spacy.en - -Tokenizer API -------------- - -.. automethod:: spacy.en.EN.tokenize - :noindex: - -.. automethod:: spacy.en.EN.lookup - :noindex: - -Lexeme Features Flag IDs ------------------------- - -A number of boolean features are computed for English Lexemes. To access a feature, -pass its ID to the :py:meth:`spacy.word.Lexeme.check_flag` function. - -Orthographic Features ---------------------- - -These features describe the `orthographic` (lettering) type of the word. The -function used to compute the value is listed along with the flag. - -.. data:: IS_ALPHA - - :py:func:`spacy.orth.is_alpha` - -.. data:: IS_DIGIT - - :py:func:`spacy.orth.is_digit` - -.. data:: IS_UPPER - - :py:func:`spacy.orth.is_upper` - -.. data:: IS_PUNCT - - :py:func:`spacy.orth.is_punct` - -.. data:: IS_SPACE - - :py:func:`spacy.orth.is_space` - -.. data:: IS_ASCII - - :py:func:`spacy.orth.is_ascii` - -.. data:: IS_TITLE - - :py:func:`spacy.orth.is_title` - -.. data:: IS_LOWER - - :py:func:`spacy.orth.is_lower` - -.. data:: IS_UPPER - - :py:func:`spacy.orth.is_upper` - -Distributional Orthographic Features ------------------------------------- - -These features describe how often the lower-cased form of the word appears -in various case-styles in a large sample of English text. See :py:func:`spacy.orth.oft_case` - -.. data:: OFT_UPPER -.. data:: OFT_LOWER -.. data:: OFT_TITLE - - -Tag Dictionary Features ------------------------ - -These features describe whether the word commonly occurs with a given -part-of-speech, in a large text corpus, using a part-of-speech tagger designed -to reduce the tag-dictionary bias of its training corpus. See -:py:func:`spacy.orth.can_tag`. - -.. data:: CAN_PUNCT -.. data:: CAN_CONJ -.. data:: CAN_NUM -.. data:: CAN_DET -.. data:: CAN_ADP -.. data:: CAN_ADJ -.. data:: CAN_ADV -.. data:: CAN_VERB -.. data:: CAN_NOUN -.. data:: CAN_PDT -.. data:: CAN_POS -.. data:: CAN_PRON -.. data:: CAN_PRT diff --git a/docs/source/how/api/tokenizers/index.rst b/docs/source/how/api/tokenizers/index.rst deleted file mode 100644 index b19ca207d..000000000 --- a/docs/source/how/api/tokenizers/index.rst +++ /dev/null @@ -1,8 +0,0 @@ -Tokenizers -=================================== - -Each module listed here implements a different tokenization scheme, usually -intended for a specific language. - -.. toctree:: - en.rst diff --git a/docs/source/how/index.rst b/docs/source/how/index.rst deleted file mode 100644 index bd995f8c8..000000000 --- a/docs/source/how/index.rst +++ /dev/null @@ -1,13 +0,0 @@ -How -=== - -Tutorial --------- - -Installation ------------- - -API ---- - - diff --git a/docs/source/index.rst b/docs/source/index.rst index ecfb9af37..e409fd88b 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -79,9 +79,11 @@ you'll find NLTK etc much more expensive, because what you save on license cost, you'll lose many times over in lost productivity. $5000 does not buy you much developer time. + .. toctree:: :hidden: :maxdepth: 3 features.rst license_stories.rst + api.rst diff --git a/docs/source/what/index.rst b/docs/source/what/index.rst deleted file mode 100644 index 8f263fe5f..000000000 --- a/docs/source/what/index.rst +++ /dev/null @@ -1,31 +0,0 @@ -What -==== - -Overview --------- - -Feature List ------------- - -License (for the code) -------- - -+------------------+------+ -| Non-commercial | $0 | -+------------------+------+ -| Trial commercial | $0 | -+------------------+------+ -| Full commercial | $500 | -+------------------+------+ - -spaCy is non-free software. Its source is published, but the copyright is -retained by the author (Matthew Honnibal). Licenses are currently under preparation. - -There is currently a gap between the output of academic NLP researchers, and -the needs of a small software companiess. I left academia to try to correct this. -My idea is that non-commercial and trial commercial use should "feel" just like -free software. But, if you do use the code in a commercial product, a small -fixed license-fee will apply, in order to fund development. - -Pricing (for the data) ----------------------- diff --git a/docs/source/why/index.rst b/docs/source/why/index.rst deleted file mode 100644 index 8f1f78272..000000000 --- a/docs/source/why/index.rst +++ /dev/null @@ -1,28 +0,0 @@ -Why -=== - -Benchmarks ----------- - -Efficiency ----------- - -+--------+-------+--------------+--------------+ -| System | Time | Words/second | Speed Factor | -+--------+-------+--------------+--------------+ -| NLTK | 6m4s | 89,000 | 1.00 | -+--------+-------+--------------+--------------+ -| spaCy | 9.5s | 3,093,000 | 38.30 | -+--------+-------+--------------+--------------+ - - -Accuracy --------- - -The comparison refers to 30 million words from the English Gigaword, on -a Maxbook Air. For context, calling string.split() on the data completes in -about 5s. - -Pros and Cons -------------- -