* Update docs

2025-06-30 09:53:04 +03:00 · 2014-12-30 21:20:34 +11:00 · 2014-12-30 21:20:34 +11:00 · cdc1a27104
commit cdc1a27104
parent bb0b00f819
9 changed files with 6 additions and 188 deletions
--- a/docs/source/api.rst
+++ b/docs/source/api.rst
@ -23,6 +23,10 @@ under spacy.en.defs.
 .. autommodule:: spacy.en.pos
   :members:
 .. automodule:: spacy.en.attrs
   :members:
   :undoc-members:
 The Tokens Class
 ----------------
--- a/docs/source/how/api/index.rst
+++ b/docs/source/how/api/index.rst
@ -1,8 +0,0 @@
 API
 ===
 .. toctree::
    :maxdepth: 2
    tokenizers/index.rst
    lexicon.rst
--- a/docs/source/how/api/lexicon.rst
+++ b/docs/source/how/api/lexicon.rst
@ -1,6 +0,0 @@
 spacy.word.Lexeme
 =================
 .. autoclass:: spacy.word.Lexeme
    :members:
--- a/docs/source/how/api/tokenizers/en.rst
+++ b/docs/source/how/api/tokenizers/en.rst
@ -1,94 +0,0 @@
 spacy.en.EN
 ============
 .. automodule:: spacy.en
 Tokenizer API
 -------------
 .. automethod:: spacy.en.EN.tokenize
    :noindex:
 .. automethod:: spacy.en.EN.lookup
    :noindex:
 Lexeme Features Flag IDs
 ------------------------
 A number of boolean features are computed for English Lexemes. To access a feature,
 pass its ID to the :py:meth:`spacy.word.Lexeme.check_flag` function.
 Orthographic Features
 ---------------------
 These features describe the `orthographic` (lettering) type of the word. The
 function used to compute the value is listed along with the flag.
 .. data:: IS_ALPHA
    :py:func:`spacy.orth.is_alpha`
 .. data:: IS_DIGIT
    :py:func:`spacy.orth.is_digit`
 .. data:: IS_UPPER
    :py:func:`spacy.orth.is_upper`
 .. data:: IS_PUNCT
    :py:func:`spacy.orth.is_punct`
 .. data:: IS_SPACE
    :py:func:`spacy.orth.is_space`
 .. data:: IS_ASCII
    :py:func:`spacy.orth.is_ascii`
 .. data:: IS_TITLE
    :py:func:`spacy.orth.is_title`
 .. data:: IS_LOWER
    :py:func:`spacy.orth.is_lower`
 .. data:: IS_UPPER
    :py:func:`spacy.orth.is_upper`
 Distributional Orthographic Features
 ------------------------------------
 These features describe how often the lower-cased form of the word appears
 in various case-styles in a large sample of English text. See :py:func:`spacy.orth.oft_case`
 .. data:: OFT_UPPER
 .. data:: OFT_LOWER
 .. data:: OFT_TITLE
 Tag Dictionary Features
 -----------------------
 These features describe whether the word commonly occurs with a given
 part-of-speech, in a large text corpus, using a part-of-speech tagger designed
 to reduce the tag-dictionary bias of its training corpus. See
 :py:func:`spacy.orth.can_tag`.
 .. data:: CAN_PUNCT
 .. data:: CAN_CONJ
 .. data:: CAN_NUM
 .. data:: CAN_DET
 .. data:: CAN_ADP
 .. data:: CAN_ADJ
 .. data:: CAN_ADV
 .. data:: CAN_VERB
 .. data:: CAN_NOUN
 .. data:: CAN_PDT
 .. data:: CAN_POS
 .. data:: CAN_PRON
 .. data:: CAN_PRT
--- a/docs/source/how/api/tokenizers/index.rst
+++ b/docs/source/how/api/tokenizers/index.rst
@ -1,8 +0,0 @@
 Tokenizers
 ===================================
 Each module listed here implements a different tokenization scheme, usually
 intended for a specific language.
 .. toctree::
    en.rst
--- a/docs/source/how/index.rst
+++ b/docs/source/how/index.rst
@ -1,13 +0,0 @@
 How
 ===
 Tutorial
 --------
 Installation
 ------------
 API
 ---
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -79,9 +79,11 @@ you'll find NLTK etc much more expensive, because what you save on license
 cost, you'll lose many times over in lost productivity. $5000 does not buy you
 much developer time.
 .. toctree::
    :hidden:
    :maxdepth: 3
    features.rst
    license_stories.rst 
    api.rst
--- a/docs/source/what/index.rst
+++ b/docs/source/what/index.rst
@ -1,31 +0,0 @@
 What
 ====
 Overview
 --------
 Feature List
 ------------
 License (for the code)
 -------
 +------------------+------+
 | Non-commercial   | $0   |
 +------------------+------+
 | Trial commercial | $0   |
 +------------------+------+
 | Full commercial  | $500 |
 +------------------+------+
 spaCy is non-free software. Its source is published, but the copyright is
 retained by the author (Matthew Honnibal).  Licenses are currently under preparation.
 There is currently a gap between the output of academic NLP researchers, and
 the needs of a small software companiess. I left academia to try to correct this.
 My idea is that non-commercial and trial commercial use should "feel" just like
 free software. But, if you do use the code in a commercial product, a small
 fixed license-fee will apply, in order to fund development. 
 Pricing (for the data)
 ----------------------
--- a/docs/source/why/index.rst
+++ b/docs/source/why/index.rst
@ -1,28 +0,0 @@
 Why
 ===
 Benchmarks
 ----------
 Efficiency
 ----------
 +--------+-------+--------------+--------------+
 | System | Time	 | Words/second | Speed Factor |
 +--------+-------+--------------+--------------+
 | NLTK	 | 6m4s  | 89,000       | 1.00         |
 +--------+-------+--------------+--------------+
 | spaCy	 | 9.5s	 | 3,093,000	| 38.30        |
 +--------+-------+--------------+--------------+
 Accuracy
 --------
 The comparison refers to 30 million words from the English Gigaword, on
 a Maxbook Air.  For context, calling string.split() on the data completes in
 about 5s.
 Pros and Cons
 -------------