* Upd docs

This commit is contained in:
Matthew Honnibal 2015-01-17 16:19:21 +11:00
parent 2e14f09d2f
commit 7d9306c3bd
3 changed files with 163 additions and 142 deletions

View File

@ -1,15 +1,152 @@
===============
API Reference
Documentation
===============
spaCy provides a number of text-processing components, which can be arranged
into text-processing pipelines. The pipeline should first construct a Tokens
object, which will then be enhanced with various information by subsequent
components. Access is then
Quick Ref
---------
Most users will want to use a pre-prepared pipeline for a given language. This
page first lists these pipelines and their relevant APIs, before listing the
APIs for the individual components.
.. class:: spacy.en.__init__.English(self, data_dir=join(dirname(__file__, 'data')))
:noindex:
.. method:: __call__(self, unicode text, tag=True, parse=False) --> Tokens
+-----------+--------------+--------------+
| Attribute | Type | Useful |
+===========+==============+==============+
| strings | StingStore | __getitem__ |
+-----------+--------------+--------------+
| vocab | Vocab | __getitem__ |
+-----------+--------------+--------------+
| tokenizer | Tokenizer | __call__ |
+-----------+--------------+--------------+
| tagger | EnPosTagger | __call__ |
+-----------+--------------+--------------+
| parser | GreedyParser | __call__ |
+-----------+--------------+--------------+
.. py:class:: spacy.tokens.Tokens(self, vocab: Vocab, string_length=0)
.. py:method:: __getitem__(self, i) --> Token
.. py:method:: __iter__(self) --> Iterator[Token]
.. py:method:: __len__(self) --> int
.. py:method:: to_array(self, attr_ids: List[int]) --> numpy.ndarray[ndim=2, dtype=int32]
.. py:method:: count_by(self, attr_id: int) --> Dict[int, int]
+---------------+-------------+-------------+
| Attribute | Type | Useful |
+===============+=============+=============+
| vocab | Vocab | __getitem__ |
+---------------+-------------+-------------+
| vocab.strings | StringStore | __getitem__ |
+---------------+-------------+-------------+
.. py:class:: spacy.tokens.Token(self, parent: Tokens, i: int)
.. py:method:: __unicode__(self) --> unicode
.. py:method:: __len__(self) --> int
.. py:method:: nbor(self, i=1) --> Token
.. py:method:: child(self, i=1) --> Token
.. py:method:: sibling(self, i=1) --> Token
.. py:attribute:: head: Token
.. py:method:: check_flag(self, attr_id: int) --> bool
+-----------+------+-----------+---------+-----------+-------+
| Attribute | Type | Attribute | Type | Attribute | Type |
+===========+======+===========+=========+===========+=======+
| sic | int | sic_ | unicode | idx | int |
+-----------+------+-----------+---------+-----------+-------+
| lemma | int | lemma_ | unicode | cluster | int |
+-----------+------+-----------+---------+-----------+-------+
| norm1 | int | norm1_ | unicode | length | int |
+-----------+------+-----------+---------+-----------+-------+
| norm2 | int | norm2_ | unicode | prob | float |
+-----------+------+-----------+---------+-----------+-------+
| shape | int | shape_ | unicode | sentiment | float |
+-----------+------+-----------+---------+-----------+-------+
| prefix | int | prefix_ | unicode | |
+-----------+------+-----------+---------+-------------------+
| suffix | int | suffix_ | unicode | |
+-----------+------+-----------+---------+-------------------+
| pos | int | pos_ | unicode | |
+-----------+------+-----------+---------+-------------------+
| fine_pos | int | fine_pos_ | unicode | |
+-----------+------+-----------+---------+-------------------+
| dep_tag | int | dep_tag_ | unicode | |
+-----------+------+-----------+---------+-------------------+
.. py:class:: spacy.vocab.Vocab(self, data_dir=None, lex_props_getter=None)
.. py:method:: __len__(self) --> int
.. py:method:: __getitem__(self, id: int) --> unicode
.. py:method:: __getitem__(self, string: unicode) --> int
.. py:method:: __setitem__(self, py_str: unicode, props: Dict[str, int|float]) --> None
.. py:method:: dump(self, loc: unicode) --> None
.. py:method:: load_lexemes(self, loc: unicode) --> None
.. py:method:: load_vectors(self, loc: unicode) --> None
.. py:class:: spacy.strings.StringStore(self)
.. py:method:: __len__(self) --> int
.. py:method:: __getitem__(self, id: int) --> unicode
.. py:method:: __getitem__(self, string: byts) --> id
.. py:method:: __getitem__(self, string: unicode) --> id
.. py:method:: dump(self, loc: unicode) --> None
.. py:method:: load(self, loc: unicode) --> None
.. py:class:: spacy.tokenizer.Tokenizer(self, Vocab vocab, rules, prefix_re, suffix_re, infix_re, pos_tags, tag_names)
.. py:method:: tokens_from_list(self, List[unicode]) --> spacy.tokens.Tokens
.. py:method:: __call__(self, string: unicode) --> spacy.tokens.Tokens)
.. py:attribute:: vocab: spacy.vocab.Vocab
.. py:class:: spacy.en.pos.EnPosTagger(self, strings: spacy.strings.StringStore, data_dir: unicode)
.. py:method:: __call__(self, tokens: spacy.tokens.Tokens)
.. py:method:: train(self, tokens: spacy.tokens.Tokens, List[int] golds) --> int
.. py:method:: load_morph_exceptions(self, exc: Dict[unicode, Dict])
.. py:class:: GreedyParser(self, model_dir: unicode)
.. py:method:: __call__(self, tokens: spacy.tokens.Tokens) --> None
.. py:method:: train(self, spacy.tokens.Tokens) --> None
spaCy is designed to easily extend to multiple languages, although presently
only English components are implemented. The components are organised into
a pipeline in the spacy.en.English class.
Usually, you will only want to create one spacy.en.English object, and pass it
around your application. It manages the string-to-integers mapping, and you
will usually want only a single mapping for all strings.
English Pipeline
----------------
@ -27,9 +164,9 @@ under spacy.en.defs.
:members:
:undoc-members:
The Tokens Class
----------------
Tokens
------
.. autoclass:: spacy.tokens.Tokens
:members:
@ -37,17 +174,26 @@ The Tokens Class
.. autoclass:: spacy.tokens.Token
:members:
Generic Classes
---------------
.. autoclass:: spacy.lexeme.Lexeme
:members:
Lexicon
-------
.. automodule:: spacy.vocab
:members:
.. automodule:: spacy.strings
:members:
Tokenizer
---------
.. automodule:: spacy.tokenizer
:members:
.. automodule:: spacy.tagger
:members:
Parser
------
.. automodule:: spacy.syntax.parser
:members:
@ -57,5 +203,3 @@ Utility Functions
.. automodule:: spacy.orth
:members:

View File

@ -180,7 +180,8 @@ Accuracy
.. toctree::
:maxdepth: 3
license.rst
index.rst
quickstart.rst
features.rst
api.rst
howworks.rst
license.rst

View File

@ -1,124 +0,0 @@
=======
License
=======
I've been writing spaCy for six months now, and I'm very excited to release it.
I think it's the most valuable thing I could have built. When I was in
academia, I noticed that small companies couldn't really make use of our work.
Meanwhile the tech giants have been hiring *everyone*, and putting this stuff
into production. I think spaCy can change that.
+------------+-----------+----------+-------------------------------------+
| License | Price | Term | Suitable for |
+============+===========+==========+=====================================+
| Commercial | $5,000 | Life | Production use |
+------------+-----------+----------+-------------------------------------+
| Trial | $1 | 90 days | Evaluation, seed startup |
+------------+-----------+----------+-------------------------------------+
| AGPLv3 | Free | Life | Research, teaching, hobbyists, FOSS |
+------------+-----------+----------+-------------------------------------+
To make spaCy as valuable as possible, licenses to it are for life. You get
complete transparency, certainty and control. There is much less risk this
way. And if you're ever in acquisition or IPO talks, the story is simple.
spaCy can also be used as free open-source software, under the Aferro GPL
license. If you use it this way, you must comply with the AGPL license terms.
When you distribute your project, or offer it as a network service, you must
distribute the source-code, and grant users an AGPL license to it.
.. I left academia in June 2014, just when I should have been submitting my first
grant proposal. Grant writing seemed a bad business model. I wasn't sure
exactly what I would do instead, but I knew that the work I could do was
valuable, and that it would make sense for people to pay me to do it, and that
it's often easy to convince smart people of things that are true.
.. I left because I don't like the grant system. It's not the
best way to create value, and it's not the best way to get paid.
Examples
--------
In order to clarify how spaCy's license structure might apply to you, I've
written a few examples, in the form of user-stories.
Ashley and Casey: Seed stage start-up
#####################################
Ashley and Casey have an idea for a start-up. To explore their idea, they want
to build a minimum viable product they can put in front of potential users and
investors.
They have two options.
1. **Trial commercial license.** With a simple form, they can use spaCy for 90
days, for a nominal fee of $1. They are free to modify spaCy, and they
will own the copyright to their modifications for the duration of the license.
After the trial period elapses, they can either pay the license fee, stop
using spaCy, release their project under the AGPL.
2. **AGPL.** Casey and Pat can instead use spaCy under the AGPL license.
However, they must then release any code that statically or dynamically
links to spaCy under the AGPL as well (e.g. if they import the module, or
import a module that imports it, etc). They also cannot use spaCy as
a network resource, by running it as a service --- this is the
loophole that the "A" part of the AGPL is designed to close.
Ashley and Casey find the AGPL license unattractive for commercial use.
They decide to take up the trial commercial license.
However, over the next 90 days, Ashley has to move house twice, and Casey gets
sick. By the time the trial expires, they still don't have a demo they can show
investors. They send an email explaining the situation, and a 90 day extension
to their trial license is granted.
By the time the extension period has elapsed, spaCy has helped them secure
funding, and they even have a little revenue. They are glad to pay the $5,000
commercial license fee.
spaCy is now permanently licensed for the product Ashley and Casey are
developing. They own the copyright to any modifications they make to spaCy,
but not to the original spaCy code.
No additional fees will be due when they hire new developers, run spaCy on
additional internal servers, etc. If their company is acquired, the license will
be transferred to the company acquiring them. However, to use spaCy in another
product, they will have to buy a second license.
Alex and Sasha: University Academics
####################################
Alex and Sasha are post-doctoral researchers working for a university. Part of
their funding comes from a grant from Google, but Google will not own any part
of the work that they produce. Their mission is just to write papers.
Alex and Sasha find spaCy convenient, so they use it in their system under the
AGPL. This means that their system must also be released under the AGPL, but they're
cool with that --- they were going to release their code anyway, as it's the only
way to ensure their experiments are properly repeatable.
Alex and Sasha find and fix a few bugs in spaCy. They must release these
modifications, and they ask that they be accepted into the main spaCy repo.
In order to do this, they must sign a contributor agreement, ceding their
copyright. When commercial licenses to spaCy are sold, Alex and Sasha will
not be able to claim any royalties from their contributions.
Later, Alex and Sasha implement new features into spaCy, for another paper. The
code was quite rushed, and they don't want to take the time to put together a
proper pull request. They must release their modifications under the AGPL, but
they are not obliged to contribute it to the spaCy repository, or concede their
copyright.
Phuong and Jessie: Open Source developers
#########################################
Phuong and Jessie use the Calibre to manage their e-book libraries. They have an
idea for a search feature, and they want to use spaCy to implement it. Calibre is
released under the GPLv3. The AGPL has additional restrictions for projects
used as a network resource, but they don't apply to this project, so Phuong and
Jessie can use spaCy to improve Calibre. They'll have to release their code, but
that was always their intention anyway.