mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-10 19:57:17 +03:00
* Upd docs
This commit is contained in:
parent
2e14f09d2f
commit
7d9306c3bd
|
@ -1,15 +1,152 @@
|
|||
===============
|
||||
API Reference
|
||||
Documentation
|
||||
===============
|
||||
|
||||
spaCy provides a number of text-processing components, which can be arranged
|
||||
into text-processing pipelines. The pipeline should first construct a Tokens
|
||||
object, which will then be enhanced with various information by subsequent
|
||||
components. Access is then
|
||||
Quick Ref
|
||||
---------
|
||||
|
||||
Most users will want to use a pre-prepared pipeline for a given language. This
|
||||
page first lists these pipelines and their relevant APIs, before listing the
|
||||
APIs for the individual components.
|
||||
.. class:: spacy.en.__init__.English(self, data_dir=join(dirname(__file__, 'data')))
|
||||
:noindex:
|
||||
|
||||
.. method:: __call__(self, unicode text, tag=True, parse=False) --> Tokens
|
||||
|
||||
+-----------+--------------+--------------+
|
||||
| Attribute | Type | Useful |
|
||||
+===========+==============+==============+
|
||||
| strings | StingStore | __getitem__ |
|
||||
+-----------+--------------+--------------+
|
||||
| vocab | Vocab | __getitem__ |
|
||||
+-----------+--------------+--------------+
|
||||
| tokenizer | Tokenizer | __call__ |
|
||||
+-----------+--------------+--------------+
|
||||
| tagger | EnPosTagger | __call__ |
|
||||
+-----------+--------------+--------------+
|
||||
| parser | GreedyParser | __call__ |
|
||||
+-----------+--------------+--------------+
|
||||
|
||||
.. py:class:: spacy.tokens.Tokens(self, vocab: Vocab, string_length=0)
|
||||
|
||||
.. py:method:: __getitem__(self, i) --> Token
|
||||
|
||||
.. py:method:: __iter__(self) --> Iterator[Token]
|
||||
|
||||
.. py:method:: __len__(self) --> int
|
||||
|
||||
.. py:method:: to_array(self, attr_ids: List[int]) --> numpy.ndarray[ndim=2, dtype=int32]
|
||||
|
||||
.. py:method:: count_by(self, attr_id: int) --> Dict[int, int]
|
||||
|
||||
+---------------+-------------+-------------+
|
||||
| Attribute | Type | Useful |
|
||||
+===============+=============+=============+
|
||||
| vocab | Vocab | __getitem__ |
|
||||
+---------------+-------------+-------------+
|
||||
| vocab.strings | StringStore | __getitem__ |
|
||||
+---------------+-------------+-------------+
|
||||
|
||||
.. py:class:: spacy.tokens.Token(self, parent: Tokens, i: int)
|
||||
|
||||
.. py:method:: __unicode__(self) --> unicode
|
||||
|
||||
.. py:method:: __len__(self) --> int
|
||||
|
||||
.. py:method:: nbor(self, i=1) --> Token
|
||||
|
||||
.. py:method:: child(self, i=1) --> Token
|
||||
|
||||
.. py:method:: sibling(self, i=1) --> Token
|
||||
|
||||
.. py:attribute:: head: Token
|
||||
|
||||
.. py:method:: check_flag(self, attr_id: int) --> bool
|
||||
|
||||
|
||||
+-----------+------+-----------+---------+-----------+-------+
|
||||
| Attribute | Type | Attribute | Type | Attribute | Type |
|
||||
+===========+======+===========+=========+===========+=======+
|
||||
| sic | int | sic_ | unicode | idx | int |
|
||||
+-----------+------+-----------+---------+-----------+-------+
|
||||
| lemma | int | lemma_ | unicode | cluster | int |
|
||||
+-----------+------+-----------+---------+-----------+-------+
|
||||
| norm1 | int | norm1_ | unicode | length | int |
|
||||
+-----------+------+-----------+---------+-----------+-------+
|
||||
| norm2 | int | norm2_ | unicode | prob | float |
|
||||
+-----------+------+-----------+---------+-----------+-------+
|
||||
| shape | int | shape_ | unicode | sentiment | float |
|
||||
+-----------+------+-----------+---------+-----------+-------+
|
||||
| prefix | int | prefix_ | unicode | |
|
||||
+-----------+------+-----------+---------+-------------------+
|
||||
| suffix | int | suffix_ | unicode | |
|
||||
+-----------+------+-----------+---------+-------------------+
|
||||
| pos | int | pos_ | unicode | |
|
||||
+-----------+------+-----------+---------+-------------------+
|
||||
| fine_pos | int | fine_pos_ | unicode | |
|
||||
+-----------+------+-----------+---------+-------------------+
|
||||
| dep_tag | int | dep_tag_ | unicode | |
|
||||
+-----------+------+-----------+---------+-------------------+
|
||||
|
||||
|
||||
.. py:class:: spacy.vocab.Vocab(self, data_dir=None, lex_props_getter=None)
|
||||
|
||||
.. py:method:: __len__(self) --> int
|
||||
|
||||
.. py:method:: __getitem__(self, id: int) --> unicode
|
||||
|
||||
.. py:method:: __getitem__(self, string: unicode) --> int
|
||||
|
||||
.. py:method:: __setitem__(self, py_str: unicode, props: Dict[str, int|float]) --> None
|
||||
|
||||
.. py:method:: dump(self, loc: unicode) --> None
|
||||
|
||||
.. py:method:: load_lexemes(self, loc: unicode) --> None
|
||||
|
||||
.. py:method:: load_vectors(self, loc: unicode) --> None
|
||||
|
||||
.. py:class:: spacy.strings.StringStore(self)
|
||||
|
||||
.. py:method:: __len__(self) --> int
|
||||
|
||||
.. py:method:: __getitem__(self, id: int) --> unicode
|
||||
|
||||
.. py:method:: __getitem__(self, string: byts) --> id
|
||||
|
||||
.. py:method:: __getitem__(self, string: unicode) --> id
|
||||
|
||||
.. py:method:: dump(self, loc: unicode) --> None
|
||||
|
||||
.. py:method:: load(self, loc: unicode) --> None
|
||||
|
||||
.. py:class:: spacy.tokenizer.Tokenizer(self, Vocab vocab, rules, prefix_re, suffix_re, infix_re, pos_tags, tag_names)
|
||||
|
||||
.. py:method:: tokens_from_list(self, List[unicode]) --> spacy.tokens.Tokens
|
||||
|
||||
.. py:method:: __call__(self, string: unicode) --> spacy.tokens.Tokens)
|
||||
|
||||
.. py:attribute:: vocab: spacy.vocab.Vocab
|
||||
|
||||
.. py:class:: spacy.en.pos.EnPosTagger(self, strings: spacy.strings.StringStore, data_dir: unicode)
|
||||
|
||||
.. py:method:: __call__(self, tokens: spacy.tokens.Tokens)
|
||||
|
||||
.. py:method:: train(self, tokens: spacy.tokens.Tokens, List[int] golds) --> int
|
||||
|
||||
.. py:method:: load_morph_exceptions(self, exc: Dict[unicode, Dict])
|
||||
|
||||
.. py:class:: GreedyParser(self, model_dir: unicode)
|
||||
|
||||
.. py:method:: __call__(self, tokens: spacy.tokens.Tokens) --> None
|
||||
|
||||
.. py:method:: train(self, spacy.tokens.Tokens) --> None
|
||||
|
||||
|
||||
|
||||
spaCy is designed to easily extend to multiple languages, although presently
|
||||
only English components are implemented. The components are organised into
|
||||
a pipeline in the spacy.en.English class.
|
||||
|
||||
Usually, you will only want to create one spacy.en.English object, and pass it
|
||||
around your application. It manages the string-to-integers mapping, and you
|
||||
will usually want only a single mapping for all strings.
|
||||
|
||||
English Pipeline
|
||||
----------------
|
||||
|
@ -27,9 +164,9 @@ under spacy.en.defs.
|
|||
:members:
|
||||
:undoc-members:
|
||||
|
||||
The Tokens Class
|
||||
----------------
|
||||
|
||||
Tokens
|
||||
------
|
||||
|
||||
.. autoclass:: spacy.tokens.Tokens
|
||||
:members:
|
||||
|
@ -37,17 +174,26 @@ The Tokens Class
|
|||
.. autoclass:: spacy.tokens.Token
|
||||
:members:
|
||||
|
||||
Generic Classes
|
||||
---------------
|
||||
.. autoclass:: spacy.lexeme.Lexeme
|
||||
:members:
|
||||
|
||||
Lexicon
|
||||
-------
|
||||
|
||||
.. automodule:: spacy.vocab
|
||||
:members:
|
||||
|
||||
.. automodule:: spacy.strings
|
||||
:members:
|
||||
|
||||
Tokenizer
|
||||
---------
|
||||
|
||||
.. automodule:: spacy.tokenizer
|
||||
:members:
|
||||
|
||||
.. automodule:: spacy.tagger
|
||||
:members:
|
||||
Parser
|
||||
------
|
||||
|
||||
.. automodule:: spacy.syntax.parser
|
||||
:members:
|
||||
|
@ -57,5 +203,3 @@ Utility Functions
|
|||
|
||||
.. automodule:: spacy.orth
|
||||
:members:
|
||||
|
||||
|
||||
|
|
|
@ -180,7 +180,8 @@ Accuracy
|
|||
.. toctree::
|
||||
:maxdepth: 3
|
||||
|
||||
license.rst
|
||||
index.rst
|
||||
quickstart.rst
|
||||
features.rst
|
||||
api.rst
|
||||
howworks.rst
|
||||
license.rst
|
||||
|
|
|
@ -1,124 +0,0 @@
|
|||
=======
|
||||
License
|
||||
=======
|
||||
|
||||
I've been writing spaCy for six months now, and I'm very excited to release it.
|
||||
I think it's the most valuable thing I could have built. When I was in
|
||||
academia, I noticed that small companies couldn't really make use of our work.
|
||||
Meanwhile the tech giants have been hiring *everyone*, and putting this stuff
|
||||
into production. I think spaCy can change that.
|
||||
|
||||
|
||||
+------------+-----------+----------+-------------------------------------+
|
||||
| License | Price | Term | Suitable for |
|
||||
+============+===========+==========+=====================================+
|
||||
| Commercial | $5,000 | Life | Production use |
|
||||
+------------+-----------+----------+-------------------------------------+
|
||||
| Trial | $1 | 90 days | Evaluation, seed startup |
|
||||
+------------+-----------+----------+-------------------------------------+
|
||||
| AGPLv3 | Free | Life | Research, teaching, hobbyists, FOSS |
|
||||
+------------+-----------+----------+-------------------------------------+
|
||||
|
||||
To make spaCy as valuable as possible, licenses to it are for life. You get
|
||||
complete transparency, certainty and control. There is much less risk this
|
||||
way. And if you're ever in acquisition or IPO talks, the story is simple.
|
||||
|
||||
spaCy can also be used as free open-source software, under the Aferro GPL
|
||||
license. If you use it this way, you must comply with the AGPL license terms.
|
||||
When you distribute your project, or offer it as a network service, you must
|
||||
distribute the source-code, and grant users an AGPL license to it.
|
||||
|
||||
|
||||
.. I left academia in June 2014, just when I should have been submitting my first
|
||||
grant proposal. Grant writing seemed a bad business model. I wasn't sure
|
||||
exactly what I would do instead, but I knew that the work I could do was
|
||||
valuable, and that it would make sense for people to pay me to do it, and that
|
||||
it's often easy to convince smart people of things that are true.
|
||||
|
||||
.. I left because I don't like the grant system. It's not the
|
||||
best way to create value, and it's not the best way to get paid.
|
||||
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
In order to clarify how spaCy's license structure might apply to you, I've
|
||||
written a few examples, in the form of user-stories.
|
||||
|
||||
Ashley and Casey: Seed stage start-up
|
||||
#####################################
|
||||
|
||||
Ashley and Casey have an idea for a start-up. To explore their idea, they want
|
||||
to build a minimum viable product they can put in front of potential users and
|
||||
investors.
|
||||
|
||||
They have two options.
|
||||
|
||||
1. **Trial commercial license.** With a simple form, they can use spaCy for 90
|
||||
days, for a nominal fee of $1. They are free to modify spaCy, and they
|
||||
will own the copyright to their modifications for the duration of the license.
|
||||
After the trial period elapses, they can either pay the license fee, stop
|
||||
using spaCy, release their project under the AGPL.
|
||||
|
||||
2. **AGPL.** Casey and Pat can instead use spaCy under the AGPL license.
|
||||
However, they must then release any code that statically or dynamically
|
||||
links to spaCy under the AGPL as well (e.g. if they import the module, or
|
||||
import a module that imports it, etc). They also cannot use spaCy as
|
||||
a network resource, by running it as a service --- this is the
|
||||
loophole that the "A" part of the AGPL is designed to close.
|
||||
|
||||
Ashley and Casey find the AGPL license unattractive for commercial use.
|
||||
They decide to take up the trial commercial license.
|
||||
However, over the next 90 days, Ashley has to move house twice, and Casey gets
|
||||
sick. By the time the trial expires, they still don't have a demo they can show
|
||||
investors. They send an email explaining the situation, and a 90 day extension
|
||||
to their trial license is granted.
|
||||
|
||||
By the time the extension period has elapsed, spaCy has helped them secure
|
||||
funding, and they even have a little revenue. They are glad to pay the $5,000
|
||||
commercial license fee.
|
||||
|
||||
spaCy is now permanently licensed for the product Ashley and Casey are
|
||||
developing. They own the copyright to any modifications they make to spaCy,
|
||||
but not to the original spaCy code.
|
||||
|
||||
No additional fees will be due when they hire new developers, run spaCy on
|
||||
additional internal servers, etc. If their company is acquired, the license will
|
||||
be transferred to the company acquiring them. However, to use spaCy in another
|
||||
product, they will have to buy a second license.
|
||||
|
||||
|
||||
Alex and Sasha: University Academics
|
||||
####################################
|
||||
|
||||
Alex and Sasha are post-doctoral researchers working for a university. Part of
|
||||
their funding comes from a grant from Google, but Google will not own any part
|
||||
of the work that they produce. Their mission is just to write papers.
|
||||
|
||||
Alex and Sasha find spaCy convenient, so they use it in their system under the
|
||||
AGPL. This means that their system must also be released under the AGPL, but they're
|
||||
cool with that --- they were going to release their code anyway, as it's the only
|
||||
way to ensure their experiments are properly repeatable.
|
||||
|
||||
Alex and Sasha find and fix a few bugs in spaCy. They must release these
|
||||
modifications, and they ask that they be accepted into the main spaCy repo.
|
||||
In order to do this, they must sign a contributor agreement, ceding their
|
||||
copyright. When commercial licenses to spaCy are sold, Alex and Sasha will
|
||||
not be able to claim any royalties from their contributions.
|
||||
|
||||
Later, Alex and Sasha implement new features into spaCy, for another paper. The
|
||||
code was quite rushed, and they don't want to take the time to put together a
|
||||
proper pull request. They must release their modifications under the AGPL, but
|
||||
they are not obliged to contribute it to the spaCy repository, or concede their
|
||||
copyright.
|
||||
|
||||
|
||||
Phuong and Jessie: Open Source developers
|
||||
#########################################
|
||||
|
||||
Phuong and Jessie use the Calibre to manage their e-book libraries. They have an
|
||||
idea for a search feature, and they want to use spaCy to implement it. Calibre is
|
||||
released under the GPLv3. The AGPL has additional restrictions for projects
|
||||
used as a network resource, but they don't apply to this project, so Phuong and
|
||||
Jessie can use spaCy to improve Calibre. They'll have to release their code, but
|
||||
that was always their intention anyway.
|
Loading…
Reference in New Issue
Block a user