Merge branch 'develop' of https://github.com/explosion/spaCy into develop

This commit is contained in:
Matthew Honnibal 2017-04-14 23:54:57 +02:00
commit d13f0a7017
16 changed files with 149 additions and 45 deletions

View File

@ -7,11 +7,12 @@ Following the v1.0 release, it's time to welcome more contributors into the spaC
## Table of contents
1. [Issues and bug reports](#issues-and-bug-reports)
2. [Contributing to the code base](#contributing-to-the-code-base)
3. [Adding tests](#adding-tests)
4. [Updating the website](#updating-the-website)
5. [Submitting a tutorial](#submitting-a-tutorial)
6. [Submitting a project to the showcase](#submitting-a-project-to-the-showcase)
7. [Code of conduct](#code-of-conduct)
3. [Code conventions](#code-conventions)
4. [Adding tests](#adding-tests)
5. [Updating the website](#updating-the-website)
6. [Submitting a tutorial](#submitting-a-tutorial)
7. [Submitting a project to the showcase](#submitting-a-project-to-the-showcase)
8. [Code of conduct](#code-of-conduct)
## Issues and bug reports
@ -50,13 +51,17 @@ To distinguish issues that are opened by us, the maintainers, we usually add a
You don't have to be an NLP expert or Python pro to contribute, and we're happy to help you get started. If you're new to spaCy, a good place to start is the [`help wanted (easy)`](https://github.com/explosion/spaCy/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted+%28easy%29%22) label, which we use to tag bugs and feature requests that are easy and self-contained. If you've decided to take on one of these problems and you're making good progress, don't forget to add a quick comment to the issue. You can also use the issue to ask questions, or share your work in progress.
### Conventions for Python
### What belongs in spaCy?
Coming soon.
Every library has a different inclusion philosophy — a policy of what should be shipped in the core library, and what could be provided in other packages. Our philosophy is to prefer a smaller core library. We generally ask the following questions:
### Conventions for Cython
* **What would this feature look like if implemented in a separate package?** Some features would be very difficult to implement externally. For instance, anything that requires a change to the `Token` class really needs to be implemented within spaCy, because there's no convenient way to make spaCy return custom `Token` objects. In contrast, a library of word alignment functions could easily live as a separate package that depended on spaCy — there's little difference between writing `import word_aligner` and `import spacy.word_aligner`.
Coming soon.
* **Would the feature be easier to implement if it relied on "heavy" dependencies spaCy doesn't currently require?** Python has a very rich ecosystem. Libraries like Sci-Kit Learn, Scipy, Gensim, Keras etc. do lots of useful things — but we don't want to have them as dependencies. If the feature requires functionality in one of these libraries, it's probably better to break it out into a different package.
* **Is the feature orthogonal to the current spaCy functionality, or overlapping?** spaCy strongly prefers to avoid having 6 different ways of doing the same thing. As better techniques are developed, we prefer to drop support for "the old way". However, it's rare that one approach *entirely* dominates another. It's very common that there's still a use-case for the "obsolete" approach. For instance, [WordNet](https://wordnet.princeton.edu/) is still very useful — but word vectors are better for most use-cases, and the two approaches to lexical semantics do a lot of the same things. spaCy therefore only supports word vectors, and support for WordNet is currently left for other packages.
* **Do you need the feature to get basic things done?** We do want spaCy to be at least somewhat self-contained. If we keep needing some feature in our recipes, that does provide some argument for bringing it "in house".
### Developer resources
@ -76,6 +81,67 @@ Next, create a test file named `test_issue[ISSUE NUMBER].py` in the [`spacy/test
📖 **For more information on how to add tests, check out the [tests README](spacy/tests/README.md).**
## Code conventions
Code should loosely follow [pep8](https://www.python.org/dev/peps/pep-0008/). Regular line length is **80 characters**, with some tolerance for lines up to 90 characters if the alternative would be worse — for instance, if your list comprehension comes to 82 characters, it's better not to split it over two lines.
### Python conventions
All Python code must be written in an **intersection of Python 2 and Python 3**. This is easy in Cython, but somewhat ugly in Python. We could use some extra utilities for this. Please pay particular attention to code that serialises json objects.
Code that interacts with the file-system should accept objects that follow the `pathlib.Path` API, without assuming that the object inherits from `pathlib.Path`. If the function is user-facing and takes a path as an argument, it should check whether the path is provided as a string. Strings should be converted to `pathlib.Path` objects.
At the time of writing (v1.7), spaCy's serialization and deserialization functions are inconsistent about accepting paths vs accepting file-like objects. The correct answer is "file-like objects" — that's what we want going forward, as it makes the library io-agnostic. Working on buffers makes the code more general, easier to test, and compatible with Python 3's asynchronous IO.
Although spaCy uses a lot of classes, inheritance is viewed with some suspicion — it's seen as a mechanism of last resort. You should discuss plans to extend the class hierarchy before implementing.
### Cython conventions
spaCy's core data structures are implemented as [Cython](http://cython.org/) `cdef` classes. Memory is managed through the `cymem.cymem.Pool` class, which allows you to allocate memory which will be freed when the `Pool` object is garbage collected. This means you usually don't have to worry about freeing memory. You just have to decide which Python object owns the memory, and make it own the `Pool`. When that object goes out of scope, the memory will be freed. You do have to take care that no pointers outlive the object that owns them — but this is generally quite easy.
All Cython modules should have the `# cython: infer_types=True` compiler directive at the top of the file. This makes the code much cleaner, as it avoids the need for many type declarations. If possible, you should prefer to declare your functions `nogil`, even if you don't especially care about multi-threading. The reason is that `nogil` functions help the Cython compiler reason about your code quite a lot — you're telling the compiler that no Python dynamics are possible. This lets many errors be raised, and ensures your function will run at C speed.
Cython gives you many choices of sequences: you could have a Python list, a numpy array, a memory view, a C++ vector, or a pointer. Pointers are preferred, because they are fastest, have the most explicit semantics, and let the compiler check your code more strictly. C++ vectors are also great — but you should only use them internally in functions. It's less friendly to accept a vector as an argument, because that asks the user to do much more work.
Here's how to get a pointer from a numpy array, memory view or vector:
```cython
cdef void get_pointers(np.ndarray[int, mode='c'] numpy_array, vector[int] cpp_vector, int[::1] memory_view) nogil:
pointer1 = <int*>numpy_array.data
pointer2 = cpp_vector.data()
pointer3 = &memory_view[0]
```
Both C arrays and C++ vectors reassure the compiler that no Python operations are possible on your variable. This is a big advantage: it lets the Cython compiler raise many more errors for you.
When getting a pointer from a numpy array or memoryview, take care that the data is actually stored in C-contiguous order — otherwise you'll get a pointer to nonsense. The type-declarations in the code above should generate runtime errors if buffers with incorrect memory layouts are passed in.
To iterate over the array, the following style is preferred:
```cython
cdef int c_total(const int* int_array, int length) nogil:
total = 0
for item in int_array[:length]:
total += item
return total
```
If this is confusing, consider that the compiler couldn't deal with `for item in int_array:` — there's no length attached to a raw pointer, so how could we figure out where to stop? The length is provided in the slice notation as a solution to this. Note that we don't have to declare the type of `item` in the code above -- the compiler can easily infer it. This gives us tidy code that looks quite like Python, but is exactly as fast as C — because we've made sure the compilation to C is trivial.
Your functions cannot be declared `nogil` if they need to create Python objects or call Python functions. This is perfectly okay — you shouldn't torture your code just to get `nogil` functions. However, if your function isn't `nogil`, you should compile your module with `cython -a --cplus my_module.pyx` and open the resulting `my_module.html` file in a browser. This will let you see how Cython is compiling your code. Calls into the Python run-time will be in bright yellow. This lets you easily see whether Cython is able to correctly type your code, or whether there are unexpected problems.
Finally, if you're new to Cython, you should expect to find the first steps a bit frustrating. It's a very large language, since it's essentially a superset of Python and C++, with additional complexity and syntax from numpy. The [documentation](http://docs.cython.org/en/latest/) isn't great, and there are many "traps for new players". Help is available on [Gitter](https://gitter.im/explosion/spaCy).
Working in Cython is very rewarding once you're over the initial learning curve. As with C and C++, the first way you write something in Cython will often be the performance-optimal approach. In contrast, Python optimisation generally requires a lot of experimentation. Is it faster to have an `if item in my_dict` check, or to use `.get()`? What about `try`/`except`? Does this numpy operation create a copy? There's no way to guess the answers to these questions, and you'll usually be dissatisfied with your results — so there's no way to know when to stop this process. In the worst case, you'll make a mess that invites the next reader to try their luck too. This is like one of those [volcanic gas-traps](http://www.wemjournal.org/article/S1080-6032%2809%2970088-2/abstract), where the rescuers keep passing out from low oxygen, causing another rescuer to follow — only to succumb themselves. In short, just say no to optimizing your Python. If it's not fast enough the first time, just switch to Cython.
### Resources to get you started
* [PEP 8 Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/) (python.org)
* [Official Cython documentation](http://docs.cython.org/en/latest/) (cython.org)
* [Writing C in Cython](https://explosion.ai/blog/writing-c-in-cython) (explosion.ai)
* [Multi-threading spaCys parser and named entity recogniser](https://explosion.ai/blog/multithreading-with-cython) (explosion.ai)
## Adding tests
spaCy uses the [pytest](http://doc.pytest.org/) framework for testing. For more info on this, see the [pytest documentation](http://docs.pytest.org/en/latest/contents.html). Tests for spaCy modules and classes live in their own directories of the same name. For example, tests for the `Tokenizer` can be found in [`/spacy/tests/tokenizer`](spacy/tests/tokenizer). To be interpreted and run, all test files and test functions need to be prefixed with `test_`.

View File

@ -8,6 +8,8 @@ English and German, as well as tokenization for Chinese, Spanish, Italian, Fren
Portuguese, Dutch, Swedish, Finnish, Hungarian, Bengali and Hebrew. It's commercial
open-source software, released under the MIT license.
📊 **Help us improve the library!** `Take the spaCy user survey <https://survey.spacy.io>`_.
💫 **Version 1.7 out now!** `Read the release notes here. <https://github.com/explosion/spaCy/releases/>`_
.. image:: https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square

View File

@ -1,8 +1,8 @@
# coding: utf8
from __future__ import unicode_literals, division, print_function
from __future__ import unicode_literals
import io
from pathlib import Path, PurePosixPath
from pathlib import Path
from .converters import conllu2json
from .. import util

View File

@ -1,5 +1,5 @@
# coding: utf8
from __future__ import unicode_literals, division, print_function
from __future__ import unicode_literals
import json
from ...gold import read_json_file, merge_sents

View File

@ -49,5 +49,6 @@ def list_models():
# won't show up in list, but it seems worth it
exclude = ['cache', 'pycache', '__pycache__']
data_path = util.get_data_path()
models = [f.parts[-1] for f in data_path.iterdir() if f.is_dir()]
return [m for m in models if m not in exclude]
if data_path:
models = [f.parts[-1] for f in data_path.iterdir() if f.is_dir()]
return [m for m in models if m not in exclude]

View File

@ -46,8 +46,18 @@ def symlink(model_path, link_name, force):
# Add workaround for Python 2 on Windows (see issue #909)
if util.is_python2() and util.is_windows():
import subprocess
command = ['mklink', '/d', link_path, model_path]
subprocess.call(command, shell=True)
command = ['mklink', '/d', unicode(link_path), unicode(model_path)]
try:
subprocess.call(command, shell=True)
except:
# This is quite dirty, but just making sure other Windows-specific
# errors are caught so users at least see a proper error message.
util.sys_exit(
"Creating a symlink in spacy/data failed. You can still import "
"the model as a Python package and call its load() method, or "
"create the symlink manually:",
"{a} --> {b}".format(a=unicode(model_path), b=unicode(link_path)),
title="Error: Couldn't link model to '{l}'".format(l=link_name))
else:
link_path.symlink_to(model_path)

View File

@ -95,7 +95,7 @@ def read_clusters(clusters_path):
return clusters
def populate_vocab(vocab, clusters, probs, oov_probs):
def populate_vocab(vocab, clusters, probs, oov_prob):
# Ensure probs has entries for all words seen during clustering.
for word in clusters:
if word not in probs:

View File

@ -6,9 +6,15 @@ import shutil
import requests
from pathlib import Path
import six
from .. import about
from .. import util
if six.PY2:
json_dumps = lambda data: json.dumps(data, indent=2).decode("utf8")
elif six.PY3:
json_dumps = lambda data: json.dumps(data, indent=2)
def package(input_dir, output_dir, force):
input_path = Path(input_dir)
@ -27,7 +33,7 @@ def package(input_dir, output_dir, force):
create_dirs(package_path, force)
shutil.copytree(input_path.as_posix(), (package_path / model_name_v).as_posix())
create_file(main_path / 'meta.json', json.dumps(meta, indent=2))
create_file(main_path / 'meta.json', json_dumps(meta))
create_file(main_path / 'setup.py', template_setup)
create_file(main_path / 'MANIFEST.in', template_manifest)
create_file(package_path / '__init__.py', template_init)

View File

@ -55,7 +55,7 @@
}
},
"V_CSS": "1.3",
"V_CSS": "1.4",
"V_JS": "1.2",
"DEFAULT_SYNTAX": "python",
"ANALYTICS": "UA-58931649-1",

View File

@ -134,3 +134,8 @@ mixin landing-header()
.c-landing__wrapper
.c-landing__content
block
mixin landing-badge(url, graphic, alt, size)
+a(url)(aria-label=alt title=alt).c-landing__badge
+svg("graphics", graphic, size || 225)

View File

@ -18,3 +18,11 @@
.c-landing__title
color: $color-back
text-align: center
.c-landing__badge
transform: rotate(7deg)
display: block
text-align: center
@include breakpoint(min, md)
@include position(absolute, top, right, 16rem, 6rem)

View File

@ -1,19 +1,27 @@
<svg style="position: absolute; width: 0; height: 0;" width="0" height="0" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs>
<symbol id="usersurvey" viewBox="0 0 200 111">
<title>spaCy user survey 2017</title>
<path fill="#ddd" d="M183.3 89.2l-164.6-40-1-29.2 164.6 40M3.8 106.8l41.6-1.4-1-29.2-41.6 1.4L13.2 92"/>
<path fill="#a3cad3" d="M45.4 105.4L19.6 94.6l25.4-1"/>
<path fill="#ddd" d="M196.6 2L155 3.4l1 29.2 41.6-1.4L187.2 17"/>
<path fill="#a3cad3" d="M155 3.4l25.8 10.8-25.4 1"/>
<path fill="#fff" d="M17.6 19.4l163-5.6 1 29.2-163 5.6zM19.2 65.6l163-5.6 1 29.2-163 5.6z"/>
<path fill="#008EBC" d="M56.8 29h-3.6v-2.4l10-.4.2 2.5h-3.6l.4 10.8h-3L56.8 29zM71 36l-4 .2-.6 3.2h-3L67 26.2h3.6l4.6 13-3.2.2-1-3zm-.6-2.3l-.4-1.2-1.2-4.2-1 4.3-.3 1.2h3zM76 25.8h3l.3 5.3 4-5.4h3.2l-3.8 5.3 5 7.7h-3.3L81 33.4l-1.5 2V39h-3l-.4-13.2zM88.5 25.4l8.3-.3v2.6l-5.2.2v2.6l4.6-.2v2.5L92 33v3l5.6-.3v2.5l-8.4.3-.5-13zM106.4 27.3h-3.6V25l10-.5.2 2.5h-3.6l.4 10.8h-3l-.4-10.5zM115 24.5h3v5l4.7-.2-.2-5 3-.2.5 13.3h-3l-.2-5.4h-4.6l.2 5.6h-3l-.5-13zM128.5 24l8.3-.3v2.5l-5.2.2V29l4.6-.2v2.5l-4.4.2v3l5.6-.2v2.5l-8.4.3-.5-13z"/>
<path fill="#1A1E23" d="M44.5 73h3l.3 7.4c0 2.6 1 3.4 2.4 3.4s2.3-1 2.2-3.6l-.3-7.4h3l.2 7c.2 4.4-1.6 6.4-5 6.5-3.4 0-5.3-1.7-5.5-6.2l-.2-7zM59 82c1 1 2.2 1.4 3.3 1.4 1.2 0 1.8-.5 1.8-1.3 0-.7-.7-1-2-1.4l-1.6-.7c-1.4-.6-2.7-1.7-2.8-3.6 0-2.2 1.8-4 4.6-4 1.5-.2 3 .4 4.3 1.5L65 75.7c-1-.6-1.7-1-2.8-1-1 0-1.7.5-1.6 1.2 0 .8 1 1 2 1.5l1.8.6c1.6.7 2.7 1.7 2.7 3.6.2 2.2-1.6 4-4.7 4.3-1.7 0-3.6-.6-5-1.8l1.7-2zM69 72.3l8.3-.3v2.5l-5.2.2.2 2.7 4.5-.2v2.5l-4.4.2v3l5.6-.3v2.5l-8.5.3-.5-13.2zM87.6 84.8L85 80l-1.7.2.2 4.8h-3L80 72l4.8-.3c2.8 0 5 .8 5.2 4 0 1.8-.8 3-2.2 3.8l3.2 5.2h-3.4zm-4.4-7h1.5c1.6 0 2.4-.7 2.3-2 0-1.3-1-1.7-2.5-1.7l-1.5.2.2 3.7zM98 80.8c1 .8 2 1.3 3.2 1.3 1.2 0 1.8-.4 1.8-1.2 0-.8-.8-1-2-1.5l-1.7-.7C98 78 96.6 77 96.5 75c0-2 1.8-4 4.6-4 1.6 0 3.2.5 4.4 1.6l-1.4 2c-1-.7-1.7-1-2.8-1-1 0-1.7.4-1.6 1 0 1 1 1.2 2 1.6l1.8.6c1.6.6 2.7 1.6 2.7 3.5.2 2.2-1.6 4-4.7 4.3-1.7 0-3.6-.5-5-1.7l1.6-2.2zM107.8 71l3-.2.3 7.4c.2 2.6 1 3.4 2.5 3.4s2.3-1 2.2-3.6l-.3-7.4h3v7c.3 4.4-1.5 6.4-5 6.5-3.3.2-5.2-1.6-5.4-6l-.2-7zM129 83.4l-2.8-4.7h-1.6l.2 5h-3l-.5-13.2 4.8-.2c3 0 5.2.8 5.3 4 0 1.8-.8 3-2.2 3.8l3.3 5.3H129zm-4.5-7h1.5c1.6-.2 2.4-1 2.3-2 0-1.4-1-1.8-2.5-1.8h-1.5l.2 3.8zM131.6 70h3.2l1.8 6 1.2 4.3c.5-1.5.7-2.8 1-4.3l1.3-6.2h3.2L139.7 83H136l-4.4-13zM144.6 69.7l8.3-.3V72h-5.3V75l4.6-.2V77l-4.4.3v3l5.5-.2v2.6l-8.4.3-.4-13.3zM158.2 77.7l-4.3-8.4h3l1.4 3 1.2 2.8c.4-1 .8-1.8 1-3l1.2-3h3l-3.6 8.5.2 4.7h-3l-.2-4.5z"/>
</symbol>
<symbol id="brain" viewBox="0 0 300 150">
<!-- by Kemal Sanli: https://dribbble.com/kemal -->
<title>brain</title>
<path stroke-width="4" stroke-miterlimit="10" fill="none" stroke="currentColor" d="M187.2 76.1h-5c-1.6 0-2.9-1.3-2.9-2.9V62.7c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 3-2.9 3zM221.1 63.8h-5c-1.6 0-2.9-1.3-2.9-2.9V50.5c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM221.1 88.3h-5c-1.6 0-2.9-1.3-2.9-2.9V74.9c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM221.1 112.7h-5c-1.6 0-2.9-1.3-2.9-2.9V99.4c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM221.1 39.4h-5c-1.6 0-2.9-1.3-2.9-2.9V26.1c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM263.2 63.8h-5c-1.6 0-2.9-1.3-2.9-2.9V50.5c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM263.2 88.3h-5c-1.6 0-2.9-1.3-2.9-2.9V74.9c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM263.2 112.7h-5c-1.6 0-2.9-1.3-2.9-2.9V99.4c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM263.2 39.4h-5c-1.6 0-2.9-1.3-2.9-2.9V26.1c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM191.5 54.3L207.8 34M195.5 61.1l12.3-4M191.5 80.1l16.3 20.4M195.5 73.3l12.3 4.1M236 39.1l11.4 13.6c1.4 1.7 1.5 4.2.1 6l-15.6 19.7c-1.4 1.8-1.4 4.3.1 6L243.4 98c2.6 3.1.4 7.8-3.7 7.8-4 0-6.3-4.7-3.7-7.8l11.4-13.6c1.4-1.7 1.5-4.2.1-6L232 58.8c-1.4-1.8-1.4-4.4.2-6.1l12-13.5c2.7-3.1.6-7.9-3.6-7.9h-.9c-4.1 0-6.3 4.7-3.7 7.8z"
/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M96.1 124.1H63v-11.7c0-12.6-3.7-25-10.7-35.5l-5.9-8.8c-3.2-4.8-4.9-10.4-4.9-16.1 0-22.3 18.1-40.4 40.4-40.4 17.6 0 33.1 11.4 38.5 28.1l10.8 33.8h-11v16.9c0 3.7-3 6.7-6.7 6.7h-12v12.3H77.3V90.2c0-.8-.2-1.6-.5-2.3l-4.5-11.3c-1.7-4.1 1.4-8.6 5.8-8.6 2 0 4-1 5.1-2.7L91.8 53h15.6c0-14-11.3-25.3-25.3-25.3h-.3c-14 0-25.3 11.3-25.3 25.3v1c0 4 3.2 7.2 7.2 7.2 2.4 0 4.6-1.2 6-3.2l11.2-16.8h10.8M139 68.7h29.4"
/>
<path stroke-width="4" stroke-miterlimit="10" fill="none" stroke="currentColor" d="M187.2 76.1h-5c-1.6 0-2.9-1.3-2.9-2.9V62.7c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 3-2.9 3zM221.1 63.8h-5c-1.6 0-2.9-1.3-2.9-2.9V50.5c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM221.1 88.3h-5c-1.6 0-2.9-1.3-2.9-2.9V74.9c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM221.1 112.7h-5c-1.6 0-2.9-1.3-2.9-2.9V99.4c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM221.1 39.4h-5c-1.6 0-2.9-1.3-2.9-2.9V26.1c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM263.2 63.8h-5c-1.6 0-2.9-1.3-2.9-2.9V50.5c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM263.2 88.3h-5c-1.6 0-2.9-1.3-2.9-2.9V74.9c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM263.2 112.7h-5c-1.6 0-2.9-1.3-2.9-2.9V99.4c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM263.2 39.4h-5c-1.6 0-2.9-1.3-2.9-2.9V26.1c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM191.5 54.3L207.8 34M195.5 61.1l12.3-4M191.5 80.1l16.3 20.4M195.5 73.3l12.3 4.1M236 39.1l11.4 13.6c1.4 1.7 1.5 4.2.1 6l-15.6 19.7c-1.4 1.8-1.4 4.3.1 6L243.4 98c2.6 3.1.4 7.8-3.7 7.8-4 0-6.3-4.7-3.7-7.8l11.4-13.6c1.4-1.7 1.5-4.2.1-6L232 58.8c-1.4-1.8-1.4-4.4.2-6.1l12-13.5c2.7-3.1.6-7.9-3.6-7.9h-.9c-4.1 0-6.3 4.7-3.7 7.8z"/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M96.1 124.1H63v-11.7c0-12.6-3.7-25-10.7-35.5l-5.9-8.8c-3.2-4.8-4.9-10.4-4.9-16.1 0-22.3 18.1-40.4 40.4-40.4 17.6 0 33.1 11.4 38.5 28.1l10.8 33.8h-11v16.9c0 3.7-3 6.7-6.7 6.7h-12v12.3H77.3V90.2c0-.8-.2-1.6-.5-2.3l-4.5-11.3c-1.7-4.1 1.4-8.6 5.8-8.6 2 0 4-1 5.1-2.7L91.8 53h15.6c0-14-11.3-25.3-25.3-25.3h-.3c-14 0-25.3 11.3-25.3 25.3v1c0 4 3.2 7.2 7.2 7.2 2.4 0 4.6-1.2 6-3.2l11.2-16.8h10.8M139 68.7h29.4"/>
</symbol>
<symbol id="computer" viewBox="0 0 300 150">
<!-- by Kemal Sanli: https://dribbble.com/kemal -->
<title>computer</title>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M56.2 87.7h-5c-1.6 0-2.9-1.3-2.9-2.9V74.4c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM90.1 75.5h-5c-1.6 0-2.9-1.3-2.9-2.9V62.2c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM90.1 99.9h-5c-1.6 0-2.9-1.3-2.9-2.9V86.6c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9V97c0 1.6-1.3 2.9-2.9 2.9zM90.1 124.4h-5c-1.6 0-2.9-1.3-2.9-2.9V111c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM90.1 51.1h-5c-1.6 0-2.9-1.3-2.9-2.9V37.7c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM132.2 75.5h-5c-1.6 0-2.9-1.3-2.9-2.9V62.2c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM132.2 99.9h-5c-1.6 0-2.9-1.3-2.9-2.9V86.6c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9V97c0 1.6-1.3 2.9-2.9 2.9zM132.2 124.4h-5c-1.6 0-2.9-1.3-2.9-2.9V111c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM132.2 51.1h-5c-1.6 0-2.9-1.3-2.9-2.9V37.7c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM60.5 66l16.3-20.3M64.5 72.8l12.3-4.1M60.5 91.8l16.3 20.4M64.5 85l12.3 4.1M105 50.8l11.4 13.6c1.4 1.7 1.5 4.2.1 6l-15.6 19.7c-1.4 1.8-1.4 4.3.1 6l11.4 13.6c2.6 3.1.4 7.8-3.7 7.8-4 0-6.3-4.7-3.7-7.8l11.4-13.6c1.4-1.7 1.5-4.2.1-6L101 70.5c-1.4-1.8-1.4-4.4.2-6.1l12-13.5c2.7-3.1.6-7.9-3.6-7.9h-.9c-4.1-.1-6.3 4.7-3.7 7.8z"
/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M56.2 87.7h-5c-1.6 0-2.9-1.3-2.9-2.9V74.4c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM90.1 75.5h-5c-1.6 0-2.9-1.3-2.9-2.9V62.2c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM90.1 99.9h-5c-1.6 0-2.9-1.3-2.9-2.9V86.6c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9V97c0 1.6-1.3 2.9-2.9 2.9zM90.1 124.4h-5c-1.6 0-2.9-1.3-2.9-2.9V111c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM90.1 51.1h-5c-1.6 0-2.9-1.3-2.9-2.9V37.7c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM132.2 75.5h-5c-1.6 0-2.9-1.3-2.9-2.9V62.2c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM132.2 99.9h-5c-1.6 0-2.9-1.3-2.9-2.9V86.6c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9V97c0 1.6-1.3 2.9-2.9 2.9zM132.2 124.4h-5c-1.6 0-2.9-1.3-2.9-2.9V111c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM132.2 51.1h-5c-1.6 0-2.9-1.3-2.9-2.9V37.7c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM60.5 66l16.3-20.3M64.5 72.8l12.3-4.1M60.5 91.8l16.3 20.4M64.5 85l12.3 4.1M105 50.8l11.4 13.6c1.4 1.7 1.5 4.2.1 6l-15.6 19.7c-1.4 1.8-1.4 4.3.1 6l11.4 13.6c2.6 3.1.4 7.8-3.7 7.8-4 0-6.3-4.7-3.7-7.8l11.4-13.6c1.4-1.7 1.5-4.2.1-6L101 70.5c-1.4-1.8-1.4-4.4.2-6.1l12-13.5c2.7-3.1.6-7.9-3.6-7.9h-.9c-4.1-.1-6.3 4.7-3.7 7.8z"/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M195.1 42.4h49v40.5h-49z" />
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M251.9 116.7h-64.6c-2.2 0-4-1.8-4-4V34.6c0-2.2 1.8-4 4-4h64.6c2.2 0 4 1.8 4 4v78.1c0 2.2-1.8 4-4 4z" />
<path fill="currentColor" d="M191.8 103.2h6.8v6.8h-6.8zM235.6 91.3v3.4h-21.9v5.1h21.9v3.4h11.9V91.3" />
@ -25,10 +33,8 @@
<title>eye</title>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M40.7 37.1h95.7v71.7H40.7z" />
<path fill="currentColor" d="M30.4 43.9h10.2v13.7H30.4z" />
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M30.4 64.4h10.2v13.7H30.4zM30.4 88.3h10.2V102H30.4zM146 59.3h-9.7V45.6h9.7c3.1 0 5.7 2.5 5.7 5.7v2.3c0 3.2-2.5 5.7-5.7 5.7zM146 96.9h-9.7V83.2h9.7c3.1 0 5.7 2.5 5.7 5.7v2.3c0 3.1-2.5 5.7-5.7 5.7zM59.5 108.8v15.4M117.5 108.8v15.4M40.7 98.3h72V70.6H125M40.7 50.8h53.6M55.3 68.2h10.8v8.7H55.3zM74.7 68.2h10.8v8.7H74.7z"
/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M101.3 77h-3.6c-2 0-3.6-1.6-3.6-3.6v-1.5c0-2 1.6-3.6 3.6-3.6h3.6c2 0 3.6 1.6 3.6 3.6v1.5c0 1.9-1.6 3.6-3.6 3.6zM40.7 88.3h58.8v-7M80.1 88.3v-7M60.7 88.3v-7M80.1 61.7V50.8M60.7 50.8v10.9M104.1 47.8c2.8 5.1-2.4 10.3-7.6 7.6-.7-.4-1.3-1-1.7-1.7-2.8-5.1 2.4-10.3 7.6-7.6.7.4 1.3 1 1.7 1.7z"
/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M30.4 64.4h10.2v13.7H30.4zM30.4 88.3h10.2V102H30.4zM146 59.3h-9.7V45.6h9.7c3.1 0 5.7 2.5 5.7 5.7v2.3c0 3.2-2.5 5.7-5.7 5.7zM146 96.9h-9.7V83.2h9.7c3.1 0 5.7 2.5 5.7 5.7v2.3c0 3.1-2.5 5.7-5.7 5.7zM59.5 108.8v15.4M117.5 108.8v15.4M40.7 98.3h72V70.6H125M40.7 50.8h53.6M55.3 68.2h10.8v8.7H55.3zM74.7 68.2h10.8v8.7H74.7z"/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M101.3 77h-3.6c-2 0-3.6-1.6-3.6-3.6v-1.5c0-2 1.6-3.6 3.6-3.6h3.6c2 0 3.6 1.6 3.6 3.6v1.5c0 1.9-1.6 3.6-3.6 3.6zM40.7 88.3h58.8v-7M80.1 88.3v-7M60.7 88.3v-7M80.1 61.7V50.8M60.7 50.8v10.9M104.1 47.8c2.8 5.1-2.4 10.3-7.6 7.6-.7-.4-1.3-1-1.7-1.7-2.8-5.1 2.4-10.3 7.6-7.6.7.4 1.3 1 1.7 1.7z"/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M104.9 50.8H125V37.1M136.3 90H123" />
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M269.5 56.4c15.3 30-14.5 59.8-44.5 44.5-4.9-2.5-8.9-6.5-11.4-11.4C198.2 59.5 228 29.7 258 45c4.9 2.5 8.9 6.5 11.5 11.4z" />
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M269.5 56.4c15.3 30-14.5 59.8-44.5 44.5-4.9-2.5-8.9-6.5-11.4-11.4C198.2 59.5 228 29.7 258 45c4.9 2.5 8.9 6.5 11.5 11.4z" />
@ -43,12 +49,10 @@
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M88.2 68.6c1.2 9.2-6.6 17-15.8 15.8-6.3-.8-11.4-5.9-12.2-12.2C59 63 66.8 55.2 76 56.4c6.3.9 11.4 5.9 12.2 12.2z" />
<path fill="currentColor" d="M77.7 70.5c-1.9 0-3.5-1.6-3.5-3.5 0-1 .4-1.9 1.1-2.5-.4-.1-.7-.1-1.1-.1-3.4 0-6.2 2.8-6.2 6.2 0 3.4 2.8 6.2 6.2 6.2s6.2-2.8 6.2-6.2c0-.4 0-.7-.1-1.1-.7.5-1.6 1-2.6 1z" />
<path d="M43.9 38.3h60.5v62.6H43.9z" fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" />
<path d="M43.9 112.3c0 5.8 4.7 10.5 10.5 10.5h39.4c5.8 0 10.5-4.7 10.5-10.5v-11.4H43.9v11.4zM93.9 20.2H54.5c-5.8 0-10.5 4.7-10.5 10.5v7.6h60.5v-7.6c-.1-5.8-4.8-10.5-10.6-10.5z" fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10"
/>
<path d="M43.9 112.3c0 5.8 4.7 10.5 10.5 10.5h39.4c5.8 0 10.5-4.7 10.5-10.5v-11.4H43.9v11.4zM93.9 20.2H54.5c-5.8 0-10.5 4.7-10.5 10.5v7.6h60.5v-7.6c-.1-5.8-4.8-10.5-10.6-10.5z" fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10"/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M79.3 110.3c.8 3.8-2.5 7.1-6.3 6.3-1.9-.4-3.5-2-3.9-3.9-.8-3.8 2.5-7.1 6.3-6.3 1.9.5 3.4 2 3.9 3.9zM69.3 30.1h9.8" />
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M264 41h-93c-2.3 0-4.2 1.9-4.2 4.2v42.3c0 2.3 1.9 4.2 4.2 4.2h7v22.5l22.5-22.5H264c2.3 0 4.2-1.9 4.2-4.2V45.2c0-2.3-1.9-4.2-4.2-4.2z" />
<path fill="currentColor" d="M183.4 53.8c1.2 2.5-1.3 5-3.8 3.8-.5-.2-1-.7-1.2-1.2-1.2-2.5 1.3-5 3.8-3.8.5.2 1 .6 1.2 1.2zM189.4 52.2h16.9v5.6h-16.9zM211.9 52.2h33.8v5.6h-33.8zM178.1 74.8h5.6v5.6h-5.6zM189.4 74.8h33.8v5.6h-33.8zM240.1 74.8H257v5.6h-16.9zM251.3 52.2h5.6v5.6h-5.6zM178.1 63.5h22.5v5.6h-22.5zM217.5 63.5h12.7v5.6h-12.7zM234.4 63.5h22.5v5.6h-22.5zM209.2 69.1h-.3c-1.5 0-2.7-1.2-2.7-2.7v-.3c0-1.5 1.2-2.7 2.7-2.7h.3c1.5 0 2.7 1.2 2.7 2.7v.3c0 1.5-1.2 2.7-2.7 2.7zM234.1 76.3c1.2 2.5-1.3 5-3.8 3.8-.5-.2-1-.7-1.2-1.2-1.2-2.5 1.3-5 3.8-3.8.6.2 1 .7 1.2 1.2z"
/>
<path fill="currentColor" d="M183.4 53.8c1.2 2.5-1.3 5-3.8 3.8-.5-.2-1-.7-1.2-1.2-1.2-2.5 1.3-5 3.8-3.8.5.2 1 .6 1.2 1.2zM189.4 52.2h16.9v5.6h-16.9zM211.9 52.2h33.8v5.6h-33.8zM178.1 74.8h5.6v5.6h-5.6zM189.4 74.8h33.8v5.6h-33.8zM240.1 74.8H257v5.6h-16.9zM251.3 52.2h5.6v5.6h-5.6zM178.1 63.5h22.5v5.6h-22.5zM217.5 63.5h12.7v5.6h-12.7zM234.4 63.5h22.5v5.6h-22.5zM209.2 69.1h-.3c-1.5 0-2.7-1.2-2.7-2.7v-.3c0-1.5 1.2-2.7 2.7-2.7h.3c1.5 0 2.7 1.2 2.7 2.7v.3c0 1.5-1.2 2.7-2.7 2.7zM234.1 76.3c1.2 2.5-1.3 5-3.8 3.8-.5-.2-1-.7-1.2-1.2-1.2-2.5 1.3-5 3.8-3.8.6.2 1 .7 1.2 1.2z"/>
</symbol>
<symbol id="spacy" viewBox="0 0 675 215">

Before

Width:  |  Height:  |  Size: 15 KiB

After

Width:  |  Height:  |  Size: 18 KiB

View File

@ -398,11 +398,12 @@ p
| vectors files, you can use the
| #[+src(gh("spacy-dev-resources", "training/init.py")) init.py]
| script from our
| #[+a(gh("spacy-dev-resources")) developer resources] to create a
| spaCy data directory:
| #[+a(gh("spacy-dev-resources")) developer resources], or use the new
| #[+a("/docs/usage/cli#model") #[code model] command] to create a data
| directory:
+code(false, "bash").
python training/init.py xx your_data_directory/ my_data/word_freqs.txt my_data/clusters.txt my_data/word_vectors.bz2
python -m spacy model [lang] [model_dir] [freqs_data] [clusters_data] [vectors_data]
+aside-code("your_data_directory", "yaml").
├── vocab/
@ -421,17 +422,14 @@ p
p
| This creates a spaCy data directory with a vocabulary model, ready to be
| loaded. By default, the
| #[+src(gh("spacy-dev-resources", "training/init.py")) init.py]
| script expects to be able to find your language class using
| #[code spacy.util.get_lang_class(lang_id)]. You can edit the script to
| help it find your language class if necessary.
| loaded. By default, the command expects to be able to find your language
| class using #[code spacy.util.get_lang_class(lang_id)].
+h(3, "word-frequencies") Word frequencies
p
| The #[+src(gh("spacy-dev-resources", "training/init.py")) init.py]
| script expects a tab-separated word frequencies file with three columns:
| The #[+a("/docs/usage/cli#model") #[code model] command] expects a
| tab-separated word frequencies file with three columns:
+list("numbers")
+item The number of times the word occurred in your language sample.

View File

@ -145,7 +145,9 @@ p
+h(2, "model") Model
+tag experimental
p Initialise a new model and its data directory.
p
| Initialise a new model and its data directory. For more info on this, see
| the documentation on #[+a("/docs/usage/adding-languages") adding languages].
+code(false, "bash").
python -m spacy model [lang] [model_dir] [freqs_data] [clusters_data] [vectors_data]

View File

@ -57,7 +57,7 @@ p
doc.ents = [Span(doc, 0, 1, label=doc.vocab.strings['GPE'])]
assert doc[0].ent_type_ == 'GPE'
doc.ents = []
doc.ents = [(u'LondonCity', doc.vocab.strings['GPE']), 0, 1)]
doc.ents = [(u'LondonCity', doc.vocab.strings['GPE'], 0, 1)]
p
| The value you assign should be a sequence, the values of which

View File

@ -11,6 +11,8 @@ include _includes/_mixins
h2.c-landing__title.o-block.u-heading-1
| in Python
+landing-badge("https://survey.spacy.io", "usersurvey", "Take the user survey!")
+grid.o-content
+grid-col("third").o-card
+h(2) Fastest in the world