Update CONTRIBUTING.md [ci skip]

This commit is contained in:
Ines Montani 2021-01-31 12:51:28 +11:00
parent 1f1fbdba14
commit 6a7ffffeb3

View File

@ -3,7 +3,9 @@
# Contribute to spaCy # Contribute to spaCy
Thanks for your interest in contributing to spaCy 🎉 The project is maintained Thanks for your interest in contributing to spaCy 🎉 The project is maintained
by [@honnibal](https://github.com/honnibal) and [@ines](https://github.com/ines), by **[@honnibal](https://github.com/honnibal)**,
**[@ines](https://github.com/ines)**, **[@svlandeg](https://github.com/svlandeg)** and
**[@adrianeboyd](https://github.com/adrianeboyd)**,
and we'll do our best to help you get started. This page will give you a quick and we'll do our best to help you get started. This page will give you a quick
overview of how things are organized and most importantly, how to get involved. overview of how things are organized and most importantly, how to get involved.
@ -50,8 +52,7 @@ issue body. A few more tips:
parts and don't just dump your entire script. This will make it easier for us to parts and don't just dump your entire script. This will make it easier for us to
reproduce the error. reproduce the error.
- **Getting info about your spaCy installation and environment:** If you're - **Getting info about your spaCy installation and environment:** You can use the command line interface to print details and
using spaCy v1.7+, you can use the command line interface to print details and
even format them as Markdown to copy-paste into GitHub issues: even format them as Markdown to copy-paste into GitHub issues:
`python -m spacy info --markdown`. `python -m spacy info --markdown`.
@ -60,7 +61,7 @@ issue body. A few more tips:
model is incompatible with your spaCy installation. In spaCy v2.0+, you can check model is incompatible with your spaCy installation. In spaCy v2.0+, you can check
this on the command line by running `python -m spacy validate`. this on the command line by running `python -m spacy validate`.
- **Sharing a model's output, like dependencies and entities:** spaCy v2.0+ - **Sharing a model's output, like dependencies and entities:** spaCy
comes with [built-in visualizers](https://spacy.io/usage/visualizers) that comes with [built-in visualizers](https://spacy.io/usage/visualizers) that
you can run from within your script or a Jupyter notebook. For some issues, it's you can run from within your script or a Jupyter notebook. For some issues, it's
helpful to **include a screenshot** of the visualization. You can simply drag and helpful to **include a screenshot** of the visualization. You can simply drag and
@ -99,7 +100,7 @@ questions:
changes to spaCy's built-in methods. In contrast, a library of word changes to spaCy's built-in methods. In contrast, a library of word
alignment functions could easily live as a separate package that depended on alignment functions could easily live as a separate package that depended on
spaCy — there's little difference between writing `import word_aligner` and spaCy — there's little difference between writing `import word_aligner` and
`import spacy.word_aligner`. spaCy v2.0+ makes it easy to implement `import spacy.word_aligner`. spaCy makes it easy to implement
[custom pipeline components](https://spacy.io/usage/processing-pipelines#custom-components), [custom pipeline components](https://spacy.io/usage/processing-pipelines#custom-components),
and add your own attributes, properties and methods to the `Doc`, `Token` and and add your own attributes, properties and methods to the `Doc`, `Token` and
`Span`. If you're looking to implement a new spaCy feature, starting with a `Span`. If you're looking to implement a new spaCy feature, starting with a
@ -109,8 +110,8 @@ questions:
library later. library later.
- **Would the feature be easier to implement if it relied on "heavy" dependencies spaCy doesn't currently require?** - **Would the feature be easier to implement if it relied on "heavy" dependencies spaCy doesn't currently require?**
Python has a very rich ecosystem. Libraries like scikit-learn, SciPy, Gensim or Python has a very rich ecosystem. Libraries like PyTorch, TensorFlow, scikit-learn, SciPy or Gensim
TensorFlow/Keras do lots of useful things — but we don't want to have them as do lots of useful things — but we don't want to have them as default
dependencies. If the feature requires functionality in one of these libraries, dependencies. If the feature requires functionality in one of these libraries,
it's probably better to break it out into a different package. it's probably better to break it out into a different package.
@ -137,19 +138,7 @@ files, a compiler, [pip](https://pip.pypa.io/en/latest/installing/),
[virtualenv](https://virtualenv.pypa.io/en/stable/) and [virtualenv](https://virtualenv.pypa.io/en/stable/) and
[git](https://git-scm.com) installed. The compiler is usually the trickiest part. [git](https://git-scm.com) installed. The compiler is usually the trickiest part.
``` If you've made changes to `.pyx` files, you need to **recompile spaCy** before you
python -m pip install -U pip
git clone https://github.com/explosion/spaCy
cd spaCy
python -m venv .env
source .env/bin/activate
export PYTHONPATH=`pwd`
pip install -r requirements.txt
python setup.py build_ext --inplace
```
If you've made changes to `.pyx` files, you need to recompile spaCy before you
can test your changes by re-running `python setup.py build_ext --inplace`. can test your changes by re-running `python setup.py build_ext --inplace`.
Changes to `.py` files will be effective immediately. Changes to `.py` files will be effective immediately.
@ -184,7 +173,7 @@ sure your test passes and reference the issue in your commit message.
## Code conventions ## Code conventions
Code should loosely follow [pep8](https://www.python.org/dev/peps/pep-0008/). Code should loosely follow [pep8](https://www.python.org/dev/peps/pep-0008/).
As of `v2.1.0`, spaCy uses [`black`](https://github.com/ambv/black) for code spaCy uses [`black`](https://github.com/ambv/black) for code
formatting and [`flake8`](http://flake8.pycqa.org/en/latest/) for linting its formatting and [`flake8`](http://flake8.pycqa.org/en/latest/) for linting its
Python modules. If you've built spaCy from source, you'll already have both Python modules. If you've built spaCy from source, you'll already have both
tools installed. tools installed.
@ -216,8 +205,7 @@ list of available editor integrations.
#### Disabling formatting #### Disabling formatting
There are a few cases where auto-formatting doesn't improve readability for There are a few cases where auto-formatting doesn't improve readability for
example, in some of the language data files like the `tag_map.py`, or in example, in some of the language data files or in the tests that construct `Doc` objects from lists of words and other labels.
the tests that construct `Doc` objects from lists of words and other labels.
Wrapping a block in `# fmt: off` and `# fmt: on` lets you disable formatting Wrapping a block in `# fmt: off` and `# fmt: on` lets you disable formatting
for that particular code. Here's an example: for that particular code. Here's an example:
@ -281,6 +269,9 @@ except: # noqa: E722
### Python conventions ### Python conventions
All Python code must be written **compatible with Python 3.6+**. All Python code must be written **compatible with Python 3.6+**.
#### I/O and handling paths
Code that interacts with the file-system should accept objects that follow the Code that interacts with the file-system should accept objects that follow the
`pathlib.Path` API, without assuming that the object inherits from `pathlib.Path`. `pathlib.Path` API, without assuming that the object inherits from `pathlib.Path`.
If the function is user-facing and takes a path as an argument, it should check If the function is user-facing and takes a path as an argument, it should check
@ -290,14 +281,18 @@ accept **file-like objects**, as it makes the library IO-agnostic. Working on
buffers makes the code more general, easier to test, and compatible with Python buffers makes the code more general, easier to test, and compatible with Python
3's asynchronous IO. 3's asynchronous IO.
#### Composition vs. inheritance
Although spaCy uses a lot of classes, **inheritance is viewed with some suspicion** Although spaCy uses a lot of classes, **inheritance is viewed with some suspicion**
— it's seen as a mechanism of last resort. You should discuss plans to extend — it's seen as a mechanism of last resort. You should discuss plans to extend
the class hierarchy before implementing. the class hierarchy before implementing.
#### Naming conventions
We have a number of conventions around variable naming that are still being We have a number of conventions around variable naming that are still being
documented, and aren't 100% strict. A general policy is that instances of the documented, and aren't 100% strict. A general policy is that instances of the
class `Doc` should by default be called `doc`, `Token` `token`, `Lexeme` `lex`, class `Doc` should by default be called `doc`, `Token` → `token`, `Lexeme` → `lex`,
`Vocab` `vocab` and `Language` `nlp`. You should avoid naming variables that are `Vocab` → `vocab` and `Language` → `nlp`. You should avoid naming variables that are
of other types these names. For instance, don't name a text string `doc` — you of other types these names. For instance, don't name a text string `doc` — you
should usually call this `text`. Two general code style preferences further help should usually call this `text`. Two general code style preferences further help
with naming. First, **lean away from introducing temporary variables**, as these with naming. First, **lean away from introducing temporary variables**, as these