Update CONTRIBUTING.md [ci skip]

This commit is contained in:
Ines Montani 2021-01-31 12:51:28 +11:00
parent 1f1fbdba14
commit 6a7ffffeb3

View File

@ -3,7 +3,9 @@
# Contribute to spaCy
Thanks for your interest in contributing to spaCy 🎉 The project is maintained
by [@honnibal](https://github.com/honnibal) and [@ines](https://github.com/ines),
by **[@honnibal](https://github.com/honnibal)**,
**[@ines](https://github.com/ines)**, **[@svlandeg](https://github.com/svlandeg)** and
**[@adrianeboyd](https://github.com/adrianeboyd)**,
and we'll do our best to help you get started. This page will give you a quick
overview of how things are organized and most importantly, how to get involved.
@ -50,8 +52,7 @@ issue body. A few more tips:
parts and don't just dump your entire script. This will make it easier for us to
reproduce the error.
- **Getting info about your spaCy installation and environment:** If you're
using spaCy v1.7+, you can use the command line interface to print details and
- **Getting info about your spaCy installation and environment:** You can use the command line interface to print details and
even format them as Markdown to copy-paste into GitHub issues:
`python -m spacy info --markdown`.
@ -60,7 +61,7 @@ issue body. A few more tips:
model is incompatible with your spaCy installation. In spaCy v2.0+, you can check
this on the command line by running `python -m spacy validate`.
- **Sharing a model's output, like dependencies and entities:** spaCy v2.0+
- **Sharing a model's output, like dependencies and entities:** spaCy
comes with [built-in visualizers](https://spacy.io/usage/visualizers) that
you can run from within your script or a Jupyter notebook. For some issues, it's
helpful to **include a screenshot** of the visualization. You can simply drag and
@ -99,7 +100,7 @@ questions:
changes to spaCy's built-in methods. In contrast, a library of word
alignment functions could easily live as a separate package that depended on
spaCy — there's little difference between writing `import word_aligner` and
`import spacy.word_aligner`. spaCy v2.0+ makes it easy to implement
`import spacy.word_aligner`. spaCy makes it easy to implement
[custom pipeline components](https://spacy.io/usage/processing-pipelines#custom-components),
and add your own attributes, properties and methods to the `Doc`, `Token` and
`Span`. If you're looking to implement a new spaCy feature, starting with a
@ -109,8 +110,8 @@ questions:
library later.
- **Would the feature be easier to implement if it relied on "heavy" dependencies spaCy doesn't currently require?**
Python has a very rich ecosystem. Libraries like scikit-learn, SciPy, Gensim or
TensorFlow/Keras do lots of useful things — but we don't want to have them as
Python has a very rich ecosystem. Libraries like PyTorch, TensorFlow, scikit-learn, SciPy or Gensim
do lots of useful things — but we don't want to have them as default
dependencies. If the feature requires functionality in one of these libraries,
it's probably better to break it out into a different package.
@ -137,19 +138,7 @@ files, a compiler, [pip](https://pip.pypa.io/en/latest/installing/),
[virtualenv](https://virtualenv.pypa.io/en/stable/) and
[git](https://git-scm.com) installed. The compiler is usually the trickiest part.
```
python -m pip install -U pip
git clone https://github.com/explosion/spaCy
cd spaCy
python -m venv .env
source .env/bin/activate
export PYTHONPATH=`pwd`
pip install -r requirements.txt
python setup.py build_ext --inplace
```
If you've made changes to `.pyx` files, you need to recompile spaCy before you
If you've made changes to `.pyx` files, you need to **recompile spaCy** before you
can test your changes by re-running `python setup.py build_ext --inplace`.
Changes to `.py` files will be effective immediately.
@ -184,7 +173,7 @@ sure your test passes and reference the issue in your commit message.
## Code conventions
Code should loosely follow [pep8](https://www.python.org/dev/peps/pep-0008/).
As of `v2.1.0`, spaCy uses [`black`](https://github.com/ambv/black) for code
spaCy uses [`black`](https://github.com/ambv/black) for code
formatting and [`flake8`](http://flake8.pycqa.org/en/latest/) for linting its
Python modules. If you've built spaCy from source, you'll already have both
tools installed.
@ -216,8 +205,7 @@ list of available editor integrations.
#### Disabling formatting
There are a few cases where auto-formatting doesn't improve readability for
example, in some of the language data files like the `tag_map.py`, or in
the tests that construct `Doc` objects from lists of words and other labels.
example, in some of the language data files or in the tests that construct `Doc` objects from lists of words and other labels.
Wrapping a block in `# fmt: off` and `# fmt: on` lets you disable formatting
for that particular code. Here's an example:
@ -281,6 +269,9 @@ except: # noqa: E722
### Python conventions
All Python code must be written **compatible with Python 3.6+**.
#### I/O and handling paths
Code that interacts with the file-system should accept objects that follow the
`pathlib.Path` API, without assuming that the object inherits from `pathlib.Path`.
If the function is user-facing and takes a path as an argument, it should check
@ -290,14 +281,18 @@ accept **file-like objects**, as it makes the library IO-agnostic. Working on
buffers makes the code more general, easier to test, and compatible with Python
3's asynchronous IO.
#### Composition vs. inheritance
Although spaCy uses a lot of classes, **inheritance is viewed with some suspicion**
— it's seen as a mechanism of last resort. You should discuss plans to extend
the class hierarchy before implementing.
#### Naming conventions
We have a number of conventions around variable naming that are still being
documented, and aren't 100% strict. A general policy is that instances of the
class `Doc` should by default be called `doc`, `Token` `token`, `Lexeme` `lex`,
`Vocab` `vocab` and `Language` `nlp`. You should avoid naming variables that are
class `Doc` should by default be called `doc`, `Token` → `token`, `Lexeme` → `lex`,
`Vocab` → `vocab` and `Language` → `nlp`. You should avoid naming variables that are
of other types these names. For instance, don't name a text string `doc` — you
should usually call this `text`. Two general code style preferences further help
with naming. First, **lean away from introducing temporary variables**, as these