# Code Conventions For a general overview of code conventions for contributors, see the [section in the contributing guide](https://github.com/explosion/spaCy/blob/master/CONTRIBUTING.md#code-conventions). 1. [Code compatibility](#code-compatibility) 2. [Auto-formatting](#auto-formatting) 3. [Linting](#linting) 4. [Documenting code](#documenting-code) 5. [Type hints](#type-hints) 6. [Structuring logic](#structuring-logic) 7. [Naming](#naming) 8. [Error handling](#error-handling) 9. [Writing tests](#writing-tests) ## Code compatibility spaCy supports **Python 3.6** and above, so all code should be written compatible with 3.6. This means that there are certain new syntax features that we won't be able to use until we drop support for older Python versions. Some newer features provide backports that we can conditionally install for older versions, although we only want to do this if it's absolutely necessary. If we need to use conditional imports based on the Python version or other custom compatibility-specific helpers, those should live in `compat.py`. ## Auto-formatting spaCy uses `black` for auto-formatting (which is also available as a pre-commit hook). It's recommended to configure your editor to perform this automatically, either triggered manually or whenever you save a file. We also have a GitHub action that regularly formats the code base and submits a PR if changes are available. Note that auto-formatting is currently only available for `.py` (Python) files, not for `.pyx` (Cython). As a rule of thumb, if the auto-formatting produces output that looks messy, it can often indicate that there's a better way to structure the code to make it more concise. ```diff - range_suggester = registry.misc.get("spacy.ngram_range_suggester.v1")( - min_size=1, max_size=3 - ) + suggester_factory = registry.misc.get("spacy.ngram_range_suggester.v1") + range_suggester = suggester_factory(min_size=1, max_size=3) ``` In some specific cases, e.g. in the tests, it can make sense to disable auto-formatting for a specific block. You can do this by wrapping the code in `# fmt: off` and `# fmt: on`: ```diff + # fmt: off text = "I look forward to using Thingamajig. I've been told it will make my life easier..." deps = ["nsubj", "ROOT", "advmod", "prep", "pcomp", "dobj", "punct", "", "nsubjpass", "aux", "auxpass", "ROOT", "nsubj", "aux", "ccomp", "poss", "nsubj", "ccomp", "punct"] + # fmt: on ``` ## Linting [`flake8`](http://flake8.pycqa.org/en/latest/) is a tool for enforcing code style. It scans one or more files and outputs errors and warnings. This feedback can help you stick to general standards and conventions, and can be very useful for spotting potential mistakes and inconsistencies in your code. Code you write should be compatible with our flake8 rules and not cause any warnings. ```bash flake8 spacy ``` The most common problems surfaced by linting are: - **Trailing or missing whitespace.** This is related to formatting and should be fixed automatically by running `black`. - **Unused imports.** Those should be removed if the imports aren't actually used. If they're required, e.g. to expose them so they can be imported from the given module, you can add a comment and `# noqa: F401` exception (see details below). - **Unused variables.** This can often indicate bugs, e.g. a variable that's declared and not correctly passed on or returned. To prevent ambiguity here, your code shouldn't contain unused variables. If you're unpacking a list of tuples and end up with variables you don't need, you can call them `_` to indicate that they're unused. - **Redefinition of function.** This can also indicate bugs, e.g. a copy-pasted function that you forgot to rename and that now replaces the original function. - **Repeated dictionary keys.** This either indicates a bug or unnecessary duplication. - **Comparison with `True`, `False`, `None`**. This is mostly a stylistic thing: when checking whether a value is `True`, `False` or `None`, you should be using `is` instead of `==`. For example, `if value is None`. ### Ignoring linter rules for special cases To ignore a given line, you can add a comment like `# noqa: F401`, specifying the code of the error or warning we want to ignore. It's also possible to ignore several comma-separated codes at once, e.g. `# noqa: E731,E123`. In general, you should always **specify the code(s)** you want to ignore – otherwise, you may end up missing actual problems. ```python # The imported class isn't used in this file, but imported here, so it can be # imported *from* here by another module. from .submodule import SomeClass # noqa: F401 try: do_something() except: # noqa: E722 # This bare except is justified, for some specific reason do_something_else() ``` ## Documenting code All functions and methods you write should be documented with a docstring inline. The docstring can contain a simple summary, and an overview of the arguments and their (simplified) types. Modern editors will show this information to users when they call the function or method in their code. If it's part of the public API and there's a documentation section available, we usually add the link as `DOCS:` at the end. This allows us to keep the docstrings simple and concise, while also providing additional information and examples if necessary. ```python def has_pipe(self, name: str) -> bool: """Check if a component name is present in the pipeline. Equivalent to `name in nlp.pipe_names`. name (str): Name of the component. RETURNS (bool): Whether a component of the name exists in the pipeline. DOCS: https://spacy.io/api/language#has_pipe """ ... ``` We specifically chose this approach of maintaining the docstrings and API reference separately, instead of auto-generating the API docs from the docstrings like other packages do. We want to be able to provide extensive explanations and examples in the documentation and use our own custom markup for it that would otherwise clog up the docstrings. We also want to be able to update the documentation independently of the code base. It's slightly more work, but it's absolutely worth it in terms of user and developer experience. ### Inline code comments We don't expect you to add inline comments for everything you're doing – this should be obvious from reading the code. If it's not, the first thing to check is whether your code can be improved to make it more explicit. That said, if your code includes complex logic or aspects that may be unintuitive at first glance (or even included a subtle bug that you ended up fixing), you should leave a quick comment that provides more context. ```diff token_index = indices[value] + # Index describes Token.i of last token but Span indices are inclusive span = doc[prev_token_index:token_index + 1] ``` ```diff + # To create the components we need to use the final interpolated config + # so all values are available (if component configs use variables). + # Later we replace the component config with the raw config again. interpolated = filled.interpolate() if not filled.is_interpolated else filled ``` Don't be shy about including comments for tricky parts that _you_ found hard to implement or get right – those may come in handy for the next person working on this code, or even future you! If your change implements a fix to a specific issue, it can often be helpful to include the issue number in the comment, especially if it's a relatively straightforward adjustment: ```diff + # Ensure object is a Span, not a Doc (#1234) if isinstance(obj, Doc): obj = obj[obj.start:obj.end] ``` ### Including TODOs It's fine to include code comments that indicate future TODOs, using the `TODO:` prefix. Modern editors typically format this in a different color, so it's easy to spot. TODOs don't necessarily have to be things that are absolutely critical to fix fight now – those should already be addressed in your pull request once it's ready for review. But they can include notes about potential future improvements. ```diff + # TODO: this is currently pretty slow dir_checksum = hashlib.md5() for sub_file in sorted(fp for fp in path.rglob("*") if fp.is_file()): dir_checksum.update(sub_file.read_bytes()) ``` If any of the TODOs you've added are important and should be fixed soon, you should add a task for this on Explosion's internal Ora board or an issue on the public issue tracker to make sure we don't forget to address it. ## Type hints We use Python type hints across the `.py` files wherever possible. This makes it easy to understand what a function expects and returns, and modern editors will be able to show this information to you when you call an annotated function. Type hints are not currently used in the `.pyx` (Cython) code, except for definitions of registered functions and component factories, where they're used for config validation. If possible, you should always use the more descriptive type hints like `List[str]` or even `List[Any]` instead of only `list`. We also annotate arguments and return types of `Callable` – although, you can simplify this if the type otherwise gets too verbose (e.g. functions that return factories to create callbacks). Remember that `Callable` takes two values: a **list** of the argument type(s) in order, and the return values. ```diff - def func(some_arg: dict) -> None: + def func(some_arg: Dict[str, Any]) -> None: ... ``` ```python def create_callback(some_arg: bool) -> Callable[[str, int], List[str]]: def callback(arg1: str, arg2: int) -> List[str]: ... return callback ``` For model architectures, Thinc also provides a collection of [custom types](https://thinc.ai/docs/api-types), including more specific types for arrays and model inputs/outputs. Even outside of static type checking, using these types will make the code a lot easier to read and follow, since it's always clear what array types are expected (and what might go wrong if the output is different from the expected type). ```python def build_tagger_model( tok2vec: Model[List[Doc], List[Floats2d]], nO: Optional[int] = None ) -> Model[List[Doc], List[Floats2d]]: ... ``` If you need to use a type hint that refers to something later declared in the same module, or the class that a method belongs to, you can use a string value instead: ```python class SomeClass: def from_bytes(self, data: bytes) -> "SomeClass": ... ``` In some cases, you won't be able to import a class from a different module to use it as a type hint because it'd cause circular imports. For instance, `spacy/util.py` includes various helper functions that return an instance of `Language`, but we couldn't import it, because `spacy/language.py` imports `util` itself. In this case, we can provide `"Language"` as a string and make the import conditional on `typing.TYPE_CHECKING` so it only runs when the code is evaluated by a type checker: ```python from typing TYPE_CHECKING if TYPE_CHECKING: from .language import Language def load_model(name: str) -> "Language": ... ``` ## Structuring logic ### Positional and keyword arguments We generally try to avoid writing functions and methods with too many arguments, and use keyword-only arguments wherever possible. Python lets you define arguments as keyword-only by separating them with a `, *`. If you're writing functions with additional arguments that customize the behavior, you typically want to make those arguments keyword-only, so their names have to be provided explicitly. ```diff - def do_something(name: str, validate: bool = False): + def do_something(name: str, *, validate: bool = False): ... - do_something("some_name", True) + do_something("some_name", validate=True) ``` This makes the function calls easier to read, because it's immediately clear what the additional values mean. It also makes it easier to extend arguments or change their order later on, because you don't end up with any function calls that depend on a specific positional order. ### Avoid mutable default arguments A common Python gotcha are [mutable default arguments](https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments): if your argument defines a mutable default value like `[]` or `{}` and then goes and mutates it, the default value is created _once_ when the function is created and the same object is then mutated every time the function is called. This can be pretty unintuitive when you first encounter it. We therefore avoid writing logic that does this. If your arguments need to default to an empty list or dict, you can use the `SimpleFrozenList` and `SimpleFrozenDict` helpers provided by spaCy. They are simple frozen implementations that raise an error if they're being mutated to prevent bugs and logic that accidentally mutates default arguments. ```diff - def to_bytes(self, *, exclude: List[str] = []): + def to_bytes(self, *, exclude: List[str] = SimpleFrozenList()): ... ``` ```diff def do_something(values: List[str] = SimpleFrozenList()): if some_condition: - values.append("foo") # raises an error + values = [*values, "foo"] return values ``` ### Don't use `try`/`except` for control flow We strongly discourage using `try`/`except` blocks for anything that's not third-party error handling or error handling that we otherwise have little control over. There's typically always a way to anticipate the _actual_ problem and **check for it explicitly**, which makes the code easier to follow and understand, and prevents bugs: ```diff - try: - token = doc[i] - except IndexError: - token = doc[-1] + if i < len(doc): + token = doc[i] + else: + token = doc[-1] ``` Even if you end up having to check for multiple conditions explicitly, this is still preferred over a catch-all `try`/`except`. It can be very helpful to think about the exact scenarios you need to cover, and what could go wrong at each step, which often leads to better code and fewer bugs. `try/except` blocks can also easily mask _other_ bugs and problems that raise the same errors you're catching, which is obviously bad. If you have to use `try`/`except`, make sure to only include what's **absolutely necessary** in the `try` block and define the exception(s) explicitly. Otherwise, you may end up masking very different exceptions caused by other bugs. ```diff - try: - value1 = get_some_value() - value2 = get_some_other_value() - score = external_library.compute_some_score(value1, value2) - except: - score = 0.0 + value1 = get_some_value() + value2 = get_some_other_value() + try: + score = external_library.compute_some_score(value1, value2) + except ValueError: + score = 0.0 ``` ### Avoid lambda functions `lambda` functions can be useful for defining simple anonymous functions in a single line, but they also introduce problems: for instance, they require [additional logic](https://stackoverflow.com/questions/25348532/can-python-pickle-lambda-functions) in order to be pickled and are pretty ugly to type-annotate. So we typically avoid them in the code base and only use them in the serialization handlers and within tests for simplicity. Instead of `lambda`s, check if your code can be refactored to not need them, or use helper functions instead. ```diff - split_string: Callable[[str], List[str]] = lambda value: [v.strip() for v in value.split(",")] + def split_string(value: str) -> List[str]: + return [v.strip() for v in value.split(",")] ``` ### Iteration and comprehensions We generally avoid using built-in functions like `filter` or `map` in favor of list or generator comprehensions. ```diff - filtered = filter(lambda x: x in ["foo", "bar"], values) + filtered = (x for x in values if x in ["foo", "bar"]) - filtered = list(filter(lambda x: x in ["foo", "bar"], values)) + filtered = [x for x in values if x in ["foo", "bar"]] - result = map(lambda x: { x: x in ["foo", "bar"]}, values) + result = ({x: x in ["foo", "bar"]} for x in values) - result = list(map(lambda x: { x: x in ["foo", "bar"]}, values)) + result = [{x: x in ["foo", "bar"]} for x in values] ``` If your logic is more complex, it's often better to write a loop instead, even if it adds more lines of code in total. The result will be much easier to follow and understand. ```diff - result = [{"key": key, "scores": {f"{i}": score for i, score in enumerate(scores)}} for key, scores in values] + result = [] + for key, scores in values: + scores_dict = {f"{i}": score for i, score in enumerate(scores)} + result.append({"key": key, "scores": scores_dict}) ``` ### Composition vs. inheritance Although spaCy uses a lot of classes, **inheritance is viewed with some suspicion** — it's seen as a mechanism of last resort. You should discuss plans to extend the class hierarchy before implementing. Unless you're implementing a new data structure or pipeline component, you typically shouldn't have to use classes at all. ### Don't use `print` The core library never `print`s anything. While we encourage using `print` statements for simple debugging (it's the most straightforward way of looking at what's happening), make sure to clean them up once you're ready to submit your pull request. If you want to output warnings or debugging information for users, use the respective dedicated mechanisms for this instead (see sections on warnings and logging for details). The only exceptions are the CLI functions, which pretty-print messages for the user, and methods that are explicitly intended for printing things, e.g. `Language.analyze_pipes` with `pretty=True` enabled. For this, we use our lightweight helper library [`wasabi`](https://github.com/ines/wasabi). ## Naming Naming is hard and often a topic of long internal discussions. We don't expect you to come up with the perfect names for everything you write – finding the right names is often an iterative and collaborative process. That said, we do try to follow some basic conventions. Consistent with general Python conventions, we use `CamelCase` for class names including dataclasses, `snake_case` for methods, functions and variables, and `UPPER_SNAKE_CASE` for constants, typically defined at the top of a module. We also avoid using variable names that shadow the names of built-in functions, e.g. `input`, `help` or `list`. ### Naming variables Variable names should always make it clear _what exactly_ the variable is and what it's used for. Instances of common classes should use the same consistent names. For example, you should avoid naming a text string (or anything else that's not a `Doc` object) `doc`. The most common class-to-variable mappings are: | Class | Variable | Example | | ---------- | --------------------- | ------------------------------------------- | | `Language` | `nlp` | `nlp = spacy.blank("en")` | | `Doc` | `doc` | `doc = nlp("Some text")` | | `Span` | `span`, `ent`, `sent` | `span = doc[1:4]`, `ent = doc.ents[0]` | | `Token` | `token` | `token = doc[0]` | | `Lexeme` | `lexeme`, `lex` | `lex = nlp.vocab["foo"]` | | `Vocab` | `vocab` | `vocab = Vocab()` | | `Example` | `example`, `eg` | `example = Example.from_dict(doc, gold)` | | `Config` | `config`, `cfg` | `config = Config().from_disk("config.cfg")` | We try to avoid introducing too many temporary variables, as these clutter your namespace. It's okay to re-assign to an existing variable, but only if the value has the same type. ```diff ents = get_a_list_of_entities() ents = [ent for ent in doc.ents if ent.label_ == "PERSON"] - ents = {(ent.start, ent.end): ent.label_ for ent in ents} + ent_mappings = {(ent.start, ent.end): ent.label_ for ent in ents} ``` ### Naming methods and functions Try choosing short and descriptive names wherever possible and imperative verbs for methods that do something, e.g. `disable_pipes`, `add_patterns` or `get_vector`. Private methods and functions that are not intended to be part of the user-facing API should be prefixed with an underscore `_`. It's often helpful to look at the existing classes for inspiration. Objects that can be serialized, e.g. data structures and pipeline components, should implement the same consistent methods for serialization. Those usually include at least `to_disk`, `from_disk`, `to_bytes` and `from_bytes`. Some objects can also implement more specific methods like `{to/from}_dict` or `{to/from}_str`. ## Error handling We always encourage writing helpful and detailed custom error messages for everything we can anticipate going wrong, and including as much detail as possible. spaCy provides a directory of error messages in `errors.py` with unique codes for each message. This allows us to keep the code base more concise and avoids long and nested blocks of texts throughout the code that disrupt the reading flow. The codes make it easy to find references to the same error in different places, and also helps identify problems reported by users (since we can just search for the error code). Errors can be referenced via their code, e.g. `Errors.E123`. Messages can also include placeholders for values, that can be populated by formatting the string with `.format()`. ```python class Errors: E123 = "Something went wrong" E456 = "Unexpected value: {value}" ``` ```diff if something_went_wrong: - raise ValueError("Something went wrong!") + raise ValueError(Errors.E123) if not isinstance(value, int): - raise ValueError(f"Unexpected value: {value}") + raise ValueError(Errors.E456.format(value=value)) ``` As a general rule of thumb, all error messages raised within the **core library** should be added to `Errors`. The only place where we write errors and messages as strings is `spacy.cli`, since these functions typically pretty-print and generate a lot of output that'd otherwise be very difficult to separate from the actual logic. ### Re-raising exceptions If we anticipate possible errors in third-party code that we don't control, or our own code in a very different context, we typically try to provide custom and more specific error messages if possible. If we need to re-raise an exception within a `try`/`except` block, we can re-raise a custom exception. [Re-raising `from`](https://docs.python.org/3/tutorial/errors.html#exception-chaining) the original caught exception lets us chain the exceptions, so the user sees both the original error, as well as the custom message with a note "The above exception was the direct cause of the following exception". ```diff try: run_third_party_code_that_might_fail() except ValueError as e: + raise ValueError(Errors.E123) from e ``` In some cases, it makes sense to suppress the original exception, e.g. if we know what it is and know that it's not particularly helpful. In that case, we can raise `from None`. This prevents clogging up the user's terminal with multiple and irrelevant chained exceptions. ```diff try: run_our_own_code_that_might_fail_confusingly() except ValueError: + raise ValueError(Errors.E123) from None ``` ### Avoid using naked `assert` During development, it can sometimes be helpful to add `assert` statements throughout your code to make sure that the values you're working with are what you expect. However, as you clean up your code, those should either be removed or replaced by more explicit error handling: ```diff - assert score >= 0.0 + if score < 0.0: + raise ValueError(Errors.789.format(score=score)) ``` Otherwise, the user will get to see a naked `AssertionError` with no further explanation, which is very unhelpful. Instead of adding an error message to `assert`, it's always better to `raise` more explicit errors for specific conditions. If you're checking for something that _has to be right_ and would otherwise be a bug in spaCy, you can express this in the error message: ```python E161 = ("Found an internal inconsistency when predicting entity links. " "This is likely a bug in spaCy, so feel free to open an issue: " "https://github.com/explosion/spaCy/issues") ``` ### Warnings Instead of raising an error, some parts of the code base can raise warnings to notify the user of a potential problem. This is done using Python's `warnings.warn` and the messages defined in `Warnings` in the `errors.py`. Whether or not warnings are shown can be controlled by the user, including custom filters for disabling specific warnings using a regular expression matching our internal codes, e.g. `W123`. ```diff - print("Warning: No examples provided for validation") + warnings.warn(Warnings.W123) ``` When adding warnings, make sure you're not calling `warnings.warn` repeatedly, e.g. in a loop, which will clog up the terminal output. Instead, you can collect the potential problems first and then raise a single warning. If the problem is critical, consider raising an error instead. ```diff + n_empty = 0 for spans in lots_of_annotations: if len(spans) == 0: - warnings.warn(Warnings.456) + n_empty += 1 + warnings.warn(Warnings.456.format(count=n_empty)) ``` ### Logging Log statements can be added via spaCy's `logger`, which uses Python's native `logging` module under the hood. We generally only use logging for debugging information that **the user may choose to see** in debugging mode or that's **relevant during training** but not at runtime. ```diff + logger.info("Set up nlp object from config") config = nlp.config.interpolate() ``` `spacy train` and similar CLI commands will enable all log statements of level `INFO` by default (which is not the case at runtime). This allows outputting specific information within certain parts of the core library during training, without having it shown at runtime. `DEBUG`-level logs are only shown if the user enables `--verbose` logging during training. They can be used to provide more specific and potentially more verbose details, especially in areas that can indicate bugs or problems, or to surface more details about what spaCy does under the hood. You should only use logging statements if absolutely necessary and important. ## Writing tests spaCy uses the [`pytest`](http://doc.pytest.org/) framework for testing. Tests for spaCy modules and classes live in their own directories of the same name and all test files should be prefixed with `test_`. Tests included in the core library only cover the code and do not depend on any trained pipelines. When implementing a new feature or fixing a bug, it's usually good to start by writing some tests that describe what _should_ happen. As you write your code, you can then keep running the relevant tests until all of them pass. ### Test suite structure When adding tests, make sure to use descriptive names and only test for one behavior at a time. Tests should be grouped into modules dedicated to the same type of functionality and some test modules are organized as directories of test files related to the same larger area of the library, e.g. `matcher` or `tokenizer`. Regression tests are tests that refer to bugs reported in specific issues. They should live in the relevant module of the test suite, named according to the issue number (e.g., `test_issue1234.py`), and [marked](https://docs.pytest.org/en/6.2.x/example/markers.html#working-with-custom-markers) appropriately (e.g. `@pytest.mark.issue(1234)`). This system allows us to relate tests for specific bugs back to the original reported issue, which is especially useful if we introduce a regression and a previously passing regression tests suddenly fails again. When fixing a bug, it's often useful to create a regression test for it first. The test suite also provides [fixtures](https://github.com/explosion/spaCy/blob/master/spacy/tests/conftest.py) for different language tokenizers that can be used as function arguments of the same name and will be passed in automatically. Those should only be used for tests related to those specific languages. We also have [test utility functions](https://github.com/explosion/spaCy/blob/master/spacy/tests/util.py) for common operations, like creating a temporary file. ### Constructing objects and state Test functions usually follow the same simple structure: they set up some state, perform the operation you want to test and `assert` conditions that you expect to be true, usually before and after the operation. Tests should focus on exactly what they're testing and avoid dependencies on other unrelated library functionality wherever possible. If all your test needs is a `Doc` object with certain annotations set, you should always construct it manually: ```python def test_doc_creation_with_pos(): doc = Doc(Vocab(), words=["hello", "world"], pos=["NOUN", "VERB"]) assert doc[0].pos_ == "NOUN" assert doc[1].pos_ == "VERB" ``` ### Parametrizing tests If you need to run the same test function over different input examples, you usually want to parametrize the test cases instead of using a loop within your test. This lets you keep a better separation between test cases and test logic, and it'll result in more useful output because `pytest` will be able to tell you which exact test case failed. The `@pytest.mark.parametrize` decorator takes two arguments: a string defining one or more comma-separated arguments that should be passed to the test function and a list of corresponding test cases (or a list of tuples to provide multiple arguments). ```python @pytest.mark.parametrize("words", [["hello", "world"], ["this", "is", "a", "test"]]) def test_doc_length(words): doc = Doc(Vocab(), words=words) assert len(doc) == len(words) ``` ```python @pytest.mark.parametrize("text,expected_len", [("hello world", 2), ("I can't!", 4)]) def test_token_length(en_tokenizer, text, expected_len): # en_tokenizer is a fixture doc = en_tokenizer(text) assert len(doc) == expected_len ``` You can also stack `@pytest.mark.parametrize` decorators, although this is not recommended unless it's absolutely needed or required for the test. When stacking decorators, keep in mind that this will run the test with all possible combinations of the respective parametrized values, which is often not what you want and can slow down the test suite. ### Handling failing tests `xfail` means that a test **should pass but currently fails**, i.e. is expected to fail. You can mark a test as currently xfailing by adding the `@pytest.mark.xfail` decorator. This should only be used for tests that don't yet work, not for logic that cause errors we raise on purpose (see the section on testing errors for this). It's often very helpful to implement tests for edge cases that we don't yet cover and mark them as `xfail`. You can also provide a `reason` keyword argument to the decorator with an explanation of why the test currently fails. ```diff + @pytest.mark.xfail(reason="Issue #225 - not yet implemented") def test_en_tokenizer_splits_em_dash_infix(en_tokenizer): doc = en_tokenizer("Will this road take me to Puddleton?\u2014No.") assert doc[8].text == "\u2014" ``` When you run the test suite, you may come across tests that are reported as `xpass`. This means that they're marked as `xfail` but didn't actually fail. This is worth looking into: sometimes, it can mean that we have since fixed a bug that caused the test to previously fail, so we can remove the decorator. In other cases, especially when it comes to machine learning model implementations, it can also indicate that the **test is flaky**: it sometimes passes and sometimes fails. This can be caused by a bug, or by constraints being too narrowly defined. If a test shows different behavior depending on whether its run in isolation or not, this can indicate that it reacts to global state set in a previous test, which is unideal and should be avoided. ### Writing slow tests If a test is useful but potentially quite slow, you can mark it with the `@pytest.mark.slow` decorator. This is a special marker we introduced and tests decorated with it only run if you run the test suite with `--slow`, but not as part of the main CI process. Before introducing a slow test, double-check that there isn't another and more efficient way to test for the behavior. You should also consider adding a simpler test with maybe only a subset of the test cases that can always run, so we at least have some coverage. ### Skipping tests The `@pytest.mark.skip` decorator lets you skip tests entirely. You only want to do this for failing tests that may be slow to run or cause memory errors or segfaults, which would otherwise terminate the entire process and wouldn't be caught by `xfail`. We also sometimes use the `skip` decorator for old and outdated regression tests that we want to keep around but that don't apply anymore. When using the `skip` decorator, make sure to provide the `reason` keyword argument with a quick explanation of why you chose to skip this test. ### Testing errors and warnings `pytest` lets you check whether a given error is raised by using the `pytest.raises` contextmanager. This is very useful when implementing custom error handling, so make sure you're not only testing for the correct behavior but also for errors resulting from incorrect inputs. If you're testing errors, you should always check for `pytest.raises` explicitly and not use `xfail`. ```python words = ["a", "b", "c", "d", "e"] ents = ["Q-PERSON", "I-PERSON", "O", "I-PERSON", "I-GPE"] with pytest.raises(ValueError): Doc(Vocab(), words=words, ents=ents) ``` You can also use the `pytest.warns` contextmanager to check that a given warning type is raised. The first argument is the warning type or `None` (which will capture a list of warnings that you can `assert` is empty). ```python def test_phrase_matcher_validation(en_vocab): doc1 = Doc(en_vocab, words=["Test"], deps=["ROOT"]) doc2 = Doc(en_vocab, words=["Test"]) matcher = PhraseMatcher(en_vocab, validate=True) with pytest.warns(UserWarning): # Warn about unnecessarily parsed document matcher.add("TEST1", [doc1]) with pytest.warns(None) as record: matcher.add("TEST2", [docs]) assert not record.list ``` Keep in mind that your tests will fail if you're using the `pytest.warns` contextmanager with a given warning and the warning is _not_ shown. So you should only use it to check that spaCy handles and outputs warnings correctly. If your test outputs a warning that's expected but not relevant to what you're testing, you can use the `@pytest.mark.filterwarnings` decorator and ignore specific warnings starting with a given code: ```python @pytest.mark.filterwarnings("ignore:\\[W036") def test_matcher_empty(en_vocab): matcher = Matcher(en_vocab) matcher(Doc(en_vocab, words=["test"])) ``` ### Testing trained pipelines Our regular test suite does not depend on any of the trained pipelines, since their outputs can vary and aren't generally required to test the library functionality. We test pipelines separately using the tests included in the [`spacy-models`](https://github.com/explosion/spacy-models) repository, which run whenever we train a new suite of models. The tests here mostly focus on making sure that the packages can be loaded and that the predictions seam reasonable, and they include checks for common bugs we encountered previously. If your test does not primarily focus on verifying a model's predictions, it should be part of the core library tests and construct the required objects manually, instead of being added to the models tests. Keep in mind that specific predictions may change, and we can't test for all incorrect predictions reported by users. Different models make different mistakes, so even a model that's significantly more accurate overall may end up making wrong predictions that it previously didn't. However, some surprising incorrect predictions may indicate deeper bugs that we definitely want to investigate.