diff --git a/.github/contributors/merrcury.md b/.github/contributors/merrcury.md
new file mode 100644
index 000000000..056a790eb
--- /dev/null
+++ b/.github/contributors/merrcury.md
@@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+ * you hereby assign to us joint ownership, and to the extent that such
+ assignment is or becomes invalid, ineffective or unenforceable, you hereby
+ grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+ royalty-free, unrestricted license to exercise all rights under those
+ copyrights. This includes, at our option, the right to sublicense these same
+ rights to third parties through multiple levels of sublicensees or other
+ licensing arrangements;
+
+ * you agree that each of us can do all things in relation to your
+ contribution as if each of us were the sole owners, and if one of us makes
+ a derivative work of your contribution, the one who makes the derivative
+ work (or has it made will be the sole owner of that derivative work;
+
+ * you agree that you will not assert any moral rights in your contribution
+ against us, our licensees or transferees;
+
+ * you agree that we may register a copyright in your contribution and
+ exercise all ownership rights associated with it; and
+
+ * you agree that neither of us has any duty to consult with, obtain the
+ consent of, pay or render an accounting to the other for any use or
+ distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+ * make, have made, use, sell, offer to sell, import, and otherwise transfer
+ your contribution in whole or in part, alone or in combination with or
+ included in any product, work or materials arising out of the project to
+ which your contribution was submitted, and
+
+ * at our option, to sublicense these same rights to third parties through
+ multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+ * Each contribution that you submit is and shall be an original work of
+ authorship and you can legally grant the rights set out in this SCA;
+
+ * to the best of your knowledge, each contribution will not violate any
+ third party's copyrights, trademarks, patents, or other intellectual
+ property rights; and
+
+ * each contribution shall be in compliance with U.S. export control laws and
+ other applicable export and import laws. You agree to notify us if you
+ become aware of any circumstance which would make any of the foregoing
+ representations inaccurate in any respect. We may publicly disclose your
+ participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+ * [X] I am signing on behalf of myself as an individual and no other person
+ or entity, including my employer, has or will have rights with respect to my
+ contributions.
+
+ * [ ] I am signing on behalf of my employer or a legal entity and I have the
+ actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field | Entry |
+|------------------------------- | -------------------- |
+| Name | Himanshu Garg |
+| Company name (if applicable) | |
+| Title or role (if applicable) | |
+| Date | 2020-03-10 |
+| GitHub username | merrcury |
+| Website (optional) | |
diff --git a/.github/contributors/pinealan.md b/.github/contributors/pinealan.md
new file mode 100644
index 000000000..699b405e2
--- /dev/null
+++ b/.github/contributors/pinealan.md
@@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+ * you hereby assign to us joint ownership, and to the extent that such
+ assignment is or becomes invalid, ineffective or unenforceable, you hereby
+ grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+ royalty-free, unrestricted license to exercise all rights under those
+ copyrights. This includes, at our option, the right to sublicense these same
+ rights to third parties through multiple levels of sublicensees or other
+ licensing arrangements;
+
+ * you agree that each of us can do all things in relation to your
+ contribution as if each of us were the sole owners, and if one of us makes
+ a derivative work of your contribution, the one who makes the derivative
+ work (or has it made will be the sole owner of that derivative work;
+
+ * you agree that you will not assert any moral rights in your contribution
+ against us, our licensees or transferees;
+
+ * you agree that we may register a copyright in your contribution and
+ exercise all ownership rights associated with it; and
+
+ * you agree that neither of us has any duty to consult with, obtain the
+ consent of, pay or render an accounting to the other for any use or
+ distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+ * make, have made, use, sell, offer to sell, import, and otherwise transfer
+ your contribution in whole or in part, alone or in combination with or
+ included in any product, work or materials arising out of the project to
+ which your contribution was submitted, and
+
+ * at our option, to sublicense these same rights to third parties through
+ multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+ * Each contribution that you submit is and shall be an original work of
+ authorship and you can legally grant the rights set out in this SCA;
+
+ * to the best of your knowledge, each contribution will not violate any
+ third party's copyrights, trademarks, patents, or other intellectual
+ property rights; and
+
+ * each contribution shall be in compliance with U.S. export control laws and
+ other applicable export and import laws. You agree to notify us if you
+ become aware of any circumstance which would make any of the foregoing
+ representations inaccurate in any respect. We may publicly disclose your
+ participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+ * [x] I am signing on behalf of myself as an individual and no other person
+ or entity, including my employer, has or will have rights with respect to my
+ contributions.
+
+ * [ ] I am signing on behalf of my employer or a legal entity and I have the
+ actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field | Entry |
+|------------------------------- | -------------------- |
+| Name | Alan Chan |
+| Company name (if applicable) | |
+| Title or role (if applicable) | |
+| Date | 2020-03-15 |
+| GitHub username | pinealan |
+| Website (optional) | http://pinealan.xyz |
diff --git a/.github/contributors/sloev.md b/.github/contributors/sloev.md
new file mode 100644
index 000000000..d151d4606
--- /dev/null
+++ b/.github/contributors/sloev.md
@@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+ * you hereby assign to us joint ownership, and to the extent that such
+ assignment is or becomes invalid, ineffective or unenforceable, you hereby
+ grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+ royalty-free, unrestricted license to exercise all rights under those
+ copyrights. This includes, at our option, the right to sublicense these same
+ rights to third parties through multiple levels of sublicensees or other
+ licensing arrangements;
+
+ * you agree that each of us can do all things in relation to your
+ contribution as if each of us were the sole owners, and if one of us makes
+ a derivative work of your contribution, the one who makes the derivative
+ work (or has it made will be the sole owner of that derivative work;
+
+ * you agree that you will not assert any moral rights in your contribution
+ against us, our licensees or transferees;
+
+ * you agree that we may register a copyright in your contribution and
+ exercise all ownership rights associated with it; and
+
+ * you agree that neither of us has any duty to consult with, obtain the
+ consent of, pay or render an accounting to the other for any use or
+ distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+ * make, have made, use, sell, offer to sell, import, and otherwise transfer
+ your contribution in whole or in part, alone or in combination with or
+ included in any product, work or materials arising out of the project to
+ which your contribution was submitted, and
+
+ * at our option, to sublicense these same rights to third parties through
+ multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+ * Each contribution that you submit is and shall be an original work of
+ authorship and you can legally grant the rights set out in this SCA;
+
+ * to the best of your knowledge, each contribution will not violate any
+ third party's copyrights, trademarks, patents, or other intellectual
+ property rights; and
+
+ * each contribution shall be in compliance with U.S. export control laws and
+ other applicable export and import laws. You agree to notify us if you
+ become aware of any circumstance which would make any of the foregoing
+ representations inaccurate in any respect. We may publicly disclose your
+ participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+ * [x] I am signing on behalf of myself as an individual and no other person
+ or entity, including my employer, has or will have rights with respect to my
+ contributions.
+
+ * [ ] I am signing on behalf of my employer or a legal entity and I have the
+ actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field | Entry |
+|------------------------------- | ------------------------ |
+| Name | Johannes Valbjørn |
+| Company name (if applicable) | |
+| Title or role (if applicable) | |
+| Date | 2020-03-13 |
+| GitHub username | sloev |
+| Website (optional) | https://sloev.github.io |
diff --git a/LICENSE b/LICENSE
index 11221f687..87b814ce4 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,6 +1,6 @@
The MIT License (MIT)
-Copyright (C) 2016-2019 ExplosionAI GmbH, 2016 spaCy GmbH, 2015 Matthew Honnibal
+Copyright (C) 2016-2020 ExplosionAI GmbH, 2016 spaCy GmbH, 2015 Matthew Honnibal
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
diff --git a/spacy/about.py b/spacy/about.py
index 365c2adbb..84dc86aa8 100644
--- a/spacy/about.py
+++ b/spacy/about.py
@@ -1,6 +1,6 @@
# fmt: off
__title__ = "spacy"
-__version__ = "2.2.4.dev0"
+__version__ = "2.2.4"
__release__ = True
__download_url__ = "https://github.com/explosion/spacy-models/releases/download"
__compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json"
diff --git a/spacy/tests/util.py b/spacy/tests/util.py
index 52768dd41..a0d6273a9 100644
--- a/spacy/tests/util.py
+++ b/spacy/tests/util.py
@@ -116,8 +116,7 @@ def assert_docs_equal(doc1, doc2):
assert [t.head.i for t in doc1] == [t.head.i for t in doc2]
assert [t.dep for t in doc1] == [t.dep for t in doc2]
- if doc1.is_parsed and doc2.is_parsed:
- assert [s for s in doc1.sents] == [s for s in doc2.sents]
+ assert [t.is_sent_start for t in doc1] == [t.is_sent_start for t in doc2]
assert [t.ent_type for t in doc1] == [t.ent_type for t in doc2]
assert [t.ent_iob for t in doc1] == [t.ent_iob for t in doc2]
diff --git a/spacy/tokens/doc.pyx b/spacy/tokens/doc.pyx
index 0c90929c3..ec0cd66b8 100644
--- a/spacy/tokens/doc.pyx
+++ b/spacy/tokens/doc.pyx
@@ -260,7 +260,7 @@ cdef class Doc:
def is_nered(self):
"""Check if the document has named entities set. Will return True if
*any* of the tokens has a named entity tag set (even if the others are
- unknown values).
+ unknown values), or if the document is empty.
"""
if len(self) == 0:
return True
diff --git a/website/docs/api/cli.md b/website/docs/api/cli.md
index e47695efb..f067ba5a7 100644
--- a/website/docs/api/cli.md
+++ b/website/docs/api/cli.md
@@ -109,9 +109,9 @@ links) and check whether they are compatible with the currently installed
version of spaCy. Should be run after upgrading spaCy via `pip install -U spacy`
to ensure that all installed models are can be used with the new version. The
command is also useful to detect out-of-sync model links resulting from links
-created in different virtual environments. It will a list of models, the
-installed versions, the latest compatible version (if out of date) and the
-commands for updating.
+created in different virtual environments. It will show a list of models and
+their installed versions. If any model is out of date, the latest compatible
+versions and command for updating are shown.
> #### Automated validation
>
@@ -176,7 +176,7 @@ All output files generated by this command are compatible with
## Debug data {#debug-data new="2.2"}
-Analyze, debug and validate your training and development data, get useful
+Analyze, debug, and validate your training and development data. Get useful
stats, and find problems like invalid entity annotations, cyclic dependencies,
low data labels and more.
diff --git a/website/docs/api/doc.md b/website/docs/api/doc.md
index 87b854a8c..ab85c1deb 100644
--- a/website/docs/api/doc.md
+++ b/website/docs/api/doc.md
@@ -657,10 +657,10 @@ The L2 norm of the document's vector representation.
| `user_data` | - | A generic storage area, for user custom data. |
| `lang` 2.1 | int | Language of the document's vocabulary. |
| `lang_` 2.1 | unicode | Language of the document's vocabulary. |
-| `is_tagged` | bool | A flag indicating that the document has been part-of-speech tagged. |
-| `is_parsed` | bool | A flag indicating that the document has been syntactically parsed. |
-| `is_sentenced` | bool | A flag indicating that sentence boundaries have been applied to the document. |
-| `is_nered` 2.1 | bool | A flag indicating that named entities have been set. Will return `True` if _any_ of the tokens has an entity tag set, even if the others are unknown. |
+| `is_tagged` | bool | A flag indicating that the document has been part-of-speech tagged. Returns `True` if the `Doc` is empty. |
+| `is_parsed` | bool | A flag indicating that the document has been syntactically parsed. Returns `True` if the `Doc` is empty. |
+| `is_sentenced` | bool | A flag indicating that sentence boundaries have been applied to the document. Returns `True` if the `Doc` is empty. |
+| `is_nered` 2.1 | bool | A flag indicating that named entities have been set. Will return `True` if the `Doc` is empty, or if _any_ of the tokens has an entity tag set, even if the others are unknown. |
| `sentiment` | float | The document's positivity/negativity score, if available. |
| `user_hooks` | dict | A dictionary that allows customization of the `Doc`'s properties. |
| `user_token_hooks` | dict | A dictionary that allows customization of properties of `Token` children. |
diff --git a/website/docs/usage/adding-languages.md b/website/docs/usage/adding-languages.md
index 4b12c6be1..70411ec0b 100644
--- a/website/docs/usage/adding-languages.md
+++ b/website/docs/usage/adding-languages.md
@@ -622,13 +622,13 @@ categorizer is to use the [`spacy train`](/api/cli#train) command-line utility.
In order to use this, you'll need training and evaluation data in the
[JSON format](/api/annotation#json-input) spaCy expects for training.
-You can now train the model using a corpus for your language annotated with If
-your data is in one of the supported formats, the easiest solution might be to
-use the [`spacy convert`](/api/cli#convert) command-line utility. This supports
-several popular formats, including the IOB format for named entity recognition,
-the JSONL format produced by our annotation tool [Prodigy](https://prodi.gy),
-and the [CoNLL-U](http://universaldependencies.org/docs/format.html) format used
-by the [Universal Dependencies](http://universaldependencies.org/) corpus.
+If your data is in one of the supported formats, the easiest solution might be
+to use the [`spacy convert`](/api/cli#convert) command-line utility. This
+supports several popular formats, including the IOB format for named entity
+recognition, the JSONL format produced by our annotation tool
+[Prodigy](https://prodi.gy), and the
+[CoNLL-U](http://universaldependencies.org/docs/format.html) format used by the
+[Universal Dependencies](http://universaldependencies.org/) corpus.
One thing to keep in mind is that spaCy expects to train its models from **whole
documents**, not just single sentences. If your corpus only contains single
diff --git a/website/docs/usage/rule-based-matching.md b/website/docs/usage/rule-based-matching.md
index f8866aec1..0ab74034e 100644
--- a/website/docs/usage/rule-based-matching.md
+++ b/website/docs/usage/rule-based-matching.md
@@ -1119,7 +1119,7 @@ entityruler = EntityRuler(nlp)
patterns = [{"label": "TEST", "pattern": str(i)} for i in range(100000)]
other_pipes = [p for p in nlp.pipe_names if p != "tagger"]
-with nlp.disable_pipes(*disable_pipes):
+with nlp.disable_pipes(*other_pipes):
entityruler.add_patterns(patterns)
```
diff --git a/website/docs/usage/saving-loading.md b/website/docs/usage/saving-loading.md
index 70983198f..8e2c30d82 100644
--- a/website/docs/usage/saving-loading.md
+++ b/website/docs/usage/saving-loading.md
@@ -94,7 +94,7 @@ docs = list(doc_bin.get_docs(nlp.vocab))
If `store_user_data` is set to `True`, the `Doc.user_data` will be serialized as
well, which includes the values of
-[extension attributes](/processing-pipelines#custom-components-attributes) (if
+[extension attributes](/usage/processing-pipelines#custom-components-attributes) (if
they're serializable with msgpack).
diff --git a/website/meta/universe.json b/website/meta/universe.json
index 0ff622521..91361e234 100644
--- a/website/meta/universe.json
+++ b/website/meta/universe.json
@@ -1965,6 +1965,41 @@
},
"category": ["pipeline"],
"tags": ["phrase extraction", "ner", "summarization", "graph algorithms", "textrank"]
+ },
+ {
+ "id": "spacy_syllables",
+ "title": "Spacy Syllables",
+ "slogan": "Multilingual syllable annotations",
+ "description": "Spacy Syllables is a pipeline component that adds multilingual syllable annotations to Tokens. It uses Pyphen under the hood and has support for a long list of languages.",
+ "github": "sloev/spacy-syllables",
+ "pip": "spacy_syllables",
+ "code_example": [
+ "import spacy",
+ "from spacy_syllables import SpacySyllables",
+ "",
+ "nlp = spacy.load('en_core_web_sm')",
+ "syllables = SpacySyllables(nlp)",
+ "nlp.add_pipe(syllables, after='tagger')",
+ "",
+ "doc = nlp('terribly long')",
+ "",
+ "data = [",
+ " (token.text, token._.syllables, token._.syllables_count)",
+ " for token in doc",
+ "]",
+ "",
+ "assert data == [",
+ " ('terribly', ['ter', 'ri', 'bly'], 3),",
+ " ('long', ['long'], 1)",
+ "]"
+ ],
+ "thumb": "https://raw.githubusercontent.com/sloev/spacy-syllables/master/logo.png",
+ "author": "Johannes Valbjørn",
+ "author_links": {
+ "github": "sloev"
+ },
+ "category": ["pipeline"],
+ "tags": ["syllables", "multilingual"]
}
],