Commit Graph

15522 Commits

Author SHA1 Message Date
Peter Baumgartner
e7b498fb1f remove cuda116 extra from install widget (#11012) 2022-06-23 08:15:45 +02:00
jademlc
a2840ecd61 Update serialization methods code block (#11004)
* Update serialization methods code block

* Update website/docs/usage/saving-loading.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-22 20:45:53 +02:00
Sofie Van Landeghem
e2652f0fc0
the 'new' indicator wants a 'number' (#10997) 2022-06-21 22:00:07 +02:00
Lucaterre
706fa6094a correct typo in universe.json for 'code_example' key : pipe name 'entityfishing' 2022-06-21 21:13:22 +02:00
Lucaterre
a60036afd8 updated spacy universe for spacyfishing 2022-06-21 21:13:11 +02:00
Victoria
4520caed94 Update linguistic-features.md (#10993)
Change link for downloading fasttext word vectors
2022-06-21 15:04:33 +09:00
Gor Arakelyan
1aad20bfb0 Add "Aim-spaCy" to spaCy Universe (#10943)
* Add Aim-spaCy to spaCy universe

* Update Aim thumbnail

* Fix author links

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-06-10 18:33:54 +09:00
Paul O'Leary McCann
7c6aa72c0a Add note about multiple patterns (#10826)
* Add note about multiple patterns

* Move note to the top of method docs

* Remove EntityRuler note
2022-06-08 16:24:56 +02:00
Sofie Van Landeghem
c360953edd Fix version in SpanRuler docs (#10925)
* SpanRuler is new since 3.3.1

* update SpanRuler version since 3.3.1
2022-06-08 16:13:24 +02:00
Adriane Boyd
cc0de154de Set version to v3.3.1 (#10901) 2022-06-07 19:04:21 +02:00
Madeesh Kannan
d0899b9fbd Avoid pickling Doc inputs passed to Language.pipe() (#10864)
* `Language.pipe()`: Serialize `Doc` objects to bytes when using multiprocessing to avoid pickling overhead

* `Doc.to_dict()`: Serialize `_context` attribute (keeping in line with `(un)pickle_doc()`

* Correct type annotations

* Fix typo

* `Doc`: Do not serialize `_context`

* `Language.pipe`: Send context objects to child processes, Simplify `as_tuples` handling

* Fix type annotation

* `Language.pipe`: Simplify `as_tuple` multiprocessor handling

* Cleanup code, fix typos

* MyPy fixes

* Move doc preparation function into `_multiprocessing_pipe`
Whitespace changes

* Remove superfluous comma

* Rename `prepare_doc` to `prepare_input`

* Update spacy/errors.py

* Undo renaming for error

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-07 19:04:21 +02:00
Adriane Boyd
00e9bfdf3d Extend typing_extensions to <4.2.0 (#10899) 2022-06-07 19:04:21 +02:00
single-fingal
cee9437f06 Fix: De/Serialize SpanGroups including the SpanGroup keys (#10707)
* fix: De/Serialize `SpanGroups` including the SpanGroup keys

This prevents the loss of `SpanGroup`s that have the same .name as other `SpanGroup`s within the same `SpanGroups` object (upon de/serialization of the `SpanGroups`).

Fixes #10685

* Maintain backwards compatibility for serialized `SpanGroups`

(serialized as: a list of `SpanGroup`s, or b'')

* Add tests for `SpanGroups` deserialization backwards-compatibility

* Move a `SpanGroups` de/serialization test (test_issue10685)
  to tests/serialize/test_serialize_spangroups.py

* Output a warning if deserializing a `SpanGroups` with duplicate .name-d `SpanGroup`s

* Minor refactor

* `SpanGroups.from_bytes` handles only `list` and `dict` types with
`dict` as the expected default
* For lists, keep first rather than last value encountered
* Update error message
* Rename and update tests

* Update to preserve list serialization of SpanGroups

To avoid breaking compatibility of serialized `Doc` and `DocBin` with
earlier versions of spacy v3, revert back to a list-only serialization,
but update the names just for serialization so that the SpanGroups keys
override the SpanGroup names.

* Preserve object identity and current key overwrite

* Preserve SpanGroup object identity
* Preserve last rather than first span group from SpanGroup list
  format without SpanGroups keys

* Update inline comments

* Fix types

* Add type info for SpanGroup.copy

* Deserialize `SpanGroup`s as copies

when a single SpanGroup is the value for more than 1 `SpanGroups` key.
This is because we serialize `SpanGroups` as dicts (to maintain backward-
and forward-compatibility) and we can't assume `SpanGroup`s with the same
bytes/serialization were the same (identical) object, pre-serialization.

* Update spacy/tokens/_dict_proxies.py

* Add more SpanGroups serialization tests

Test that serialized SpanGroups maintain their Span order

* small clarification on older spaCy version

* Update spacy/tests/serialize/test_serialize_span_groups.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-06-07 19:04:21 +02:00
Adriane Boyd
78744362d1 Fix schemas import in Doc (#10898) 2022-06-07 19:04:21 +02:00
Raphael Mitsch
eed3f3ed66 Add Doc.from_json() (#10688)
* Implement Doc.from_json: rough draft.

* Implement Doc.from_json: first draft with tests.

* Implement Doc.from_json: added documentation on website for Doc.to_json(), Doc.from_json().

* Implement Doc.from_json: formatting changes.

* Implement Doc.to_json(): reverting unrelated formatting changes.

* Implement Doc.to_json(): fixing entity and span conversion. Moving fixture and doc <-> json conversion tests into single file.

* Implement Doc.from_json(): replaced entity/span converters with doc.char_span() calls.

* Implement Doc.from_json(): handling sentence boundaries in spans.

* Implementing Doc.from_json(): added parser-free sentence boundaries transfer.

* Implementing Doc.from_json(): added parser-free sentence boundaries transfer.

* Implementing Doc.from_json(): incorporated various PR feedback.

* Renaming fixture for document without dependencies.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implementing Doc.from_json(): using two sent_starts instead of one.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implementing Doc.from_json(): doc_without_dependency_parser() -> doc_without_deps.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implementing Doc.from_json(): incorporating various PR feedback. Rebased on latest master.

* Implementing Doc.from_json(): refactored Doc.from_json() to work with annotation IDs instead of their string representations.

* Implement Doc.from_json(): reverting unwanted formatting/rebasing changes.

* Implement Doc.from_json(): added check for char_span() calculation for entities.

* Update spacy/tokens/doc.pyx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): minor refactoring, additional check for token attribute consistency with corresponding test.

* Implement Doc.from_json(): removed redundancy in annotation type key naming.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): Simplifying setting annotation values.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement doc.from_json(): renaming annot_types to token_attrs.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): adjustments for renaming of annot_types to token_attrs.

* Implement Doc.from_json(): removing default categories.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): simplifying lexeme initialization.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): simplifying lexeme initialization.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): refactoring to only have keys for present annotations.

* Implement Doc.from_json(): fix check for tokens' HEAD attributes.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): refactoring Doc.from_json().

* Implement Doc.from_json(): fixing span_group retrieval.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): fixing span retrieval.

* Implement Doc.from_json(): added schema for Doc JSON format. Minor refactoring in Doc.from_json().

* Implement Doc.from_json(): added comment regarding Token and Span extension support.

* Implement Doc.from_json(): renaming inconsistent_props to partial_attrs..

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): adjusting error message.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): extending E1038 message.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): added params to E1038 raises.

* Implement Doc.from_json(): combined attribute collection with partial attributes check.

* Implement Doc.from_json(): added optional schema validation.

* Implement Doc.from_json(): fixed optional fields in schema, tests.

* Implement Doc.from_json(): removed redundant None check for DEP.

* Implement Doc.from_json(): added passing of schema validatoin message to E1037..

* Implement Doc.from_json(): removing redundant error E1040.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): changing message for E1037.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): adjusted website docs and docstring of Doc.from_json().

* Update spacy/tests/doc/test_json_doc_conversion.py

* Implement Doc.from_json(): docstring update.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): docstring update.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): website docs update.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): docstring formatting.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): docstring formatting.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): fixing Doc reference in website docs.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): reformatted website/docs/api/doc.md.

* Implement Doc.from_json(): bumped IDs of new errors to avoid merge conflicts.

* Implement Doc.from_json(): fixing bug in tests.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): fix setting of sentence starts for docs without DEP.

* Implement Doc.from_json(): add check for valid char spans when manually setting sentence boundaries. Refactor sentence boundary setting slightly. Move error message for lack of support for partial token annotations to errors.py.

* Implement Doc.from_json(): simplify token sentence start manipulation.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Combine related error messages

* Update spacy/tests/doc/test_json_doc_conversion.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-07 19:04:21 +02:00
Adriane Boyd
3d2a6dbf33 Add SpanRuler component (#9880)
* Add SpanRuler component

Add a `SpanRuler` component similar to `EntityRuler` that saves a list
of matched spans to `Doc.spans[spans_key]`. The matches from the token
and phrase matchers are deduplicated and sorted before assignment but
are not otherwise filtered.

* Update spacy/pipeline/span_ruler.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix cast

* Add self.key property

* Use number of patterns as length

* Remove patterns kwarg from init

* Update spacy/tests/pipeline/test_span_ruler.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Add options for spans filter and setting to ents

* Add `spans_filter` option as a registered function'
* Make `spans_key` optional and if `None`, set to `doc.ents` instead of
`doc.spans[spans_key]`.

* Update and generalize tests

* Add test for setting doc.ents, fix key property type

* Fix typing

* Allow independent doc.spans and doc.ents

* If `spans_key` is set, set `doc.spans` with `spans_filter`.
* If `annotate_ents` is set, set `doc.ents` with `ents_fitler`.
  * Use `util.filter_spans` by default as `ents_filter`.
  * Use a custom warning if the filter does not work for `doc.ents`.

* Enable use of SpanC.id in Span

* Support id in SpanRuler as Span.id

* Update types

* `id` can only be provided as string (already by `PatternType`
definition)

* Update all uses of Span.id/ent_id in Doc

* Rename Span id kwarg to span_id

* Update types and docs

* Add ents filter to mimic EntityRuler overwrite_ents

* Refactor `ents_filter` to take `entities, spans` args for more
  filtering options
* Give registered filters more descriptive names
* Allow registered `filter_spans` filter
  (`spacy.first_longest_spans_filter.v1`) to take any number of
  `Iterable[Span]` objects as args so it can be used for spans filter
  or ents filter

* Implement future entity ruler as span ruler

Implement a compatible `entity_ruler` as `future_entity_ruler` using
`SpanRuler` as the underlying component:
* Add `sort_key` and `sort_reverse` to allow the sorting behavior to be
  customized. (Necessary for the same sorting/filtering as in
  `EntityRuler`.)
* Implement `overwrite_overlapping_ents_filter` and
  `preserve_existing_ents_filter` to support
  `EntityRuler.overwrite_ents` settings.
* Add `remove_by_id` to support `EntityRuler.remove` functionality.
* Refactor `entity_ruler` tests to parametrize all tests to test both
  `entity_ruler` and `future_entity_ruler`
* Implement `SpanRuler.token_patterns` and `SpanRuler.phrase_patterns`
  properties.

Additional changes:

* Move all config settings to top-level attributes to avoid duplicating
  settings in the config vs. `span_ruler/cfg`. (Also avoids a lot of
  casting.)

* Format

* Fix filter make method name

* Refactor to use same error for removing by label or ID

* Also provide existing spans to spans filter

* Support ids property

* Remove token_patterns and phrase_patterns

* Update docstrings

* Add span ruler docs

* Fix types

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Move sorting into filters

* Check for all tokens in seen tokens in entity ruler filters

* Remove registered sort key

* Set Token.ent_id in a backwards-compatible way in Doc.set_ents

* Remove sort options from API docs

* Update docstrings

* Rename entity ruler filters

* Fix and parameterize scoring

* Add id to Span API docs

* Fix typo in API docs

* Include explicit labeled=True for scorer

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-06-07 19:04:21 +02:00
Sofie Van Landeghem
73418c88c5 fix typo + CI slow testing (#10835)
* fix typo

* one more typo
2022-06-07 19:04:21 +02:00
Madeesh Kannan
d4f1e780b9 Add test_slow_gpu explosion-bot command (#10858) 2022-06-07 19:04:21 +02:00
github-actions[bot]
d2536cfa70 Auto-format code with black (#10857)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-06-07 19:04:21 +02:00
kadarakos
fa1a065b48 Better errors for has_annotation and Matcher (#10830)
* Show input argument instead of None

* catch invalid attr early

* moved error message from code to errors.py

* Update spacy/errors.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/errors.py

* update E153 and E154

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-07 19:04:21 +02:00
Paul O'Leary McCann
cef69b360a Fix Entity Linker with tokenization mismatches (fix #9575) (#10457)
* Add failing test

* Partial fix for issue

This kind of works. The issue with token length mismatches is gone. The
problem is that when you get empty lists of encodings to compare, it
fails because the sizes are not the same, even though they're both zero:
(0, 3) vs (0,). Not sure why that happens...

* Short circuit on empties

* Remove spurious check

The check here isn't needed now the the short circuit is fixed.

* Update spacy/tests/pipeline/test_entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Use "eg", not "example"

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-06-07 19:04:21 +02:00
Lj Miranda
e9eb601005 Add spacy-span-analyzer to debug data (#10668)
* Rename to spans_key for consistency

* Implement spans length in debug data

* Implement how span bounds and spans are obtained

In this commit, I implemented how span boundaries (the tokens) around a
given span and spans are obtained. I've put them in the compile_gold()
function so that it's accessible later on. I will do the actual
computation of the span and boundary distinctiveness in the main
function above.

* Compute for p_spans and p_bounds

* Add computation for SD and BD

* Fix mypy issues

* Add weighted average computation

* Fix compile_gold conditional logic

* Add test for frequency distribution computation

* Add tests for kl-divergence computation

* Fix weighted average computation

* Make tables more compact by rounding them

* Add more descriptive checks for spans

* Modularize span computation methods

In this commit, I added the _get_span_characteristics and
_print_span_characteristics functions so that they can be reusable
anywhere.

* Remove unnecessary arguments and make fxs more compact

* Update a few parameter arguments

* Add tests for print_span and get_span methods

* Update API to talk about span characteristics in brief

* Add better reporting of spans_length

* Add test for span length reporting

* Update formatting of span length report

Removed '' to indicate that it's not a string, then
sort the n-grams by their length, not by their frequency.

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Show all frequency distribution when -V

In this commit, I displayed the full frequency distribution of the
span lengths when --verbose is passed. To make things simpler, I
rewrote some of the formatter functions so that I can call them
whenever.

Another notable change is that instead of showing percentages as
Integers, I showed them as floats (max 2-decimal places). I did this
because it looks weird when it displays (0%).

* Update logic on how total is computed

The way the 90% thresholding is computed now is that we keep
adding the percentages until we reach >= 90%. I also updated the wording
and used the term "At least" to denote that >= 90% of your spans have
these distributions.

* Fix display when showing the threshold percentage

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Add better phrasing for span information

* Update spacy/cli/debug_data.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Add minor edits for whitespaces etc.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-07 19:04:21 +02:00
Madeesh Kannan
fc09badd6b Disable weekly GPU/slow tests on forks (#10831) 2022-06-07 19:04:21 +02:00
Paul O'Leary McCann
eb3a7a6653 Add glossary entry for root (#10821)
* Add glossary entry for root

There was already one but it was lower case, maybe that should be
removed?

* remove lowercase root

On reflection, that was probably just a mistake.

* Add lowercase root back

It's harmless to leave it there.
2022-06-07 19:04:21 +02:00
Raphael Mitsch
2d12342abd Fuzz tokenizer.explain: draft for fuzzy tests. (#10771)
* Fuzz tokenizer.explain: draft for fuzzy tests.

* Fuzz tokenizer.explain: xignoring tokenizer.explain() tests. Removed deadline modification. Removed LANGUAGES_WITHOUT_TOKENIZERS.

* Fuzz tokenizer.explain: changed tokenizer initialization to avoid failus in Azure runs.

* Fuzz tokenizer.explain: type hint for tokenizer in test.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-07 19:04:21 +02:00
github-actions[bot]
d47f428102 Auto-format code with black (#10795)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-06-07 19:04:21 +02:00
kadarakos
6b36809055 bugfix parser labels (#10797) 2022-06-07 19:04:21 +02:00
Patrick Düggelin
9a10264dac Fix PhraseMatcher remove overlapping terms (#10734)
* Add regression test for issue 10643

* Improve overlapping terms testcase

* Fix removing overlapping terms in phrase matcher (#10643)
2022-06-07 19:04:21 +02:00
Raphael Mitsch
df9304afac Ignore overrides for pipe names in config argument (#10779)
* Pipe name override in config: added check with warning, added removal of name override from config, extended tests.

* Pipoe name override in config: added pytest UserWarning.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-07 19:04:21 +02:00
Adriane Boyd
c73fcc2908 Override SpanGroups.setdefault to provide default SpanGroup (#10772)
* Fix mistake in SpanGroup API docs

* Restrict SpanGroups.setdefault to SpanGroup only

* Refactor to support default span iterable
2022-06-07 19:04:21 +02:00
Raphael Mitsch
307e66215e Allow assets to be optional in spacy project (#10714)
* Allow assets to be optional in spacy project: draft for optional flag/download_all options.

* Allow assets to be optional in spacy project: added OPTIONAL_DEFAULT reflecting default asset optionality.

* Allow assets to be optional in spacy project: renamed --all to --extra.

* Allow assets to be optional in spacy project: included optional flag in project config test.

* Allow assets to be optional in spacy project: added documentation.

* Allow assets to be optional in spacy project: fixing deprecated --all reference.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Allow assets to be optional in spacy project: fixed project_assets() docstring.

* Allow assets to be optional in spacy project: adjusted wording in justification of optional assets.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: switched to  as keyword in project.yml. Updated docs.

* Allow assets to be optional in spacy project: updated comment.

* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in output.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in docstring..

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in test..

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in test.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: renamed OPTIONAL_DEFAULT to EXTRA_DEFAULT.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-06-07 19:04:21 +02:00
Sofie Van Landeghem
50a1745235 Add test for old architectures (#10751)
* add v1 and v2 tests for tok2vec architectures

* textcat architectures are not "layers"

* test older textcat architectures

* test older parser architecture
2022-06-07 19:04:21 +02:00
Luca Dorigo
b0becebf48 Fix StringStore.__getitem__ return type depending on parameter types (#10741)
* Fix StringStore.__getitem__ return type depending on parameter types

Small fix using  `@overload` so that `StringStore.__getitem__` returns an `int` when given a `str` or `bytes` and a `str` when given an `int`.

* Update spacy/strings.pyi

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-07 19:04:21 +02:00
Raphael Mitsch
ece27a0424 Refactor error messages to remove hardcoded strings (#10729)
* Use custom error msg instead of hardcoded string: replaced remaining hardcoded error message strings.

* Use custom error msg instead of hardcoded string: fixing faulty Errors import.
2022-06-07 19:04:21 +02:00
Madeesh Kannan
1a4a8f14f7 Remove vestigial debug print statement in walk_head_nodes (#10718)
* `graph`: Remove vestigial debug print statement in `walk_head_nodes`

* Revert whitespace changes

* Remove more debug print statements
2022-06-07 19:04:21 +02:00
Ilya Nikitin
1ff0c695d4 token.md: Fix documentation of Token.ancestors (#10917) 2022-06-06 14:37:15 +09:00
vincent d warmerdam
21b2c3ebdb Add spacy-report to universe (#10910)
* Add spacy-report to universe

* Remove extra comma

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-06-05 18:58:46 +09:00
richardpaulhudson
c7f798ac23 Update Holmes entry in universe.json 2022-05-31 09:09:35 +02:00
Max Tarlov
837c522258 Update documentation for displacy style kwargs (#10841)
* Update docs for displacy style kwargs

Added "span" to the accepted values for the style kwarg in the displacy.serve and displacy.render top-level functions. These styles are new as of SpaCy 3.3, so I added the "new" tag for that option only

* restored alpha ordering
2022-05-30 09:12:23 +02:00
Peter Baumgartner
fcdf2bd08b add doc cleaner to menu (#10862) 2022-05-30 08:52:31 +02:00
Freddy Heppell
9dffd93a0e Fix misspelt keyword in StringStore example 2022-05-30 14:23:38 +09:00
Sofie Van Landeghem
83dd42163a Remove NBSP's across tables in the docs (#10842) 2022-05-25 09:49:11 +02:00
Peter Baumgartner
494df895d3 add floret to static vectors docs (#10833) 2022-05-23 09:16:51 +02:00
kadarakos
e9a4011f77 oov confusion fix (#10828) 2022-05-23 09:16:08 +02:00
Adriane Boyd
3171f2d80c Remove cuda extras for non-linux arm in install widget (#10796)
* Remove cuda extras for non-linux arm platforms in install widget
* Extend cuda versions install widget
* Update GPU install docs to clarify cuda
2022-05-20 09:58:11 +02:00
schaeran
439bfbbfb5 update spaCy Universe: spacytextblob (code example) 2022-05-13 12:10:37 +09:00
Richard Hudson
8b56383bee Add documentation tip about overriding variables (#10780) 2022-05-11 10:17:34 +02:00
Madeesh Kannan
f45ae27d9d training.md: Fix typos (#10775) 2022-05-09 19:45:16 +02:00
Raphael Mitsch
d8a5444e96 Document different ways to create a pipeline (#10762)
* Document different ways to create a pipeline: moved up/slightly modified paragraph on pipeline creation.

* Document different ways to create a pipeline: changed Finnish to Ukrainian in example for language without trained pipeline.

* Document different ways to create a pipeline: added explanation of blank pipeline.

* Document different ways to create a pipeline: exchanged Ukrainian with Yoruba.
2022-05-06 15:42:15 +02:00
Richard Hudson
b2d3ac8222 Updated Coreferee Universe entry 2022-05-06 14:27:24 +02:00