Commit Graph

2935 Commits

Author SHA1 Message Date
Daniël de Kok
3937abd2e7 Merge remote-tracking branch 'upstream/v4' into store-activations 2022-08-30 10:11:04 +02:00
Adriane Boyd
2a558a7cdc
Switch to mecab-ko as default Korean tokenizer (#11294)
* Switch to mecab-ko as default Korean tokenizer

Switch to the (confusingly-named) mecab-ko python module for default Korean
tokenization.

Maintain the previous `natto-py` tokenizer as
`spacy.KoreanNattoTokenizer.v1`.

* Temporarily run tests with mecab-ko tokenizer

* Fix types

* Fix duplicate test names

* Update requirements test

* Revert "Temporarily run tests with mecab-ko tokenizer"

This reverts commit d2083e7044.

* Add mecab_args setting, fix pickle for KoreanNattoTokenizer

* Fix length check

* Update docs

* Formatting

* Update natto-py error message

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-08-26 10:11:18 +02:00
Adriane Boyd
740c33fe58 Merge remote-tracking branch 'upstream/develop' into chore/update-v4-from-develop 2022-08-24 20:43:07 +02:00
Adriane Boyd
81874265e9 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5-1 2022-08-24 12:47:42 +02:00
Adriane Boyd
c44d243f25 Merge remote-tracking branch 'upstream/master' into chore/update-v4-from-master 2022-08-24 07:15:41 +02:00
Tobius Saul
c09d2fa25b
luganda language extension (#10847)
* luganda language extension

* __init__.py changes

* New enhancements

* Lexical attribute changed

* punctuaction and sentence additions

* Remove comment header

* Fix typos, reformat

* reformated version

* Add tokenizer test

* Remove contractions from stop words

* Format

* Add Luganda to website

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-23 13:09:36 +02:00
Tal Zussman
7e75327893
Fix menu order in linguistic-features.md (#11364)
Swap 'Vectors & Similarity' and 'Mappings & Exceptions' in menu to match order in body
2022-08-23 14:40:38 +09:00
Adriane Boyd
bb0e178878
Make Span/Doc.ents more consistent for ent_kb_id and ent_id (#11328)
* Map `Span.id` to `Token.ent_id` in all cases when setting `Doc.ents`
* Reset `Token.ent_id` and `Token.ent_kb_id` when setting `Doc.ents`
* Make `Span.ent_id` an alias of `Span.id` rather than a read-only view
of the root token's `ent_id` annotation
2022-08-22 20:28:57 +02:00
Adriane Boyd
04c6e5cb95
Improve floret vectors display in pipeline docs (#11343) 2022-08-22 11:28:13 +02:00
Adriane Boyd
5fa8f4faca
Switch ru and uk lemmatizers to pymorphy3 (#11345)
* Switch ru and uk lemmatizers to pymorphy3

* Switch to pymorphy3 in tests
2022-08-22 11:27:14 +02:00
Adriane Boyd
09b3118b26
Add uk pipelines to website (#11332) 2022-08-18 14:04:57 +02:00
Peter Baumgartner
db7b9938a4
Docs: displaCy documentation - data types, parse_{deps,ents,spans}, spans example (#10950)
* add in spans example and parse references

* rm autoformatter

* rm extra ents copy

* TypedDict draft

* type fixes

* restore non-documentation files

* docs update

* fix spans example

* fix hyperlinks

* add parse example

* example fix + argument fix

* fix api arg in docs

* fix bad variable replacement

* fix spacing in style

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* fix spacing on table

* fix spacing on table

* rm temp files

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-08-16 11:23:34 -04:00
Sofie Van Landeghem
5d54c0e32a
Rename modules for consistency (#11286)
* rename Python module to entity_ruler

* rename Python module to attribute_ruler
2022-08-10 11:44:05 +02:00
stefawolf
23749cfc91
adding spans to doc_annotation in Example.to_dict (#11261)
* adding spans to doc_annotation in Example.to_dict

* to_dict compatible with from_dict: tuples instead of spans

* use strings for label and kb_id

* Simplify test

* Update data formats docs

Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-05 12:26:38 +02:00
Jules Belveze
cd09614ab2
chore: add 'concepCy' to spacy universe (#11255)
* chore: add 'concepCy' to spacy universe

* docs: add 'slogan' to concepCy
2022-08-04 15:42:38 +09:00
Lj Miranda
d993df41e5
Update docs for pipeline initialize() methods (#11221)
* Update documentation for dependency parser

* Update documentation for trainable_lemmatizer

* Update documentation for entity_linker

* Update documentation for ner

* Update documentation for morphologizer

* Update documentation for senter

* Update documentation for spancat

* Update documentation for tagger

* Update documentation for textcat

* Update documentation for tok2vec

* Run prettier on edited files

* Apply similar changes in transformer docs

* Remove need to say annotated example explicitly

I removed the need to say "Must contain at least one annotated Example"
because it's often a given that Examples will contain some gold-standard
annotation.

* Run prettier on transformer docs
2022-08-03 16:53:02 +02:00
Adriane Boyd
d0578c2ede
Add scorer to textcat API docs config settings (#11263) 2022-08-03 16:41:20 +02:00
Daniël de Kok
288d27e17e Merge remote-tracking branch 'upstream/v4' into store-activations 2022-08-01 09:30:04 +02:00
Daniël de Kok
1ff683a50b Merge remote-tracking branch 'upstream/master' into merge-master-v4-20220728 2022-07-28 13:53:59 +02:00
ninjalu
95a1b8aca6
add additional REL_OP (#10371)
* add additional  REL_OP

* change to condition and new rel_op symbols

* add operators to docs

* add the anchor while we're in here

* add tests

Co-authored-by: Peter Baumgartner <5107405+pmbaumgartner@users.noreply.github.com>
2022-07-27 13:16:44 +02:00
Paul O'Leary McCann
1c12812d1a
Replace link to old label (#11188) 2022-07-25 16:39:34 +09:00
Adriane Boyd
7a99fe3c65
Move sent-patterns to correct section of universe.json (#11192) 2022-07-25 09:14:50 +02:00
0xpeIpeI
93960dc4b5
[universe project] create English interpretation project (#11184)
* [add] my universe  project setting

* [modify] A few adjustments

* [Modify] change package description
2022-07-24 19:01:04 +09:00
Dan Radenkovic
a5aa3a818f
fix docs (#11123) 2022-07-24 17:16:36 +09:00
Lucas Terriel
7ff52c02a1
Update meta for spacyfishing in spaCy Universe (#11185)
* add new logo for spacyfishing to update spacy universe

* change logo location
2022-07-24 17:10:29 +09:00
Maarten Grootendorst
1caa2d1d16
Added BERTopic to Spacy Universe (#11159)
* Added BERTopic to Spacy Universe

* Fix no render of visualization
2022-07-19 19:37:18 +09:00
Madeesh Kannan
ba18d2913d
Morphology/Morphologizer optimizations and refactoring (#11024)
* `Morphology`: Refactor to use C types, reduce allocations, remove unused code

* `Morphologzier`: Avoid unnecessary sorting of morpho features

* `Morphologizer`: Remove execessive reallocations of labels, improve hash lookups of labels, coerce `numpy` numeric types to native ints
Update docs

* Remove unused method

* Replace `unique_ptr` usage with `shared_ptr`

* Add type annotations to internal Python methods, rename `hash` variable, fix typos

* Add comment to clarify implementation detail

* Fix return type

* `Morphology`: Stop early when splitting fields and values
2022-07-15 11:14:08 +02:00
Adriane Boyd
2235e3520c
Update binder version in docs (#11124) 2022-07-12 15:20:33 +02:00
Adriane Boyd
11f859c132
Docs for v3.4 (#11057)
* Add draft of v3.4 usage

* Add Croatian models

* Add Matcher min/max

* Update release notes

* Minor edits

* Add updates, tables

* Update pydantic/mypy versions

* Update version in README

* Fix sidebar
2022-07-11 15:36:31 +02:00
Adriane Boyd
3701039c1f
Tweak build jobs setting, update install docs (#11077)
* Restrict SPACY_NUM_BUILD_JOBS to only override if set

* Update install docs
2022-07-08 19:21:17 +02:00
Richard Hudson
dc38a0f079
Change demo URL (#11102) 2022-07-08 19:19:48 +02:00
Adriane Boyd
be9e17c0e4
Add docs for compiling with build constraints (#11081) 2022-07-08 11:45:56 +02:00
Nipun Sadvilkar
bb3e11b9a1
Github Action for spaCy universe project alert (#11090) 2022-07-07 17:50:30 +05:30
Kenneth Enevoldsen
7b220afc29
Added asent to spacy universe (#11078)
* Added asent to spacy universe

* Update addition of asent following correction
2022-07-07 13:25:25 +09:00
Schero1994
c7c3fb1d0c
Merge pull request #11074 from Schero1994/feature/remove
Batch #2 | spaCy universe cleanup
2022-07-06 10:39:04 +02:00
Raphael Mitsch
e9eb59699f
NEL confidence threshold (#11016)
* Add base for NEL abstention threshold mechanism.

* Add abstention threshold to entity linker. Add test.

* Fix entity linking tests.

* Changed abstention default threshold from 0 to None.

* Fix default values for abstention thresholds.

* Fix mypy errors.

* Replace assertion with raise of proper error code.

* Simplify threshold check. Remove thresholding from EntityLinker_v1.

* Rename test.

* Update spacy/pipeline/entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/pipeline/entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Make E1043 configurable.

* Update docs.

* Rephrase description in docs. Adjusting error code message.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-07-04 17:05:21 +02:00
schaeran
b3165db41b remove universe object: spacy-langdetect 2022-07-04 16:07:18 +02:00
schaeran
4e8a5994df remove universe object: NLPre 2022-07-04 16:06:58 +02:00
schaeran
0e4a835468 remove universe object: num_fh 2022-07-04 16:06:38 +02:00
schaeran
5000a08a20 remove universe object: adam_qas 2022-07-04 16:06:20 +02:00
schaeran
60a35a2bb2 remove universe object: spacy_kenlm 2022-07-04 16:06:02 +02:00
schaeran
224f30c563 remove universe object: spacy-raspberry 2022-07-04 16:05:34 +02:00
schaeran
a9062ebf17 remove universe object: spacy-lookup 2022-07-04 16:05:11 +02:00
schaeran
9b823fc9e9 remove universe object: NeuroNER 2022-07-04 16:04:50 +02:00
schaeran
b94bcaa62f remove universe object: spacy-vis 2022-07-04 16:04:29 +02:00
schaeran
880e7db44e remove universe object: spacy_grammar 2022-07-04 16:04:06 +02:00
schaeran
6c036d1e25 remove universe object: spacy_hunspell 2022-07-04 16:03:30 +02:00
Paul O'Leary McCann
e8fdbfc65e Minor fix in Lemmatizer docs 2022-07-01 14:28:03 +09:00
Adriane Boyd
3bc1fe0a78
Update cupy extras (#11055)
* Add cuda116 and cuda117 extras

* Revert "remove `cuda116` extra from install widget (#11012)"

This reverts commit e7b498fb1f.

* Add cuda117 to quickstart
2022-06-30 11:24:37 +02:00
Shen Qin
be00db6645
Addition of min_max quantifier in matcher {n,m} (#10981)
* Min_max_operators
1. Modified API and Usage for spaCy website to include min_max operator
2. Modified matcher.pyx to include min_max function {n,m} and its variants
3. Modified schemas.py to include min_max validation error
4. Added test cases to test_matcher_api.py, test_matcher_logic.py and test_pattern_validation.py

* attempt to fix mypy/pydantic compat issue

* formatting

* Update spacy/tests/matcher/test_pattern_validation.py

Co-authored-by: Source-Shen <82353723+Source-Shen@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-30 11:01:58 +02:00