Daniël de Kok
cd6e4fa8f4
Rename activations
...
- "probs" -> "probabilities"
- "guesses" -> "label_ids", except in the edit tree lemmatizer, where
"guesses" -> "tree_ids".
2022-08-31 11:18:40 +02:00
Daniël de Kok
d245da08e1
docs: tag (save_)activations as new in spaCy 4.0
2022-08-30 10:32:18 +02:00
Daniël de Kok
51c87c5f98
Change wording in API docs after store -> save change
2022-08-30 10:24:11 +02:00
Daniël de Kok
2290a04d55
Rename TrainablePipe.store_activations
to save_activations
2022-08-30 10:20:59 +02:00
Daniël de Kok
3937abd2e7
Merge remote-tracking branch 'upstream/v4' into store-activations
2022-08-30 10:11:04 +02:00
Adriane Boyd
2a558a7cdc
Switch to mecab-ko as default Korean tokenizer ( #11294 )
...
* Switch to mecab-ko as default Korean tokenizer
Switch to the (confusingly-named) mecab-ko python module for default Korean
tokenization.
Maintain the previous `natto-py` tokenizer as
`spacy.KoreanNattoTokenizer.v1`.
* Temporarily run tests with mecab-ko tokenizer
* Fix types
* Fix duplicate test names
* Update requirements test
* Revert "Temporarily run tests with mecab-ko tokenizer"
This reverts commit d2083e7044
.
* Add mecab_args setting, fix pickle for KoreanNattoTokenizer
* Fix length check
* Update docs
* Formatting
* Update natto-py error message
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-08-26 10:11:18 +02:00
Adriane Boyd
740c33fe58
Merge remote-tracking branch 'upstream/develop' into chore/update-v4-from-develop
2022-08-24 20:43:07 +02:00
Adriane Boyd
81874265e9
Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5-1
2022-08-24 12:47:42 +02:00
Adriane Boyd
c44d243f25
Merge remote-tracking branch 'upstream/master' into chore/update-v4-from-master
2022-08-24 07:15:41 +02:00
Tobius Saul
c09d2fa25b
luganda language extension ( #10847 )
...
* luganda language extension
* __init__.py changes
* New enhancements
* Lexical attribute changed
* punctuaction and sentence additions
* Remove comment header
* Fix typos, reformat
* reformated version
* Add tokenizer test
* Remove contractions from stop words
* Format
* Add Luganda to website
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-23 13:09:36 +02:00
Tal Zussman
7e75327893
Fix menu order in linguistic-features.md ( #11364 )
...
Swap 'Vectors & Similarity' and 'Mappings & Exceptions' in menu to match order in body
2022-08-23 14:40:38 +09:00
Adriane Boyd
bb0e178878
Make Span/Doc.ents more consistent for ent_kb_id and ent_id ( #11328 )
...
* Map `Span.id` to `Token.ent_id` in all cases when setting `Doc.ents`
* Reset `Token.ent_id` and `Token.ent_kb_id` when setting `Doc.ents`
* Make `Span.ent_id` an alias of `Span.id` rather than a read-only view
of the root token's `ent_id` annotation
2022-08-22 20:28:57 +02:00
Adriane Boyd
04c6e5cb95
Improve floret vectors display in pipeline docs ( #11343 )
2022-08-22 11:28:13 +02:00
Adriane Boyd
5fa8f4faca
Switch ru and uk lemmatizers to pymorphy3 ( #11345 )
...
* Switch ru and uk lemmatizers to pymorphy3
* Switch to pymorphy3 in tests
2022-08-22 11:27:14 +02:00
Adriane Boyd
09b3118b26
Add uk pipelines to website ( #11332 )
2022-08-18 14:04:57 +02:00
Peter Baumgartner
db7b9938a4
Docs: displaCy documentation - data types, parse_{deps,ents,spans}
, spans example ( #10950 )
...
* add in spans example and parse references
* rm autoformatter
* rm extra ents copy
* TypedDict draft
* type fixes
* restore non-documentation files
* docs update
* fix spans example
* fix hyperlinks
* add parse example
* example fix + argument fix
* fix api arg in docs
* fix bad variable replacement
* fix spacing in style
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* fix spacing on table
* fix spacing on table
* rm temp files
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-08-16 11:23:34 -04:00
Sofie Van Landeghem
5d54c0e32a
Rename modules for consistency ( #11286 )
...
* rename Python module to entity_ruler
* rename Python module to attribute_ruler
2022-08-10 11:44:05 +02:00
stefawolf
23749cfc91
adding spans to doc_annotation in Example.to_dict ( #11261 )
...
* adding spans to doc_annotation in Example.to_dict
* to_dict compatible with from_dict: tuples instead of spans
* use strings for label and kb_id
* Simplify test
* Update data formats docs
Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-05 12:26:38 +02:00
Jules Belveze
cd09614ab2
chore: add 'concepCy' to spacy universe ( #11255 )
...
* chore: add 'concepCy' to spacy universe
* docs: add 'slogan' to concepCy
2022-08-04 15:42:38 +09:00
Lj Miranda
d993df41e5
Update docs for pipeline initialize() methods ( #11221 )
...
* Update documentation for dependency parser
* Update documentation for trainable_lemmatizer
* Update documentation for entity_linker
* Update documentation for ner
* Update documentation for morphologizer
* Update documentation for senter
* Update documentation for spancat
* Update documentation for tagger
* Update documentation for textcat
* Update documentation for tok2vec
* Run prettier on edited files
* Apply similar changes in transformer docs
* Remove need to say annotated example explicitly
I removed the need to say "Must contain at least one annotated Example"
because it's often a given that Examples will contain some gold-standard
annotation.
* Run prettier on transformer docs
2022-08-03 16:53:02 +02:00
Adriane Boyd
d0578c2ede
Add scorer to textcat API docs config settings ( #11263 )
2022-08-03 16:41:20 +02:00
Daniël de Kok
288d27e17e
Merge remote-tracking branch 'upstream/v4' into store-activations
2022-08-01 09:30:04 +02:00
Daniël de Kok
1ff683a50b
Merge remote-tracking branch 'upstream/master' into merge-master-v4-20220728
2022-07-28 13:53:59 +02:00
ninjalu
95a1b8aca6
add additional REL_OP ( #10371 )
...
* add additional REL_OP
* change to condition and new rel_op symbols
* add operators to docs
* add the anchor while we're in here
* add tests
Co-authored-by: Peter Baumgartner <5107405+pmbaumgartner@users.noreply.github.com>
2022-07-27 13:16:44 +02:00
Paul O'Leary McCann
1c12812d1a
Replace link to old label ( #11188 )
2022-07-25 16:39:34 +09:00
Adriane Boyd
7a99fe3c65
Move sent-patterns to correct section of universe.json ( #11192 )
2022-07-25 09:14:50 +02:00
0xpeIpeI
93960dc4b5
[universe project] create English interpretation project ( #11184 )
...
* [add] my universe project setting
* [modify] A few adjustments
* [Modify] change package description
2022-07-24 19:01:04 +09:00
Dan Radenkovic
a5aa3a818f
fix docs ( #11123 )
2022-07-24 17:16:36 +09:00
Lucas Terriel
7ff52c02a1
Update meta for spacyfishing in spaCy Universe ( #11185 )
...
* add new logo for spacyfishing to update spacy universe
* change logo location
2022-07-24 17:10:29 +09:00
Maarten Grootendorst
1caa2d1d16
Added BERTopic to Spacy Universe ( #11159 )
...
* Added BERTopic to Spacy Universe
* Fix no render of visualization
2022-07-19 19:37:18 +09:00
Madeesh Kannan
ba18d2913d
Morphology
/Morphologizer
optimizations and refactoring (#11024 )
...
* `Morphology`: Refactor to use C types, reduce allocations, remove unused code
* `Morphologzier`: Avoid unnecessary sorting of morpho features
* `Morphologizer`: Remove execessive reallocations of labels, improve hash lookups of labels, coerce `numpy` numeric types to native ints
Update docs
* Remove unused method
* Replace `unique_ptr` usage with `shared_ptr`
* Add type annotations to internal Python methods, rename `hash` variable, fix typos
* Add comment to clarify implementation detail
* Fix return type
* `Morphology`: Stop early when splitting fields and values
2022-07-15 11:14:08 +02:00
Adriane Boyd
2235e3520c
Update binder version in docs ( #11124 )
2022-07-12 15:20:33 +02:00
Adriane Boyd
11f859c132
Docs for v3.4 ( #11057 )
...
* Add draft of v3.4 usage
* Add Croatian models
* Add Matcher min/max
* Update release notes
* Minor edits
* Add updates, tables
* Update pydantic/mypy versions
* Update version in README
* Fix sidebar
2022-07-11 15:36:31 +02:00
Adriane Boyd
3701039c1f
Tweak build jobs setting, update install docs ( #11077 )
...
* Restrict SPACY_NUM_BUILD_JOBS to only override if set
* Update install docs
2022-07-08 19:21:17 +02:00
Richard Hudson
dc38a0f079
Change demo URL ( #11102 )
2022-07-08 19:19:48 +02:00
Adriane Boyd
be9e17c0e4
Add docs for compiling with build constraints ( #11081 )
2022-07-08 11:45:56 +02:00
Nipun Sadvilkar
bb3e11b9a1
Github Action for spaCy universe project alert ( #11090 )
2022-07-07 17:50:30 +05:30
Kenneth Enevoldsen
7b220afc29
Added asent to spacy universe ( #11078 )
...
* Added asent to spacy universe
* Update addition of asent following correction
2022-07-07 13:25:25 +09:00
Schero1994
c7c3fb1d0c
Merge pull request #11074 from Schero1994/feature/remove
...
Batch #2 | spaCy universe cleanup
2022-07-06 10:39:04 +02:00
Raphael Mitsch
e9eb59699f
NEL confidence threshold ( #11016 )
...
* Add base for NEL abstention threshold mechanism.
* Add abstention threshold to entity linker. Add test.
* Fix entity linking tests.
* Changed abstention default threshold from 0 to None.
* Fix default values for abstention thresholds.
* Fix mypy errors.
* Replace assertion with raise of proper error code.
* Simplify threshold check. Remove thresholding from EntityLinker_v1.
* Rename test.
* Update spacy/pipeline/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/pipeline/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Make E1043 configurable.
* Update docs.
* Rephrase description in docs. Adjusting error code message.
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-07-04 17:05:21 +02:00
schaeran
b3165db41b
remove universe object: spacy-langdetect
2022-07-04 16:07:18 +02:00
schaeran
4e8a5994df
remove universe object: NLPre
2022-07-04 16:06:58 +02:00
schaeran
0e4a835468
remove universe object: num_fh
2022-07-04 16:06:38 +02:00
schaeran
5000a08a20
remove universe object: adam_qas
2022-07-04 16:06:20 +02:00
schaeran
60a35a2bb2
remove universe object: spacy_kenlm
2022-07-04 16:06:02 +02:00
schaeran
224f30c563
remove universe object: spacy-raspberry
2022-07-04 16:05:34 +02:00
schaeran
a9062ebf17
remove universe object: spacy-lookup
2022-07-04 16:05:11 +02:00
schaeran
9b823fc9e9
remove universe object: NeuroNER
2022-07-04 16:04:50 +02:00
schaeran
b94bcaa62f
remove universe object: spacy-vis
2022-07-04 16:04:29 +02:00
schaeran
880e7db44e
remove universe object: spacy_grammar
2022-07-04 16:04:06 +02:00