Commit Graph

15606 Commits

Author SHA1 Message Date
Lj Miranda
9a35b24b48 Implement _allow_extra_label to use _n_labels
To ensure that spancat / spancat_exclusive cannot be resized after
initialization, I inherited the _allow_extra_label() method from
spacy/pipeline/trainable_pipe.pyx and used self._n_labels instead
of len(self.labels) for checking.

I think that changing it locally is a better solution rather than
forcing each class that inherits TrainablePipe to use the self._n_labels
attribute.

Also note that I turned-off black formatting in this block of code
because it reads better without the overhang.
2022-11-18 13:48:18 +08:00
Lj Miranda
c9036a6d79 Include zero_init.v1 for spancat 2022-11-18 13:16:33 +08:00
Lj Miranda
e23034365a Import Suggester from spancat 2022-11-18 12:34:44 +08:00
Lj Miranda
b667ab56a0
Update spacy/pipeline/spancat_exclusive.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-18 12:31:09 +08:00
Lj Miranda
7ac46058a2 Fix init call for exclusive spancat 2022-11-02 13:05:56 +08:00
Lj Miranda
7021dbaff3 Revert documentation link to spancat 2022-11-02 12:43:26 +08:00
Lj Miranda
8548e2c311 Inherit from SpanCat instead of TrainablePipe
This commit changes the inheritance structure of Exclusive_Spancat,
now it's inheriting from SpanCategorizer than TrainablePipe. This
allows me to remove duplicate methods that are already present in
the parent function.
2022-11-02 12:30:41 +08:00
Lj Miranda
bdf2a1d1fe Add _n_labels property to SpanCategorizer
Instead of using len(self.labels) in initialize() I am using a private
property self._n_labels. This achieves implementation parity and allows
me to delete the whole initialize() method for spancat_exclusive (since
it's now the same with spancat).
2022-11-02 12:27:54 +08:00
Lj Miranda
023a1a6c04 Add scorer to docstring 2022-11-02 12:10:49 +08:00
Lj Miranda
60a8df7c5f Merge branch 'add/exclusive-spancat' of github.com:ljvmiranda921/spaCy into add/exclusive-spancat 2022-10-26 11:09:03 +08:00
Lj Miranda
1533a4ef5a Update component versions to v2 2022-10-26 11:08:49 +08:00
Lj Miranda
1b1afd2251
Update spacy/pipeline/spancat_exclusive.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-10-26 11:07:57 +08:00
Sofie Van Landeghem
95c5bfcc78
avoid multiplication with 1.0
Co-authored-by: kadarakos <kadar.akos@gmail.com>
2022-10-03 17:05:55 +02:00
Lj Miranda
2b7eb85e36 Fix mypy errors
However, I ignored line 370 because it opened up a bunch of type errors
that might be trickier to solve and might lead to a more complicated
codebase.
2022-09-05 15:42:34 +08:00
Lj Miranda
dbfb3a7739 Cache the label map 2022-09-05 14:34:49 +08:00
Lj Miranda
2bbab641e9 Use Softmax v2 directly from thinc 2022-09-05 11:28:30 +08:00
Lj Miranda
43bf05275f [ci skip] Small updates 2022-08-25 16:26:03 +08:00
Lj Miranda
b728eaae18
Update spacy/pipeline/spancat_exclusive.py
Co-authored-by: kadarakos <kadar.akos@gmail.com>
2022-08-25 16:08:15 +08:00
Lj Miranda
826c1d3ca3 Use spacy.SpanCategorizer.v1 as default archi 2022-08-25 13:31:36 +08:00
Lj Miranda
d6e56b62b9 [ci skip] Add breakpoint for debugging 2022-08-25 13:23:15 +08:00
Lj Miranda
5452e71b05 [WIP] Update 2022-08-25 13:08:37 +08:00
Lj Miranda
3d07c05cba Add spancat_exclusive to pipeline 2022-08-25 12:40:48 +08:00
Lj Miranda
527a1818e5 Fix all imports 2022-08-25 11:24:37 +08:00
Lj Miranda
1db65b8e78 [wip] Update 2022-08-24 17:54:34 +08:00
Lj Miranda
6f08d83731 Add initial port 2022-08-24 16:47:56 +08:00
Lj Miranda
e7e845b5ed [wip] Update 2022-08-24 11:35:26 +08:00
Lj Miranda
176ef9840e [wip] Update 2022-08-24 11:20:22 +08:00
Edward
5afa98aabf
Support custom attributes for tokens and spans in json conversion (#11125)
* Add token and span custom attributes to to_json()

* Change logic for to_json

* Add functionality to from_json

* Small adjustments

* Move token/span attributes to new dict key

* Fix test

* Fix the same test but much better

* Add backwards compatibility tests and adjust logic

* Add test to check if attributes not set in underscore are not saved in the json

* Add tests for json compatibility

* Adjust test names

* Fix tests and clean up code

* Fix assert json tests

* small adjustment

* adjust naming and code readability

* Adjust naming, added more tests and changed logic

* Fix typo

* Adjust errors, naming, and small test optimization

* Fix byte tests

* Fix bytes tests

* Change naming and json structure

* update schema

* Update spacy/schemas.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/tokens/doc.pyx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/tokens/doc.pyx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/schemas.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update schema for underscore attributes

* Adjust underscore schema

* adjust schema tests

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-23 10:05:02 +02:00
Tal Zussman
7e75327893
Fix menu order in linguistic-features.md (#11364)
Swap 'Vectors & Similarity' and 'Mappings & Exceptions' in menu to match order in body
2022-08-23 14:40:38 +09:00
Sofie Van Landeghem
6e20842370
dev docs: numeric comparators (#11334)
* add section on numeric comparators

* edit

* prettier

* Update extra/DEVELOPER_DOCS/Code Conventions.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* note on typing imports

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-22 15:52:53 +02:00
Adriane Boyd
f55bb7470d
Clean up warnings in the test suite (#11331) 2022-08-22 12:04:30 +02:00
Paul O'Leary McCann
0f07defe2c
Remove reference to voting on issue (#11335)
Not clear which issue this refers to, we don't suggest this for any
other issues, and we don't use votes in general.
2022-08-22 11:29:05 +02:00
Adriane Boyd
04c6e5cb95
Improve floret vectors display in pipeline docs (#11343) 2022-08-22 11:28:13 +02:00
Adriane Boyd
3e4cf1bbe1
Check for . in factory names (#11336) 2022-08-19 09:52:12 +02:00
Adriane Boyd
09b3118b26
Add uk pipelines to website (#11332) 2022-08-18 14:04:57 +02:00
Sofie Van Landeghem
cab263791f
include span_ruler for default warning filter (#11333) 2022-08-17 19:55:54 +02:00
Peter Baumgartner
db7b9938a4
Docs: displaCy documentation - data types, parse_{deps,ents,spans}, spans example (#10950)
* add in spans example and parse references

* rm autoformatter

* rm extra ents copy

* TypedDict draft

* type fixes

* restore non-documentation files

* docs update

* fix spans example

* fix hyperlinks

* add parse example

* example fix + argument fix

* fix api arg in docs

* fix bad variable replacement

* fix spacing in style

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* fix spacing on table

* fix spacing on table

* rm temp files

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-08-16 11:23:34 -04:00
Adriane Boyd
ed4ad309e6
Fix Dutch noun chunks to skip overlapping spans (#11275)
* Add test for overlapping noun chunks

* Skip overlapping noun chunks

* Update spacy/tests/lang/nl/test_noun_chunks.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-08-10 09:49:08 +02:00
Paul O'Leary McCann
231a17817d
Clean up automated label-based issue handling (#11284)
* Clean up automated label-based issue handline

1. upgrade tiangolo/issue-manager to latest
2. move needs-more-info to tiangolo
3. change needs-more-info close time to 7 days
4. delete old needs-more-info config

* Use old, longer message

* Fix label name
2022-08-09 14:50:50 +02:00
Adriane Boyd
e700358ba0
Add W605 to the errors raised by flake8 in the CI (#11283) 2022-08-09 12:15:13 +02:00
Adriane Boyd
fc4246558b
Fix regex invalid escape sequences (#11276) 2022-08-09 10:59:36 +02:00
stefawolf
23749cfc91
adding spans to doc_annotation in Example.to_dict (#11261)
* adding spans to doc_annotation in Example.to_dict

* to_dict compatible with from_dict: tuples instead of spans

* use strings for label and kb_id

* Simplify test

* Update data formats docs

Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-05 12:26:38 +02:00
Adriane Boyd
b07708d5d0
Support full prerelease versions in the compat table (#11228)
* Support full prerelease versions in the compat table

* Fix types
2022-08-04 15:14:19 +02:00
Jules Belveze
cd09614ab2
chore: add 'concepCy' to spacy universe (#11255)
* chore: add 'concepCy' to spacy universe

* docs: add 'slogan' to concepCy
2022-08-04 15:42:38 +09:00
Lj Miranda
d993df41e5
Update docs for pipeline initialize() methods (#11221)
* Update documentation for dependency parser

* Update documentation for trainable_lemmatizer

* Update documentation for entity_linker

* Update documentation for ner

* Update documentation for morphologizer

* Update documentation for senter

* Update documentation for spancat

* Update documentation for tagger

* Update documentation for textcat

* Update documentation for tok2vec

* Run prettier on edited files

* Apply similar changes in transformer docs

* Remove need to say annotated example explicitly

I removed the need to say "Must contain at least one annotated Example"
because it's often a given that Examples will contain some gold-standard
annotation.

* Run prettier on transformer docs
2022-08-03 16:53:02 +02:00
Adriane Boyd
d0578c2ede
Add scorer to textcat API docs config settings (#11263) 2022-08-03 16:41:20 +02:00
Paul O'Leary McCann
2d89dd9db8
Update natto-py version spec (#11222)
* Update natto-py version spec

* Update setup.cfg

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-07-28 07:45:02 +02:00
ninjalu
95a1b8aca6
add additional REL_OP (#10371)
* add additional  REL_OP

* change to condition and new rel_op symbols

* add operators to docs

* add the anchor while we're in here

* add tests

Co-authored-by: Peter Baumgartner <5107405+pmbaumgartner@users.noreply.github.com>
2022-07-27 13:16:44 +02:00
Madeesh Kannan
1829d7120a
ExplosionBot: Add note about case-sensitivity (#11211) 2022-07-27 14:24:22 +09:00
Edward
360a702ecd
Add parent argument (#11210) 2022-07-26 14:35:18 +02:00