Commit Graph

9318 Commits

Author SHA1 Message Date
kadarakos
fe964e7831 remove near duplicate reduntant method 2023-06-01 13:02:12 +00:00
kadarakos
09b5f61e7d remove comment 2023-06-01 12:20:45 +00:00
kadarakos
4c2f80cf17 typo 2023-06-01 12:18:17 +00:00
kadarakos
2a1cb13069 remove debug lines 2023-06-01 10:23:35 +00:00
kadarakos
af802257f2 black 2023-06-01 10:20:14 +00:00
kadarakos
6f750d0da6 only use a single spans_key like in spancat 2023-06-01 10:19:22 +00:00
kadarakos
90af16af76 failing overfit test 2023-05-31 17:30:56 +00:00
kadarakos
f599bd5a4d return correct variable 2023-05-31 17:30:17 +00:00
kadarakos
6e46ecfa2c handle misaligned tokenization 2023-05-31 16:56:01 +00:00
kadarakos
11a17976ec black 2023-05-09 11:32:02 +00:00
kadarakos
e5b2a8b506 Merge branch 'add-span-finder' of https://github.com/kadarakos/spaCy into add-span-finder 2023-05-08 13:45:57 +00:00
kadarakos
103654211f adjust test 2023-05-08 13:21:37 +00:00
kadarakos
530d812aed remove todo 2023-05-08 13:20:49 +00:00
kadarakos
b7ce3ab42f new error for ensuring reference and predicted texts are the same 2023-05-08 13:14:34 +00:00
kadarakos
d5da2df4c9 enforce that the gold and predicted documents have the same text 2023-05-08 11:18:06 +00:00
kadarakos
9d6793604e update dosctring 2023-05-08 10:24:56 +00:00
kadarakos
2c8408f4f2
Update spacy/errors.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-05-08 12:07:26 +02:00
kadarakos
6fe7c66f36 black 2023-05-03 13:33:18 +00:00
kadarakos
82f6a813f0 revert to not interleaving (relized its faster) 2023-05-03 13:32:41 +00:00
kadarakos
a5b9e63664 black 2023-05-03 13:26:53 +00:00
kadarakos
db361db874 interleave thresholding with span creation 2023-05-03 13:26:31 +00:00
kadarakos
6b2e8363fc avoid two for loops over all docs by not precomputing 2023-05-03 13:08:44 +00:00
kadarakos
fe4c094d86 implement all comparison operators for inf int 2023-05-03 11:57:18 +00:00
kadarakos
02d8d62053 revert line order 2023-05-03 11:46:26 +00:00
kadarakos
02c1bc03c8 mypy fix for integer type infinity 2023-05-03 11:44:01 +00:00
kadarakos
9252e62477 black 2023-05-03 11:10:44 +00:00
kadarakos
4ef70c094c max_length and min_length as Optional[int] and strict checking 2023-05-03 11:00:18 +00:00
kadarakos
3b41a988b0 Merge branch 'master' of https://github.com/explosion/spaCy into add-span-finder 2023-04-26 18:34:54 +00:00
Patrick J. Burns
ab4ba04c32
Update LatinDefaults for lang 'la' (#12538)
* Add noun chunking to la syntax iterators

* Expand list of numeral, ordinal words

* Expand abbreviations in la tokenizer_exceptions

* Add example sents

* Update spacy/lang/la/syntax_iterators.py

Reorganize la syntax iterators

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Minor updates based on review

* fix call

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-20 16:55:40 +02:00
Adriane Boyd
b60b027927
Add default option to MorphAnalysis.get (#12545)
* Add default to MorphAnalysis.get

Similar to `dict`, allow a `default` option for `MorphAnalysis.get` for
the user to provide a default return value if the field is not found.
The default return value remains `[]`, which is not the same as
`dict.get`, but is already established as this method's default return
value with the return type `List[str]`. However the new `default` option
does not enforce that the user-provided default is actually `List[str]`.

* Restore test case
2023-04-20 14:06:32 +02:00
Adriane Boyd
dc0a1a9808
Load exceptions last in Tokenizer.from_bytes (#12553)
In `Tokenizer.from_bytes`, the exceptions should be loaded last so that
they are only processed once as part of loading the model.

The exceptions are tokenized as phrase matcher patterns in the
background and the internal tokenization needs to be synced with all the
remaining tokenizer settings. If the exceptions are not loaded last,
there are speed regressions for `Tokenizer.from_bytes/disk` vs.
`Tokenizer.add_special_case` as the caches are reloaded more than
necessary during deserialization.
2023-04-20 11:30:34 +02:00
Sofie Van Landeghem
8e6a3d58d8
fix typo (#12543) 2023-04-19 10:59:33 +02:00
kadarakos
19bae0900d rename 2023-04-17 14:20:22 +00:00
kadarakos
1796cf3b2b rename 2023-04-17 14:17:54 +00:00
kadarakos
58f1aa29a0
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-17 16:11:10 +02:00
TAN Long
923d24e885
perf(REL_OP): Replace some token.children with token.rights or token.lefts (#12528)
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-04-17 13:16:34 +02:00
kadarakos
d544c903b6 black 2023-04-13 10:04:35 +00:00
kadarakos
85dd4d4c3b default spankey constant 2023-04-13 10:01:12 +00:00
kadarakos
4d88616c4f black 2023-04-06 13:52:54 +00:00
kadarakos
8fc84fb6ab isort 2023-04-06 13:43:15 +00:00
kadarakos
9cfcbdd0ad black 2023-04-06 13:40:31 +00:00
kadarakos
638ac9f666 span finder integrated into spacy from experimental 2023-04-06 13:27:08 +00:00
Edward
de32011e4c
Add model-last saving mechanism to pretraining (#12459)
* Adjust pretrain command

* chane naming and add finally block

* Add unit test

* Add unit test assertions

* Update spacy/training/pretrain.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* change finally block

* Add to docs

* Update website/docs/usage/embeddings-transformers.mdx

* Add flag to skip saving model-last

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:24:03 +02:00
Adriane Boyd
4a1ec332de
Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set (#12493)
* Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set

* Format
2023-04-03 15:11:12 +02:00
Adriane Boyd
4538ceb507
Remove redundant strings.add for Doc.char_span (#12429) 2023-04-03 11:38:56 +02:00
Adriane Boyd
69e20ce03d
Fix pickle for ngram suggester (#12486) 2023-03-31 13:43:51 +02:00
Adriane Boyd
140d53649d
Convert values to numpy for label smoothing tests (#12472) 2023-03-31 13:41:41 +02:00
Ye Lei (叶磊)
ce258670b7
Allow passing a Span to displacy.parse_deps (#12477)
* Allow passing a Span to displacy.parse_deps

* Update docstring

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update API docs

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-31 09:44:01 +02:00
Raphael Mitsch
d85df9d577
Fix Span.sents for edge case of Span being the only Span in the last sentence of a Doc. (#12484) 2023-03-29 18:54:47 +02:00
kadarakos
372a90885e
Fix spancat-singlelabel score (#12469)
* debug argmax sort and add span scores

* add missing tests for spanscores
2023-03-29 08:38:11 +02:00