Commit Graph

9322 Commits

Author SHA1 Message Date
kadarakos
37c4ad5007 only test suggester and test result exhaustively 2023-06-02 08:54:45 +00:00
kadarakos
658c4aee35 flaky test fix suggestion, hand set bias terms 2023-06-01 16:44:55 +00:00
kadarakos
56de1076a1
Update spacy/pipeline/span_finder.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-06-01 17:40:25 +02:00
kadarakos
8c7c34d4f4 use the 'spans_key' variable name everywhere 2023-06-01 13:09:25 +00:00
kadarakos
fe964e7831 remove near duplicate reduntant method 2023-06-01 13:02:12 +00:00
kadarakos
09b5f61e7d remove comment 2023-06-01 12:20:45 +00:00
kadarakos
4c2f80cf17 typo 2023-06-01 12:18:17 +00:00
kadarakos
2a1cb13069 remove debug lines 2023-06-01 10:23:35 +00:00
kadarakos
af802257f2 black 2023-06-01 10:20:14 +00:00
kadarakos
6f750d0da6 only use a single spans_key like in spancat 2023-06-01 10:19:22 +00:00
kadarakos
90af16af76 failing overfit test 2023-05-31 17:30:56 +00:00
kadarakos
f599bd5a4d return correct variable 2023-05-31 17:30:17 +00:00
kadarakos
6e46ecfa2c handle misaligned tokenization 2023-05-31 16:56:01 +00:00
kadarakos
11a17976ec black 2023-05-09 11:32:02 +00:00
kadarakos
e5b2a8b506 Merge branch 'add-span-finder' of https://github.com/kadarakos/spaCy into add-span-finder 2023-05-08 13:45:57 +00:00
kadarakos
103654211f adjust test 2023-05-08 13:21:37 +00:00
kadarakos
530d812aed remove todo 2023-05-08 13:20:49 +00:00
kadarakos
b7ce3ab42f new error for ensuring reference and predicted texts are the same 2023-05-08 13:14:34 +00:00
kadarakos
d5da2df4c9 enforce that the gold and predicted documents have the same text 2023-05-08 11:18:06 +00:00
kadarakos
9d6793604e update dosctring 2023-05-08 10:24:56 +00:00
kadarakos
2c8408f4f2
Update spacy/errors.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-05-08 12:07:26 +02:00
kadarakos
6fe7c66f36 black 2023-05-03 13:33:18 +00:00
kadarakos
82f6a813f0 revert to not interleaving (relized its faster) 2023-05-03 13:32:41 +00:00
kadarakos
a5b9e63664 black 2023-05-03 13:26:53 +00:00
kadarakos
db361db874 interleave thresholding with span creation 2023-05-03 13:26:31 +00:00
kadarakos
6b2e8363fc avoid two for loops over all docs by not precomputing 2023-05-03 13:08:44 +00:00
kadarakos
fe4c094d86 implement all comparison operators for inf int 2023-05-03 11:57:18 +00:00
kadarakos
02d8d62053 revert line order 2023-05-03 11:46:26 +00:00
kadarakos
02c1bc03c8 mypy fix for integer type infinity 2023-05-03 11:44:01 +00:00
kadarakos
9252e62477 black 2023-05-03 11:10:44 +00:00
kadarakos
4ef70c094c max_length and min_length as Optional[int] and strict checking 2023-05-03 11:00:18 +00:00
kadarakos
3b41a988b0 Merge branch 'master' of https://github.com/explosion/spaCy into add-span-finder 2023-04-26 18:34:54 +00:00
Patrick J. Burns
ab4ba04c32
Update LatinDefaults for lang 'la' (#12538)
* Add noun chunking to la syntax iterators

* Expand list of numeral, ordinal words

* Expand abbreviations in la tokenizer_exceptions

* Add example sents

* Update spacy/lang/la/syntax_iterators.py

Reorganize la syntax iterators

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Minor updates based on review

* fix call

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-20 16:55:40 +02:00
Adriane Boyd
b60b027927
Add default option to MorphAnalysis.get (#12545)
* Add default to MorphAnalysis.get

Similar to `dict`, allow a `default` option for `MorphAnalysis.get` for
the user to provide a default return value if the field is not found.
The default return value remains `[]`, which is not the same as
`dict.get`, but is already established as this method's default return
value with the return type `List[str]`. However the new `default` option
does not enforce that the user-provided default is actually `List[str]`.

* Restore test case
2023-04-20 14:06:32 +02:00
Adriane Boyd
dc0a1a9808
Load exceptions last in Tokenizer.from_bytes (#12553)
In `Tokenizer.from_bytes`, the exceptions should be loaded last so that
they are only processed once as part of loading the model.

The exceptions are tokenized as phrase matcher patterns in the
background and the internal tokenization needs to be synced with all the
remaining tokenizer settings. If the exceptions are not loaded last,
there are speed regressions for `Tokenizer.from_bytes/disk` vs.
`Tokenizer.add_special_case` as the caches are reloaded more than
necessary during deserialization.
2023-04-20 11:30:34 +02:00
Sofie Van Landeghem
8e6a3d58d8
fix typo (#12543) 2023-04-19 10:59:33 +02:00
kadarakos
19bae0900d rename 2023-04-17 14:20:22 +00:00
kadarakos
1796cf3b2b rename 2023-04-17 14:17:54 +00:00
kadarakos
58f1aa29a0
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-17 16:11:10 +02:00
TAN Long
923d24e885
perf(REL_OP): Replace some token.children with token.rights or token.lefts (#12528)
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-04-17 13:16:34 +02:00
kadarakos
d544c903b6 black 2023-04-13 10:04:35 +00:00
kadarakos
85dd4d4c3b default spankey constant 2023-04-13 10:01:12 +00:00
kadarakos
4d88616c4f black 2023-04-06 13:52:54 +00:00
kadarakos
8fc84fb6ab isort 2023-04-06 13:43:15 +00:00
kadarakos
9cfcbdd0ad black 2023-04-06 13:40:31 +00:00
kadarakos
638ac9f666 span finder integrated into spacy from experimental 2023-04-06 13:27:08 +00:00
Edward
de32011e4c
Add model-last saving mechanism to pretraining (#12459)
* Adjust pretrain command

* chane naming and add finally block

* Add unit test

* Add unit test assertions

* Update spacy/training/pretrain.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* change finally block

* Add to docs

* Update website/docs/usage/embeddings-transformers.mdx

* Add flag to skip saving model-last

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:24:03 +02:00
Adriane Boyd
4a1ec332de
Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set (#12493)
* Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set

* Format
2023-04-03 15:11:12 +02:00
Adriane Boyd
4538ceb507
Remove redundant strings.add for Doc.char_span (#12429) 2023-04-03 11:38:56 +02:00
Adriane Boyd
69e20ce03d
Fix pickle for ngram suggester (#12486) 2023-03-31 13:43:51 +02:00