kadarakos
37c4ad5007
only test suggester and test result exhaustively
2023-06-02 08:54:45 +00:00
kadarakos
658c4aee35
flaky test fix suggestion, hand set bias terms
2023-06-01 16:44:55 +00:00
kadarakos
56de1076a1
Update spacy/pipeline/span_finder.py
...
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-06-01 17:40:25 +02:00
kadarakos
8c7c34d4f4
use the 'spans_key' variable name everywhere
2023-06-01 13:09:25 +00:00
kadarakos
fe964e7831
remove near duplicate reduntant method
2023-06-01 13:02:12 +00:00
kadarakos
09b5f61e7d
remove comment
2023-06-01 12:20:45 +00:00
kadarakos
4c2f80cf17
typo
2023-06-01 12:18:17 +00:00
kadarakos
2a1cb13069
remove debug lines
2023-06-01 10:23:35 +00:00
kadarakos
af802257f2
black
2023-06-01 10:20:14 +00:00
kadarakos
6f750d0da6
only use a single spans_key like in spancat
2023-06-01 10:19:22 +00:00
kadarakos
90af16af76
failing overfit test
2023-05-31 17:30:56 +00:00
kadarakos
f599bd5a4d
return correct variable
2023-05-31 17:30:17 +00:00
kadarakos
6e46ecfa2c
handle misaligned tokenization
2023-05-31 16:56:01 +00:00
kadarakos
11a17976ec
black
2023-05-09 11:32:02 +00:00
kadarakos
e5b2a8b506
Merge branch 'add-span-finder' of https://github.com/kadarakos/spaCy into add-span-finder
2023-05-08 13:45:57 +00:00
kadarakos
103654211f
adjust test
2023-05-08 13:21:37 +00:00
kadarakos
530d812aed
remove todo
2023-05-08 13:20:49 +00:00
kadarakos
b7ce3ab42f
new error for ensuring reference and predicted texts are the same
2023-05-08 13:14:34 +00:00
kadarakos
d5da2df4c9
enforce that the gold and predicted documents have the same text
2023-05-08 11:18:06 +00:00
kadarakos
9d6793604e
update dosctring
2023-05-08 10:24:56 +00:00
kadarakos
2c8408f4f2
Update spacy/errors.py
...
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-05-08 12:07:26 +02:00
kadarakos
6fe7c66f36
black
2023-05-03 13:33:18 +00:00
kadarakos
82f6a813f0
revert to not interleaving (relized its faster)
2023-05-03 13:32:41 +00:00
kadarakos
a5b9e63664
black
2023-05-03 13:26:53 +00:00
kadarakos
db361db874
interleave thresholding with span creation
2023-05-03 13:26:31 +00:00
kadarakos
6b2e8363fc
avoid two for loops over all docs by not precomputing
2023-05-03 13:08:44 +00:00
kadarakos
fe4c094d86
implement all comparison operators for inf int
2023-05-03 11:57:18 +00:00
kadarakos
02d8d62053
revert line order
2023-05-03 11:46:26 +00:00
kadarakos
02c1bc03c8
mypy fix for integer type infinity
2023-05-03 11:44:01 +00:00
kadarakos
9252e62477
black
2023-05-03 11:10:44 +00:00
kadarakos
4ef70c094c
max_length and min_length as Optional[int] and strict checking
2023-05-03 11:00:18 +00:00
kadarakos
3b41a988b0
Merge branch 'master' of https://github.com/explosion/spaCy into add-span-finder
2023-04-26 18:34:54 +00:00
Patrick J. Burns
ab4ba04c32
Update LatinDefaults for lang 'la' ( #12538 )
...
* Add noun chunking to la syntax iterators
* Expand list of numeral, ordinal words
* Expand abbreviations in la tokenizer_exceptions
* Add example sents
* Update spacy/lang/la/syntax_iterators.py
Reorganize la syntax iterators
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Minor updates based on review
* fix call
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-20 16:55:40 +02:00
Adriane Boyd
b60b027927
Add default option to MorphAnalysis.get ( #12545 )
...
* Add default to MorphAnalysis.get
Similar to `dict`, allow a `default` option for `MorphAnalysis.get` for
the user to provide a default return value if the field is not found.
The default return value remains `[]`, which is not the same as
`dict.get`, but is already established as this method's default return
value with the return type `List[str]`. However the new `default` option
does not enforce that the user-provided default is actually `List[str]`.
* Restore test case
2023-04-20 14:06:32 +02:00
Adriane Boyd
dc0a1a9808
Load exceptions last in Tokenizer.from_bytes ( #12553 )
...
In `Tokenizer.from_bytes`, the exceptions should be loaded last so that
they are only processed once as part of loading the model.
The exceptions are tokenized as phrase matcher patterns in the
background and the internal tokenization needs to be synced with all the
remaining tokenizer settings. If the exceptions are not loaded last,
there are speed regressions for `Tokenizer.from_bytes/disk` vs.
`Tokenizer.add_special_case` as the caches are reloaded more than
necessary during deserialization.
2023-04-20 11:30:34 +02:00
Sofie Van Landeghem
8e6a3d58d8
fix typo ( #12543 )
2023-04-19 10:59:33 +02:00
kadarakos
19bae0900d
rename
2023-04-17 14:20:22 +00:00
kadarakos
1796cf3b2b
rename
2023-04-17 14:17:54 +00:00
kadarakos
58f1aa29a0
Update spacy/pipeline/spancat.py
...
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-17 16:11:10 +02:00
TAN Long
923d24e885
perf(REL_OP): Replace some token.children with token.rights or token.lefts ( #12528 )
...
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-04-17 13:16:34 +02:00
kadarakos
d544c903b6
black
2023-04-13 10:04:35 +00:00
kadarakos
85dd4d4c3b
default spankey constant
2023-04-13 10:01:12 +00:00
kadarakos
4d88616c4f
black
2023-04-06 13:52:54 +00:00
kadarakos
8fc84fb6ab
isort
2023-04-06 13:43:15 +00:00
kadarakos
9cfcbdd0ad
black
2023-04-06 13:40:31 +00:00
kadarakos
638ac9f666
span finder integrated into spacy from experimental
2023-04-06 13:27:08 +00:00
Edward
de32011e4c
Add model-last saving mechanism to pretraining ( #12459 )
...
* Adjust pretrain command
* chane naming and add finally block
* Add unit test
* Add unit test assertions
* Update spacy/training/pretrain.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* change finally block
* Add to docs
* Update website/docs/usage/embeddings-transformers.mdx
* Add flag to skip saving model-last
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:24:03 +02:00
Adriane Boyd
4a1ec332de
Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set ( #12493 )
...
* Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set
* Format
2023-04-03 15:11:12 +02:00
Adriane Boyd
4538ceb507
Remove redundant strings.add for Doc.char_span ( #12429 )
2023-04-03 11:38:56 +02:00
Adriane Boyd
69e20ce03d
Fix pickle for ngram suggester ( #12486 )
2023-03-31 13:43:51 +02:00