Commit Graph

15911 Commits

Author SHA1 Message Date
kadarakos
6fe7c66f36 black 2023-05-03 13:33:18 +00:00
kadarakos
82f6a813f0 revert to not interleaving (relized its faster) 2023-05-03 13:32:41 +00:00
kadarakos
a5b9e63664 black 2023-05-03 13:26:53 +00:00
kadarakos
db361db874 interleave thresholding with span creation 2023-05-03 13:26:31 +00:00
kadarakos
6b2e8363fc avoid two for loops over all docs by not precomputing 2023-05-03 13:08:44 +00:00
kadarakos
fe4c094d86 implement all comparison operators for inf int 2023-05-03 11:57:18 +00:00
kadarakos
02d8d62053 revert line order 2023-05-03 11:46:26 +00:00
kadarakos
02c1bc03c8 mypy fix for integer type infinity 2023-05-03 11:44:01 +00:00
kadarakos
9252e62477 black 2023-05-03 11:10:44 +00:00
kadarakos
4ef70c094c max_length and min_length as Optional[int] and strict checking 2023-05-03 11:00:18 +00:00
kadarakos
3b41a988b0 Merge branch 'master' of https://github.com/explosion/spaCy into add-span-finder 2023-04-26 18:34:54 +00:00
Victoria
a8dfc66135
Add spacy-wasm to universe (#12572)
* add spacy-wasm to universe

* add tag
2023-04-26 14:18:40 +02:00
moxley01
070fa16545
add spacysee project (#12568) 2023-04-25 12:30:19 +02:00
Adriane Boyd
68da580a4c
CI: Disable Azure (#12560) 2023-04-21 15:05:53 +02:00
Victoria
e115408514
remove survey link (#12559) 2023-04-21 10:22:26 +02:00
Patrick J. Burns
ab4ba04c32
Update LatinDefaults for lang 'la' (#12538)
* Add noun chunking to la syntax iterators

* Expand list of numeral, ordinal words

* Expand abbreviations in la tokenizer_exceptions

* Add example sents

* Update spacy/lang/la/syntax_iterators.py

Reorganize la syntax iterators

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Minor updates based on review

* fix call

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-20 16:55:40 +02:00
Adriane Boyd
b60b027927
Add default option to MorphAnalysis.get (#12545)
* Add default to MorphAnalysis.get

Similar to `dict`, allow a `default` option for `MorphAnalysis.get` for
the user to provide a default return value if the field is not found.
The default return value remains `[]`, which is not the same as
`dict.get`, but is already established as this method's default return
value with the return type `List[str]`. However the new `default` option
does not enforce that the user-provided default is actually `List[str]`.

* Restore test case
2023-04-20 14:06:32 +02:00
Adriane Boyd
dc0a1a9808
Load exceptions last in Tokenizer.from_bytes (#12553)
In `Tokenizer.from_bytes`, the exceptions should be loaded last so that
they are only processed once as part of loading the model.

The exceptions are tokenized as phrase matcher patterns in the
background and the internal tokenization needs to be synced with all the
remaining tokenizer settings. If the exceptions are not loaded last,
there are speed regressions for `Tokenizer.from_bytes/disk` vs.
`Tokenizer.add_special_case` as the caches are reloaded more than
necessary during deserialization.
2023-04-20 11:30:34 +02:00
Sofie Van Landeghem
8e6a3d58d8
fix typo (#12543) 2023-04-19 10:59:33 +02:00
kadarakos
19bae0900d rename 2023-04-17 14:20:22 +00:00
kadarakos
1796cf3b2b rename 2023-04-17 14:17:54 +00:00
kadarakos
58f1aa29a0
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-17 16:11:10 +02:00
TAN Long
923d24e885
perf(REL_OP): Replace some token.children with token.rights or token.lefts (#12528)
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-04-17 13:16:34 +02:00
TAN Long
119f959218
docs(REL_OP): modify docs for REL_OPs to match Semgrex's update on CoreNLP v4.5.2 (#12531)
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-04-17 13:14:01 +02:00
kadarakos
d544c903b6 black 2023-04-13 10:04:35 +00:00
kadarakos
85dd4d4c3b default spankey constant 2023-04-13 10:01:12 +00:00
andyjessen
02259fa195
Add category to spaCy project (#12506)
ScispaCy fits within biomedical domain. Consider adding this category.
2023-04-07 15:31:04 +02:00
kadarakos
4d88616c4f black 2023-04-06 13:52:54 +00:00
kadarakos
8fc84fb6ab isort 2023-04-06 13:43:15 +00:00
kadarakos
9cfcbdd0ad black 2023-04-06 13:40:31 +00:00
kadarakos
21d5360e67 Merge branch 'master' of https://github.com/explosion/spaCy into add-span-finder 2023-04-06 13:27:19 +00:00
kadarakos
638ac9f666 span finder integrated into spacy from experimental 2023-04-06 13:27:08 +00:00
Madeesh Kannan
6db20b354f
Docs: Fix rule-based matching example that expands named entities (#12495) 2023-04-06 11:45:58 +02:00
Edward
c95d320d28
Add more information to custom code docs (#12491)
* Add info to sections

* Update website/docs/usage/training.mdx

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-06 11:45:19 +02:00
Will Frey
8d4129e177
Fix invalid ConsoleLogger.v3 example config (#12498)
Replace `progress_bar = "all_steps"` with `progress_bar = "eval"`, which is consistent with the default behavior for `spacy.ConsoleLogger.v1` and `spacy.ConsoleLogger.v2`.
2023-04-04 20:53:07 +02:00
Edward
de32011e4c
Add model-last saving mechanism to pretraining (#12459)
* Adjust pretrain command

* chane naming and add finally block

* Add unit test

* Add unit test assertions

* Update spacy/training/pretrain.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* change finally block

* Add to docs

* Update website/docs/usage/embeddings-transformers.mdx

* Add flag to skip saving model-last

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:24:03 +02:00
Adriane Boyd
4a1ec332de
Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set (#12493)
* Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set

* Format
2023-04-03 15:11:12 +02:00
Adriane Boyd
4538ceb507
Remove redundant strings.add for Doc.char_span (#12429) 2023-04-03 11:38:56 +02:00
Adriane Boyd
476a2e7a0a
Allow cupy 12.0 for extras (#12490) 2023-03-31 13:48:15 +02:00
Adriane Boyd
69e20ce03d
Fix pickle for ngram suggester (#12486) 2023-03-31 13:43:51 +02:00
Adriane Boyd
140d53649d
Convert values to numpy for label smoothing tests (#12472) 2023-03-31 13:41:41 +02:00
Ye Lei (叶磊)
ce258670b7
Allow passing a Span to displacy.parse_deps (#12477)
* Allow passing a Span to displacy.parse_deps

* Update docstring

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update API docs

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-31 09:44:01 +02:00
Raphael Mitsch
d85df9d577
Fix Span.sents for edge case of Span being the only Span in the last sentence of a Doc. (#12484) 2023-03-29 18:54:47 +02:00
kadarakos
372a90885e
Fix spancat-singlelabel score (#12469)
* debug argmax sort and add span scores

* add missing tests for spanscores
2023-03-29 08:38:11 +02:00
Edward
dba4e7bece
Add info to stringstore and vocab (#12471) 2023-03-27 13:15:14 +02:00
Adriane Boyd
2fba21be63
Restrict github workflows to explosion (#12470) 2023-03-27 12:44:04 +02:00
sloev / Johannes Valbjørn
fd072533e7
add spacy_onnx_sentiment_english to universe (#12422)
* add spacy_onnx_sentiment_english to universe

* rename to sentimental-onix

* fix comma json error

* fix typo

* typo fix

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* mention need to download model before example works

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-27 11:35:14 +02:00
Prajakta Darade
ae7779e830
corrected example code (#12466) 2023-03-27 11:32:49 +02:00
kadarakos
d1474fdd91
add explanation about overwriting behaviour (#12464)
* add explanation about overwriting behaviour

* Update website/docs/api/spancategorizer.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/spancategorizer.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/spancategorizer.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* format

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-27 10:27:11 +02:00
Adriane Boyd
fac457a509
Support floret for PretrainVectors (#12435)
* Support floret for PretrainVectors

* Format
2023-03-24 16:28:51 +01:00