Ines Montani
15e6578f7d
Adjust formatting
2021-07-17 10:49:13 +10:00
Mario Šaško
47c5a63a83
Add TakeLab/spacy-udpipe to Universe ( #8698 )
...
* Add TakeLab/spacy-udpipe to universe
* Add SCA
* Sign SCA
2021-07-16 11:18:09 +02:00
Mario Šaško
1ba2e8a646
Add TakeLab/spacy-udpipe to Universe ( #8698 )
...
* Add TakeLab/spacy-udpipe to universe
* Add SCA
* Sign SCA
2021-07-16 11:15:52 +02:00
explosion-bot
eff3d1088b
Auto-format code with black
2021-07-16 08:03:36 +00:00
Adriane Boyd
e76e2addd1
Remove TrainablePipe as base class for Lemmatizer in API docs ( #8725 )
2021-07-15 16:42:14 +02:00
Adriane Boyd
f5acc48111
Remove TrainablePipe as base class for Lemmatizer in API docs ( #8725 )
2021-07-15 16:41:36 +02:00
Adriane Boyd
ac45c7c045
Add pre-commit to ignored requirements ( #8728 )
2021-07-15 16:41:15 +02:00
Paul O'Leary McCann
9b63cbb775
Add extract spans import
2021-07-15 18:16:53 +09:00
jmyerston
993b0fab0e
Added ancient Greek language support ( #8606 )
...
* Add ancient Greek language support
Initial commit
* Contributor Agreement
* grc tokenizer test added and files formatted with black, unnecessary import removed
Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Commas in lists fixed. __init__py added to test
* Update lex_attrs.py
* Update stop_words.py
* Update stop_words.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-15 10:27:17 +02:00
Sofie Van Landeghem
77859beb99
spacy.ngram_range_suggester.v1 ( #8699 )
2021-07-15 10:01:22 +02:00
Julien Rossi
e117573822
Adding noun_chunks to the DUTCH language model (nl) ( #8529 )
...
* ✨ implement noun_chunks for dutch language
* copy/paste FR and SV syntax iterators to accomodate UD tags
* added tests with dutch text
* signed contributor agreement
* 🐛 fix noun chunks generator
* built from scratch
* define noun chunk as a single Noun-Phrase
* includes some corner cases debugging (incorrect POS tagging)
* test with provided annotated sample (POS, DEP)
* ✅ fix failing test
* CI pipeline did not like the added sample file
* add the sample as a pytest fixture
* Update spacy/lang/nl/syntax_iterators.py
* Update spacy/lang/nl/syntax_iterators.py
Code readability
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/tests/lang/nl/test_noun_chunks.py
correct comment
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* finalize code
* change "if next_word" into "if next_word is not None"
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-14 14:01:02 +02:00
Paul O'Leary McCann
e9626e38c1
Fix serialization test
...
This test was failing not because the thing it was testing wasn't
working, but because of the way span equality works. Span equality
relies on doc equality, and doc equality is object identity, so spans
from different docs will never be equal.
2021-07-14 18:37:34 +09:00
Paul O'Leary McCann
4a9dc00d86
Use relative indices for mentions
...
Was using batch absolute indices to manage mentions, but extract_spans
expects doc-relative ones.
2021-07-14 18:36:18 +09:00
Paul O'Leary McCann
3684f7fdfd
Remove comment from fixed test
2021-07-14 18:22:14 +09:00
Paul O'Leary McCann
f1796e4af7
Fix mention list bug
...
There was an off-by-one error in how mentions are generated that would
affect mentions at the end of a sentence. This was pretty nasty.
2021-07-14 18:19:00 +09:00
Ines Montani
8ca6c58625
Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip]
...
Update spacy-stanza universe.json
2021-07-13 19:03:56 +10:00
Ines Montani
2a8eeed5da
Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip]
...
Update spacy-stanza universe.json
2021-07-13 19:03:42 +10:00
thomashacker
aafb89df78
Update universe.json code_example
2021-07-13 10:22:49 +02:00
KennethEnevoldsen
e5127992a0
added agreement
2021-07-13 10:11:02 +02:00
Kenneth Enevoldsen
94ce904e10
added missing comma
2021-07-13 09:59:34 +02:00
Kenneth Enevoldsen
a81fcc81b0
added dacy to universe
2021-07-13 09:54:08 +02:00
Adriane Boyd
f9fd2889b7
Use 0-vector for OOV lexemes ( #8639 )
2021-07-13 14:48:12 +10:00
Edward
8233359225
Fix preservation of spacy package meta ( #8663 )
...
* update package meta with existing_meta and nlp_meta
* Add spaCy contributor agreement
* Added more info when creating readme
2021-07-12 11:18:52 +02:00
Paul O'Leary McCann
80a17071d3
Remove unused code
2021-07-11 18:46:39 +09:00
Paul O'Leary McCann
447c7070e3
Fix loss
...
Accidentally deleted it
2021-07-10 22:45:25 +09:00
Paul O'Leary McCann
c25ec292a9
Cleanup
2021-07-10 22:42:55 +09:00
Paul O'Leary McCann
e00bd422d9
Fix span embeds
...
Some of the lengths and backprop weren't right.
Also various cleanup.
2021-07-10 21:38:53 +09:00
Paul O'Leary McCann
d7d317a1b5
Clean up span embedding code
...
This is now cleaner and significantly faster. There's still some messy
parts in the code (particularly variable names), will get to that later.
2021-07-10 19:59:08 +09:00
Paul O'Leary McCann
dc1f974d39
Merge branch 'master' into feature/coref
2021-07-10 18:10:40 +09:00
Paul O'Leary McCann
f34915c1e8
Use scatter_add to speed up span embed backprop
...
This was the slowest part of the code, and using scatter_add here
probably reduces the runtime by 50%.
2021-07-10 18:08:51 +09:00
Paul O'Leary McCann
1c70c87daf
Fix autoblack
...
The conditional needs double equals.
2021-07-10 16:02:39 +09:00
Ines Montani
616f4de034
Merge pull request #8674 from polm/fix/autoblack-no-forks [ci skip]
...
Make the autoblack job not run on forks
2021-07-10 16:41:59 +10:00
Paul O'Leary McCann
b8cdbb4bb6
Make the autoblack job not run on forks
...
The autoblack job is an occasional cleanup job. If it runs on forks and
those PRs are accepted the git history will be weird and that doesn't
help anyone.
The way to make the job not run on forks is a little non-obvious but
based on this thread.
https://github.com/prisma/prisma/issues/3539
2021-07-10 15:38:20 +09:00
Ines Montani
d8ae5750a6
Merge pull request #8665 from rynoV/patch-1 [ci skip]
2021-07-10 10:52:39 +10:00
Ines Montani
d4fecdfb82
Merge pull request #8665 from rynoV/patch-1 [ci skip]
2021-07-10 10:52:15 +10:00
Ines Montani
50000d37e4
Avoid double parentheses [ci skip]
2021-07-10 10:52:01 +10:00
Calum Sieppert
e2d53aa1a6
Typo fixes
2021-07-09 10:25:56 -06:00
Adriane Boyd
d8805a1073
Fix ru/uk lemmatizer mp with spawn ( #8657 )
...
Use an instance variable instead a class variable for the morphological
analzyer so that multiprocessing with spawn is possible.
2021-07-09 15:36:56 +02:00
Adriane Boyd
b8e720fdb9
Fix Azerbaijani init, extend lang init tests ( #8656 )
...
* Extend langs in initialize tests
* Fix az init
2021-07-09 15:36:35 +02:00
Ines Montani
1c0ed22d1e
Merge pull request #8573 from julien-talkair/code-quality-pre-commit
2021-07-09 23:09:24 +10:00
Ines Montani
bbca56687f
Merge pull request #8655 from explosion/autoblack
...
Auto-format code with black
2021-07-09 23:08:05 +10:00
explosion-bot
334f1f98d8
Auto-format code with black
2021-07-09 08:06:06 +00:00
Adriane Boyd
363230de19
Add Macedonian models to website ( #8637 )
2021-07-08 09:32:41 +02:00
Adriane Boyd
1ee5bee29d
Add Macedonian models to website ( #8637 )
2021-07-08 09:32:14 +02:00
Paul O'Leary McCann
d0b041aff4
Switch to using Thinc tuplify
...
The tuplify code here was added to Thinc proper and that's been
released, so no need to have it here any more.
2021-07-08 16:08:36 +09:00
Suqi Sun
20a2beafb5
Update pip
2021-07-08 15:09:52 +09:00
Suqi Sun
c61ecb6f7c
Update pip and code example
2021-07-08 15:09:52 +09:00
Suqi Sun
f011126ebd
Add forte to universe.json
2021-07-08 15:09:52 +09:00
Paul O'Leary McCann
1d9209d43a
Merge pull request #8547 from mylibrar/update-universe
...
Add forte to universe.json
2021-07-08 14:59:49 +09:00
Ines Montani
cdc0d669c1
Add code preview for textcat_multilabel [ci skip]
2021-07-08 13:33:33 +10:00