muratjumashev
4418ec2eee
Add punctuation
2021-01-23 21:31:31 +06:00
muratjumashev
101d265778
Add stopwords
2021-01-23 21:25:28 +06:00
muratjumashev
28d06ab860
Add tokenizer_exceptions
2021-01-22 23:08:41 +06:00
Sofie Van Landeghem
5ace559201
ensure span.text works for an empty span ( #6772 )
2021-01-21 23:18:46 +08:00
Sofie Van Landeghem
fdf8c77630
support IS_SENT_START in PhraseMatcher ( #6771 )
...
* support IS_SENT_START in PhraseMatcher
* add unit test and friendlier error
* use IDS.get instead
2021-01-21 09:59:17 +01:00
Adriane Boyd
bc7d83d4be
Skip 0-length matches ( #6759 )
...
Add hack to prevent matcher from returning 0-length matches.
2021-01-19 07:38:11 +08:00
Santiago Castro
28256522c8
Fix spacy.util.minibatch
when the size iterator is finished ( #6745 )
2021-01-17 19:48:43 +08:00
Adriane Boyd
e649242927
Prevent overlapping noun chunks for Spanish ( #6712 )
...
* Prevent overlapping noun chunks in Spanish noun chunk iterator
* Clean up similar code in Danish noun chunk iterator
2021-01-14 17:33:31 +11:00
Adriane Boyd
9957ed7897
Override language defaults for null token and URL match ( #6705 )
...
* Override language defaults for null token and URL match
When the serialized `token_match` or `url_match` is `None`, override the
language defaults to preserve `None` on deserialization.
* Fix fixtures in tests
2021-01-14 17:31:29 +11:00
Ines Montani
29c3ca7e34
Fix SVG integration [ci skip]
2021-01-14 13:33:41 +11:00
Antonio Miras
b4bd8f347a
spaCy Universe: New project; SpacyDotNet ( #6702 )
...
* Universe: SpacyDotNet a .NET Core spaCy wrapper
* Signed contributor agreement
Co-authored-by: Antonio Miras <antonio@amiras.net>
2021-01-13 12:47:30 +11:00
Alex Combessie
9cc880014c
Remove questionable French stopwords ( #6310 )
...
* Remove questionable French stopwords
* Create alexcombessie.md
2021-01-08 11:36:22 +11:00
Cristiana S Parada
7a0222f260
Update stop_words.py in Portuguese (a,o,e) ( #6345 )
...
* Update stop_words.py
Added three aditional stopwords: "a" and "o" that means "the", and "e" that means "and"
* Create cristianasp.md
* zero edit to push CI
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-01-08 11:35:38 +11:00
Lorena Ciutacu
f11002f1f1
add new Romanian stopwords ( #6621 )
...
* add contributor agreement
* update ro stopwords list
* add new stopwords
2021-01-08 11:34:47 +11:00
ophelielacroix
e3222fdec9
Add (noun chunks) syntax iterators for Danish ( #6246 )
...
* add syntax iterators for danish
* add test noun chunks for danish syntax iterators
* add contributor agreement
* update da syntax iterators to remove nested chunks
* add tests for da noun chunks
* Fix test
* add missing import
* fix example
* Prevent overlapping noun chunks
Prevent overlapping noun chunks by tracking the end index of the
previous noun chunk span.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-01-07 16:33:00 +11:00
Sofie Van Landeghem
6f7e7d88b9
remove cause without apostrophe from norm exceptions ( #6636 )
2021-01-06 12:30:30 +08:00
Sofie Van Landeghem
87562e470d
fix backticks in docs ( #6635 )
2020-12-27 22:12:37 +01:00
Sofie Van Landeghem
8df5b7f513
fix documentation of 'path' in tokenizer.to_disk ( #6634 )
2020-12-27 22:01:06 +01:00
Yosi
cf52510631
Add Amharic አማርኛ Language support ( #6583 )
...
* Add Amharic to space
* clean up
* Add some PRON_LEMMA
* add Tigrinya support
* remove text_noun_chunks
* Tigrinya Support
* added some more details for ti
* fix unit test
* add amharic char range
* changes from review
* amharic and tigrinya share same unicode block
* get rid of _amharic/_tigrinya in char_classes
Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com>
2020-12-22 16:50:34 +01:00
Tim Gates
292c1d6a73
docs: fix simple typo, speficied -> specified ( #6611 )
...
There is a small typo in spacy/cli/info.py.
Should read `specified` rather than `speficied`.
2020-12-22 09:14:10 +01:00
Gareth Sparks
efc229c3f4
Doc.char_span arg: alignment_mode ( #6591 )
...
Currently labeled "mode", actually "alignment_mode"
2020-12-18 09:54:56 +01:00
Ines Montani
7c9a2f298c
Merge pull request #6578 from jenojp/master [ci skip]
2020-12-16 17:31:55 +11:00
Ines Montani
d8aa113d16
Merge pull request #6566 from rafguns/cite-zenodo [ci skip]
2020-12-16 16:40:50 +11:00
Ines Montani
4feef6bf9f
Update citation
2020-12-16 15:59:57 +11:00
Jeno Pizarro
a6fe35a0f9
Update universe.json
2020-12-15 21:53:20 -05:00
Jeno Pizarro
343a44abe9
Merge branch 'master' of https://github.com/explosion/spaCy
2020-12-15 21:49:46 -05:00
Thomas Bird
f6e4378942
Add SCA for @thomasbird ( #6576 )
2020-12-15 20:59:47 +01:00
Raf Guns
ec876c9713
Merge branch 'master' of https://github.com/explosion/spaCy into cite-zenodo
2020-12-14 22:03:58 +01:00
Raf Guns
db2a34d610
Update CITATION to Zenodo
2020-12-14 22:01:24 +01:00
Raf Guns
a90ca0e1fb
Add contributor agreement
2020-12-14 22:01:14 +01:00
Ines Montani
1d4b1dea25
Update contributing guide and issue template [ci skip]
2020-12-11 13:39:26 +11:00
Ines Montani
37c5d7e826
Merge pull request #6542 from adrianeboyd/chore/prepare-v2.3.5
...
Set version to v2.3.5
2020-12-11 10:33:18 +11:00
Ines Montani
fb43a30a71
Merge pull request #6545 from svlandeg/feature/discussions [ci skip]
2020-12-11 10:20:35 +11:00
Ines Montani
76cfd89dea
Update site.json
2020-12-11 10:19:42 +11:00
Ines Montani
c9b67b02f8
Update issue templates
2020-12-11 10:05:47 +11:00
Ines Montani
43a69eecb7
Update site.json
2020-12-11 10:05:21 +11:00
Ines Montani
73896fcbc8
Update README.md
2020-12-11 10:05:19 +11:00
Ines Montani
25186fa431
Merge pull request #6543 from adrianeboyd/docs/install-v2
...
Docs and extras updates for v2.3.5
2020-12-11 09:53:53 +11:00
svlandeg
4afcd9567e
refer to GH discussions
2020-12-10 20:56:12 +01:00
svlandeg
d156b423ae
remove gitter and reddit links
2020-12-10 20:41:02 +01:00
svlandeg
5afa567767
replace gitter with discussions in 101
2020-12-10 20:17:36 +01:00
svlandeg
ae1ccf2b04
update link to discussion forum
2020-12-10 20:02:49 +01:00
svlandeg
52cdb12d26
add GH discussions to readme
2020-12-10 19:58:43 +01:00
Adriane Boyd
27bb75e2a0
Docs and extras updates for v2.3.5
...
* Update install instructions for updated packages
* Add `cuda110` and `cuda111` extras, remove upper `cupy` pins (only
compatible with `thinc>=7.4.4`)
2020-12-10 15:34:34 +01:00
Adriane Boyd
7b277661f6
Set version to v2.3.5
2020-12-10 13:32:10 +01:00
Koichi Yasuoka
0afb54ac93
JapaneseTokenizer.pipe added ( #6515 )
...
* JapaneseTokenizer.pipe added
For [spacymoji](https://spacy.io/universe/project/spacymoji ) with `Japanese()`.
* DummyTokenizer.pipe added instead
2020-12-08 20:02:23 +01:00
Adriane Boyd
df4891bed1
Remove blis python version constraints ( #6522 )
...
* Remove blis version constraints
After updating the blis sdist in v0.7.4, remove python version
constraints for blis build and install dependencies.
* Install sdist with --prefer-binary for python 3.5
* Fix duplicate sdist install steps
* Fix sdist install step types
* Fix blis pins in requirements.txt
* Remove wheel hack for python 3.5 from CI
2020-12-08 15:25:19 +01:00
Ines Montani
4e77349106
Merge pull request #6524 from adrianeboyd/bugfix/entity-ruler-subsequent
...
Fix subsequent pipe detection in EntityRuler
2020-12-08 22:17:28 +11:00
Adriane Boyd
6c221d4841
Fix subsequent pipe detection in EntityRuler
...
Fix subsequent pipe detection to detect the position of the current
object by comparing the component itself rather than from the factory
name.
2020-12-08 10:01:30 +01:00
Ines Montani
b87793a89a
Merge pull request #6523 from adrianeboyd/bugfix/remove-use-chars
...
Remove non-working --use-chars from train CLI
2020-12-08 09:30:48 +01:00