Commit Graph

11732 Commits

Author SHA1 Message Date
muratjumashev
101d265778 Add stopwords 2021-01-23 21:25:28 +06:00
muratjumashev
28d06ab860 Add tokenizer_exceptions 2021-01-22 23:08:41 +06:00
Sofie Van Landeghem
5ace559201
ensure span.text works for an empty span (#6772) 2021-01-21 23:18:46 +08:00
Sofie Van Landeghem
fdf8c77630
support IS_SENT_START in PhraseMatcher (#6771)
* support IS_SENT_START in PhraseMatcher

* add unit test and friendlier error

* use IDS.get instead
2021-01-21 09:59:17 +01:00
Adriane Boyd
bc7d83d4be
Skip 0-length matches (#6759)
Add hack to prevent matcher from returning 0-length matches.
2021-01-19 07:38:11 +08:00
Santiago Castro
28256522c8
Fix spacy.util.minibatch when the size iterator is finished (#6745) 2021-01-17 19:48:43 +08:00
Adriane Boyd
e649242927
Prevent overlapping noun chunks for Spanish (#6712)
* Prevent overlapping noun chunks in Spanish noun chunk iterator
* Clean up similar code in Danish noun chunk iterator
2021-01-14 17:33:31 +11:00
Adriane Boyd
9957ed7897
Override language defaults for null token and URL match (#6705)
* Override language defaults for null token and URL match

When the serialized `token_match` or `url_match` is `None`, override the
language defaults to preserve `None` on deserialization.

* Fix fixtures in tests
2021-01-14 17:31:29 +11:00
Ines Montani
29c3ca7e34 Fix SVG integration [ci skip] 2021-01-14 13:33:41 +11:00
Antonio Miras
b4bd8f347a
spaCy Universe: New project; SpacyDotNet (#6702)
* Universe: SpacyDotNet a .NET Core spaCy wrapper

* Signed contributor agreement

Co-authored-by: Antonio Miras <antonio@amiras.net>
2021-01-13 12:47:30 +11:00
Alex Combessie
9cc880014c
Remove questionable French stopwords (#6310)
* Remove questionable French stopwords

* Create alexcombessie.md
2021-01-08 11:36:22 +11:00
Cristiana S Parada
7a0222f260
Update stop_words.py in Portuguese (a,o,e) (#6345)
* Update stop_words.py

Added three aditional stopwords: "a" and "o" that means "the", and "e" that means "and"

* Create cristianasp.md

* zero edit to push CI

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-01-08 11:35:38 +11:00
Lorena Ciutacu
f11002f1f1
add new Romanian stopwords (#6621)
* add contributor agreement

* update ro stopwords list

* add new stopwords
2021-01-08 11:34:47 +11:00
ophelielacroix
e3222fdec9
Add (noun chunks) syntax iterators for Danish (#6246)
* add syntax iterators for danish

* add test noun chunks for danish syntax iterators

* add contributor agreement

* update da syntax iterators to remove nested chunks

* add tests for da noun chunks

* Fix test

* add missing import
* fix example

* Prevent overlapping noun chunks

Prevent overlapping noun chunks by tracking the end index of the
previous noun chunk span.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-01-07 16:33:00 +11:00
Sofie Van Landeghem
6f7e7d88b9
remove cause without apostrophe from norm exceptions (#6636) 2021-01-06 12:30:30 +08:00
Sofie Van Landeghem
87562e470d
fix backticks in docs (#6635) 2020-12-27 22:12:37 +01:00
Sofie Van Landeghem
8df5b7f513
fix documentation of 'path' in tokenizer.to_disk (#6634) 2020-12-27 22:01:06 +01:00
Yosi
cf52510631
Add Amharic አማርኛ Language support (#6583)
* Add Amharic to space

* clean up

* Add some PRON_LEMMA

* add Tigrinya support

* remove text_noun_chunks

* Tigrinya Support

* added some more details for ti

* fix unit test

* add amharic char range

* changes from review

* amharic and tigrinya share same unicode block

* get rid of _amharic/_tigrinya in char_classes

Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com>
2020-12-22 16:50:34 +01:00
Tim Gates
292c1d6a73
docs: fix simple typo, speficied -> specified (#6611)
There is a small typo in spacy/cli/info.py.

Should read `specified` rather than `speficied`.
2020-12-22 09:14:10 +01:00
Gareth Sparks
efc229c3f4
Doc.char_span arg: alignment_mode (#6591)
Currently labeled "mode", actually "alignment_mode"
2020-12-18 09:54:56 +01:00
Ines Montani
7c9a2f298c
Merge pull request #6578 from jenojp/master [ci skip] 2020-12-16 17:31:55 +11:00
Ines Montani
d8aa113d16
Merge pull request #6566 from rafguns/cite-zenodo [ci skip] 2020-12-16 16:40:50 +11:00
Ines Montani
4feef6bf9f
Update citation 2020-12-16 15:59:57 +11:00
Jeno Pizarro
a6fe35a0f9
Update universe.json 2020-12-15 21:53:20 -05:00
Jeno Pizarro
343a44abe9 Merge branch 'master' of https://github.com/explosion/spaCy 2020-12-15 21:49:46 -05:00
Thomas Bird
f6e4378942
Add SCA for @thomasbird (#6576) 2020-12-15 20:59:47 +01:00
Raf Guns
ec876c9713 Merge branch 'master' of https://github.com/explosion/spaCy into cite-zenodo 2020-12-14 22:03:58 +01:00
Raf Guns
db2a34d610 Update CITATION to Zenodo 2020-12-14 22:01:24 +01:00
Raf Guns
a90ca0e1fb Add contributor agreement 2020-12-14 22:01:14 +01:00
Ines Montani
1d4b1dea25 Update contributing guide and issue template [ci skip] 2020-12-11 13:39:26 +11:00
Ines Montani
37c5d7e826
Merge pull request #6542 from adrianeboyd/chore/prepare-v2.3.5
Set version to v2.3.5
2020-12-11 10:33:18 +11:00
Ines Montani
fb43a30a71
Merge pull request #6545 from svlandeg/feature/discussions [ci skip] 2020-12-11 10:20:35 +11:00
Ines Montani
76cfd89dea Update site.json 2020-12-11 10:19:42 +11:00
Ines Montani
c9b67b02f8 Update issue templates 2020-12-11 10:05:47 +11:00
Ines Montani
43a69eecb7 Update site.json 2020-12-11 10:05:21 +11:00
Ines Montani
73896fcbc8 Update README.md 2020-12-11 10:05:19 +11:00
Ines Montani
25186fa431
Merge pull request #6543 from adrianeboyd/docs/install-v2
Docs and extras updates for v2.3.5
2020-12-11 09:53:53 +11:00
svlandeg
4afcd9567e refer to GH discussions 2020-12-10 20:56:12 +01:00
svlandeg
d156b423ae remove gitter and reddit links 2020-12-10 20:41:02 +01:00
svlandeg
5afa567767 replace gitter with discussions in 101 2020-12-10 20:17:36 +01:00
svlandeg
ae1ccf2b04 update link to discussion forum 2020-12-10 20:02:49 +01:00
svlandeg
52cdb12d26 add GH discussions to readme 2020-12-10 19:58:43 +01:00
Adriane Boyd
27bb75e2a0 Docs and extras updates for v2.3.5
* Update install instructions for updated packages

* Add `cuda110` and `cuda111` extras, remove upper `cupy` pins (only
compatible with `thinc>=7.4.4`)
2020-12-10 15:34:34 +01:00
Adriane Boyd
7b277661f6 Set version to v2.3.5 2020-12-10 13:32:10 +01:00
Koichi Yasuoka
0afb54ac93
JapaneseTokenizer.pipe added (#6515)
* JapaneseTokenizer.pipe added

For [spacymoji](https://spacy.io/universe/project/spacymoji)  with `Japanese()`.

* DummyTokenizer.pipe added instead
2020-12-08 20:02:23 +01:00
Adriane Boyd
df4891bed1
Remove blis python version constraints (#6522)
* Remove blis version constraints

After updating the blis sdist in v0.7.4, remove python version
constraints for blis build and install dependencies.

* Install sdist with --prefer-binary for python 3.5

* Fix duplicate sdist install steps

* Fix sdist install step types

* Fix blis pins in requirements.txt

* Remove wheel hack for python 3.5 from CI
2020-12-08 15:25:19 +01:00
Ines Montani
4e77349106
Merge pull request #6524 from adrianeboyd/bugfix/entity-ruler-subsequent
Fix subsequent pipe detection in EntityRuler
2020-12-08 22:17:28 +11:00
Adriane Boyd
6c221d4841 Fix subsequent pipe detection in EntityRuler
Fix subsequent pipe detection to detect the position of the current
object by comparing the component itself rather than from the factory
name.
2020-12-08 10:01:30 +01:00
Ines Montani
b87793a89a
Merge pull request #6523 from adrianeboyd/bugfix/remove-use-chars
Remove non-working --use-chars from train CLI
2020-12-08 09:30:48 +01:00
Adriane Boyd
5ceac425ee Remove non-working --use-chars from train CLI
Remove the non-working `--use-chars` option from the train CLI. The
implementation of the option across component types and the CLI settings
could be fixed, but the `CharacterEmbed` model does not work on GPU in
v2 so it's better to remove it.
2020-12-08 08:30:00 +01:00