Commit Graph

11737 Commits

Author SHA1 Message Date
muratjumashev
fe3b5b8ff5 Add kyrgyz to char_classes 2021-01-23 21:53:41 +06:00
muratjumashev
e30bbf5432 Add examples 2021-01-23 21:49:08 +06:00
muratjumashev
2f385385a9 Remove comment 2021-01-23 21:36:28 +06:00
muratjumashev
d53724ba1d Add lex_attrs 2021-01-23 21:35:25 +06:00
muratjumashev
4418ec2eee Add punctuation 2021-01-23 21:31:31 +06:00
muratjumashev
101d265778 Add stopwords 2021-01-23 21:25:28 +06:00
muratjumashev
28d06ab860 Add tokenizer_exceptions 2021-01-22 23:08:41 +06:00
Sofie Van Landeghem
5ace559201
ensure span.text works for an empty span (#6772) 2021-01-21 23:18:46 +08:00
Sofie Van Landeghem
fdf8c77630
support IS_SENT_START in PhraseMatcher (#6771)
* support IS_SENT_START in PhraseMatcher

* add unit test and friendlier error

* use IDS.get instead
2021-01-21 09:59:17 +01:00
Adriane Boyd
bc7d83d4be
Skip 0-length matches (#6759)
Add hack to prevent matcher from returning 0-length matches.
2021-01-19 07:38:11 +08:00
Santiago Castro
28256522c8
Fix spacy.util.minibatch when the size iterator is finished (#6745) 2021-01-17 19:48:43 +08:00
Adriane Boyd
e649242927
Prevent overlapping noun chunks for Spanish (#6712)
* Prevent overlapping noun chunks in Spanish noun chunk iterator
* Clean up similar code in Danish noun chunk iterator
2021-01-14 17:33:31 +11:00
Adriane Boyd
9957ed7897
Override language defaults for null token and URL match (#6705)
* Override language defaults for null token and URL match

When the serialized `token_match` or `url_match` is `None`, override the
language defaults to preserve `None` on deserialization.

* Fix fixtures in tests
2021-01-14 17:31:29 +11:00
Ines Montani
29c3ca7e34 Fix SVG integration [ci skip] 2021-01-14 13:33:41 +11:00
Antonio Miras
b4bd8f347a
spaCy Universe: New project; SpacyDotNet (#6702)
* Universe: SpacyDotNet a .NET Core spaCy wrapper

* Signed contributor agreement

Co-authored-by: Antonio Miras <antonio@amiras.net>
2021-01-13 12:47:30 +11:00
Alex Combessie
9cc880014c
Remove questionable French stopwords (#6310)
* Remove questionable French stopwords

* Create alexcombessie.md
2021-01-08 11:36:22 +11:00
Cristiana S Parada
7a0222f260
Update stop_words.py in Portuguese (a,o,e) (#6345)
* Update stop_words.py

Added three aditional stopwords: "a" and "o" that means "the", and "e" that means "and"

* Create cristianasp.md

* zero edit to push CI

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-01-08 11:35:38 +11:00
Lorena Ciutacu
f11002f1f1
add new Romanian stopwords (#6621)
* add contributor agreement

* update ro stopwords list

* add new stopwords
2021-01-08 11:34:47 +11:00
ophelielacroix
e3222fdec9
Add (noun chunks) syntax iterators for Danish (#6246)
* add syntax iterators for danish

* add test noun chunks for danish syntax iterators

* add contributor agreement

* update da syntax iterators to remove nested chunks

* add tests for da noun chunks

* Fix test

* add missing import
* fix example

* Prevent overlapping noun chunks

Prevent overlapping noun chunks by tracking the end index of the
previous noun chunk span.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-01-07 16:33:00 +11:00
Sofie Van Landeghem
6f7e7d88b9
remove cause without apostrophe from norm exceptions (#6636) 2021-01-06 12:30:30 +08:00
Sofie Van Landeghem
87562e470d
fix backticks in docs (#6635) 2020-12-27 22:12:37 +01:00
Sofie Van Landeghem
8df5b7f513
fix documentation of 'path' in tokenizer.to_disk (#6634) 2020-12-27 22:01:06 +01:00
Yosi
cf52510631
Add Amharic አማርኛ Language support (#6583)
* Add Amharic to space

* clean up

* Add some PRON_LEMMA

* add Tigrinya support

* remove text_noun_chunks

* Tigrinya Support

* added some more details for ti

* fix unit test

* add amharic char range

* changes from review

* amharic and tigrinya share same unicode block

* get rid of _amharic/_tigrinya in char_classes

Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com>
2020-12-22 16:50:34 +01:00
Tim Gates
292c1d6a73
docs: fix simple typo, speficied -> specified (#6611)
There is a small typo in spacy/cli/info.py.

Should read `specified` rather than `speficied`.
2020-12-22 09:14:10 +01:00
Gareth Sparks
efc229c3f4
Doc.char_span arg: alignment_mode (#6591)
Currently labeled "mode", actually "alignment_mode"
2020-12-18 09:54:56 +01:00
Ines Montani
7c9a2f298c
Merge pull request #6578 from jenojp/master [ci skip] 2020-12-16 17:31:55 +11:00
Ines Montani
d8aa113d16
Merge pull request #6566 from rafguns/cite-zenodo [ci skip] 2020-12-16 16:40:50 +11:00
Ines Montani
4feef6bf9f
Update citation 2020-12-16 15:59:57 +11:00
Jeno Pizarro
a6fe35a0f9
Update universe.json 2020-12-15 21:53:20 -05:00
Jeno Pizarro
343a44abe9 Merge branch 'master' of https://github.com/explosion/spaCy 2020-12-15 21:49:46 -05:00
Thomas Bird
f6e4378942
Add SCA for @thomasbird (#6576) 2020-12-15 20:59:47 +01:00
Raf Guns
ec876c9713 Merge branch 'master' of https://github.com/explosion/spaCy into cite-zenodo 2020-12-14 22:03:58 +01:00
Raf Guns
db2a34d610 Update CITATION to Zenodo 2020-12-14 22:01:24 +01:00
Raf Guns
a90ca0e1fb Add contributor agreement 2020-12-14 22:01:14 +01:00
Ines Montani
1d4b1dea25 Update contributing guide and issue template [ci skip] 2020-12-11 13:39:26 +11:00
Ines Montani
37c5d7e826
Merge pull request #6542 from adrianeboyd/chore/prepare-v2.3.5
Set version to v2.3.5
2020-12-11 10:33:18 +11:00
Ines Montani
fb43a30a71
Merge pull request #6545 from svlandeg/feature/discussions [ci skip] 2020-12-11 10:20:35 +11:00
Ines Montani
76cfd89dea Update site.json 2020-12-11 10:19:42 +11:00
Ines Montani
c9b67b02f8 Update issue templates 2020-12-11 10:05:47 +11:00
Ines Montani
43a69eecb7 Update site.json 2020-12-11 10:05:21 +11:00
Ines Montani
73896fcbc8 Update README.md 2020-12-11 10:05:19 +11:00
Ines Montani
25186fa431
Merge pull request #6543 from adrianeboyd/docs/install-v2
Docs and extras updates for v2.3.5
2020-12-11 09:53:53 +11:00
svlandeg
4afcd9567e refer to GH discussions 2020-12-10 20:56:12 +01:00
svlandeg
d156b423ae remove gitter and reddit links 2020-12-10 20:41:02 +01:00
svlandeg
5afa567767 replace gitter with discussions in 101 2020-12-10 20:17:36 +01:00
svlandeg
ae1ccf2b04 update link to discussion forum 2020-12-10 20:02:49 +01:00
svlandeg
52cdb12d26 add GH discussions to readme 2020-12-10 19:58:43 +01:00
Adriane Boyd
27bb75e2a0 Docs and extras updates for v2.3.5
* Update install instructions for updated packages

* Add `cuda110` and `cuda111` extras, remove upper `cupy` pins (only
compatible with `thinc>=7.4.4`)
2020-12-10 15:34:34 +01:00
Adriane Boyd
7b277661f6 Set version to v2.3.5 2020-12-10 13:32:10 +01:00
Koichi Yasuoka
0afb54ac93
JapaneseTokenizer.pipe added (#6515)
* JapaneseTokenizer.pipe added

For [spacymoji](https://spacy.io/universe/project/spacymoji)  with `Japanese()`.

* DummyTokenizer.pipe added instead
2020-12-08 20:02:23 +01:00