Ines Montani
8d293a4c4b
Update website to support legacy state [ci skip]
2021-01-30 18:27:31 +11:00
Ines Montani
8ddf53f8e1
Merge pull request #6857 from tupui/patch-1
2021-01-30 12:07:05 +11:00
Pamphile ROY
e496b8623f
SCA tupui
2021-01-29 15:46:53 +01:00
Pamphile ROY
41ee75ac6d
Remove --no-cache-dir when downloading models
...
When `--no-cache-dir` is present, it prevents caching to properly function.
If the user still wants to do this, there is the possibility to pass options with `user_pip_args`.
But you should not enforce options like these. In my case this is preventing some docker build (using buildkit caching) to have proper caching of models.
2021-01-29 15:37:44 +01:00
Adriane Boyd
4096a79de7
Add alignment mode error and fix Doc.char_span docs ( #6820 )
...
* Raise an error on an unrecognized alignment mode rather than
defaulting to `strict`
* Fix the `Doc.char_span` API doc alignment mode details
2021-01-27 23:40:42 +11:00
Ines Montani
d5ef245bb1
Merge pull request #6822 from jganseman/master [ci skip]
2021-01-27 13:04:30 +11:00
Ines Montani
560b7acece
Merge pull request #6802 from jumasheff/add-ky
2021-01-27 13:02:54 +11:00
jganseman
907bce7a78
Merge pull request #1 from jganseman/patch-1
...
Patch 1
2021-01-26 11:12:30 +01:00
jganseman
8bc57ec372
also update is_oov in lexeme docs
2021-01-26 11:09:16 +01:00
jganseman
c9103d60fa
Create jganseman.md
2021-01-26 11:02:31 +01:00
jganseman
1f2b0ec168
proposing a more concise explanation for is_oov
...
proposing a more concise explanation for is_oov
2021-01-26 10:53:39 +01:00
muratjumashev
2b19ebad59
Remove Kyrgyz chars fr. char_classes since Tatar ones already cover
2021-01-25 00:46:45 +06:00
muratjumashev
7d0154a36e
Added language meta data
2021-01-25 00:42:19 +06:00
muratjumashev
79327197d1
Add contributor agreement
2021-01-25 00:34:12 +06:00
muratjumashev
87168eb81f
Add tests
2021-01-24 20:56:16 +06:00
muratjumashev
53abf759ad
Fix punctuation
2021-01-24 20:54:22 +06:00
muratjumashev
2a2646362b
Fix language subclass
2021-01-23 22:00:50 +06:00
muratjumashev
fe3b5b8ff5
Add kyrgyz to char_classes
2021-01-23 21:53:41 +06:00
muratjumashev
e30bbf5432
Add examples
2021-01-23 21:49:08 +06:00
muratjumashev
2f385385a9
Remove comment
2021-01-23 21:36:28 +06:00
muratjumashev
d53724ba1d
Add lex_attrs
2021-01-23 21:35:25 +06:00
muratjumashev
4418ec2eee
Add punctuation
2021-01-23 21:31:31 +06:00
muratjumashev
101d265778
Add stopwords
2021-01-23 21:25:28 +06:00
muratjumashev
28d06ab860
Add tokenizer_exceptions
2021-01-22 23:08:41 +06:00
Sofie Van Landeghem
5ace559201
ensure span.text works for an empty span ( #6772 )
2021-01-21 23:18:46 +08:00
Sofie Van Landeghem
fdf8c77630
support IS_SENT_START in PhraseMatcher ( #6771 )
...
* support IS_SENT_START in PhraseMatcher
* add unit test and friendlier error
* use IDS.get instead
2021-01-21 09:59:17 +01:00
Adriane Boyd
bc7d83d4be
Skip 0-length matches ( #6759 )
...
Add hack to prevent matcher from returning 0-length matches.
2021-01-19 07:38:11 +08:00
Santiago Castro
28256522c8
Fix spacy.util.minibatch
when the size iterator is finished ( #6745 )
2021-01-17 19:48:43 +08:00
Adriane Boyd
e649242927
Prevent overlapping noun chunks for Spanish ( #6712 )
...
* Prevent overlapping noun chunks in Spanish noun chunk iterator
* Clean up similar code in Danish noun chunk iterator
2021-01-14 17:33:31 +11:00
Adriane Boyd
9957ed7897
Override language defaults for null token and URL match ( #6705 )
...
* Override language defaults for null token and URL match
When the serialized `token_match` or `url_match` is `None`, override the
language defaults to preserve `None` on deserialization.
* Fix fixtures in tests
2021-01-14 17:31:29 +11:00
Ines Montani
29c3ca7e34
Fix SVG integration [ci skip]
2021-01-14 13:33:41 +11:00
Antonio Miras
b4bd8f347a
spaCy Universe: New project; SpacyDotNet ( #6702 )
...
* Universe: SpacyDotNet a .NET Core spaCy wrapper
* Signed contributor agreement
Co-authored-by: Antonio Miras <antonio@amiras.net>
2021-01-13 12:47:30 +11:00
Alex Combessie
9cc880014c
Remove questionable French stopwords ( #6310 )
...
* Remove questionable French stopwords
* Create alexcombessie.md
2021-01-08 11:36:22 +11:00
Cristiana S Parada
7a0222f260
Update stop_words.py in Portuguese (a,o,e) ( #6345 )
...
* Update stop_words.py
Added three aditional stopwords: "a" and "o" that means "the", and "e" that means "and"
* Create cristianasp.md
* zero edit to push CI
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-01-08 11:35:38 +11:00
Lorena Ciutacu
f11002f1f1
add new Romanian stopwords ( #6621 )
...
* add contributor agreement
* update ro stopwords list
* add new stopwords
2021-01-08 11:34:47 +11:00
ophelielacroix
e3222fdec9
Add (noun chunks) syntax iterators for Danish ( #6246 )
...
* add syntax iterators for danish
* add test noun chunks for danish syntax iterators
* add contributor agreement
* update da syntax iterators to remove nested chunks
* add tests for da noun chunks
* Fix test
* add missing import
* fix example
* Prevent overlapping noun chunks
Prevent overlapping noun chunks by tracking the end index of the
previous noun chunk span.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-01-07 16:33:00 +11:00
Sofie Van Landeghem
6f7e7d88b9
remove cause without apostrophe from norm exceptions ( #6636 )
2021-01-06 12:30:30 +08:00
Sofie Van Landeghem
87562e470d
fix backticks in docs ( #6635 )
2020-12-27 22:12:37 +01:00
Sofie Van Landeghem
8df5b7f513
fix documentation of 'path' in tokenizer.to_disk ( #6634 )
2020-12-27 22:01:06 +01:00
Yosi
cf52510631
Add Amharic አማርኛ Language support ( #6583 )
...
* Add Amharic to space
* clean up
* Add some PRON_LEMMA
* add Tigrinya support
* remove text_noun_chunks
* Tigrinya Support
* added some more details for ti
* fix unit test
* add amharic char range
* changes from review
* amharic and tigrinya share same unicode block
* get rid of _amharic/_tigrinya in char_classes
Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com>
2020-12-22 16:50:34 +01:00
Tim Gates
292c1d6a73
docs: fix simple typo, speficied -> specified ( #6611 )
...
There is a small typo in spacy/cli/info.py.
Should read `specified` rather than `speficied`.
2020-12-22 09:14:10 +01:00
Gareth Sparks
efc229c3f4
Doc.char_span arg: alignment_mode ( #6591 )
...
Currently labeled "mode", actually "alignment_mode"
2020-12-18 09:54:56 +01:00
Ines Montani
7c9a2f298c
Merge pull request #6578 from jenojp/master [ci skip]
2020-12-16 17:31:55 +11:00
Ines Montani
d8aa113d16
Merge pull request #6566 from rafguns/cite-zenodo [ci skip]
2020-12-16 16:40:50 +11:00
Ines Montani
4feef6bf9f
Update citation
2020-12-16 15:59:57 +11:00
Jeno Pizarro
a6fe35a0f9
Update universe.json
2020-12-15 21:53:20 -05:00
Jeno Pizarro
343a44abe9
Merge branch 'master' of https://github.com/explosion/spaCy
2020-12-15 21:49:46 -05:00
Thomas Bird
f6e4378942
Add SCA for @thomasbird ( #6576 )
2020-12-15 20:59:47 +01:00
Raf Guns
ec876c9713
Merge branch 'master' of https://github.com/explosion/spaCy into cite-zenodo
2020-12-14 22:03:58 +01:00
Raf Guns
db2a34d610
Update CITATION to Zenodo
2020-12-14 22:01:24 +01:00