Commit Graph

11648 Commits

Author SHA1 Message Date
Adriane Boyd
31de700b0f
Fix on_match callback and remove empty patterns (#6312)
For the `DependencyMatcher`:

* Fix on_match callback so that it is called once per matched pattern
* Fix results so that patterns with empty match lists are not returned
2020-11-05 09:16:26 +01:00
Adriane Boyd
45c9a68828
Identify final Matcher pattern node by quantifier (#6317)
Modify the internal pattern representation in `Matcher` patterns to
identify the final ID state using a unique quantifier rather than a
combination of other attributes.

It was insufficient to identify the final ID node based on an
uninitialized `quantifier` (coincidentally being the same as the `ZERO`)
with `nr_attr` as 0. (In addition, it was potentially bug-prone that
`nr_attr` was set to 0 even though attrs were allocated.)

In the case of `{"OP": "!"}` (a valid, if pointless, pattern), `nr_attr`
is 0 and the quantifier is ZERO, so the previous methods for
incrementing to the ID node at the end of the pattern weren't able to
distinguish the final ID node from the `{"OP": "!"}` pattern.
2020-10-31 12:18:48 +01:00
Duygu Altinok
0e55f806dd
Turkish tokenization improvements (#6268)
* added single and paired orth variants

* added token match

* added long text tokenization test

* inverted init

* normalized lemmas to lowercase

* more abbrevs

* tests for ordinals and abbrevs

* separated period abbvrevs to another list

* fiex typo

* added ordinal and abbrev tests

* added number tests for dates

* minor refinement

* added inflected abbrevs regex

* added percentage and inflection

* cosmetics

* added token match

* added url inflection tests

* excluded url tokens from custom pattern

* removed url match import
2020-10-29 09:43:17 +01:00
Adriane Boyd
8cc5ed6771 Add Macedonian to website languages 2020-10-29 08:49:56 +01:00
Ines Montani
1e4d7e059f Revert "Test FUNDING.yml [ci skip]"
This reverts commit 287be48ad0.
2020-10-28 17:42:23 +01:00
Ines Montani
287be48ad0 Test FUNDING.yml [ci skip] 2020-10-28 17:36:25 +01:00
Adriane Boyd
4dd86306e9
Add Nepali to supported languages on website (#6315) 2020-10-28 16:32:07 +01:00
Robert Šípek
260c29794a
Fill contributor agreement by robertsipek (#6285)
* Fill contributor agreement by robertsipek

* Fill contributor agreement by robertsipek
2020-10-22 22:13:17 +02:00
Kunal Sharma
01aec7a313
Adding MindMeld to Universe JSON (#6275)
* Adding Mindmeld to Universe JSON

Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/

* Signing contribution agreement.

Co-authored-by: kunshar2 <kunshar2@cisco.com>
2020-10-21 18:42:11 +02:00
Ines Montani
d7a4e8454b
Merge pull request #6274 from walterhenry/master
User contributor agreement
2020-10-19 16:30:58 +02:00
walterhenry
ff82644746 User contributor agreement
Here it is!
2020-10-19 16:25:09 +02:00
Ines Montani
3851300e80 Update landing [ci skip] 2020-10-16 11:46:33 +02:00
Borijan Georgievski
2311192ba1
Include Macedonian language (#6230)
* Include Macedonian language

* Fix indentation at char_classes.py

* Fix indentation at char_classes.py

* Add Macedonian tests, update lex_attrs and char_classes

* Import unicode literals for python 2
2020-10-15 15:55:01 +02:00
Ines Montani
bc027dc35c Update .gitignore [ci skip] 2020-10-15 12:43:35 +02:00
Ines Montani
a3b84c7656 Update netlify.toml [ci skip] 2020-10-15 12:42:30 +02:00
Ines Montani
07a976b036
Merge pull request #6221 from baranitharan2020/master 2020-10-13 11:03:49 +02:00
Ines Montani
7f92a5ee6a
Update spacy/lang/ta/examples.py 2020-10-13 11:03:35 +02:00
Baranitharan
d6037c1860
added sentence 2020-10-08 08:22:58 +05:30
Baranitharan
169857e0ec
Merge pull request #1 from baranitharan2020/baranitharan2020-patch-1
Update examples.py
2020-10-08 08:17:57 +05:30
Baranitharan
81afe9b19d
Update examples.py 2020-10-08 08:17:25 +05:30
Sofie Van Landeghem
241cd112f5
add reenabled pipe names back to the meta before serializing (#6219) 2020-10-08 00:44:16 +02:00
Sofie Van Landeghem
2998131416
Reproducibility for TextCat and Tok2Vec (#6218)
* ensure fixed seed in HashEmbed layers

* forgot about the joys of python 2
2020-10-08 00:43:46 +02:00
Wannaphong Phatthiyaphaibun
9fc8392b38
Add Thai tag map (LST20 Corpus) (#6163)
* Add Thai tag map (LST20 Corpus)

By @korakot

* Update tag_map.py

* Update tag_map.py

* Update tag_map.py
2020-10-07 11:12:01 +02:00
Duygu Altinok
7e821c2776
Turkish language syntax iterators (#6191)
* added tr_vocab to config

* basic test

* added syntax iterator to Turkish lang class

* first version for Turkish syntax iter, without flat

* added simple tests with nmod, amod, det

* more tests to amod and nmod

* separated noun chunks and parser test

* rearrangement after nchunk parser separation

* added recursive NPs

* tests with complicated recursive NPs

* tests with conjed NPs

* additional tests for conj NP

* small modification for shaving off conj from NP

* added tests with flat

* more tests with flat

* added examples with flats conjed

* added inner func for flat trick

* corrected parse

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-10-07 11:07:52 +02:00
Duygu Altinok
2ce6fc2611
Turkish tag map and morph rules addition (#6141)
* feat: added turkish tag map

* feat: morph rules cconj and sconj

* feat: more conjuncts

* feat: added popular postpositions

* feat: added adverbs

* feat: added personal pronouns

* feat: added reflexive pronouns

* minor: corrected case capital

* minor: fixed comma typo

* feat: added indef pronouns

* feat: added dict iter

* fixed comma typo

* updated language class with tag map and morph

* use default tag map instead

* removed tag map
2020-10-07 10:27:36 +02:00
Duygu Altinok
b95a11dd95
Ordinal numbers for Turkish (#6142)
* minor ordinal number addition

* fixed typo

* added corresponding lexical test
2020-10-07 10:25:37 +02:00
Rahul Gupta
1a00bff06d
Hindi: Adds tests for lexical attributes (norm and like_num) (#5829)
* Hindi: Adds tests for lexical attributes (norm and like_num)

* Signs and sdds the contributor agreement

* Add ordinal numbers to be tagged as like_num

* Adds alternate pronunciation for 31 and 39
2020-10-07 10:23:32 +02:00
Nuccy90
c809b2c8e7
Update morph_rules.py (#6102)
* Update morph_rules.py

Added "dig" and "dej" ("you" in accusative form)

* Create Nuccy90.md

* Update Nuccy90.md
2020-10-06 15:14:47 +02:00
delzac
15ea401b39
Reflect on usage doc that IS_SENT_START attribute exist (#6114)
* Reflect on usage doc that IS_SENT_START attribute exist

* Create delzac.md
2020-10-06 15:11:01 +02:00
Šarūnas Navickas
047fb9f8b8
Website (Universe): An entry for rita-dsl (#6138)
* Create zaibacu.md

* Add RITA-DSL entry

* Update agreement

* Fix formatting
2020-10-06 11:19:36 +02:00
Florijan Stamenković
9db670b996
Fix Issue 6207 (#6208)
* Regression test for issue 6207

* Fix issue 6207

* Sign contributor agreement

* Minor adjustments to test

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-10-06 11:17:37 +02:00
Stanislav Schmidt
3589a64d44
Change type of texts argument in pipe to iterable (#6186)
* Change type of texts argument in pipe to iterable

* Add contributor agreement
2020-10-02 21:00:11 +02:00
Yohei Tamura
3243ddac8f
Fix/span.sent (#6083)
* add fail test

* fix test

* fix span.sent

* Remove incorrect implicit check

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-10-01 14:01:52 +02:00
Elijah Rippeth
4cbb954281
reorder so tagmap is replaced only if a custom file is provided. (#6164)
* reorder so tagmap is replaced only if a custom file is provided.

* Remove unneeded variable initialization

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-09-30 13:26:06 +02:00
Ines Montani
27c5795ea5 Fix version check in models directory [ci skip] 2020-09-25 09:23:29 +02:00
Muhammad Fahmi Rasyid
7489d02dea
Update Indonesian Example Phrases (#6124)
* create contributor agreement

* Update Indonesian example. (see  #1107)

Update Indonesian examples with more proper phrases. the current phrases contains sensitive and violent words.
2020-09-23 14:02:26 +02:00
Adriane Boyd
e4acb28658
Fix norm in retokenizer split (#6111)
Parallel to behavior in merge, reset norm on original token in
retokenizer split.
2020-09-22 21:53:33 +02:00
Adriane Boyd
9b4979407d
Fix overlapping German noun chunks (#6112)
Add a similar fix as in #5470 to prevent the German noun chunks iterator
from producing overlapping spans.
2020-09-22 21:52:42 +02:00
Adriane Boyd
4625029370
Add pin for pyrsistent<0.17.0 (#6116)
Add pin for pyrsistent<0.17.0 since pyrsistent>=0.17.1 is only
compatible with python3.5+.
2020-09-22 19:04:49 +02:00
Marek Grzenkowicz
a26f864ed3
Clarify how to choose pretrained weights files (closes #6027) [ci skip] (#6039) 2020-09-08 21:13:50 +02:00
Ines Montani
33d9c64977 Fix outbound link and update package lock [ci skip] 2020-09-04 14:44:38 +02:00
Ines Montani
ba6cf9821f Replace docs analytics [ci skip] 2020-09-04 14:28:28 +02:00
holubvl3
0a27fca557
Create examples.py (#5985)
* Create examples.py

* Create tag_map.py

* Delete tag_map.py

* Update examples.py

formatting: add empty line

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-09-04 11:00:14 +02:00
Brad Jascob
2160aafec6
Updates spaCy Universe for amrlib (#6020)
* Updates spaCy Universe for amrlib

* Updates to doc based on feedback
2020-09-04 10:03:35 +02:00
Marek Grzenkowicz
92d7832a86
Fix off-by-one error for best iteration calculation (closes #6014) (#6016) 2020-09-02 15:15:45 +02:00
Sofie Van Landeghem
f7a25d69f7
Bugfix in merge_entities (#6005)
* failing test

* bugfix
2020-09-01 21:57:52 +02:00
Juan Gutiérrez
9002bea29f
Update suffixes example (#5989)
* Update suffixes example

The current example will throw `TypeError: can only concatenate list (not "tuple") to list`

* Signing Contributor Agreement
2020-08-31 12:44:56 +02:00
Adriane Boyd
caf23462eb
Add 3rd party licenses (#5959) 2020-08-26 15:23:59 +02:00
Adriane Boyd
7d7b65ffd4
Fix raw strings in URL pattern (#5972)
Add missing raw string specifiers.
2020-08-26 04:00:49 +02:00
Hiroshi Matsuda
332803eda9
fix ja leading spaces (#5969)
* change condition for space after

* add NAUGHTY_STRINGS test example
2020-08-25 14:16:24 +02:00