Commit Graph

11911 Commits

Author SHA1 Message Date
Sofie Van Landeghem
2af31a8c8d
Bugfix textcat reproducibility on GPU (#6411)
* add seed argument to ParametricAttention layer

* bump thinc to 7.4.3

* set thinc version range

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-11-23 12:29:35 +01:00
Adriane Boyd
cdca44ac11
Dynamically include numpy headers (#6418)
* Dynamically include numpy headers

* Add `build-constraints.txt` with numpy version pins for building wheels with `pip` and `wheelwright`
* Update `setup.py` to add current numpy include directory
* Assume `cython` and `numpy` are installed for `setup.py`
* Remove included numpy headers

* Fix typo in requirements.txt

* Use script in CI
2020-11-23 11:15:11 +01:00
Adriane Boyd
3f61f5eb54
Use int8_t instead of char in Matcher (#6413)
* Use signed char instead of char in Matcher

Remove unused char* utf8_t typedef

* Use int8_t instead of signed char
2020-11-23 10:26:47 +01:00
Adriane Boyd
4284605683
Remove Beam cleanup (#6414)
Beam cleanup is handled through the Beam finalization method.
2020-11-23 10:01:46 +01:00
Adriane Boyd
a8c2dad466
Add all vectors to vocab before pruning (#6408)
Add all vectors to the vocab before pruning to correct the selection of
vectors to prioritize.
2020-11-23 10:00:59 +01:00
Adriane Boyd
13f0676f04
Updates for python 3.9 (#6338)
* Update blis and thinc version ranges

* Update thinc version range

* Update setup.cfg for python 3.9

* Adjust blis and thinc ranges
* Add python 3.9 classifier

* Update CI for python 3.9

* Add --prefer-binary to CI sdist install

* Update CI python 3.7 mac image

* Add --prefer-binary to Travis CI

* Update install instructions in README

* Specify blis versions separately for < / >= 3.6

* Update --prefer-binary in README

* Test cleaner sdist install

* Also upgrade pip

(This is kind of unnecessary given --prefer-binary but may avoid other
issues related to sdist installs in the future.)

* Compile with -j 2

* Remove wheel from setup_requires

* Update to have separate CI uninstall step

* Remove wheel from pyproject.toml

* Recommend upgrading setuptools in addition to pip
2020-11-23 09:45:18 +01:00
Yusuke Mori
ee84f8f4cb Avoid a SyntaxError in self-attentive-parser (#6428)
* Avoid a SyntaxError in self-attentive-parser

Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser

* Create forest1988.md

Fill in the spaCy contributor agreement
2020-11-22 22:02:03 +01:00
Yusuke Mori
e3ac90b035
Avoid a SyntaxError in self-attentive-parser (#6428)
* Avoid a SyntaxError in self-attentive-parser

Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser

* Create forest1988.md

Fill in the spaCy contributor agreement
2020-11-22 21:59:37 +01:00
M. Revuelta Espinosa
d940bc3f87 Update universe.json (include PatternOmatic) (#6399)
Request to include PatternOmatic in spaCy Universe

Adds @revuel to contributors
2020-11-19 13:25:48 +01:00
M. Revuelta Espinosa
51232ffb9e
Update universe.json (include PatternOmatic) (#6399)
Request to include PatternOmatic in spaCy Universe

Adds @revuel to contributors
2020-11-19 13:15:50 +01:00
Adriane Boyd
3cf6479467 Fix JSON in #6395 2020-11-17 15:25:41 +01:00
Adriane Boyd
c2eb0992ae Fix JSON in #6395 2020-11-17 15:24:38 +01:00
Sam Edwardes
c3d9550f30 Added spaCyTextBlob to universe.json (#6395) 2020-11-17 14:38:59 +01:00
Sam Edwardes
78913a4f95
Added spaCyTextBlob to universe.json (#6395) 2020-11-17 14:38:34 +01:00
Adriane Boyd
320a8b1481
Add ent_id_ to strings serialized with Doc (#6353) 2020-11-10 20:16:07 +08:00
Ines Montani
d490428089 Update README.md [ci skip] 2020-11-10 09:51:20 +08:00
Alec Chapman
8b919d77c1 add medspacy to universe and fix example w/ cov-bsv 2020-11-10 09:49:39 +08:00
Ines Montani
4d337eedf2
Merge pull request #6322 from medspacy/master 2020-11-10 02:47:29 +01:00
Adriane Boyd
90550552a0
CI updates for python 3.5 (#6354)
* Update pip in CI

* Use --prefer-binary

* Use `--prefer-binary`
* Delete all installed packages before testing source install

* sdist install with --only-binary :all:
2020-11-06 13:35:51 +01:00
Daniel Vasic
20d72de986
Added Multext-East V5 tagset for Croatian language (#6248)
* Added Multext-East V5 tagset for Croatian language

* Create danielvasic.md

* Update danielvasic.md

* Update danielvasic.md

* Add tag map to CroatianDefaults

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-11-05 12:19:22 +01:00
Robert Šípek
6069efe57d
Add tag map to cs language (#6284) 2020-11-05 10:13:11 +01:00
Adriane Boyd
e4c3d6748c Update TIGER link and tag description (#6344) 2020-11-05 09:33:45 +01:00
Adriane Boyd
8644ee3e3f
Update TIGER link and tag description (#6344) 2020-11-05 09:33:00 +01:00
Vu Ha
6d465ec52c
add oprd to the list of accepted deps for noun chunking (#6302)
* add oprd to the list of accepted deps for noun chunking

* add SCA
2020-11-05 09:17:35 +01:00
Adriane Boyd
31de700b0f
Fix on_match callback and remove empty patterns (#6312)
For the `DependencyMatcher`:

* Fix on_match callback so that it is called once per matched pattern
* Fix results so that patterns with empty match lists are not returned
2020-11-05 09:16:26 +01:00
Alec Chapman
204c7c8a00 fix thumbnail link to be github raw url 2020-11-01 07:53:48 -07:00
Adriane Boyd
45c9a68828
Identify final Matcher pattern node by quantifier (#6317)
Modify the internal pattern representation in `Matcher` patterns to
identify the final ID state using a unique quantifier rather than a
combination of other attributes.

It was insufficient to identify the final ID node based on an
uninitialized `quantifier` (coincidentally being the same as the `ZERO`)
with `nr_attr` as 0. (In addition, it was potentially bug-prone that
`nr_attr` was set to 0 even though attrs were allocated.)

In the case of `{"OP": "!"}` (a valid, if pointless, pattern), `nr_attr`
is 0 and the quantifier is ZERO, so the previous methods for
incrementing to the ID node at the end of the pattern weren't able to
distinguish the final ID node from the `{"OP": "!"}` pattern.
2020-10-31 12:18:48 +01:00
Alec Chapman
73d22d96ff add medspacy to universe and fix example w/ cov-bsv 2020-10-29 07:53:56 -06:00
Duygu Altinok
0e55f806dd
Turkish tokenization improvements (#6268)
* added single and paired orth variants

* added token match

* added long text tokenization test

* inverted init

* normalized lemmas to lowercase

* more abbrevs

* tests for ordinals and abbrevs

* separated period abbvrevs to another list

* fiex typo

* added ordinal and abbrev tests

* added number tests for dates

* minor refinement

* added inflected abbrevs regex

* added percentage and inflection

* cosmetics

* added token match

* added url inflection tests

* excluded url tokens from custom pattern

* removed url match import
2020-10-29 09:43:17 +01:00
Adriane Boyd
58a7461cff Add Macedonian to website languages 2020-10-29 08:51:26 +01:00
Adriane Boyd
94aa4c7410 Add Nepali to supported languages on website (#6315) 2020-10-29 08:51:15 +01:00
Kunal Sharma
1b8f1f6f1b Adding MindMeld to Universe JSON (#6275)
* Adding Mindmeld to Universe JSON

Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/

* Signing contribution agreement.

Co-authored-by: kunshar2 <kunshar2@cisco.com>
2020-10-29 08:50:59 +01:00
Adriane Boyd
8cc5ed6771 Add Macedonian to website languages 2020-10-29 08:49:56 +01:00
Ines Montani
1e4d7e059f Revert "Test FUNDING.yml [ci skip]"
This reverts commit 287be48ad0.
2020-10-28 17:42:23 +01:00
Ines Montani
287be48ad0 Test FUNDING.yml [ci skip] 2020-10-28 17:36:25 +01:00
Adriane Boyd
4dd86306e9
Add Nepali to supported languages on website (#6315) 2020-10-28 16:32:07 +01:00
Robert Šípek
260c29794a
Fill contributor agreement by robertsipek (#6285)
* Fill contributor agreement by robertsipek

* Fill contributor agreement by robertsipek
2020-10-22 22:13:17 +02:00
Kunal Sharma
01aec7a313
Adding MindMeld to Universe JSON (#6275)
* Adding Mindmeld to Universe JSON

Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/

* Signing contribution agreement.

Co-authored-by: kunshar2 <kunshar2@cisco.com>
2020-10-21 18:42:11 +02:00
Ines Montani
d7a4e8454b
Merge pull request #6274 from walterhenry/master
User contributor agreement
2020-10-19 16:30:58 +02:00
walterhenry
ff82644746 User contributor agreement
Here it is!
2020-10-19 16:25:09 +02:00
Ines Montani
9227f9a3ca Update landing [ci skip] 2020-10-16 11:46:55 +02:00
Ines Montani
3851300e80 Update landing [ci skip] 2020-10-16 11:46:33 +02:00
Borijan Georgievski
2311192ba1
Include Macedonian language (#6230)
* Include Macedonian language

* Fix indentation at char_classes.py

* Fix indentation at char_classes.py

* Add Macedonian tests, update lex_attrs and char_classes

* Import unicode literals for python 2
2020-10-15 15:55:01 +02:00
Ines Montani
8eff159603 Merge branch 'spacy.io' of https://github.com/explosion/spaCy into spacy.io 2020-10-15 12:44:36 +02:00
Ines Montani
57eec9cc14 Update .gitignore 2020-10-15 12:44:32 +02:00
Ines Montani
ac77be48f2 Update netlify.toml [ci skip] 2020-10-15 12:44:03 +02:00
Ines Montani
bc027dc35c Update .gitignore [ci skip] 2020-10-15 12:43:35 +02:00
Ines Montani
a3b84c7656 Update netlify.toml [ci skip] 2020-10-15 12:42:30 +02:00
Ines Montani
07a976b036
Merge pull request #6221 from baranitharan2020/master 2020-10-13 11:03:49 +02:00
Ines Montani
7f92a5ee6a
Update spacy/lang/ta/examples.py 2020-10-13 11:03:35 +02:00