Commit Graph

11773 Commits

Author SHA1 Message Date
Adriane Boyd
afd744bc05
Update Travis CI pip install steps (#6440) 2020-11-24 14:10:16 +01:00
Adriane Boyd
573f5c863f
Fix tag map clobbering in spacy train (#6437)
Fix bug from #5768 where the tag map is clobbered if a custom tag map
isn't provided.
2020-11-24 13:13:16 +01:00
Adriane Boyd
ce18fc6588 Set version to v2.3.3 2020-11-24 10:03:45 +01:00
Adriane Boyd
cd61d264ef Set version to v2.3.3.dev0 2020-11-23 13:51:59 +01:00
Sofie Van Landeghem
2af31a8c8d
Bugfix textcat reproducibility on GPU (#6411)
* add seed argument to ParametricAttention layer

* bump thinc to 7.4.3

* set thinc version range

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-11-23 12:29:35 +01:00
Adriane Boyd
cdca44ac11
Dynamically include numpy headers (#6418)
* Dynamically include numpy headers

* Add `build-constraints.txt` with numpy version pins for building wheels with `pip` and `wheelwright`
* Update `setup.py` to add current numpy include directory
* Assume `cython` and `numpy` are installed for `setup.py`
* Remove included numpy headers

* Fix typo in requirements.txt

* Use script in CI
2020-11-23 11:15:11 +01:00
Adriane Boyd
3f61f5eb54
Use int8_t instead of char in Matcher (#6413)
* Use signed char instead of char in Matcher

Remove unused char* utf8_t typedef

* Use int8_t instead of signed char
2020-11-23 10:26:47 +01:00
Adriane Boyd
4284605683
Remove Beam cleanup (#6414)
Beam cleanup is handled through the Beam finalization method.
2020-11-23 10:01:46 +01:00
Adriane Boyd
a8c2dad466
Add all vectors to vocab before pruning (#6408)
Add all vectors to the vocab before pruning to correct the selection of
vectors to prioritize.
2020-11-23 10:00:59 +01:00
Adriane Boyd
13f0676f04
Updates for python 3.9 (#6338)
* Update blis and thinc version ranges

* Update thinc version range

* Update setup.cfg for python 3.9

* Adjust blis and thinc ranges
* Add python 3.9 classifier

* Update CI for python 3.9

* Add --prefer-binary to CI sdist install

* Update CI python 3.7 mac image

* Add --prefer-binary to Travis CI

* Update install instructions in README

* Specify blis versions separately for < / >= 3.6

* Update --prefer-binary in README

* Test cleaner sdist install

* Also upgrade pip

(This is kind of unnecessary given --prefer-binary but may avoid other
issues related to sdist installs in the future.)

* Compile with -j 2

* Remove wheel from setup_requires

* Update to have separate CI uninstall step

* Remove wheel from pyproject.toml

* Recommend upgrading setuptools in addition to pip
2020-11-23 09:45:18 +01:00
Yusuke Mori
e3ac90b035
Avoid a SyntaxError in self-attentive-parser (#6428)
* Avoid a SyntaxError in self-attentive-parser

Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser

* Create forest1988.md

Fill in the spaCy contributor agreement
2020-11-22 21:59:37 +01:00
M. Revuelta Espinosa
51232ffb9e
Update universe.json (include PatternOmatic) (#6399)
Request to include PatternOmatic in spaCy Universe

Adds @revuel to contributors
2020-11-19 13:15:50 +01:00
Adriane Boyd
3cf6479467 Fix JSON in #6395 2020-11-17 15:25:41 +01:00
Sam Edwardes
78913a4f95
Added spaCyTextBlob to universe.json (#6395) 2020-11-17 14:38:34 +01:00
Adriane Boyd
320a8b1481
Add ent_id_ to strings serialized with Doc (#6353) 2020-11-10 20:16:07 +08:00
Ines Montani
d490428089 Update README.md [ci skip] 2020-11-10 09:51:20 +08:00
Ines Montani
4d337eedf2
Merge pull request #6322 from medspacy/master 2020-11-10 02:47:29 +01:00
Adriane Boyd
90550552a0
CI updates for python 3.5 (#6354)
* Update pip in CI

* Use --prefer-binary

* Use `--prefer-binary`
* Delete all installed packages before testing source install

* sdist install with --only-binary :all:
2020-11-06 13:35:51 +01:00
Daniel Vasic
20d72de986
Added Multext-East V5 tagset for Croatian language (#6248)
* Added Multext-East V5 tagset for Croatian language

* Create danielvasic.md

* Update danielvasic.md

* Update danielvasic.md

* Add tag map to CroatianDefaults

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-11-05 12:19:22 +01:00
Robert Šípek
6069efe57d
Add tag map to cs language (#6284) 2020-11-05 10:13:11 +01:00
Adriane Boyd
8644ee3e3f
Update TIGER link and tag description (#6344) 2020-11-05 09:33:00 +01:00
Vu Ha
6d465ec52c
add oprd to the list of accepted deps for noun chunking (#6302)
* add oprd to the list of accepted deps for noun chunking

* add SCA
2020-11-05 09:17:35 +01:00
Adriane Boyd
31de700b0f
Fix on_match callback and remove empty patterns (#6312)
For the `DependencyMatcher`:

* Fix on_match callback so that it is called once per matched pattern
* Fix results so that patterns with empty match lists are not returned
2020-11-05 09:16:26 +01:00
Alec Chapman
204c7c8a00 fix thumbnail link to be github raw url 2020-11-01 07:53:48 -07:00
Adriane Boyd
45c9a68828
Identify final Matcher pattern node by quantifier (#6317)
Modify the internal pattern representation in `Matcher` patterns to
identify the final ID state using a unique quantifier rather than a
combination of other attributes.

It was insufficient to identify the final ID node based on an
uninitialized `quantifier` (coincidentally being the same as the `ZERO`)
with `nr_attr` as 0. (In addition, it was potentially bug-prone that
`nr_attr` was set to 0 even though attrs were allocated.)

In the case of `{"OP": "!"}` (a valid, if pointless, pattern), `nr_attr`
is 0 and the quantifier is ZERO, so the previous methods for
incrementing to the ID node at the end of the pattern weren't able to
distinguish the final ID node from the `{"OP": "!"}` pattern.
2020-10-31 12:18:48 +01:00
Alec Chapman
73d22d96ff add medspacy to universe and fix example w/ cov-bsv 2020-10-29 07:53:56 -06:00
Duygu Altinok
0e55f806dd
Turkish tokenization improvements (#6268)
* added single and paired orth variants

* added token match

* added long text tokenization test

* inverted init

* normalized lemmas to lowercase

* more abbrevs

* tests for ordinals and abbrevs

* separated period abbvrevs to another list

* fiex typo

* added ordinal and abbrev tests

* added number tests for dates

* minor refinement

* added inflected abbrevs regex

* added percentage and inflection

* cosmetics

* added token match

* added url inflection tests

* excluded url tokens from custom pattern

* removed url match import
2020-10-29 09:43:17 +01:00
Adriane Boyd
8cc5ed6771 Add Macedonian to website languages 2020-10-29 08:49:56 +01:00
Ines Montani
1e4d7e059f Revert "Test FUNDING.yml [ci skip]"
This reverts commit 287be48ad0.
2020-10-28 17:42:23 +01:00
Ines Montani
287be48ad0 Test FUNDING.yml [ci skip] 2020-10-28 17:36:25 +01:00
Adriane Boyd
4dd86306e9
Add Nepali to supported languages on website (#6315) 2020-10-28 16:32:07 +01:00
Robert Šípek
260c29794a
Fill contributor agreement by robertsipek (#6285)
* Fill contributor agreement by robertsipek

* Fill contributor agreement by robertsipek
2020-10-22 22:13:17 +02:00
Kunal Sharma
01aec7a313
Adding MindMeld to Universe JSON (#6275)
* Adding Mindmeld to Universe JSON

Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/

* Signing contribution agreement.

Co-authored-by: kunshar2 <kunshar2@cisco.com>
2020-10-21 18:42:11 +02:00
Ines Montani
d7a4e8454b
Merge pull request #6274 from walterhenry/master
User contributor agreement
2020-10-19 16:30:58 +02:00
walterhenry
ff82644746 User contributor agreement
Here it is!
2020-10-19 16:25:09 +02:00
Ines Montani
3851300e80 Update landing [ci skip] 2020-10-16 11:46:33 +02:00
Borijan Georgievski
2311192ba1
Include Macedonian language (#6230)
* Include Macedonian language

* Fix indentation at char_classes.py

* Fix indentation at char_classes.py

* Add Macedonian tests, update lex_attrs and char_classes

* Import unicode literals for python 2
2020-10-15 15:55:01 +02:00
Ines Montani
bc027dc35c Update .gitignore [ci skip] 2020-10-15 12:43:35 +02:00
Ines Montani
a3b84c7656 Update netlify.toml [ci skip] 2020-10-15 12:42:30 +02:00
Ines Montani
07a976b036
Merge pull request #6221 from baranitharan2020/master 2020-10-13 11:03:49 +02:00
Ines Montani
7f92a5ee6a
Update spacy/lang/ta/examples.py 2020-10-13 11:03:35 +02:00
Baranitharan
d6037c1860
added sentence 2020-10-08 08:22:58 +05:30
Baranitharan
169857e0ec
Merge pull request #1 from baranitharan2020/baranitharan2020-patch-1
Update examples.py
2020-10-08 08:17:57 +05:30
Baranitharan
81afe9b19d
Update examples.py 2020-10-08 08:17:25 +05:30
Sofie Van Landeghem
241cd112f5
add reenabled pipe names back to the meta before serializing (#6219) 2020-10-08 00:44:16 +02:00
Sofie Van Landeghem
2998131416
Reproducibility for TextCat and Tok2Vec (#6218)
* ensure fixed seed in HashEmbed layers

* forgot about the joys of python 2
2020-10-08 00:43:46 +02:00
Wannaphong Phatthiyaphaibun
9fc8392b38
Add Thai tag map (LST20 Corpus) (#6163)
* Add Thai tag map (LST20 Corpus)

By @korakot

* Update tag_map.py

* Update tag_map.py

* Update tag_map.py
2020-10-07 11:12:01 +02:00
Duygu Altinok
7e821c2776
Turkish language syntax iterators (#6191)
* added tr_vocab to config

* basic test

* added syntax iterator to Turkish lang class

* first version for Turkish syntax iter, without flat

* added simple tests with nmod, amod, det

* more tests to amod and nmod

* separated noun chunks and parser test

* rearrangement after nchunk parser separation

* added recursive NPs

* tests with complicated recursive NPs

* tests with conjed NPs

* additional tests for conj NP

* small modification for shaving off conj from NP

* added tests with flat

* more tests with flat

* added examples with flats conjed

* added inner func for flat trick

* corrected parse

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-10-07 11:07:52 +02:00
Duygu Altinok
2ce6fc2611
Turkish tag map and morph rules addition (#6141)
* feat: added turkish tag map

* feat: morph rules cconj and sconj

* feat: more conjuncts

* feat: added popular postpositions

* feat: added adverbs

* feat: added personal pronouns

* feat: added reflexive pronouns

* minor: corrected case capital

* minor: fixed comma typo

* feat: added indef pronouns

* feat: added dict iter

* fixed comma typo

* updated language class with tag map and morph

* use default tag map instead

* removed tag map
2020-10-07 10:27:36 +02:00
Duygu Altinok
b95a11dd95
Ordinal numbers for Turkish (#6142)
* minor ordinal number addition

* fixed typo

* added corresponding lexical test
2020-10-07 10:25:37 +02:00