fizban99
57d4a8bf3d
Create fizban99.md ( #3601 )
2019-04-17 11:22:19 +02:00
Matthew Honnibal
83511972d3
Set version to v2.1.4.dev0
2019-04-16 14:17:26 +02:00
Matthew Honnibal
8b5ae0733e
Merge branch 'master' of https://github.com/explosion/spaCy
2019-04-16 12:29:46 +02:00
Matthew Honnibal
d59b2e8a0c
Fix issue #3551 : Upper case lemmas
...
If the Morphology class tries to lemmatize a word that's not in the
string store, it's forced to just return it as-is. While loading
exceptions, the class could hit a case where these strings weren't in
the string store yet. The resulting lemmas could then be cached, leading
to some words receiving upper-case lemmas. Closes #3551 .
2019-04-16 12:27:15 +02:00
BreakBB
5b8dbe4975
Fix symlink creation to show error message on failure ( #3589 ) ( resolves #3307 ))
...
* Fix symlink creation to show error message on failure. Update tests to reflect those changes.
* Fix test to succeed on non windows systems.
2019-04-16 11:58:31 +02:00
Krzysztof Kowalczyk
cc1516ec26
Improved training and evaluation ( #3538 )
...
* Add early stopping
* Add return_score option to evaluate
* Fix missing str to path conversion
* Fix import + old python compatibility
* Fix bad beam_width setting during cpu evaluation in spacy train with gpu option turned on
2019-04-15 12:04:36 +02:00
svlandeg
6763e025e1
parse wp dump for links to determine prior probabilities
2019-04-15 11:41:57 +02:00
svlandeg
3163331b1e
wikipedia dump parser and mediawiki format regex cleanup
2019-04-14 21:52:01 +02:00
Ines Montani
5289dd1356
Fix formatting
2019-04-13 17:58:26 +02:00
Ines Montani
9e7deeaf48
Remove Datacamp
2019-04-13 17:46:32 +02:00
Shikhar Chauhan
bbf6f9f764
Change default output format from jsonl
to json
for cli convert ( #3583 ) ( closes #3523 )
...
* Changing default ouput format from jsonl to json for cli convert
* Adding Contributor Agreement
2019-04-12 11:31:23 +02:00
svlandeg
b31a390a9a
reading types, claims and sitelinks
2019-04-11 21:42:44 +02:00
svlandeg
6e997be4b4
reading wikidata descriptions and aliases
2019-04-11 21:08:22 +02:00
Omer Celik
531c0869b2
Added Turkish Lira symbol(₺) ( #3576 )
...
Added Turkish Lira symbol(₺)
https://en.wikipedia.org/wiki/Turkish_lira
2019-04-11 11:32:28 +02:00
Omer Celik
034a1f458b
Signed agreement ( #3577 )
2019-04-11 11:31:27 +02:00
Ivan Tham
71710e2454
Add myself to contributors ( #3575 )
2019-04-11 11:31:04 +02:00
oterrier
2854724e69
Added project gracyql to Universe ( #3570 ) ( resolves #3568 )
...
As discussed with Ines in https://github.com/explosion/spaCy/issues/3568 , adding a new project proposal for the community in SpaCy Universe website
GracyQL a tiny graphql wrapper aroung spacy using graphene and starlette.
## Description
Change only in universe.json file to add a new project
### Types of change
New project reference in Universe
## Checklist
- [x ] I have submitted the spaCy Contributor Agreement.
- [x ] I ran the tests, and all new and existing tests passed.
- [ x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-04-10 17:54:42 +02:00
svlandeg
9a7d534b1b
enable nogil for cython functions in kb.pxd
2019-04-10 17:25:10 +02:00
svlandeg
61a33f55d2
little fixes
2019-04-10 16:06:09 +02:00
Santiago Castro
86e4b68aa9
Fix website docs for Vectors.from_glove ( #3565 )
...
* Fix website docs for Vectors.from_glove
* Add myself as a contributor
2019-04-10 15:23:27 +02:00
Ines Montani
4d198a7e92
Ensure match pattern error isn't raised on empty errors ( closes #3549 )
2019-04-09 12:50:43 +02:00
Ines Montani
3ddb799f27
Merge branch 'master' of https://github.com/explosion/spaCy
2019-04-09 11:40:28 +02:00
Ines Montani
145c0b7e88
Tidy up and auto-format
2019-04-09 11:40:19 +02:00
Bharat Raghunathan
72820896d4
Fix typo in web docs cli.md ( #3559 )
2019-04-09 11:40:03 +02:00
Ines Montani
5f005adf61
Add xfailing test for #3555
2019-04-09 11:07:14 +02:00
Ines Montani
6ae3b5699e
Make sure path is string ( resolves #3546 )
2019-04-08 12:53:41 +02:00
Ines Montani
d0f5e015cb
Auto-format
2019-04-08 12:53:16 +02:00
pierremonico
0d26bfe677
Removes duplicate in table ( #3550 )
...
* Removes duplicate in table
Just fixing typos.
* Remove newline
Co-authored-by: Ines Montani <ines@ines.io>
2019-04-08 10:30:42 +02:00
Piero Molino
5198aa4ae6
Added Ludwig among the projects ( #3548 ) [ci skip]
...
* Added Ludwig among the projects
* Create w4nderlust.md
* Add Uber to logo wall
2019-04-07 13:01:26 +02:00
Dobita21
8bf6967eb7
Update Thai stop words ( #3545 )
...
* test sPacy commit to git fri 04052019 10:54
* change Data format from my format to master format
* ทัทั้งนี้ ---> ทั้งนี้
* delete stop_word translate from Eng
* Adjust formatting and readability
2019-04-05 12:06:38 +02:00
jeannefukumaru
f67d881b30
fix typos in tag_map flagged by python -m debug-data
( #3542 )
...
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
Co-authored-by: Ines Montani <ines@ines.io>
2019-04-05 12:06:09 +02:00
Ines Montani
cd21778bef
Merge pull request #3539 from jeannefukumaru/master
...
Added tags previously missing from Indonesian `tag_map.py`
2019-04-04 11:57:03 +02:00
Jeanne Choo
b6c9807431
Merge remote-tracking branch 'upstream/master'
2019-04-04 14:21:50 +08:00
Jeanne Choo
80e15af76c
fixed tag_map.py merge conflict
2019-04-04 14:18:27 +08:00
jeannefukumaru
eba4f77526
Merge pull request #2 from jeannefukumaru/update_indonesian_tag_map
...
updated tag map with missing tags
2019-04-04 06:49:04 +08:00
jeannefukumaru
876ce01567
updated tag map with missing tags
2019-04-03 23:09:11 +08:00
jeannefukumaru
99e04c4ce2
Merge pull request #1 from jeannefukumaru/added-indonesian-tag-map
...
Added indonesian tag map
2019-04-03 23:05:05 +08:00
Ines Montani
4faf62d515
Merge pull request #3530 from svlandeg/fix/issue_3521
...
Allow English stopwords with any type of apostrophe
2019-04-03 14:14:03 +02:00
Yves Peirsman
951825532c
Improved Dutch language resources and Dutch lemmatization ( #3409 )
...
* Improved Dutch language resources and Dutch lemmatization
* Fix conftest
* Update punctuation.py
* Auto-format
* Format and fix tests
* Remove unused test file
* Re-add deleted test
* removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains
* Cleaner lemmatization files
2019-04-03 14:13:26 +02:00
svlandeg
4ff786e113
addressed all comments by Ines
2019-04-03 13:50:33 +02:00
Ines Montani
6a4575a56c
Don't make "settings" or "title" required in displaCy data ( closes #3531 )
2019-04-03 10:13:16 +02:00
Ines Montani
2f0f439c54
Remove non-existent example ( closes #3533 )
2019-04-03 09:59:17 +02:00
Kamolsit Mongkolsrisawat
dcc67f3f51
Update Thai tokenizer_exception list ( #3529 )
...
* add tokenizer_exceptions word (ก-น) from https://goo.gl/JpJ2qq
* update tokenizer_exceptions word list
* add contributor file
2019-04-03 09:13:36 +02:00
ivigamberdiev
5e5641616d
Update links and http -> https ( #3532 )
...
* update links and http -> https
* SCA
2019-04-02 17:36:22 +02:00
svlandeg
85b4319f33
specify encoding in files
2019-04-02 15:05:31 +02:00
svlandeg
673c81bbb4
unicode string for python 2.7
2019-04-02 13:52:07 +02:00
svlandeg
eca9cc5417
fixing Issue #3521 by adding all hyphen variants for each stopword
2019-04-02 13:24:59 +02:00
svlandeg
e7062cf699
failing test for Issue #3521
2019-04-02 13:15:35 +02:00
svlandeg
1424b12b09
failing test for Issue #3449
2019-04-02 13:06:37 +02:00
Ines Montani
24cecdb44f
Update compatibility [ci skip]
2019-04-01 16:25:16 +02:00