Dobita21
8bf6967eb7
Update Thai stop words ( #3545 )
...
* test sPacy commit to git fri 04052019 10:54
* change Data format from my format to master format
* ทัทั้งนี้ ---> ทั้งนี้
* delete stop_word translate from Eng
* Adjust formatting and readability
2019-04-05 12:06:38 +02:00
jeannefukumaru
f67d881b30
fix typos in tag_map flagged by python -m debug-data
( #3542 )
...
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
Co-authored-by: Ines Montani <ines@ines.io>
2019-04-05 12:06:09 +02:00
Jeanne Choo
b6c9807431
Merge remote-tracking branch 'upstream/master'
2019-04-04 14:21:50 +08:00
Jeanne Choo
80e15af76c
fixed tag_map.py merge conflict
2019-04-04 14:18:27 +08:00
jeannefukumaru
876ce01567
updated tag map with missing tags
2019-04-03 23:09:11 +08:00
Ines Montani
4faf62d515
Merge pull request #3530 from svlandeg/fix/issue_3521
...
Allow English stopwords with any type of apostrophe
2019-04-03 14:14:03 +02:00
Yves Peirsman
951825532c
Improved Dutch language resources and Dutch lemmatization ( #3409 )
...
* Improved Dutch language resources and Dutch lemmatization
* Fix conftest
* Update punctuation.py
* Auto-format
* Format and fix tests
* Remove unused test file
* Re-add deleted test
* removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains
* Cleaner lemmatization files
2019-04-03 14:13:26 +02:00
svlandeg
4ff786e113
addressed all comments by Ines
2019-04-03 13:50:33 +02:00
Kamolsit Mongkolsrisawat
dcc67f3f51
Update Thai tokenizer_exception list ( #3529 )
...
* add tokenizer_exceptions word (ก-น) from https://goo.gl/JpJ2qq
* update tokenizer_exceptions word list
* add contributor file
2019-04-03 09:13:36 +02:00
svlandeg
673c81bbb4
unicode string for python 2.7
2019-04-02 13:52:07 +02:00
svlandeg
eca9cc5417
fixing Issue #3521 by adding all hyphen variants for each stopword
2019-04-02 13:24:59 +02:00
jeannefukumaru
6cdb7b2e04
added tag_map for indonesian ( #3515 )
...
* added tag_map for indonesian
* changed tag map from .py to .txt to see if tests pass
* added symbols import
* added utf8 encoding flag
* added missing SCONJ symbol
* Auto-format
* Remove unused imports
* Make tag map available in Indonesian defaults
2019-04-01 12:27:48 +02:00
Ines Montani
c23e234d65
Auto-format
2019-04-01 12:11:27 +02:00
Ines Montani
0a0b1087b0
Make tag map available in Indonesian defaults
2019-04-01 11:46:51 +02:00
Ines Montani
5d9212c44c
Remove unused imports
2019-04-01 11:46:25 +02:00
Ines Montani
8d6b544632
Auto-format
2019-04-01 11:45:43 +02:00
jeannefukumaru
6567f27849
added missing SCONJ symbol
2019-04-01 17:02:53 +08:00
jeannefukumaru
082a0a2232
added utf8 encoding flag
2019-04-01 16:37:11 +08:00
jeannefukumaru
a741bed7a7
added symbols import
2019-04-01 16:21:06 +08:00
jeannefukumaru
745cf0c914
changed tag map from .py to .txt to see if tests pass
2019-04-01 07:04:50 +08:00
jeannefukumaru
3cc897102f
added tag_map for indonesian
2019-04-01 00:00:08 +08:00
Duygu Altinok
5a7bc6b39d
Fix/irreg adverbs extension ( #3499 )
...
* extended list of irreg adverbs
* added test to exceptions
* fixed typo
2019-03-28 13:23:33 +01:00
Wannaphong Phatthiyaphaibun
297a051992
Update Thai tag map ( #3480 )
...
* Update Thai tag map
Update Thai tag map
* Create wannaphongcom.md
2019-03-25 16:53:26 +01:00
Matthew Honnibal
c66bd61e88
Fix lemmas
2019-03-21 14:22:12 +01:00
Matthew Honnibal
04395ffa49
Bring English tag_map in line with UD Treebank
...
I wrote a small script to read the UD English training data and check
that our tag map and morph rules were resulting in the best POS map.
This hadn't been done for some time, and there have been various changes
to the UD schema since it has been done. After these changes we should
see much better agreement between our POS assignments and the UD POS
tags.
2019-03-21 13:53:44 +01:00
Mehdi Hamoumi
9211f30ee3
Tiny correction in french lookup dictionary ( #3427 )
2019-03-19 13:00:19 +01:00
Ines Montani
2912ddc9a6
Don't set extension attribute in Japanese ( closes #3398 )
2019-03-12 13:30:33 +01:00
Ines Montani
cdd418b93e
Auto-format [ci skip]
2019-03-11 17:10:50 +01:00
Matthew Honnibal
39a4741e26
Add support for vocab.writing_system property ( #3390 )
...
* Add xfail test for vocab.writing_system
* Add vocab.writing_system property
* Set Language.Defaults.writing_system
* Set default writing system
* Remove xfail on test_vocab_writing_system
2019-03-11 15:23:20 +01:00
Ines Montani
ee4f312e89
Add writing_system to ArabicDefaults (experimental)
2019-03-11 14:22:23 +01:00
Ines Montani
ef80cfde6f
Fix pickling of Japanese ( closes #3191 )
2019-03-11 13:34:23 +01:00
Matthew Honnibal
5d25ee52fb
Fix English tag map
2019-03-11 01:06:02 +01:00
Matthew Honnibal
7503e1e505
Improve English tag map. Re #593 , #3311
2019-03-10 23:50:00 +01:00
Ines Montani
610fb306bd
Revert hyphens
2019-03-09 12:51:53 +01:00
Ines Montani
bbabb6aaae
Escape more hyphens
2019-03-09 12:41:05 +01:00
Ines Montani
b8db219850
Auto-format
2019-03-09 12:40:58 +01:00
Ines Montani
a145bfe627
Try escaping hyphens again
2019-03-09 03:06:50 +01:00
Ines Montani
b9c71fc0f0
Fix flags
2019-03-09 02:46:04 +01:00
Ines Montani
ae09b6a6cf
Try fixing unicode inconsistencies on Python 2
2019-03-09 02:37:50 +01:00
Ines Montani
d957d7a697
Auto-format
2019-03-09 02:37:41 +01:00
Ines Montani
65402c3d02
Revert "Experiment with escaping hyphens"
...
This reverts commit 9b42e2d5dd
.
2019-03-09 02:13:00 +01:00
Ines Montani
9b42e2d5dd
Experiment with escaping hyphens
2019-03-09 02:05:26 +01:00
Ines Montani
6bd34e9d54
Expose Japanese stop words ( closes #3346 )
2019-03-06 14:21:15 +01:00
Ines Montani
85deb96278
Fix whitespace
2019-03-06 14:20:34 +01:00
Ines Montani
23f6ebf0f3
Add missing " ( closes #3343 )
2019-02-27 16:37:03 +01:00
Ines Montani
48a2046d1c
Remove stray print statement ( closes #3342 )
2019-02-27 15:35:04 +01:00
Ines Montani
07d7c0a1af
Fix whitespace
2019-02-27 15:34:21 +01:00
Ines Montani
76ce8b2662
Merge branch 'master' into develop
2019-02-25 15:54:55 +01:00
Julia Makogon
f1c3108d52
Fixing pymorphy2 dependency issue ( #3329 ) ( closes #3327 )
...
* Classes for Ukrainian; small fix in Russian.
* Contributor agreement
* pymorphy2 initialization split for ru and uk (#3327 )
* stop-words fixed
* Unit-tests updated
2019-02-25 15:48:17 +01:00
Ines Montani
2982f82934
Auto-format
2019-02-24 14:09:15 +01:00