Ines Montani
|
6247c005a2
|
Add test for tokenizer regular expressions
|
2016-11-24 13:51:59 +01:00 |
|
Ines Montani
|
de747e39e7
|
Reformat language data
|
2016-11-24 13:51:32 +01:00 |
|
Ines Montani
|
dad2c6cae9
|
Strip trailing whitespace
|
2016-11-20 16:45:51 +01:00 |
|
Ines Montani
|
3082e49326
|
Update and reformat German stopwords
|
2016-11-20 16:45:26 +01:00 |
|
Sourav Singh
|
6745eac309
|
Update language_data.py
|
2016-11-20 19:52:02 +05:30 |
|
Sourav Singh
|
4d9aae7d6a
|
Add German Stopwords
|
2016-11-19 22:47:53 +05:30 |
|
Matthew Honnibal
|
7afb2544a7
|
Merge pull request #627 from sadovnychyi/patch-1
Remove duplicated line of vocab declaration
|
2016-11-16 06:09:18 +11:00 |
|
Yanhao
|
762169da29
|
Fixed bug: eg.guess is a tag id, rather than tag
|
2016-11-15 14:11:22 +08:00 |
|
Dmytro Sadovnychyi
|
e70a7050e1
|
Remove duplicated line of vocab declaration
As already declared on line 211.
|
2016-11-13 18:52:49 +08:00 |
|
Matthew Honnibal
|
f123f92e0c
|
Fix #617: Vocab.load() required Path. Should work with string as well.
|
2016-11-10 22:48:48 +01:00 |
|
Matthew Honnibal
|
e86f440ca6
|
Fix test for issue 617
|
2016-11-10 22:48:10 +01:00 |
|
Matthew Honnibal
|
faa7610c56
|
Merge branch 'master' of ssh://github.com/explosion/spaCy
|
2016-11-10 22:46:38 +01:00 |
|
Matthew Honnibal
|
a2c7de8329
|
spacy/tests/regression/test_issue617.py
Test Issue #617
|
2016-11-10 22:46:23 +01:00 |
|
tiago
|
2a3e342c1f
|
Added a test case to cover the span.merge returning values
|
2016-11-09 18:57:50 +00:00 |
|
tiago
|
b38cfd0ef9
|
now span.merge returns token like it says on documentation
|
2016-11-09 14:58:19 +00:00 |
|
Dmitry Sadovnychyi
|
9488222e79
|
Fix PhraseMatcher to work with updated Matcher
#613
|
2016-11-09 00:14:26 +08:00 |
|
Dmitry Sadovnychyi
|
86c056ba64
|
Add basic test for PhraseMatcher
#613
|
2016-11-09 00:10:32 +08:00 |
|
Matthew Honnibal
|
3ea15b257f
|
Fix test for 605
|
2016-11-06 11:59:26 +01:00 |
|
Matthew Honnibal
|
efe7790439
|
Test #590: Order dependence in Matcher rules.
|
2016-11-06 11:21:36 +01:00 |
|
Matthew Honnibal
|
5cd3acb265
|
Fix #605: Acceptor now rejects matches as expected.
|
2016-11-06 10:50:42 +01:00 |
|
Matthew Honnibal
|
75805397dd
|
Test Issue #605
|
2016-11-06 10:42:32 +01:00 |
|
Matthew Honnibal
|
014b6936ac
|
Fix #608 -- __version__ should be available at the base of the package.
|
2016-11-04 21:21:02 +01:00 |
|
Matthew Honnibal
|
42b0736db7
|
Increment version
|
2016-11-04 20:04:21 +01:00 |
|
Matthew Honnibal
|
9f93386994
|
Update version
|
2016-11-04 19:28:16 +01:00 |
|
Matthew Honnibal
|
1fb09c3dc1
|
Fix morphology tagger
|
2016-11-04 19:19:09 +01:00 |
|
Matthew Honnibal
|
a36353df47
|
Temporarily put back the tokenize_from_strings method, while tests aren't updated yet.
|
2016-11-04 19:18:07 +01:00 |
|
Matthew Honnibal
|
f0917b6808
|
Fix Issue #376: and/or was tagged as a noun.
|
2016-11-04 15:21:28 +01:00 |
|
Matthew Honnibal
|
737816e86e
|
Fix #368: Tokenizer handled pattern 'unicode close quote, period' incorrectly.
|
2016-11-04 15:16:20 +01:00 |
|
Matthew Honnibal
|
ab952b4756
|
Fix #578 -- Sputnik had been purging all files on --force, not just the relevant one.
|
2016-11-04 10:44:11 +01:00 |
|
Matthew Honnibal
|
6e37ba1d82
|
Fix #602, #603 --- Broken build
|
2016-11-04 09:54:24 +01:00 |
|
Matthew Honnibal
|
293c79c09a
|
Fix #595: Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly.
|
2016-11-04 00:29:07 +01:00 |
|
Matthew Honnibal
|
e30348b331
|
Prefer to import from symbols instead of parts_of_speech
|
2016-11-04 00:27:55 +01:00 |
|
Matthew Honnibal
|
4a8a2b6001
|
Test #595 -- Bug in lemmatization of base forms.
|
2016-11-04 00:27:32 +01:00 |
|
Matthew Honnibal
|
f1605df2ec
|
Fix #588: Matcher should reject empty pattern.
|
2016-11-03 00:16:44 +01:00 |
|
Matthew Honnibal
|
72b9bd57ec
|
Test Issue #588: Matcher accepts invalid, empty patterns.
|
2016-11-03 00:09:35 +01:00 |
|
Matthew Honnibal
|
41a90a7fbb
|
Add tokenizer exception for 'Ph.D.', to fix 592.
|
2016-11-03 00:03:34 +01:00 |
|
Matthew Honnibal
|
532318e80b
|
Import Jieba inside zh.make_doc
|
2016-11-02 23:49:19 +01:00 |
|
Matthew Honnibal
|
f292f7f0e6
|
Fix Issue #599, by considering empty documents to be parsed and tagged. Implementation is a bit dodgy.
|
2016-11-02 23:48:43 +01:00 |
|
Matthew Honnibal
|
b6b01d4680
|
Remove deprecated tokens_from_list test.
|
2016-11-02 23:47:21 +01:00 |
|
Matthew Honnibal
|
3d6c79e595
|
Test Issue #599: .is_tagged and .is_parsed attributes not reflected after deserialization for empty documents.
|
2016-11-02 23:40:11 +01:00 |
|
Matthew Honnibal
|
05a8b752a2
|
Fix Issue #600: Missing setters for Token attribute.
|
2016-11-02 23:28:59 +01:00 |
|
Matthew Honnibal
|
125c910a8d
|
Test Issue #600
|
2016-11-02 23:24:13 +01:00 |
|
Matthew Honnibal
|
e0c9695615
|
Fix doc strings for tokenizer
|
2016-11-02 23:15:39 +01:00 |
|
Matthew Honnibal
|
80824f6d29
|
Fix test
|
2016-11-02 20:48:40 +01:00 |
|
Matthew Honnibal
|
dbe47902bc
|
Add import fr
|
2016-11-02 20:48:29 +01:00 |
|
Matthew Honnibal
|
8f24dc1982
|
Fix infixes in Italian
|
2016-11-02 20:43:52 +01:00 |
|
Matthew Honnibal
|
41a4766c1c
|
Fix infixes in spanish and portuguese
|
2016-11-02 20:43:12 +01:00 |
|
Matthew Honnibal
|
3d4bd96e8a
|
Fix infixes in french
|
2016-11-02 20:41:43 +01:00 |
|
Matthew Honnibal
|
c09a8ce5bb
|
Add test for french tokenizer
|
2016-11-02 20:40:31 +01:00 |
|
Matthew Honnibal
|
b012ae3044
|
Add test for loading languages
|
2016-11-02 20:38:48 +01:00 |
|