Commit Graph

1914 Commits

Author SHA1 Message Date
Matthew Honnibal
b8c4f5ea76 Allow German noun chunks to work on Span
Update the German noun chunks iterator, so that it also works on Span objects.
2016-11-24 23:30:15 +11:00
Pokey Rule
3e3bda142d Add noun_chunks to Span 2016-11-24 10:47:20 +00:00
Matthew Honnibal
09f68bc641 Fix Issue #639: stop words in language class not used. This patch is messy, but it's better not to change too much until the language data loading can be properly refactored. 2016-11-24 00:13:55 +01:00
Matthew Honnibal
48e1dc29d4 Fix default path loading. 2016-11-23 23:48:55 +01:00
Matthew Honnibal
e01c1875ee Work on test for #615 2016-11-23 23:48:41 +01:00
ExplodingCabbage
6c4f488e89 Fix syntax mistake 2016-11-23 15:12:45 +00:00
Matthew Honnibal
60eb2343ce Only try to load vectors if they exist. 2016-11-23 13:50:24 +01:00
Matthew Honnibal
618ac36093 Fix use of path argument in Language.__init__. Needs to be keyword arg, not positional. 2016-11-23 13:26:34 +01:00
Mark Amery
fbe19680a6 Fix another bug related to Language.__init__'s path parameter 2016-11-20 20:31:34 +00:00
Mark Amery
b0a07c21a0 Fix path param of Language.__init__ always being ignored
There was an explicitly-declared `path` keyword argument, so 'path'
would never be present in `**overrides`. This line just overwrote
any manually-specified value the user might've passed to the `path`
parameter.
2016-11-20 16:29:57 +00:00
Mark Amery
1988fce389 Merge remote-tracking branch 'origin/master' into specify-data-path 2016-11-20 16:07:14 +00:00
Mark Amery
3871007c72 Let --data-path be specified when running download.py scripts
Resolves https://github.com/explosion/spaCy/issues/637
2016-11-20 15:48:04 +00:00
Ines Montani
dad2c6cae9 Strip trailing whitespace 2016-11-20 16:45:51 +01:00
Ines Montani
3082e49326 Update and reformat German stopwords 2016-11-20 16:45:26 +01:00
Sourav Singh
6745eac309 Update language_data.py 2016-11-20 19:52:02 +05:30
Sourav Singh
4d9aae7d6a Add German Stopwords 2016-11-19 22:47:53 +05:30
Matthew Honnibal
7afb2544a7 Merge pull request #627 from sadovnychyi/patch-1
Remove duplicated line of vocab declaration
2016-11-16 06:09:18 +11:00
Yanhao
762169da29 Fixed bug: eg.guess is a tag id, rather than tag 2016-11-15 14:11:22 +08:00
Dmytro Sadovnychyi
e70a7050e1 Remove duplicated line of vocab declaration
As already declared on line 211.
2016-11-13 18:52:49 +08:00
Matthew Honnibal
f123f92e0c Fix #617: Vocab.load() required Path. Should work with string as well. 2016-11-10 22:48:48 +01:00
Matthew Honnibal
e86f440ca6 Fix test for issue 617 2016-11-10 22:48:10 +01:00
Matthew Honnibal
faa7610c56 Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-11-10 22:46:38 +01:00
Matthew Honnibal
a2c7de8329 spacy/tests/regression/test_issue617.py
Test Issue #617
2016-11-10 22:46:23 +01:00
tiago
2a3e342c1f Added a test case to cover the span.merge returning values 2016-11-09 18:57:50 +00:00
tiago
b38cfd0ef9 now span.merge returns token like it says on documentation 2016-11-09 14:58:19 +00:00
Dmitry Sadovnychyi
9488222e79 Fix PhraseMatcher to work with updated Matcher
#613
2016-11-09 00:14:26 +08:00
Dmitry Sadovnychyi
86c056ba64 Add basic test for PhraseMatcher
#613
2016-11-09 00:10:32 +08:00
Matthew Honnibal
3ea15b257f Fix test for 605 2016-11-06 11:59:26 +01:00
Matthew Honnibal
efe7790439 Test #590: Order dependence in Matcher rules. 2016-11-06 11:21:36 +01:00
Matthew Honnibal
5cd3acb265 Fix #605: Acceptor now rejects matches as expected. 2016-11-06 10:50:42 +01:00
Matthew Honnibal
75805397dd Test Issue #605 2016-11-06 10:42:32 +01:00
Matthew Honnibal
014b6936ac Fix #608 -- __version__ should be available at the base of the package. 2016-11-04 21:21:02 +01:00
Matthew Honnibal
42b0736db7 Increment version 2016-11-04 20:04:21 +01:00
Matthew Honnibal
9f93386994 Update version 2016-11-04 19:28:16 +01:00
Matthew Honnibal
1fb09c3dc1 Fix morphology tagger 2016-11-04 19:19:09 +01:00
Matthew Honnibal
a36353df47 Temporarily put back the tokenize_from_strings method, while tests aren't updated yet. 2016-11-04 19:18:07 +01:00
Matthew Honnibal
f0917b6808 Fix Issue #376: and/or was tagged as a noun. 2016-11-04 15:21:28 +01:00
Matthew Honnibal
737816e86e Fix #368: Tokenizer handled pattern 'unicode close quote, period' incorrectly. 2016-11-04 15:16:20 +01:00
Matthew Honnibal
ab952b4756 Fix #578 -- Sputnik had been purging all files on --force, not just the relevant one. 2016-11-04 10:44:11 +01:00
Matthew Honnibal
6e37ba1d82 Fix #602, #603 --- Broken build 2016-11-04 09:54:24 +01:00
Matthew Honnibal
293c79c09a Fix #595: Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly. 2016-11-04 00:29:07 +01:00
Matthew Honnibal
e30348b331 Prefer to import from symbols instead of parts_of_speech 2016-11-04 00:27:55 +01:00
Matthew Honnibal
4a8a2b6001 Test #595 -- Bug in lemmatization of base forms. 2016-11-04 00:27:32 +01:00
Matthew Honnibal
f1605df2ec Fix #588: Matcher should reject empty pattern. 2016-11-03 00:16:44 +01:00
Matthew Honnibal
72b9bd57ec Test Issue #588: Matcher accepts invalid, empty patterns. 2016-11-03 00:09:35 +01:00
Matthew Honnibal
41a90a7fbb Add tokenizer exception for 'Ph.D.', to fix 592. 2016-11-03 00:03:34 +01:00
Matthew Honnibal
532318e80b Import Jieba inside zh.make_doc 2016-11-02 23:49:19 +01:00
Matthew Honnibal
f292f7f0e6 Fix Issue #599, by considering empty documents to be parsed and tagged. Implementation is a bit dodgy. 2016-11-02 23:48:43 +01:00
Matthew Honnibal
b6b01d4680 Remove deprecated tokens_from_list test. 2016-11-02 23:47:21 +01:00
Matthew Honnibal
3d6c79e595 Test Issue #599: .is_tagged and .is_parsed attributes not reflected after deserialization for empty documents. 2016-11-02 23:40:11 +01:00