Commit Graph

3510 Commits

Author SHA1 Message Date
Matthew Honnibal
4a8a2b6001 Test #595 -- Bug in lemmatization of base forms. 2016-11-04 00:27:32 +01:00
Matthew Honnibal
f1605df2ec Fix #588: Matcher should reject empty pattern. 2016-11-03 00:16:44 +01:00
Matthew Honnibal
72b9bd57ec Test Issue #588: Matcher accepts invalid, empty patterns. 2016-11-03 00:09:35 +01:00
Matthew Honnibal
59a30325ba Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-11-03 00:04:12 +01:00
Ines Montani
cd89f6b602 Add line break 2016-11-03 00:03:37 +01:00
Matthew Honnibal
41a90a7fbb Add tokenizer exception for 'Ph.D.', to fix 592. 2016-11-03 00:03:34 +01:00
Ines Montani
eea9f1aab4 Make small changes and update StackOverflow text 2016-11-03 00:02:49 +01:00
Matthew Honnibal
532318e80b Import Jieba inside zh.make_doc 2016-11-02 23:49:19 +01:00
Matthew Honnibal
f292f7f0e6 Fix Issue #599, by considering empty documents to be parsed and tagged. Implementation is a bit dodgy. 2016-11-02 23:48:43 +01:00
Matthew Honnibal
b6b01d4680 Remove deprecated tokens_from_list test. 2016-11-02 23:47:21 +01:00
Matthew Honnibal
04fe18d23d Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-11-02 23:40:27 +01:00
Matthew Honnibal
3d6c79e595 Test Issue #599: .is_tagged and .is_parsed attributes not reflected after deserialization for empty documents. 2016-11-02 23:40:11 +01:00
Matthew Honnibal
05a8b752a2 Fix Issue #600: Missing setters for Token attribute. 2016-11-02 23:28:59 +01:00
Matthew Honnibal
125c910a8d Test Issue #600 2016-11-02 23:24:13 +01:00
Ines Montani
2515b32a74 Add documentation for Tokenizer API (see #600) 2016-11-02 23:18:02 +01:00
Ines Montani
309a8c2c0f Use more distinct color for table footers 2016-11-02 23:18:01 +01:00
Matthew Honnibal
fd2a012fd3 Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-11-02 23:15:52 +01:00
Matthew Honnibal
e0c9695615 Fix doc strings for tokenizer 2016-11-02 23:15:39 +01:00
Ines Montani
2329adf891 Fix typo 2016-11-02 21:34:56 +01:00
Matthew Honnibal
7b7660c903 Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-11-02 20:49:15 +01:00
Matthew Honnibal
80824f6d29 Fix test 2016-11-02 20:48:40 +01:00
Matthew Honnibal
dbe47902bc Add import fr 2016-11-02 20:48:29 +01:00
Matthew Honnibal
8f24dc1982 Fix infixes in Italian 2016-11-02 20:43:52 +01:00
Matthew Honnibal
41a4766c1c Fix infixes in spanish and portuguese 2016-11-02 20:43:12 +01:00
Matthew Honnibal
3d4bd96e8a Fix infixes in french 2016-11-02 20:41:43 +01:00
Matthew Honnibal
c09a8ce5bb Add test for french tokenizer 2016-11-02 20:40:31 +01:00
Matthew Honnibal
b012ae3044 Add test for loading languages 2016-11-02 20:38:48 +01:00
Matthew Honnibal
ad1c747c6b Fix stray POS in language stubs 2016-11-02 20:37:55 +01:00
Matthew Honnibal
e9e6fce576 Handle null prefix/suffix/infix search in tokenizer 2016-11-02 20:35:48 +01:00
Matthew Honnibal
22647c2423 Check that patterns aren't null before compiling regex for tokenizer 2016-11-02 20:35:29 +01:00
Matthew Honnibal
5ac735df33 Link languages in __init__.py 2016-11-02 20:05:14 +01:00
Matthew Honnibal
c68dfe2965 Stub out support for Italian 2016-11-02 20:03:24 +01:00
Matthew Honnibal
6dbf4f7ad7 Stub out support for French, Spanish, Italian and Portuguese 2016-11-02 20:02:41 +01:00
Matthew Honnibal
6b8b05ef83 Specify that spacy.util is encoded in utf8 2016-11-02 19:58:00 +01:00
Matthew Honnibal
5363224395 Add draft Jieba tokenizer for Chinese 2016-11-02 19:57:38 +01:00
Matthew Honnibal
f7fee6c24b Check for class-defined make_docs method before assigning one provided as an argument 2016-11-02 19:57:13 +01:00
Matthew Honnibal
19c1e83d3d Work on draft Italian tokenizer 2016-11-02 19:56:32 +01:00
Ines Montani
b54219c35a Add "Contribute" link to documentation table 2016-11-02 19:19:11 +01:00
Ines Montani
5fa19806eb Add logo 2016-11-02 17:46:04 +01:00
Ines Montani
fafd4bbb42 Add CONTRIBUTING.md 2016-11-02 17:45:16 +01:00
Matthew Honnibal
7555aa5e63 Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-11-02 12:31:49 +01:00
Matthew Honnibal
9efe568177 Add missing unicode_literals to spacy.util. I think this was messing up the tokenizer regex for non-ascii characters in Python 2. Re Issue #596 2016-11-02 12:31:34 +01:00
Ines Montani
adf04a6ad3 Adjust tutorial category name 2016-11-02 12:11:17 +01:00
Ines Montani
2c65c15d7a Fix typo 2016-11-02 11:25:09 +01:00
Ines Montani
823e47d946 Add language models to API docs (fixes #598) 2016-11-02 11:24:13 +01:00
Ines Montani
35ad353dc2 Fix odd row color to show scroll shadow 2016-11-02 11:21:17 +01:00
Ines Montani
0438137f2f Add language models to features (see #598) 2016-11-02 10:47:02 +01:00
Ines Montani
85b0dd9ad6 Change wording 2016-11-02 10:47:02 +01:00
Ines Montani
869570c2e7 Update features and add languages (see #598) 2016-11-02 10:45:29 +01:00
Ines Montani
74a6e63a6b Fix typo 2016-11-01 23:06:19 +01:00