Matthew Honnibal
|
1fb09c3dc1
|
Fix morphology tagger
|
2016-11-04 19:19:09 +01:00 |
|
Matthew Honnibal
|
a36353df47
|
Temporarily put back the tokenize_from_strings method, while tests aren't updated yet.
|
2016-11-04 19:18:07 +01:00 |
|
Matthew Honnibal
|
f0917b6808
|
Fix Issue #376: and/or was tagged as a noun.
|
2016-11-04 15:21:28 +01:00 |
|
Matthew Honnibal
|
737816e86e
|
Fix #368: Tokenizer handled pattern 'unicode close quote, period' incorrectly.
|
2016-11-04 15:16:20 +01:00 |
|
Matthew Honnibal
|
ab952b4756
|
Fix #578 -- Sputnik had been purging all files on --force, not just the relevant one.
|
2016-11-04 10:44:11 +01:00 |
|
Matthew Honnibal
|
6e37ba1d82
|
Fix #602, #603 --- Broken build
|
2016-11-04 09:54:24 +01:00 |
|
Matthew Honnibal
|
293c79c09a
|
Fix #595: Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly.
|
2016-11-04 00:29:07 +01:00 |
|
Matthew Honnibal
|
e30348b331
|
Prefer to import from symbols instead of parts_of_speech
|
2016-11-04 00:27:55 +01:00 |
|
Matthew Honnibal
|
4a8a2b6001
|
Test #595 -- Bug in lemmatization of base forms.
|
2016-11-04 00:27:32 +01:00 |
|
Matthew Honnibal
|
f1605df2ec
|
Fix #588: Matcher should reject empty pattern.
|
2016-11-03 00:16:44 +01:00 |
|
Matthew Honnibal
|
72b9bd57ec
|
Test Issue #588: Matcher accepts invalid, empty patterns.
|
2016-11-03 00:09:35 +01:00 |
|
Matthew Honnibal
|
41a90a7fbb
|
Add tokenizer exception for 'Ph.D.', to fix 592.
|
2016-11-03 00:03:34 +01:00 |
|
Matthew Honnibal
|
532318e80b
|
Import Jieba inside zh.make_doc
|
2016-11-02 23:49:19 +01:00 |
|
Matthew Honnibal
|
f292f7f0e6
|
Fix Issue #599, by considering empty documents to be parsed and tagged. Implementation is a bit dodgy.
|
2016-11-02 23:48:43 +01:00 |
|
Matthew Honnibal
|
b6b01d4680
|
Remove deprecated tokens_from_list test.
|
2016-11-02 23:47:21 +01:00 |
|
Matthew Honnibal
|
3d6c79e595
|
Test Issue #599: .is_tagged and .is_parsed attributes not reflected after deserialization for empty documents.
|
2016-11-02 23:40:11 +01:00 |
|
Matthew Honnibal
|
05a8b752a2
|
Fix Issue #600: Missing setters for Token attribute.
|
2016-11-02 23:28:59 +01:00 |
|
Matthew Honnibal
|
125c910a8d
|
Test Issue #600
|
2016-11-02 23:24:13 +01:00 |
|
Matthew Honnibal
|
e0c9695615
|
Fix doc strings for tokenizer
|
2016-11-02 23:15:39 +01:00 |
|
Matthew Honnibal
|
80824f6d29
|
Fix test
|
2016-11-02 20:48:40 +01:00 |
|
Matthew Honnibal
|
dbe47902bc
|
Add import fr
|
2016-11-02 20:48:29 +01:00 |
|
Matthew Honnibal
|
8f24dc1982
|
Fix infixes in Italian
|
2016-11-02 20:43:52 +01:00 |
|
Matthew Honnibal
|
41a4766c1c
|
Fix infixes in spanish and portuguese
|
2016-11-02 20:43:12 +01:00 |
|
Matthew Honnibal
|
3d4bd96e8a
|
Fix infixes in french
|
2016-11-02 20:41:43 +01:00 |
|
Matthew Honnibal
|
c09a8ce5bb
|
Add test for french tokenizer
|
2016-11-02 20:40:31 +01:00 |
|
Matthew Honnibal
|
b012ae3044
|
Add test for loading languages
|
2016-11-02 20:38:48 +01:00 |
|
Matthew Honnibal
|
ad1c747c6b
|
Fix stray POS in language stubs
|
2016-11-02 20:37:55 +01:00 |
|
Matthew Honnibal
|
e9e6fce576
|
Handle null prefix/suffix/infix search in tokenizer
|
2016-11-02 20:35:48 +01:00 |
|
Matthew Honnibal
|
22647c2423
|
Check that patterns aren't null before compiling regex for tokenizer
|
2016-11-02 20:35:29 +01:00 |
|
Matthew Honnibal
|
5ac735df33
|
Link languages in __init__.py
|
2016-11-02 20:05:14 +01:00 |
|
Matthew Honnibal
|
c68dfe2965
|
Stub out support for Italian
|
2016-11-02 20:03:24 +01:00 |
|
Matthew Honnibal
|
6dbf4f7ad7
|
Stub out support for French, Spanish, Italian and Portuguese
|
2016-11-02 20:02:41 +01:00 |
|
Matthew Honnibal
|
6b8b05ef83
|
Specify that spacy.util is encoded in utf8
|
2016-11-02 19:58:00 +01:00 |
|
Matthew Honnibal
|
5363224395
|
Add draft Jieba tokenizer for Chinese
|
2016-11-02 19:57:38 +01:00 |
|
Matthew Honnibal
|
f7fee6c24b
|
Check for class-defined make_docs method before assigning one provided as an argument
|
2016-11-02 19:57:13 +01:00 |
|
Matthew Honnibal
|
19c1e83d3d
|
Work on draft Italian tokenizer
|
2016-11-02 19:56:32 +01:00 |
|
Matthew Honnibal
|
9efe568177
|
Add missing unicode_literals to spacy.util. I think this was messing up the tokenizer regex for non-ascii characters in Python 2. Re Issue #596
|
2016-11-02 12:31:34 +01:00 |
|
Matthew Honnibal
|
d8db648ebf
|
Add __init__.py file for regression tests
|
2016-11-01 13:45:06 +01:00 |
|
Matthew Honnibal
|
11664b9f20
|
Fix variable error in token
|
2016-11-01 13:28:00 +01:00 |
|
Matthew Honnibal
|
8c4d1b46ce
|
Fix variable error in Span
|
2016-11-01 13:27:44 +01:00 |
|
Matthew Honnibal
|
e7af6b937f
|
Fix syntax error while fixing doc strings
|
2016-11-01 13:27:32 +01:00 |
|
Matthew Honnibal
|
62fc6b1afa
|
Use 32 bit hashes for OOV, re Issue #589, Issue #285
|
2016-11-01 13:27:13 +01:00 |
|
Matthew Honnibal
|
6977a2b8cd
|
Add test for Issue #589
|
2016-11-01 12:33:36 +01:00 |
|
Matthew Honnibal
|
b86f8af0c1
|
Fix doc strings
|
2016-11-01 12:25:36 +01:00 |
|
Matthew Honnibal
|
d563f1eadb
|
Fix Issue #587: Segfault in Matcher, due to simple error in the state machine.
|
2016-10-28 17:42:00 +02:00 |
|
Matthew Honnibal
|
7e5f63a595
|
Improve test slightly
|
2016-10-28 17:41:16 +02:00 |
|
Matthew Honnibal
|
782e4814f4
|
Test Issue #587: Matcher segfaults on particular input
|
2016-10-28 16:38:32 +02:00 |
|
Matthew Honnibal
|
708ea22208
|
Infer types in transition_system.pyx
|
2016-10-27 18:08:13 +02:00 |
|
Matthew Honnibal
|
18590eba94
|
Fix training evaluate method
|
2016-10-27 18:02:19 +02:00 |
|
Matthew Honnibal
|
301f3cc898
|
Fix Issue #429. Add an initialize_state method to the named entity recogniser that adds missing entity types. This is a messy place to add this, because it's strange to have the method mutate state. A better home for this logic could be found.
|
2016-10-27 18:01:55 +02:00 |
|