Ines Montani
52e7d634df
Remove trailing whitespace
2016-12-07 21:12:19 +01:00
Ines Montani
0d07d7fc80
Apply emoticon exceptions to tokenizer
2016-12-07 21:11:59 +01:00
Ines Montani
71f0f34cb3
Fix formatting
2016-12-07 21:11:29 +01:00
Ines Montani
9413bcd9ee
Declare encoding and unicode literals
2016-12-07 21:10:34 +01:00
Ines Montani
a280ff2657
Fix __all__
2016-12-07 21:10:12 +01:00
Ines Montani
ba8721953c
Add missing emoticons
2016-12-07 21:09:44 +01:00
Ines Montani
1285c4ba93
Update English language data
2016-12-07 20:33:28 +01:00
Ines Montani
79dce0aabe
Add emoticons
2016-12-07 20:33:28 +01:00
Ines Montani
a662a95294
Add line breaks
2016-12-07 20:33:28 +01:00
Ines Montani
07f0efb102
Add test for tokenizer regular expressions
2016-12-07 20:33:28 +01:00
Ines Montani
e0712d1b32
Reformat language data
2016-12-07 20:33:28 +01:00
Matthew Honnibal
0c0f4c965d
Increment version
2016-12-03 11:16:52 +01:00
Matthew Honnibal
f6e356aada
Add (and test) Span.sentiment attribute. By default we average token.span, but can override with custom hook. Re Issue #667
2016-12-02 11:05:50 +01:00
Matthew Honnibal
296d33a4fc
Merge branch 'master' of ssh://github.com/explosion/spaCy
2016-11-26 12:36:18 +01:00
Matthew Honnibal
1f6c37c6f5
Fix create_tokenizer when nlp is None
2016-11-26 12:36:04 +01:00
Matthew Honnibal
c7889492f9
Fix model saving error for Python 3
2016-11-25 18:04:30 -06:00
Matthew Honnibal
bc0a202c9c
Fix unicode problem in nonproj module
2016-11-25 17:29:17 -06:00
Matthew Honnibal
6dd3b94fa6
Filter out deprecated attributes when reading special-case tokenization rules.
2016-11-25 09:57:18 -06:00
Matthew Honnibal
e879c79b8c
Merge branch 'master' of https://github.com/explosion/spaCy
2016-11-25 09:18:28 -06:00
Matthew Honnibal
a335c6dcc2
Exclude morphs from deprecated token attributes for now
2016-11-25 16:17:32 +01:00
Matthew Honnibal
f799a07f25
Merge branch 'master' of https://github.com/explosion/spaCy
2016-11-25 09:16:43 -06:00
Matthew Honnibal
159e8c46e1
Merge old training fixes with newer state
2016-11-25 09:16:36 -06:00
Matthew Honnibal
846e80f2f4
Exclude morphs from deprecated token attributes for now
2016-11-25 16:14:54 +01:00
Matthew Honnibal
664f2dd1c0
Allow dep to be None in scorer, for missing labels.
2016-11-25 09:02:49 -06:00
Matthew Honnibal
39341598bb
Fix NER label calculation
2016-11-25 09:02:22 -06:00
Matthew Honnibal
ca773a1f53
Tweak arc_eager n_gold to deal with negative costs, and improve error message.
2016-11-25 09:01:52 -06:00
Matthew Honnibal
a2f55e7015
Pass cfg through loading, for training.
2016-11-25 09:01:20 -06:00
Matthew Honnibal
608d8f5421
Pass cfg through parser, and have is_valid default to 1, not 0 when resetting state
2016-11-25 09:00:21 -06:00
Matthew Honnibal
cc7e607a8a
Fix gold.pyx for 1.0
2016-11-25 08:57:59 -06:00
root
080d29e092
Fix train.py for 1.0
2016-11-25 08:55:33 -06:00
Matthew Honnibal
6652f2a135
Test #656 , #624 : special case rules for tokenizer with attributes.
2016-11-25 12:44:13 +01:00
Matthew Honnibal
1e0f566d95
Fix #656 , #624 : Support arbitrary token attributes when adding special-case rules.
2016-11-25 12:43:24 +01:00
Matthew Honnibal
87613edf8f
Add set_struct_attr staticmethod to token
2016-11-25 12:41:47 +01:00
Matthew Honnibal
fb69aa648f
Merge branch 'master' of ssh://github.com/explosion/spaCy
2016-11-25 11:35:44 +01:00
Matthew Honnibal
9a03a3f85e
Add get_struct_attr staticmethod to Token, to match Lexeme.get_struct_attr.
2016-11-25 11:35:17 +01:00
Matthew Honnibal
53d8ca8f51
Add spacy.attrs.intify_attrs function, to normalize strings in token attribute dictionaries.
2016-11-25 11:34:30 +01:00
Matthew Honnibal
b8c4f5ea76
Allow German noun chunks to work on Span
...
Update the German noun chunks iterator, so that it also works on Span objects.
2016-11-24 23:30:15 +11:00
Pokey Rule
3e3bda142d
Add noun_chunks to Span
2016-11-24 10:47:20 +00:00
Matthew Honnibal
09f68bc641
Fix Issue #639 : stop words in language class not used. This patch is messy, but it's better not to change too much until the language data loading can be properly refactored.
2016-11-24 00:13:55 +01:00
Matthew Honnibal
48e1dc29d4
Fix default path loading.
2016-11-23 23:48:55 +01:00
Matthew Honnibal
e01c1875ee
Work on test for #615
2016-11-23 23:48:41 +01:00
ExplodingCabbage
6c4f488e89
Fix syntax mistake
2016-11-23 15:12:45 +00:00
Matthew Honnibal
60eb2343ce
Only try to load vectors if they exist.
2016-11-23 13:50:24 +01:00
Matthew Honnibal
618ac36093
Fix use of path argument in Language.__init__. Needs to be keyword arg, not positional.
2016-11-23 13:26:34 +01:00
Mark Amery
fbe19680a6
Fix another bug related to Language.__init__'s path parameter
2016-11-20 20:31:34 +00:00
Mark Amery
b0a07c21a0
Fix path
param of Language.__init__
always being ignored
...
There was an explicitly-declared `path` keyword argument, so 'path'
would never be present in `**overrides`. This line just overwrote
any manually-specified value the user might've passed to the `path`
parameter.
2016-11-20 16:29:57 +00:00
Mark Amery
1988fce389
Merge remote-tracking branch 'origin/master' into specify-data-path
2016-11-20 16:07:14 +00:00
Mark Amery
3871007c72
Let --data-path be specified when running download.py scripts
...
Resolves https://github.com/explosion/spaCy/issues/637
2016-11-20 15:48:04 +00:00
Ines Montani
dad2c6cae9
Strip trailing whitespace
2016-11-20 16:45:51 +01:00
Ines Montani
3082e49326
Update and reformat German stopwords
2016-11-20 16:45:26 +01:00