Matthew Honnibal
|
a4eb5c2bff
|
Check POS key in lemmatizer, to update it for new data format
|
2016-12-18 13:28:20 +01:00 |
|
Matthew Honnibal
|
28d63ec58e
|
Restore missing '' character in tokenizer exceptions.
|
2016-12-18 05:34:51 +01:00 |
|
Ines Montani
|
a9421652c9
|
Remove duplicates in tag map
|
2016-12-17 22:44:31 +01:00 |
|
Ines Montani
|
69baf1c9a8
|
Fix tag map
|
2016-12-17 22:44:22 +01:00 |
|
Ines Montani
|
577adad945
|
Fix formatting
|
2016-12-17 14:00:52 +01:00 |
|
Ines Montani
|
fc4ad17136
|
Fix typo
|
2016-12-17 14:00:47 +01:00 |
|
Ines Montani
|
bb94e784dc
|
Fix typo
|
2016-12-17 13:59:30 +01:00 |
|
Ines Montani
|
afda532595
|
Use symbols in tag map
|
2016-12-17 13:56:24 +01:00 |
|
Ines Montani
|
07249145c9
|
Fix formatting
|
2016-12-17 13:34:46 +01:00 |
|
Ines Montani
|
dd55d085b6
|
Reformat dutch language data to match new style
|
2016-12-17 13:26:01 +01:00 |
|
Ines Montani
|
f2c48ef504
|
Resolve stopwords conflict to merge Dutch
|
2016-12-17 13:08:16 +01:00 |
|
Matthew Honnibal
|
ff03ade08f
|
Merge pull request #688 from nlesc-sherlock/dutch
Support for Dutch in SpaCy
|
2016-12-17 22:44:58 +11:00 |
|
Ines Montani
|
a22322187f
|
Add missing lemmas to tokenizer exceptions (fixes #674)
|
2016-12-17 12:42:41 +01:00 |
|
Ines Montani
|
5445074cbd
|
Expand tokenizer exceptions with unicode apostrophe (fixes #685)
|
2016-12-17 12:34:08 +01:00 |
|
Ines Montani
|
e0a7b5c612
|
Fix formatting
|
2016-12-17 12:33:09 +01:00 |
|
Ines Montani
|
08162dce67
|
Move shared functions and constants to global language data
|
2016-12-17 12:32:48 +01:00 |
|
Ines Montani
|
6a60a61086
|
Move update_exc to global language data utils
|
2016-12-17 12:29:02 +01:00 |
|
Ines Montani
|
f324311249
|
Add global language data utils
|
2016-12-17 12:27:41 +01:00 |
|
Ines Montani
|
487ce1e20a
|
Add encoding declaration
|
2016-12-17 12:25:44 +01:00 |
|
Ines Montani
|
d8d50a0334
|
Add tokenizer exception for "gonna" (fixes #691)
|
2016-12-17 11:59:28 +01:00 |
|
Ines Montani
|
c69b77d8aa
|
Revert "Add exception for "gonna""
This reverts commit 280c03f67b .
|
2016-12-17 11:56:44 +01:00 |
|
Ines Montani
|
280c03f67b
|
Add exception for "gonna"
|
2016-12-17 11:54:59 +01:00 |
|
Ines Montani
|
5031a015e2
|
Fix typo in stopwords (fixes #689)
|
2016-12-15 17:57:06 +01:00 |
|
Janneke van der Zwaan
|
4a3fdcce8a
|
Merge github.com:explosion/spaCy into dutch
|
2016-12-13 09:25:23 +01:00 |
|
Matthew Honnibal
|
5965d3c2a7
|
Revert "Add acl to symbols.pyx"
|
2016-12-12 10:10:28 +11:00 |
|
Matthew Honnibal
|
6dee76dfed
|
Update symbols.pxd
|
2016-12-12 10:09:58 +11:00 |
|
Pokey Rule
|
18a15c0777
|
Add acl to symbols.pyx
|
2016-12-11 20:00:07 +00:00 |
|
Ines Montani
|
63024466a9
|
Add Portuguese stopwords
|
2016-12-08 20:45:07 +01:00 |
|
Ines Montani
|
7bfe2d4abc
|
Update Portuguese language data
|
2016-12-08 20:41:41 +01:00 |
|
Ines Montani
|
c0c5f31950
|
Remove unused data and download script
|
2016-12-08 20:39:49 +01:00 |
|
Ines Montani
|
0a6d529104
|
Remove unused data
|
2016-12-08 20:36:56 +01:00 |
|
Ines Montani
|
1b3b043660
|
Add French stopwords
|
2016-12-08 20:12:43 +01:00 |
|
Ines Montani
|
8863e504eb
|
Update French language data
|
2016-12-08 20:07:14 +01:00 |
|
Ines Montani
|
7cb9f51be6
|
Add Italian stopwords
|
2016-12-08 20:05:25 +01:00 |
|
Ines Montani
|
470a0e0bea
|
Update Italian language data
|
2016-12-08 19:52:18 +01:00 |
|
Ines Montani
|
1a284d342e
|
Add Spanish language data
|
2016-12-08 19:47:03 +01:00 |
|
Ines Montani
|
0c39654786
|
Remove unused import
|
2016-12-08 19:46:53 +01:00 |
|
Ines Montani
|
e47ee94761
|
Split punctuation into its own file
|
2016-12-08 19:46:43 +01:00 |
|
Ines Montani
|
70b51ed7c8
|
Remove time from German language data
|
2016-12-08 19:45:50 +01:00 |
|
Ines Montani
|
e8ae588be9
|
Add emoticons
|
2016-12-08 19:45:18 +01:00 |
|
Ines Montani
|
5908c0ed9f
|
Fix formatting
|
2016-12-08 19:45:11 +01:00 |
|
Ines Montani
|
311b30ab35
|
Reorganize exceptions for English and German
|
2016-12-08 13:58:32 +01:00 |
|
Ines Montani
|
66c7348cda
|
Add update_exc util function
|
2016-12-08 13:58:12 +01:00 |
|
Ines Montani
|
1256232fad
|
Fix formatting
|
2016-12-08 13:56:40 +01:00 |
|
Ines Montani
|
8e977cc71c
|
Fix formatting
|
2016-12-08 13:56:17 +01:00 |
|
Ines Montani
|
0176b99004
|
Fix formatting
|
2016-12-08 12:48:02 +01:00 |
|
Ines Montani
|
877f09218b
|
Add more custom rules for abbreviations
|
2016-12-08 12:47:01 +01:00 |
|
Ines Montani
|
bfaa42636c
|
Update language data for German
|
2016-12-08 12:01:09 +01:00 |
|
Ines Montani
|
ec44bee321
|
Fix capitalization on morphological features
|
2016-12-08 12:00:54 +01:00 |
|
Ines Montani
|
ce979553df
|
Resolve conflict
|
2016-12-07 21:16:52 +01:00 |
|
Ines Montani
|
8350d65695
|
Change morphology and lemmatizer API
Take morphology features as object instead of keyword arguments
|
2016-12-07 21:12:49 +01:00 |
|
Ines Montani
|
52e7d634df
|
Remove trailing whitespace
|
2016-12-07 21:12:19 +01:00 |
|
Ines Montani
|
0d07d7fc80
|
Apply emoticon exceptions to tokenizer
|
2016-12-07 21:11:59 +01:00 |
|
Ines Montani
|
71f0f34cb3
|
Fix formatting
|
2016-12-07 21:11:29 +01:00 |
|
Ines Montani
|
9413bcd9ee
|
Declare encoding and unicode literals
|
2016-12-07 21:10:34 +01:00 |
|
Ines Montani
|
a280ff2657
|
Fix __all__
|
2016-12-07 21:10:12 +01:00 |
|
Ines Montani
|
ba8721953c
|
Add missing emoticons
|
2016-12-07 21:09:44 +01:00 |
|
Ines Montani
|
1285c4ba93
|
Update English language data
|
2016-12-07 20:33:28 +01:00 |
|
Ines Montani
|
79dce0aabe
|
Add emoticons
|
2016-12-07 20:33:28 +01:00 |
|
Ines Montani
|
a662a95294
|
Add line breaks
|
2016-12-07 20:33:28 +01:00 |
|
Ines Montani
|
07f0efb102
|
Add test for tokenizer regular expressions
|
2016-12-07 20:33:28 +01:00 |
|
Ines Montani
|
e0712d1b32
|
Reformat language data
|
2016-12-07 20:33:28 +01:00 |
|
Matthew Honnibal
|
0c0f4c965d
|
Increment version
|
2016-12-03 11:16:52 +01:00 |
|
Matthew Honnibal
|
f6e356aada
|
Add (and test) Span.sentiment attribute. By default we average token.span, but can override with custom hook. Re Issue #667
|
2016-12-02 11:05:50 +01:00 |
|
Janneke van der Zwaan
|
88869e0e07
|
Merge github.com:explosion/spaCy into dutch
|
2016-11-30 17:13:39 +01:00 |
|
Janneke van der Zwaan
|
51ade86b86
|
Update language data with tag map from UD_Dutch
|
2016-11-30 14:41:23 +01:00 |
|
Janneke van der Zwaan
|
90f6ff12c9
|
Update Dutch language data
- Use Dutch tag map
- remove tokenizer exceptions
|
2016-11-30 11:59:39 +01:00 |
|
dafnevk
|
7b8f4c49f2
|
Added language Dutch to init file
|
2016-11-29 16:42:05 +01:00 |
|
Matthew Honnibal
|
296d33a4fc
|
Merge branch 'master' of ssh://github.com/explosion/spaCy
|
2016-11-26 12:36:18 +01:00 |
|
Matthew Honnibal
|
1f6c37c6f5
|
Fix create_tokenizer when nlp is None
|
2016-11-26 12:36:04 +01:00 |
|
Matthew Honnibal
|
c7889492f9
|
Fix model saving error for Python 3
|
2016-11-25 18:04:30 -06:00 |
|
Matthew Honnibal
|
bc0a202c9c
|
Fix unicode problem in nonproj module
|
2016-11-25 17:29:17 -06:00 |
|
Matthew Honnibal
|
6dd3b94fa6
|
Filter out deprecated attributes when reading special-case tokenization rules.
|
2016-11-25 09:57:18 -06:00 |
|
Matthew Honnibal
|
e879c79b8c
|
Merge branch 'master' of https://github.com/explosion/spaCy
|
2016-11-25 09:18:28 -06:00 |
|
Matthew Honnibal
|
a335c6dcc2
|
Exclude morphs from deprecated token attributes for now
|
2016-11-25 16:17:32 +01:00 |
|
Matthew Honnibal
|
f799a07f25
|
Merge branch 'master' of https://github.com/explosion/spaCy
|
2016-11-25 09:16:43 -06:00 |
|
Matthew Honnibal
|
159e8c46e1
|
Merge old training fixes with newer state
|
2016-11-25 09:16:36 -06:00 |
|
Matthew Honnibal
|
846e80f2f4
|
Exclude morphs from deprecated token attributes for now
|
2016-11-25 16:14:54 +01:00 |
|
Matthew Honnibal
|
664f2dd1c0
|
Allow dep to be None in scorer, for missing labels.
|
2016-11-25 09:02:49 -06:00 |
|
Matthew Honnibal
|
39341598bb
|
Fix NER label calculation
|
2016-11-25 09:02:22 -06:00 |
|
Matthew Honnibal
|
ca773a1f53
|
Tweak arc_eager n_gold to deal with negative costs, and improve error message.
|
2016-11-25 09:01:52 -06:00 |
|
Matthew Honnibal
|
a2f55e7015
|
Pass cfg through loading, for training.
|
2016-11-25 09:01:20 -06:00 |
|
Matthew Honnibal
|
608d8f5421
|
Pass cfg through parser, and have is_valid default to 1, not 0 when resetting state
|
2016-11-25 09:00:21 -06:00 |
|
Matthew Honnibal
|
cc7e607a8a
|
Fix gold.pyx for 1.0
|
2016-11-25 08:57:59 -06:00 |
|
root
|
080d29e092
|
Fix train.py for 1.0
|
2016-11-25 08:55:33 -06:00 |
|
Matthew Honnibal
|
6652f2a135
|
Test #656, #624: special case rules for tokenizer with attributes.
|
2016-11-25 12:44:13 +01:00 |
|
Matthew Honnibal
|
1e0f566d95
|
Fix #656, #624: Support arbitrary token attributes when adding special-case rules.
|
2016-11-25 12:43:24 +01:00 |
|
Matthew Honnibal
|
87613edf8f
|
Add set_struct_attr staticmethod to token
|
2016-11-25 12:41:47 +01:00 |
|
Matthew Honnibal
|
fb69aa648f
|
Merge branch 'master' of ssh://github.com/explosion/spaCy
|
2016-11-25 11:35:44 +01:00 |
|
Matthew Honnibal
|
9a03a3f85e
|
Add get_struct_attr staticmethod to Token, to match Lexeme.get_struct_attr.
|
2016-11-25 11:35:17 +01:00 |
|
Matthew Honnibal
|
53d8ca8f51
|
Add spacy.attrs.intify_attrs function, to normalize strings in token attribute dictionaries.
|
2016-11-25 11:34:30 +01:00 |
|
Ines Montani
|
d21ad01840
|
Add emoticons
|
2016-11-24 19:13:00 +01:00 |
|
dafnevk
|
d8c7ac203a
|
Added nl module for dutch
|
2016-11-24 16:39:49 +01:00 |
|
dafnevk
|
3db8b0d322
|
Added language class and some language data (with some TODOs) for Dutch
|
2016-11-24 15:56:38 +01:00 |
|
Ines Montani
|
4dcfafde02
|
Add line breaks
|
2016-11-24 14:57:37 +01:00 |
|
Ines Montani
|
6247c005a2
|
Add test for tokenizer regular expressions
|
2016-11-24 13:51:59 +01:00 |
|
Ines Montani
|
de747e39e7
|
Reformat language data
|
2016-11-24 13:51:32 +01:00 |
|
Matthew Honnibal
|
b8c4f5ea76
|
Allow German noun chunks to work on Span
Update the German noun chunks iterator, so that it also works on Span objects.
|
2016-11-24 23:30:15 +11:00 |
|
Pokey Rule
|
3e3bda142d
|
Add noun_chunks to Span
|
2016-11-24 10:47:20 +00:00 |
|
Janneke van der Zwaan
|
83daade0e4
|
Add directory and initial (empty) files for language Dutch
|
2016-11-24 09:45:41 +01:00 |
|