Ines Montani
|
28326649f3
|
Fix typo
|
2016-12-18 13:30:03 +01:00 |
|
Ines Montani
|
a9421652c9
|
Remove duplicates in tag map
|
2016-12-17 22:44:31 +01:00 |
|
Ines Montani
|
577adad945
|
Fix formatting
|
2016-12-17 14:00:52 +01:00 |
|
Ines Montani
|
bb94e784dc
|
Fix typo
|
2016-12-17 13:59:30 +01:00 |
|
Ines Montani
|
a22322187f
|
Add missing lemmas to tokenizer exceptions (fixes #674)
|
2016-12-17 12:42:41 +01:00 |
|
Ines Montani
|
5445074cbd
|
Expand tokenizer exceptions with unicode apostrophe (fixes #685)
|
2016-12-17 12:34:08 +01:00 |
|
Ines Montani
|
e0a7b5c612
|
Fix formatting
|
2016-12-17 12:33:09 +01:00 |
|
Ines Montani
|
08162dce67
|
Move shared functions and constants to global language data
|
2016-12-17 12:32:48 +01:00 |
|
Ines Montani
|
6a60a61086
|
Move update_exc to global language data utils
|
2016-12-17 12:29:02 +01:00 |
|
Ines Montani
|
487ce1e20a
|
Add encoding declaration
|
2016-12-17 12:25:44 +01:00 |
|
Ines Montani
|
d8d50a0334
|
Add tokenizer exception for "gonna" (fixes #691)
|
2016-12-17 11:59:28 +01:00 |
|
Ines Montani
|
c69b77d8aa
|
Revert "Add exception for "gonna""
This reverts commit 280c03f67b .
|
2016-12-17 11:56:44 +01:00 |
|
Ines Montani
|
280c03f67b
|
Add exception for "gonna"
|
2016-12-17 11:54:59 +01:00 |
|
Ines Montani
|
c0c5f31950
|
Remove unused data and download script
|
2016-12-08 20:39:49 +01:00 |
|
Ines Montani
|
0c39654786
|
Remove unused import
|
2016-12-08 19:46:53 +01:00 |
|
Ines Montani
|
e47ee94761
|
Split punctuation into its own file
|
2016-12-08 19:46:43 +01:00 |
|
Ines Montani
|
311b30ab35
|
Reorganize exceptions for English and German
|
2016-12-08 13:58:32 +01:00 |
|
Ines Montani
|
877f09218b
|
Add more custom rules for abbreviations
|
2016-12-08 12:47:01 +01:00 |
|
Ines Montani
|
ec44bee321
|
Fix capitalization on morphological features
|
2016-12-08 12:00:54 +01:00 |
|
Ines Montani
|
ce979553df
|
Resolve conflict
|
2016-12-07 21:16:52 +01:00 |
|
Ines Montani
|
0d07d7fc80
|
Apply emoticon exceptions to tokenizer
|
2016-12-07 21:11:59 +01:00 |
|
Ines Montani
|
71f0f34cb3
|
Fix formatting
|
2016-12-07 21:11:29 +01:00 |
|
Ines Montani
|
1285c4ba93
|
Update English language data
|
2016-12-07 20:33:28 +01:00 |
|
Ines Montani
|
a662a95294
|
Add line breaks
|
2016-12-07 20:33:28 +01:00 |
|
Ines Montani
|
e0712d1b32
|
Reformat language data
|
2016-12-07 20:33:28 +01:00 |
|
Ines Montani
|
4dcfafde02
|
Add line breaks
|
2016-11-24 14:57:37 +01:00 |
|
Ines Montani
|
de747e39e7
|
Reformat language data
|
2016-11-24 13:51:32 +01:00 |
|
Mark Amery
|
1988fce389
|
Merge remote-tracking branch 'origin/master' into specify-data-path
|
2016-11-20 16:07:14 +00:00 |
|
Mark Amery
|
3871007c72
|
Let --data-path be specified when running download.py scripts
Resolves https://github.com/explosion/spaCy/issues/637
|
2016-11-20 15:48:04 +00:00 |
|
Ines Montani
|
dad2c6cae9
|
Strip trailing whitespace
|
2016-11-20 16:45:51 +01:00 |
|
Matthew Honnibal
|
f0917b6808
|
Fix Issue #376: and/or was tagged as a noun.
|
2016-11-04 15:21:28 +01:00 |
|
Matthew Honnibal
|
737816e86e
|
Fix #368: Tokenizer handled pattern 'unicode close quote, period' incorrectly.
|
2016-11-04 15:16:20 +01:00 |
|
Matthew Honnibal
|
41a90a7fbb
|
Add tokenizer exception for 'Ph.D.', to fix 592.
|
2016-11-03 00:03:34 +01:00 |
|
Matthew Honnibal
|
e7414cd064
|
Try to fix weird install glitch.
|
2016-10-23 19:46:28 +02:00 |
|
Matthew Honnibal
|
622b0a9674
|
Tweak download script
|
2016-10-19 00:52:16 +02:00 |
|
Matthew Honnibal
|
edc45c19d6
|
Update download script
|
2016-10-19 00:41:14 +02:00 |
|
Matthew Honnibal
|
8c8f5c62c6
|
Add LANG attribute to English and German
|
2016-10-18 18:52:48 +02:00 |
|
Matthew Honnibal
|
ea23b64cc8
|
Refactor training, with new spacy.train module. Defaults still a little awkward.
|
2016-10-09 12:24:24 +02:00 |
|
Matthew Honnibal
|
7db956133e
|
Move tokenizer data for German into spacy.de.language_data
|
2016-09-25 15:37:33 +02:00 |
|
Matthew Honnibal
|
95aaea0d3f
|
Refactor so that the tokenizer data is read from Python data, rather than from disk
|
2016-09-25 14:49:53 +02:00 |
|
Matthew Honnibal
|
d7e9acdcdf
|
Add English language data, so that the tokenizer doesn't require the data download
|
2016-09-25 14:49:00 +02:00 |
|
Matthew Honnibal
|
fd65cf6cbb
|
Finish refactoring data loading
|
2016-09-24 20:26:17 +02:00 |
|
Henning Peters
|
470cdf5bf9
|
remove deprecated LOCAL_DATA_DIR
|
2016-04-05 11:25:54 +02:00 |
|
Henning Peters
|
a7d7ea3afa
|
first idea for supporting multiple langs in download script
|
2016-03-24 11:19:43 +01:00 |
|
Henning Peters
|
9cc4f8d5b3
|
avoid shadowing __name__
|
2016-02-15 01:33:39 +01:00 |
|
Matthew Honnibal
|
445164d5b4
|
* Restore the LOCAL_DATA_DIR global in spacy/en/__init__.py, although this is now deprecated
|
2016-01-19 02:54:56 +01:00 |
|
Henning Peters
|
5551052840
|
fix py2/3 issue
|
2016-01-16 12:44:53 +01:00 |
|
Henning Peters
|
235f094534
|
untangle data_path/via
|
2016-01-16 12:23:45 +01:00 |
|
Henning Peters
|
211913d689
|
add about.py, adapt setup.py
|
2016-01-15 18:57:01 +01:00 |
|
Henning Peters
|
780cb847c9
|
add default_model to about
|
2016-01-15 18:07:15 +01:00 |
|