Commit Graph

2853 Commits

Author SHA1 Message Date
Ines Montani
8676cd0135 Add newline 2017-05-03 09:38:07 +02:00
Yasuaki Uechi
c8f83aeb87 Add basic japanese support 2017-05-03 13:56:21 +09:00
Matthew Honnibal
31ec9e1371 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-27 13:21:39 +02:00
Matthew Honnibal
2da16adcc2 Add dropout optin for parser and NER
Dropout can now be specified in the `Parser.update()` method via
the `drop` keyword argument, e.g.

    nlp.entity.update(doc, gold, drop=0.4)

This will randomly drop 40% of features, and multiply the value of the
others by 1. / 0.4. This may be useful for generalising from small data
sets.

This commit also patches the examples/training/train_new_entity_type.py
example, to use dropout and fix the output (previously it did not output
the learned entity).
2017-04-27 13:18:39 +02:00
Ines Montani
7da9cefd25 Merge pull request #1022 from luvogels/master
Initial support for Norwegian Bokmål
2017-04-27 11:16:06 +02:00
Ines Montani
c9e592ae6c Add newline 2017-04-27 11:15:41 +02:00
Ines Montani
5942adccc2 Add newline 2017-04-27 11:15:19 +02:00
Ines Montani
4cd9269aef Add newline 2017-04-27 11:15:04 +02:00
Ines Montani
ccf13ecc21 Add newline 2017-04-27 11:14:42 +02:00
Ines Montani
03d2b0cc05 Add newline 2017-04-27 11:14:26 +02:00
luvogels
d12a0b6431 Hooked up tokenizer tests 2017-04-26 23:21:41 +02:00
Matthew Honnibal
f0e1606d27 Increment version 2017-04-26 20:25:41 +02:00
luvogels
b331929a7e Merge branch 'master' of https://github.com/luvogels/spaCy 2017-04-26 19:15:48 +02:00
luvogels
8de59ce3b9 Added tokenizer tests 2017-04-26 19:10:18 +02:00
Matthew Honnibal
4d98511db7 Make Span hashable. Closes #1019 2017-04-26 19:01:05 +02:00
Matthew Honnibal
24c4c51f13 Try to make test999 less flakey 2017-04-26 18:42:06 +02:00
Leif Uwe Vogelsang
460094bf09 Update __init__.py 2017-04-26 18:27:55 +02:00
ines
527d51ac9a Fetch shortcuts from GitHub and improve error handling 2017-04-26 18:00:28 +02:00
Matthew Honnibal
c4be9c36fe Fix unicode header in tests 2017-04-24 10:09:01 +02:00
Matthew Honnibal
65f10b53e5 Fix test 2017-04-24 00:25:55 +02:00
Matthew Honnibal
70a43858e1 Fix flakey test 2017-04-24 00:06:30 +02:00
Matthew Honnibal
3973af2d15 Make training test less flakey 2017-04-23 22:59:34 +02:00
Matthew Honnibal
4f9657b42b Fix reporting if no dev data with train 2017-04-23 22:27:10 +02:00
Matthew Honnibal
df2ac8b843 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-23 21:25:07 +02:00
Matthew Honnibal
d0e19267e8 Create directory if missing in save_to_directory 2017-04-23 21:24:43 +02:00
ines
42305bc519 Remove unnecessary test 2017-04-23 21:21:41 +02:00
ines
012ea594d1 Add file for misc tests 2017-04-23 21:06:51 +02:00
ines
83f66947dc Rename test_download to test_cli 2017-04-23 21:06:50 +02:00
ines
401045433c Simplify compat.fix_text 2017-04-23 21:06:50 +02:00
Matthew Honnibal
e033c86a64 Increment version 2017-04-23 21:03:43 +02:00
Matthew Honnibal
d2436dc17b Update fix for Issue #999 2017-04-23 18:14:37 +02:00
Matthew Honnibal
874a3cbb07 Add test for Issue #955 2017-04-23 17:57:01 +02:00
Matthew Honnibal
60703cede5 Ensure noun chunks can't be nested. Closes #955 2017-04-23 17:56:39 +02:00
Matthew Honnibal
c9ec24b257 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-23 17:07:46 +02:00
Matthew Honnibal
5d8af40445 Add test for Issue #999 2017-04-23 17:06:30 +02:00
Matthew Honnibal
4d2a659c52 Fix json dump for Python3 2017-04-23 17:05:53 +02:00
Matthew Honnibal
040751ad17 Remove xfail on Test #910 2017-04-23 16:28:55 +02:00
ines
3a9710f356 Pass dev_scores to print_progress correctly (resolves #1008)
Only read scores attribute if command is used with dev_data, otherwise
default dev_scores to empty dict.
2017-04-23 15:58:40 +02:00
Matthew Honnibal
1b12f342e4 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-20 17:03:11 +02:00
Matthew Honnibal
4eef200bab Persist the actions within spacy.parser.cfg 2017-04-20 17:02:44 +02:00
ines
25c70b4cc5 Move fix_text to spacy.compat (see #1002) 2017-04-20 15:47:17 +02:00
Ines Montani
60b5243bee Merge pull request #1002 from oroszgy/model_cli_fix
Fixes for the `model` CLI
2017-04-20 15:41:03 +02:00
Gyorgy Orosz
4a06a2572c Using ftfy for handling broken encoded strings. 2017-04-20 13:34:51 +02:00
Ines Montani
3800b29046 Merge pull request #1001 from recognai/master
Add SPACE to es tag map
2017-04-20 12:16:34 +02:00
oeg
f0bcd0babb fix(model): Add SPACE to es tag_map. Fixing error in morphology.pyx when SP tag is missing 2017-04-20 11:36:24 +02:00
Ben Eyal
e90e8a3f10 Enable test 2017-04-20 02:25:24 +03:00
Ben Eyal
33af52599e Redefine alphabetic characters
For caseless languages (Hebrew, Bengali) all characters are both lowercase and uppercase.
2017-04-20 02:25:02 +03:00
Ben Eyal
d8098a8be2 Use regex instead of re 2017-04-20 02:22:52 +03:00
oeg
daaa42dd25 Merge remote-tracking branch 'upstream/master' 2017-04-19 23:30:36 +02:00
oeg
936a297241 fix(model): Fix tag map for fixing issues with tag SPACE 2017-04-19 23:30:21 +02:00
luvogels
c7cec7e5e2 Update __init__.py 2017-04-19 21:06:30 +02:00
luvogels
55e8cade36 Update __init__.py 2017-04-19 21:06:30 +02:00
luvogels
03abd0c8e6 Update __init__.py 2017-04-19 21:06:30 +02:00
Leif Uwe Vogelsang
538a8d6b12 Resolved merge conflict by incorporating both suggestions. 2017-04-19 21:06:07 +02:00
Leif Uwe Vogelsang
e821c48489 Norwegian language basics 2017-04-19 21:04:01 +02:00
Leif Uwe Vogelsang
3796c668d9 more norwegian 2017-04-19 21:01:32 +02:00
Leif Uwe Vogelsang
bc9557b21f Norwegian language basics 2017-04-19 21:00:01 +02:00
ines
2bd89e7ade Tidy up Hebrew tests and test for punctuation (see #995) 2017-04-19 19:28:03 +02:00
ines
48da244058 Use spacy.compat.json_dumps for Python 2/3 compatibility (resolves #991) 2017-04-19 11:50:36 +02:00
ines
ddd5194088 Update Language docs and docstrings 2017-04-17 01:52:13 +02:00
ines
f62b740961 Use compat.json_dumps 2017-04-17 01:46:14 +02:00
ines
8e83f8e2fa Update docstrings 2017-04-17 01:40:26 +02:00
ines
e2299dc389 Ensure path in save_to_directory 2017-04-17 01:40:14 +02:00
ines
82f5f1f98f Replace str with compat.unicode_ 2017-04-17 01:29:54 +02:00
ines
16a8521efa Increment version 2017-04-16 22:38:38 +02:00
Matthew Honnibal
4efd6fb9d6 Fix training 2017-04-16 15:28:27 -05:00
Matthew Honnibal
17c9fffb9e Fix naked except 2017-04-16 15:28:16 -05:00
ines
5610fdcc06 Get language name first if no model path exists
Makes sure spaCy fails early if no tokenizer exists, and allows
printing better error message.
2017-04-16 22:16:47 +02:00
ines
ad168ba88c Set model name to empty string if path override exists
Required for parse_package_meta, which composes path of data_path and
model_name (needs to be fixed in the future)
2017-04-16 22:15:51 +02:00
ines
97647c46cd Add docstring and todo note 2017-04-16 22:14:45 +02:00
ines
5c5f8c0a72 Check if full string is found in lang classes first
This allows users to set arbitrary strings. (Otherwise, custom lang
class "my_custom_class" would always load Burmese "my" tokenizer if one
was available.)
2017-04-16 22:14:38 +02:00
ines
13d30b6c01 xfail lemmatizer test that's causing problems (see #546) 2017-04-16 21:18:39 +02:00
Matthew Honnibal
4931c56afc Increment version 2017-04-16 13:59:38 -05:00
ines
6145b7c153 Remove redundant Path 2017-04-16 20:53:25 +02:00
Matthew Honnibal
fa89613444 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-16 13:42:56 -05:00
ines
1f9f867c70 Remove unused util function 2017-04-16 20:37:45 +02:00
ines
7670c745b6 Update spacy.load() and fix path checks 2017-04-16 20:37:45 +02:00
ines
d3759dfb32 Fix docstring 2017-04-16 20:37:45 +02:00
ines
ed7e19ad68 Remove unused import 2017-04-16 20:37:45 +02:00
ines
0084466a66 Remove unused utf8open util and replace os.path with ensure_path 2017-04-16 20:37:45 +02:00
Matthew Honnibal
89a4f262fc Fix training methods 2017-04-16 13:00:37 -05:00
Matthew Honnibal
6a4221a6de Allow lemma to be set from Python. Re #973 2017-04-16 18:07:53 +02:00
Matthew Honnibal
137b210bcf Restore use of FTRL training 2017-04-16 18:02:42 +02:00
ines
d10bd0eaf9 Fix formatting 2017-04-16 13:42:34 +02:00
ines
8191e33cf1 Update link error message with info on permissions 2017-04-16 13:32:31 +02:00
ines
a3ddbc0444 Add note about --force flag to error message 2017-04-16 13:14:36 +02:00
ines
e3de035814 Add meta validation to check for required settings
Complain if no "lang", "name" or "version" is found (those settings are
used in directory / package names). Package will still build without,
but it'll inevitably fail somewhere down the line.
2017-04-16 13:13:17 +02:00
ines
a7574b7572 Add more options to read in meta data in package command
Add meta option to supply path to meta.json. If no meta path is set,
check if meta.json exists in input directory and use it. Otherwise,
prompt for details on the command line.
2017-04-16 13:06:02 +02:00
ines
13c8a42d2b Fix typos 2017-04-16 13:03:58 +02:00
ines
31fa73293a Move read_json out to own util function 2017-04-16 13:03:28 +02:00
Matthew Honnibal
45464d065e Remove print statement 2017-04-15 16:11:43 +02:00
Matthew Honnibal
c76cb8af35 Fix training for new labels 2017-04-15 16:11:26 +02:00
Matthew Honnibal
4884b2c113 Refix StepwiseState 2017-04-15 16:00:28 +02:00
Matthew Honnibal
e6ee7e130f Fix parse package meta 2017-04-15 13:38:53 +02:00
Matthew Honnibal
1a98e48b8e Fix Stepwisestate' 2017-04-15 13:35:01 +02:00
ines
0739ae7b76 Tidy up and fix formatting and imports 2017-04-15 13:05:15 +02:00
ines
fefe6684cd Fix symlink function to check for Windows 2017-04-15 12:17:27 +02:00
ines
35fb4febe2 Fix whitespace 2017-04-15 12:13:45 +02:00
ines
e1efd589c3 Fix json imports and use ujson 2017-04-15 12:13:34 +02:00
ines
958b12dec8 Use pathlib instead of os.path 2017-04-15 12:13:00 +02:00