Matthew Honnibal
5d8af40445
Add test for Issue #999
2017-04-23 17:06:30 +02:00
Matthew Honnibal
4d2a659c52
Fix json dump for Python3
2017-04-23 17:05:53 +02:00
Matthew Honnibal
040751ad17
Remove xfail on Test #910
2017-04-23 16:28:55 +02:00
ines
3a9710f356
Pass dev_scores to print_progress correctly ( resolves #1008 )
...
Only read scores attribute if command is used with dev_data, otherwise
default dev_scores to empty dict.
2017-04-23 15:58:40 +02:00
Matthew Honnibal
1b12f342e4
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-20 17:03:11 +02:00
Matthew Honnibal
4eef200bab
Persist the actions within spacy.parser.cfg
2017-04-20 17:02:44 +02:00
ines
25c70b4cc5
Move fix_text to spacy.compat (see #1002 )
2017-04-20 15:47:17 +02:00
Ines Montani
60b5243bee
Merge pull request #1002 from oroszgy/model_cli_fix
...
Fixes for the `model` CLI
2017-04-20 15:41:03 +02:00
Gyorgy Orosz
4a06a2572c
Using ftfy for handling broken encoded strings.
2017-04-20 13:34:51 +02:00
Ines Montani
3800b29046
Merge pull request #1001 from recognai/master
...
Add SPACE to es tag map
2017-04-20 12:16:34 +02:00
oeg
f0bcd0babb
fix(model): Add SPACE to es tag_map. Fixing error in morphology.pyx when SP tag is missing
2017-04-20 11:36:24 +02:00
Ben Eyal
e90e8a3f10
Enable test
2017-04-20 02:25:24 +03:00
Ben Eyal
33af52599e
Redefine alphabetic characters
...
For caseless languages (Hebrew, Bengali) all characters are both lowercase and uppercase.
2017-04-20 02:25:02 +03:00
Ben Eyal
d8098a8be2
Use regex
instead of re
2017-04-20 02:22:52 +03:00
oeg
daaa42dd25
Merge remote-tracking branch 'upstream/master'
2017-04-19 23:30:36 +02:00
oeg
936a297241
fix(model): Fix tag map for fixing issues with tag SPACE
2017-04-19 23:30:21 +02:00
luvogels
c7cec7e5e2
Update __init__.py
2017-04-19 21:06:30 +02:00
luvogels
55e8cade36
Update __init__.py
2017-04-19 21:06:30 +02:00
luvogels
03abd0c8e6
Update __init__.py
2017-04-19 21:06:30 +02:00
Leif Uwe Vogelsang
538a8d6b12
Resolved merge conflict by incorporating both suggestions.
2017-04-19 21:06:07 +02:00
Leif Uwe Vogelsang
e821c48489
Norwegian language basics
2017-04-19 21:04:01 +02:00
Leif Uwe Vogelsang
3796c668d9
more norwegian
2017-04-19 21:01:32 +02:00
Leif Uwe Vogelsang
bc9557b21f
Norwegian language basics
2017-04-19 21:00:01 +02:00
ines
2bd89e7ade
Tidy up Hebrew tests and test for punctuation (see #995 )
2017-04-19 19:28:03 +02:00
ines
48da244058
Use spacy.compat.json_dumps for Python 2/3 compatibility ( resolves #991 )
2017-04-19 11:50:36 +02:00
ines
ddd5194088
Update Language docs and docstrings
2017-04-17 01:52:13 +02:00
ines
f62b740961
Use compat.json_dumps
2017-04-17 01:46:14 +02:00
ines
8e83f8e2fa
Update docstrings
2017-04-17 01:40:26 +02:00
ines
e2299dc389
Ensure path in save_to_directory
2017-04-17 01:40:14 +02:00
ines
82f5f1f98f
Replace str with compat.unicode_
2017-04-17 01:29:54 +02:00
ines
16a8521efa
Increment version
2017-04-16 22:38:38 +02:00
Matthew Honnibal
4efd6fb9d6
Fix training
2017-04-16 15:28:27 -05:00
Matthew Honnibal
17c9fffb9e
Fix naked except
2017-04-16 15:28:16 -05:00
ines
5610fdcc06
Get language name first if no model path exists
...
Makes sure spaCy fails early if no tokenizer exists, and allows
printing better error message.
2017-04-16 22:16:47 +02:00
ines
ad168ba88c
Set model name to empty string if path override exists
...
Required for parse_package_meta, which composes path of data_path and
model_name (needs to be fixed in the future)
2017-04-16 22:15:51 +02:00
ines
97647c46cd
Add docstring and todo note
2017-04-16 22:14:45 +02:00
ines
5c5f8c0a72
Check if full string is found in lang classes first
...
This allows users to set arbitrary strings. (Otherwise, custom lang
class "my_custom_class" would always load Burmese "my" tokenizer if one
was available.)
2017-04-16 22:14:38 +02:00
ines
13d30b6c01
xfail lemmatizer test that's causing problems (see #546 )
2017-04-16 21:18:39 +02:00
Matthew Honnibal
4931c56afc
Increment version
2017-04-16 13:59:38 -05:00
ines
6145b7c153
Remove redundant Path
2017-04-16 20:53:25 +02:00
Matthew Honnibal
fa89613444
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-16 13:42:56 -05:00
ines
1f9f867c70
Remove unused util function
2017-04-16 20:37:45 +02:00
ines
7670c745b6
Update spacy.load() and fix path checks
2017-04-16 20:37:45 +02:00
ines
d3759dfb32
Fix docstring
2017-04-16 20:37:45 +02:00
ines
ed7e19ad68
Remove unused import
2017-04-16 20:37:45 +02:00
ines
0084466a66
Remove unused utf8open util and replace os.path with ensure_path
2017-04-16 20:37:45 +02:00
Matthew Honnibal
89a4f262fc
Fix training methods
2017-04-16 13:00:37 -05:00
Matthew Honnibal
6a4221a6de
Allow lemma to be set from Python. Re #973
2017-04-16 18:07:53 +02:00
Matthew Honnibal
137b210bcf
Restore use of FTRL training
2017-04-16 18:02:42 +02:00
ines
d10bd0eaf9
Fix formatting
2017-04-16 13:42:34 +02:00
ines
8191e33cf1
Update link error message with info on permissions
2017-04-16 13:32:31 +02:00
ines
a3ddbc0444
Add note about --force flag to error message
2017-04-16 13:14:36 +02:00
ines
e3de035814
Add meta validation to check for required settings
...
Complain if no "lang", "name" or "version" is found (those settings are
used in directory / package names). Package will still build without,
but it'll inevitably fail somewhere down the line.
2017-04-16 13:13:17 +02:00
ines
a7574b7572
Add more options to read in meta data in package command
...
Add meta option to supply path to meta.json. If no meta path is set,
check if meta.json exists in input directory and use it. Otherwise,
prompt for details on the command line.
2017-04-16 13:06:02 +02:00
ines
13c8a42d2b
Fix typos
2017-04-16 13:03:58 +02:00
ines
31fa73293a
Move read_json out to own util function
2017-04-16 13:03:28 +02:00
Matthew Honnibal
45464d065e
Remove print statement
2017-04-15 16:11:43 +02:00
Matthew Honnibal
c76cb8af35
Fix training for new labels
2017-04-15 16:11:26 +02:00
Matthew Honnibal
4884b2c113
Refix StepwiseState
2017-04-15 16:00:28 +02:00
Matthew Honnibal
e6ee7e130f
Fix parse package meta
2017-04-15 13:38:53 +02:00
Matthew Honnibal
1a98e48b8e
Fix Stepwisestate'
2017-04-15 13:35:01 +02:00
ines
0739ae7b76
Tidy up and fix formatting and imports
2017-04-15 13:05:15 +02:00
ines
fefe6684cd
Fix symlink function to check for Windows
2017-04-15 12:17:27 +02:00
ines
35fb4febe2
Fix whitespace
2017-04-15 12:13:45 +02:00
ines
e1efd589c3
Fix json imports and use ujson
2017-04-15 12:13:34 +02:00
ines
958b12dec8
Use pathlib instead of os.path
2017-04-15 12:13:00 +02:00
ines
956dc36785
Move functions to deprecated
2017-04-15 12:12:31 +02:00
ines
c05ec4b89a
Add compat functions and remove old workarounds
...
Add ensure_path util function to handle checking instance of path
2017-04-15 12:11:16 +02:00
ines
26445ee304
Add compat module for Python2/3 and platform compatibility
2017-04-15 12:07:02 +02:00
ines
d24589aa72
Clean up imports, unused code, whitespace, docstrings
2017-04-15 12:05:47 +02:00
ines
561f2a3eb4
Use consistent formatting for docstrings
2017-04-15 11:59:21 +02:00
Matthew Honnibal
d13f0a7017
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-04-14 23:54:57 +02:00
Matthew Honnibal
354458484c
WIP on add_label bug during NER training
...
Currently when a new label is introduced to NER during training,
it causes the labels to be read in in an unexpected order. This
invalidates the model.
2017-04-14 23:52:17 +02:00
Matthew Honnibal
33ba5066eb
Refactor Language.end_training, making new save_to_directory method
2017-04-14 23:51:24 +02:00
ines
84341c2975
Only compile list of models if data_path exists
2017-04-14 16:48:02 +02:00
Gyorgy Orosz
dd3244c08a
Made json dump to produce unicode strings in py2
2017-04-13 23:30:47 +02:00
Gyorgy Orosz
a9469c8173
Fixed typo
2017-04-13 15:24:14 +02:00
ines
41037f0f07
Remove unused imports
2017-04-13 13:52:11 +02:00
ines
1b92c8d5d5
Use unicode paths on Windows/Python 2 and catch other errors ( resolves #970 )
...
try/except here is quite dirty, but it'll at least make sure users see
an error message that explains what's going on
2017-04-10 17:49:51 +02:00
Matthew Honnibal
49e2de900e
Add costs property to StepwiseState, to show which moves are gold.
2017-04-10 11:37:04 +02:00
Matthew Honnibal
e26577b202
Increment version
2017-04-07 18:45:06 +02:00
Matthew Honnibal
40bf7ecf27
Increment version
2017-04-07 18:44:20 +02:00
Matthew Honnibal
1dca7eeb03
Add unicode declaration on new regression test
2017-04-07 18:09:23 +02:00
ines
887827fc6a
Merge branch 'develop'
2017-04-07 17:36:23 +02:00
ines
444dd511c5
Fix xpassing URL test case
2017-04-07 17:36:05 +02:00
ines
bf0f15e762
Add / to tokenizer infixes ( resolves #891 )
2017-04-07 17:30:44 +02:00
ines
00b9011a49
Fix whitespace
2017-04-07 17:29:59 +02:00
ines
f9869e4dc5
Merge branch 'master' into develop
2017-04-07 17:23:40 +02:00
Matthew Honnibal
4a6204dbad
Merge remote-tracking branch 'origin/develop'
2017-04-07 17:20:09 +02:00
Matthew Honnibal
0513c43bf0
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-07 17:07:10 +02:00
Matthew Honnibal
cc36c308f4
Fix noun_chunk rules around coordination
...
Closes #693 .
2017-04-07 17:06:40 +02:00
Matthew Honnibal
ab846256cf
Merge pull request #966 from recognai/master
...
Prepare Spanish language for training models, including configuration, rich-UD tag map and tests
2017-04-07 16:12:29 +02:00
Matthew Honnibal
83dca920d4
Rename test #913 -> #957 , comment
...
Make test for #957 reference correct bug. Add comment.
Previous commit closes #957 .
2017-04-07 15:54:25 +02:00
Matthew Honnibal
be204ed714
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-07 15:50:14 +02:00
Matthew Honnibal
e7b1ee9efd
Switch to regex module for URL identification
...
The URL detection regex was failing on input such as 0.1.2.3, as this
input triggered excessive back-tracking in the builtin re module.
The solution was to switch to the regex module, which behaves better.
Closes #913 .
2017-04-07 15:47:36 +02:00
Matthew Honnibal
5887383fc0
Add test for Issue #913 : Hang from bad regex
2017-04-07 15:47:27 +02:00
ines
7ea1673072
Fix whitespace
2017-04-07 13:28:48 +02:00
ines
255650dbc2
Add connlu2json converter from explosion/spacy-dev-resources/#11
2017-04-07 13:05:12 +02:00
ines
789ce8a45e
Add convert command
2017-04-07 13:04:17 +02:00
ines
9952d3b08a
Fix whitespace
2017-04-07 13:02:05 +02:00