Raphaël Bournhonesque
f37d078d6a
Fix issue #1069 with custom hook Doc.sents
definition
2017-05-18 09:59:38 +02:00
ines
9003fd25e5
Fix error messages if model is required ( resolves #1051 )
...
Rename about.__docs__ to about.__docs_models__.
2017-05-13 13:14:02 +02:00
ines
24e973b17f
Rename about.__docs__ to about.__docs_models__
2017-05-13 13:09:00 +02:00
ines
6e1dbc608e
Fix parse_tree test
2017-05-13 12:34:20 +02:00
ines
573f0ba867
Replace deepcopy
2017-05-13 12:34:14 +02:00
ines
bd428c0a70
Set defaults for light and flat kwargs
2017-05-13 12:34:05 +02:00
ines
c5669450a0
Fix formatting
2017-05-13 12:33:57 +02:00
Matthew Honnibal
ad590feaa8
Fix test, which imported English incorrectly
2017-05-13 11:36:19 +02:00
Ines Montani
8d742ac8ff
Merge pull request #1055 from recognai/master
...
Enable pruning out rare words from clusters data
2017-05-13 03:22:56 +02:00
Matthew Honnibal
b2540d2379
Merge Kengz's tree_print patch
2017-05-13 03:18:49 +02:00
oeg
cdaefae60a
feature(populate_vocab): Enable pruning out rare words from clusters data
2017-05-12 16:15:19 +02:00
ines
b1f22c5a10
Fix formatting
2017-05-03 20:11:02 +02:00
ines
a04b5be1b2
Add glossary for annotation scheme ( closes #1034 )
...
Can be imported as explain from spacy.glossary, or called as
spacy.explain(term)
2017-05-03 17:02:17 +02:00
Ines Montani
3ea23a3f4d
Fix formatting
2017-05-03 09:44:38 +02:00
Ines Montani
d730eb0c0d
Raise custom ImportError if importing janome fails
2017-05-03 09:43:29 +02:00
Ines Montani
949ad6594b
Add newline
2017-05-03 09:38:43 +02:00
Ines Montani
d12ca587ea
Add newline
2017-05-03 09:38:29 +02:00
Ines Montani
8676cd0135
Add newline
2017-05-03 09:38:07 +02:00
Yasuaki Uechi
c8f83aeb87
Add basic japanese support
2017-05-03 13:56:21 +09:00
Matthew Honnibal
31ec9e1371
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-27 13:21:39 +02:00
Matthew Honnibal
2da16adcc2
Add dropout optin for parser and NER
...
Dropout can now be specified in the `Parser.update()` method via
the `drop` keyword argument, e.g.
nlp.entity.update(doc, gold, drop=0.4)
This will randomly drop 40% of features, and multiply the value of the
others by 1. / 0.4. This may be useful for generalising from small data
sets.
This commit also patches the examples/training/train_new_entity_type.py
example, to use dropout and fix the output (previously it did not output
the learned entity).
2017-04-27 13:18:39 +02:00
Ines Montani
7da9cefd25
Merge pull request #1022 from luvogels/master
...
Initial support for Norwegian Bokmål
2017-04-27 11:16:06 +02:00
Ines Montani
c9e592ae6c
Add newline
2017-04-27 11:15:41 +02:00
Ines Montani
5942adccc2
Add newline
2017-04-27 11:15:19 +02:00
Ines Montani
4cd9269aef
Add newline
2017-04-27 11:15:04 +02:00
Ines Montani
ccf13ecc21
Add newline
2017-04-27 11:14:42 +02:00
Ines Montani
03d2b0cc05
Add newline
2017-04-27 11:14:26 +02:00
luvogels
d12a0b6431
Hooked up tokenizer tests
2017-04-26 23:21:41 +02:00
Matthew Honnibal
f0e1606d27
Increment version
2017-04-26 20:25:41 +02:00
luvogels
b331929a7e
Merge branch 'master' of https://github.com/luvogels/spaCy
2017-04-26 19:15:48 +02:00
luvogels
8de59ce3b9
Added tokenizer tests
2017-04-26 19:10:18 +02:00
Matthew Honnibal
4d98511db7
Make Span hashable. Closes #1019
2017-04-26 19:01:05 +02:00
Matthew Honnibal
24c4c51f13
Try to make test999 less flakey
2017-04-26 18:42:06 +02:00
Leif Uwe Vogelsang
460094bf09
Update __init__.py
2017-04-26 18:27:55 +02:00
ines
527d51ac9a
Fetch shortcuts from GitHub and improve error handling
2017-04-26 18:00:28 +02:00
Matthew Honnibal
c4be9c36fe
Fix unicode header in tests
2017-04-24 10:09:01 +02:00
Matthew Honnibal
65f10b53e5
Fix test
2017-04-24 00:25:55 +02:00
Matthew Honnibal
70a43858e1
Fix flakey test
2017-04-24 00:06:30 +02:00
Matthew Honnibal
3973af2d15
Make training test less flakey
2017-04-23 22:59:34 +02:00
Matthew Honnibal
4f9657b42b
Fix reporting if no dev data with train
2017-04-23 22:27:10 +02:00
Matthew Honnibal
df2ac8b843
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-23 21:25:07 +02:00
Matthew Honnibal
d0e19267e8
Create directory if missing in save_to_directory
2017-04-23 21:24:43 +02:00
ines
42305bc519
Remove unnecessary test
2017-04-23 21:21:41 +02:00
ines
012ea594d1
Add file for misc tests
2017-04-23 21:06:51 +02:00
ines
83f66947dc
Rename test_download to test_cli
2017-04-23 21:06:50 +02:00
ines
401045433c
Simplify compat.fix_text
2017-04-23 21:06:50 +02:00
Matthew Honnibal
e033c86a64
Increment version
2017-04-23 21:03:43 +02:00
Matthew Honnibal
d2436dc17b
Update fix for Issue #999
2017-04-23 18:14:37 +02:00
Matthew Honnibal
874a3cbb07
Add test for Issue #955
2017-04-23 17:57:01 +02:00
Matthew Honnibal
60703cede5
Ensure noun chunks can't be nested. Closes #955
2017-04-23 17:56:39 +02:00
Matthew Honnibal
c9ec24b257
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-23 17:07:46 +02:00
Matthew Honnibal
5d8af40445
Add test for Issue #999
2017-04-23 17:06:30 +02:00
Matthew Honnibal
4d2a659c52
Fix json dump for Python3
2017-04-23 17:05:53 +02:00
Matthew Honnibal
040751ad17
Remove xfail on Test #910
2017-04-23 16:28:55 +02:00
ines
3a9710f356
Pass dev_scores to print_progress correctly ( resolves #1008 )
...
Only read scores attribute if command is used with dev_data, otherwise
default dev_scores to empty dict.
2017-04-23 15:58:40 +02:00
Matthew Honnibal
1b12f342e4
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-20 17:03:11 +02:00
Matthew Honnibal
4eef200bab
Persist the actions within spacy.parser.cfg
2017-04-20 17:02:44 +02:00
ines
25c70b4cc5
Move fix_text to spacy.compat (see #1002 )
2017-04-20 15:47:17 +02:00
Ines Montani
60b5243bee
Merge pull request #1002 from oroszgy/model_cli_fix
...
Fixes for the `model` CLI
2017-04-20 15:41:03 +02:00
Gyorgy Orosz
4a06a2572c
Using ftfy for handling broken encoded strings.
2017-04-20 13:34:51 +02:00
Ines Montani
3800b29046
Merge pull request #1001 from recognai/master
...
Add SPACE to es tag map
2017-04-20 12:16:34 +02:00
oeg
f0bcd0babb
fix(model): Add SPACE to es tag_map. Fixing error in morphology.pyx when SP tag is missing
2017-04-20 11:36:24 +02:00
Ben Eyal
e90e8a3f10
Enable test
2017-04-20 02:25:24 +03:00
Ben Eyal
33af52599e
Redefine alphabetic characters
...
For caseless languages (Hebrew, Bengali) all characters are both lowercase and uppercase.
2017-04-20 02:25:02 +03:00
Ben Eyal
d8098a8be2
Use regex
instead of re
2017-04-20 02:22:52 +03:00
oeg
daaa42dd25
Merge remote-tracking branch 'upstream/master'
2017-04-19 23:30:36 +02:00
oeg
936a297241
fix(model): Fix tag map for fixing issues with tag SPACE
2017-04-19 23:30:21 +02:00
luvogels
c7cec7e5e2
Update __init__.py
2017-04-19 21:06:30 +02:00
luvogels
55e8cade36
Update __init__.py
2017-04-19 21:06:30 +02:00
luvogels
03abd0c8e6
Update __init__.py
2017-04-19 21:06:30 +02:00
Leif Uwe Vogelsang
538a8d6b12
Resolved merge conflict by incorporating both suggestions.
2017-04-19 21:06:07 +02:00
Leif Uwe Vogelsang
e821c48489
Norwegian language basics
2017-04-19 21:04:01 +02:00
Leif Uwe Vogelsang
3796c668d9
more norwegian
2017-04-19 21:01:32 +02:00
Leif Uwe Vogelsang
bc9557b21f
Norwegian language basics
2017-04-19 21:00:01 +02:00
ines
2bd89e7ade
Tidy up Hebrew tests and test for punctuation (see #995 )
2017-04-19 19:28:03 +02:00
ines
48da244058
Use spacy.compat.json_dumps for Python 2/3 compatibility ( resolves #991 )
2017-04-19 11:50:36 +02:00
ines
ddd5194088
Update Language docs and docstrings
2017-04-17 01:52:13 +02:00
ines
f62b740961
Use compat.json_dumps
2017-04-17 01:46:14 +02:00
ines
8e83f8e2fa
Update docstrings
2017-04-17 01:40:26 +02:00
ines
e2299dc389
Ensure path in save_to_directory
2017-04-17 01:40:14 +02:00
ines
82f5f1f98f
Replace str with compat.unicode_
2017-04-17 01:29:54 +02:00
ines
16a8521efa
Increment version
2017-04-16 22:38:38 +02:00
Matthew Honnibal
4efd6fb9d6
Fix training
2017-04-16 15:28:27 -05:00
Matthew Honnibal
17c9fffb9e
Fix naked except
2017-04-16 15:28:16 -05:00
ines
5610fdcc06
Get language name first if no model path exists
...
Makes sure spaCy fails early if no tokenizer exists, and allows
printing better error message.
2017-04-16 22:16:47 +02:00
ines
ad168ba88c
Set model name to empty string if path override exists
...
Required for parse_package_meta, which composes path of data_path and
model_name (needs to be fixed in the future)
2017-04-16 22:15:51 +02:00
ines
97647c46cd
Add docstring and todo note
2017-04-16 22:14:45 +02:00
ines
5c5f8c0a72
Check if full string is found in lang classes first
...
This allows users to set arbitrary strings. (Otherwise, custom lang
class "my_custom_class" would always load Burmese "my" tokenizer if one
was available.)
2017-04-16 22:14:38 +02:00
ines
13d30b6c01
xfail lemmatizer test that's causing problems (see #546 )
2017-04-16 21:18:39 +02:00
Matthew Honnibal
4931c56afc
Increment version
2017-04-16 13:59:38 -05:00
ines
6145b7c153
Remove redundant Path
2017-04-16 20:53:25 +02:00
Matthew Honnibal
fa89613444
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-16 13:42:56 -05:00
ines
1f9f867c70
Remove unused util function
2017-04-16 20:37:45 +02:00
ines
7670c745b6
Update spacy.load() and fix path checks
2017-04-16 20:37:45 +02:00
ines
d3759dfb32
Fix docstring
2017-04-16 20:37:45 +02:00
ines
ed7e19ad68
Remove unused import
2017-04-16 20:37:45 +02:00
ines
0084466a66
Remove unused utf8open util and replace os.path with ensure_path
2017-04-16 20:37:45 +02:00
Matthew Honnibal
89a4f262fc
Fix training methods
2017-04-16 13:00:37 -05:00
Matthew Honnibal
6a4221a6de
Allow lemma to be set from Python. Re #973
2017-04-16 18:07:53 +02:00
Matthew Honnibal
137b210bcf
Restore use of FTRL training
2017-04-16 18:02:42 +02:00
ines
d10bd0eaf9
Fix formatting
2017-04-16 13:42:34 +02:00
ines
8191e33cf1
Update link error message with info on permissions
2017-04-16 13:32:31 +02:00
ines
a3ddbc0444
Add note about --force flag to error message
2017-04-16 13:14:36 +02:00
ines
e3de035814
Add meta validation to check for required settings
...
Complain if no "lang", "name" or "version" is found (those settings are
used in directory / package names). Package will still build without,
but it'll inevitably fail somewhere down the line.
2017-04-16 13:13:17 +02:00
ines
a7574b7572
Add more options to read in meta data in package command
...
Add meta option to supply path to meta.json. If no meta path is set,
check if meta.json exists in input directory and use it. Otherwise,
prompt for details on the command line.
2017-04-16 13:06:02 +02:00
ines
13c8a42d2b
Fix typos
2017-04-16 13:03:58 +02:00
ines
31fa73293a
Move read_json out to own util function
2017-04-16 13:03:28 +02:00
Matthew Honnibal
45464d065e
Remove print statement
2017-04-15 16:11:43 +02:00
Matthew Honnibal
c76cb8af35
Fix training for new labels
2017-04-15 16:11:26 +02:00
Matthew Honnibal
4884b2c113
Refix StepwiseState
2017-04-15 16:00:28 +02:00
Matthew Honnibal
e6ee7e130f
Fix parse package meta
2017-04-15 13:38:53 +02:00
Matthew Honnibal
1a98e48b8e
Fix Stepwisestate'
2017-04-15 13:35:01 +02:00
ines
0739ae7b76
Tidy up and fix formatting and imports
2017-04-15 13:05:15 +02:00
ines
fefe6684cd
Fix symlink function to check for Windows
2017-04-15 12:17:27 +02:00
ines
35fb4febe2
Fix whitespace
2017-04-15 12:13:45 +02:00
ines
e1efd589c3
Fix json imports and use ujson
2017-04-15 12:13:34 +02:00
ines
958b12dec8
Use pathlib instead of os.path
2017-04-15 12:13:00 +02:00
ines
956dc36785
Move functions to deprecated
2017-04-15 12:12:31 +02:00
ines
c05ec4b89a
Add compat functions and remove old workarounds
...
Add ensure_path util function to handle checking instance of path
2017-04-15 12:11:16 +02:00
ines
26445ee304
Add compat module for Python2/3 and platform compatibility
2017-04-15 12:07:02 +02:00
ines
d24589aa72
Clean up imports, unused code, whitespace, docstrings
2017-04-15 12:05:47 +02:00
ines
561f2a3eb4
Use consistent formatting for docstrings
2017-04-15 11:59:21 +02:00
Matthew Honnibal
d13f0a7017
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-04-14 23:54:57 +02:00
Matthew Honnibal
354458484c
WIP on add_label bug during NER training
...
Currently when a new label is introduced to NER during training,
it causes the labels to be read in in an unexpected order. This
invalidates the model.
2017-04-14 23:52:17 +02:00
Matthew Honnibal
33ba5066eb
Refactor Language.end_training, making new save_to_directory method
2017-04-14 23:51:24 +02:00
ines
84341c2975
Only compile list of models if data_path exists
2017-04-14 16:48:02 +02:00
Gyorgy Orosz
dd3244c08a
Made json dump to produce unicode strings in py2
2017-04-13 23:30:47 +02:00
Gyorgy Orosz
a9469c8173
Fixed typo
2017-04-13 15:24:14 +02:00
ines
41037f0f07
Remove unused imports
2017-04-13 13:52:11 +02:00
ines
1b92c8d5d5
Use unicode paths on Windows/Python 2 and catch other errors ( resolves #970 )
...
try/except here is quite dirty, but it'll at least make sure users see
an error message that explains what's going on
2017-04-10 17:49:51 +02:00
Matthew Honnibal
49e2de900e
Add costs property to StepwiseState, to show which moves are gold.
2017-04-10 11:37:04 +02:00
Matthew Honnibal
e26577b202
Increment version
2017-04-07 18:45:06 +02:00
Matthew Honnibal
40bf7ecf27
Increment version
2017-04-07 18:44:20 +02:00
Matthew Honnibal
1dca7eeb03
Add unicode declaration on new regression test
2017-04-07 18:09:23 +02:00
ines
887827fc6a
Merge branch 'develop'
2017-04-07 17:36:23 +02:00
ines
444dd511c5
Fix xpassing URL test case
2017-04-07 17:36:05 +02:00
ines
bf0f15e762
Add / to tokenizer infixes ( resolves #891 )
2017-04-07 17:30:44 +02:00
ines
00b9011a49
Fix whitespace
2017-04-07 17:29:59 +02:00
ines
f9869e4dc5
Merge branch 'master' into develop
2017-04-07 17:23:40 +02:00
Matthew Honnibal
4a6204dbad
Merge remote-tracking branch 'origin/develop'
2017-04-07 17:20:09 +02:00
Matthew Honnibal
0513c43bf0
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-07 17:07:10 +02:00
Matthew Honnibal
cc36c308f4
Fix noun_chunk rules around coordination
...
Closes #693 .
2017-04-07 17:06:40 +02:00
Matthew Honnibal
ab846256cf
Merge pull request #966 from recognai/master
...
Prepare Spanish language for training models, including configuration, rich-UD tag map and tests
2017-04-07 16:12:29 +02:00
Matthew Honnibal
83dca920d4
Rename test #913 -> #957 , comment
...
Make test for #957 reference correct bug. Add comment.
Previous commit closes #957 .
2017-04-07 15:54:25 +02:00
Matthew Honnibal
be204ed714
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-07 15:50:14 +02:00
Matthew Honnibal
e7b1ee9efd
Switch to regex module for URL identification
...
The URL detection regex was failing on input such as 0.1.2.3, as this
input triggered excessive back-tracking in the builtin re module.
The solution was to switch to the regex module, which behaves better.
Closes #913 .
2017-04-07 15:47:36 +02:00
Matthew Honnibal
5887383fc0
Add test for Issue #913 : Hang from bad regex
2017-04-07 15:47:27 +02:00
ines
7ea1673072
Fix whitespace
2017-04-07 13:28:48 +02:00
ines
255650dbc2
Add connlu2json converter from explosion/spacy-dev-resources/#11
2017-04-07 13:05:12 +02:00
ines
789ce8a45e
Add convert command
2017-04-07 13:04:17 +02:00