Commit Graph

436 Commits

Author SHA1 Message Date
Matthew Honnibal
4c9202249d Refactor training, to fix memory leak 2017-05-21 09:07:06 -05:00
ines
0c6c65aa3c Improve messaging if model linking fails after download 2017-05-21 00:28:37 +02:00
ines
e39ad78267 Resolve model name properly in cli.info
Use util.resolve_model_path() to also allow package names and paths.
2017-05-20 12:24:40 +02:00
Matthew Honnibal
3376d4d6e8 Update the train script, fixing GPU memory leak 2017-05-19 18:15:50 -05:00
Matthew Honnibal
08766240c3 Add incomplete iob converter 2017-05-19 13:27:51 -05:00
Matthew Honnibal
09a877886b WIP on iob converter 2017-05-19 13:24:39 -05:00
Matthew Honnibal
ca70b08661 Fix GPU training and evaluation 2017-05-18 08:30:33 -05:00
Matthew Honnibal
fc8d3a112c Add util.env_opt support: Can set hyper params through environment variables. 2017-05-18 04:36:53 -05:00
Matthew Honnibal
55dab77de8 Add conversion rule for .conll 2017-05-17 13:13:48 +02:00
Matthew Honnibal
793430aa7a Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
2017-05-17 12:04:50 +02:00
Matthew Honnibal
3bf4a28d8d Use tag in CoNLL converter, not POS 2017-05-17 12:04:33 +02:00
Matthew Honnibal
8cf097ca88 Redesign training to integrate NN components
* Obsolete .parser, .entity etc names in favour of .pipeline
* Components no longer create models on initialization
* Models created by loading method (from_disk(), from_bytes() etc), or
    .begin_training()
* Add .predict(), .set_annotations() methods in components
* Pass state through pipeline, to allow components to share information
    more flexibly.
2017-05-16 16:17:30 +02:00
Matthew Honnibal
5211645af3 Get data flowing through pipeline. Needs redesign 2017-05-16 11:21:59 +02:00
Matthew Honnibal
a9edb3aa1d Improve integration of NN parser, to support unified training API 2017-05-15 21:53:27 +02:00
ines
9d85cda8e4 Fix models error message and use about.__docs_models__ (see #1051) 2017-05-13 13:05:47 +02:00
ines
4eefb288e3 Port over PR #1055 2017-05-13 03:25:32 +02:00
ines
95edd9e896 Let parse_package_meta take full path 2017-05-08 15:30:48 +02:00
ines
59c3b9d4dd Tidy up CLI and fix print functions 2017-05-07 23:25:29 +02:00
ines
527d51ac9a Fetch shortcuts from GitHub and improve error handling 2017-04-26 18:00:28 +02:00
Matthew Honnibal
4f9657b42b Fix reporting if no dev data with train 2017-04-23 22:27:10 +02:00
ines
3a9710f356 Pass dev_scores to print_progress correctly (resolves #1008)
Only read scores attribute if command is used with dev_data, otherwise
default dev_scores to empty dict.
2017-04-23 15:58:40 +02:00
ines
25c70b4cc5 Move fix_text to spacy.compat (see #1002) 2017-04-20 15:47:17 +02:00
Gyorgy Orosz
4a06a2572c Using ftfy for handling broken encoded strings. 2017-04-20 13:34:51 +02:00
ines
48da244058 Use spacy.compat.json_dumps for Python 2/3 compatibility (resolves #991) 2017-04-19 11:50:36 +02:00
ines
82f5f1f98f Replace str with compat.unicode_ 2017-04-17 01:29:54 +02:00
Matthew Honnibal
17c9fffb9e Fix naked except 2017-04-16 15:28:16 -05:00
ines
6145b7c153 Remove redundant Path 2017-04-16 20:53:25 +02:00
Matthew Honnibal
89a4f262fc Fix training methods 2017-04-16 13:00:37 -05:00
ines
8191e33cf1 Update link error message with info on permissions 2017-04-16 13:32:31 +02:00
ines
a3ddbc0444 Add note about --force flag to error message 2017-04-16 13:14:36 +02:00
ines
e3de035814 Add meta validation to check for required settings
Complain if no "lang", "name" or "version" is found (those settings are
used in directory / package names). Package will still build without,
but it'll inevitably fail somewhere down the line.
2017-04-16 13:13:17 +02:00
ines
a7574b7572 Add more options to read in meta data in package command
Add meta option to supply path to meta.json. If no meta path is set,
check if meta.json exists in input directory and use it. Otherwise,
prompt for details on the command line.
2017-04-16 13:06:02 +02:00
ines
13c8a42d2b Fix typos 2017-04-16 13:03:58 +02:00
ines
35fb4febe2 Fix whitespace 2017-04-15 12:13:45 +02:00
ines
c05ec4b89a Add compat functions and remove old workarounds
Add ensure_path util function to handle checking instance of path
2017-04-15 12:11:16 +02:00
ines
d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines
561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
ines
84341c2975 Only compile list of models if data_path exists 2017-04-14 16:48:02 +02:00
Gyorgy Orosz
dd3244c08a Made json dump to produce unicode strings in py2 2017-04-13 23:30:47 +02:00
Gyorgy Orosz
a9469c8173 Fixed typo 2017-04-13 15:24:14 +02:00
ines
41037f0f07 Remove unused imports 2017-04-13 13:52:11 +02:00
ines
1b92c8d5d5 Use unicode paths on Windows/Python 2 and catch other errors (resolves #970)
try/except here is quite dirty, but it'll at least make sure users see
an error message that explains what's going on
2017-04-10 17:49:51 +02:00
ines
7ea1673072 Fix whitespace 2017-04-07 13:28:48 +02:00
ines
255650dbc2 Add connlu2json converter from explosion/spacy-dev-resources/#11 2017-04-07 13:05:12 +02:00
ines
789ce8a45e Add convert command 2017-04-07 13:04:17 +02:00
ines
9952d3b08a Fix whitespace 2017-04-07 13:02:05 +02:00
ines
dcf8ab0c47 Merge branch 'develop' 2017-04-07 12:00:09 +02:00
Joshua Reeter
564daf6dec Issue #934 symlink should not convert paths as_posix under windows. 2017-03-30 23:47:45 -05:00
ines
4759fd437d Merge branch 'master' into develop 2017-03-29 10:37:13 +02:00
Grégory Howard
9c2996b27f correction of package.py (encoding on open instead of write) 2017-03-29 09:11:02 +02:00
ines
7198cf1c8a Remove unused import 2017-03-26 20:56:05 +02:00
ines
7ceaa1614b Add experimental model init command 2017-03-26 20:51:40 +02:00
Matthew Honnibal
2efdbc08ff Make training work with directories 2017-03-26 08:46:44 -05:00
Matthew Honnibal
9dcb58aaaf Merge CLI changes 2017-03-26 07:30:45 -05:00
Matthew Honnibal
6b7f7a2060 Connect parser L1 option to train CLI 2017-03-26 07:24:07 -05:00
Matthew Honnibal
dec5571bf3 Update train CLI 2017-03-26 07:16:52 -05:00
ines
53cf2f1c0e Make dev data optional 2017-03-26 11:48:17 +02:00
Matthew Honnibal
5eac089fbe Merge branch 'master' into develop 2017-03-26 04:45:43 -05:00
ines
97814f8da6 Update Windows Python 2 link workaround to use helper functions 2017-03-25 14:04:27 +01:00
Greg Baker
b7f714b498 Possible solution to #909 2017-03-25 21:36:38 +11:00
Matthew Honnibal
9c9cd99144 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-23 11:11:24 +01:00
ines
0035fd9efe Add spacy train work in progress 2017-03-23 11:08:41 +01:00
ines
c3a9f73896 Fix writing to file 2017-03-21 12:35:22 +01:00
ines
d74aa428ad Fix path 2017-03-21 12:26:00 +01:00
ines
83a999ea83 Change default license from MIT to CC 2017-03-21 12:24:43 +01:00
ines
ae46647560 Fix brackets 2017-03-21 12:21:42 +01:00
ines
3e134b5b2b Make sure paths in copytree and rmtree are strings 2017-03-21 12:15:33 +01:00
ines
cf0094187e Fetch MANIFEST.in from GitHub as well 2017-03-21 11:32:38 +01:00
ines
3f4e3fda1d Update command and fetch file templates from GitHub
While feature is still experimental, this allows files to be modified
without having to ship a new version of spaCy.
2017-03-21 11:17:36 +01:00
ines
5230ed5b98 Move directory check and overwriting/creating dirs to own function 2017-03-21 02:06:53 +01:00
ines
46bc3c36b0 Fix typo 2017-03-21 02:06:37 +01:00
ines
64e38f304e Only import shutil 2017-03-21 02:06:29 +01:00
ines
448a916d0d Add --force option to override directory 2017-03-21 02:05:34 +01:00
ines
bf240132d7 Add cli.package command to build model packages 2017-03-20 22:50:13 +01:00
Matthew Honnibal
692eb0603d Fix high memory usage in download command
Due to PyPi issue #2984, installing large packages via pip causes
a large spike in memory usage. The recommended fix is to disable
caching.
2017-03-20 18:24:44 +01:00
ines
b8f8d5d8bf Make sure model_path is a Posix path
Otherwise, formatting the success message with model_path.as_posix()
fails when using a local path for linking (linking still works, but the
error message is confusing)
2017-03-19 11:57:13 +01:00
ines
8de5108af6 Exclude common cache directories from mode list in cli.info
This means models called "cache" etc. won't show up in the list, but it
seems worth it.
2017-03-19 01:44:43 +01:00
Matthew Honnibal
797f286c38 Use import to find data package 2017-03-19 01:39:36 +01:00
Matthew Honnibal
bc10d06bc2 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-18 19:32:54 +01:00
Matthew Honnibal
1754e0db9b Call pip via subprocess, to make it use virtualenv 2017-03-18 19:29:36 +01:00
ines
1277abcde2 Remove print statement 2017-03-18 19:14:58 +01:00
Matthew Honnibal
dcec104643 Remove unused import 2017-03-18 18:57:45 +01:00
Matthew Honnibal
703eb7bdbd Fix link module 2017-03-18 18:57:31 +01:00
ines
7d33104180 Use distutils.sysconfig.get_python_lib
site.getsitepackages seems to not work as expected in Python 2
2017-03-18 18:20:40 +01:00
ines
0dd7710556 Make sure paths are paths 2017-03-18 16:48:52 +01:00
ines
ec3e810662 Add directory cli and set up command line interface 2017-03-18 15:14:48 +01:00