ines
a7574b7572
Add more options to read in meta data in package command
...
Add meta option to supply path to meta.json. If no meta path is set,
check if meta.json exists in input directory and use it. Otherwise,
prompt for details on the command line.
2017-04-16 13:06:02 +02:00
ines
13c8a42d2b
Fix typos
2017-04-16 13:03:58 +02:00
ines
31fa73293a
Move read_json out to own util function
2017-04-16 13:03:28 +02:00
Matthew Honnibal
45464d065e
Remove print statement
2017-04-15 16:11:43 +02:00
Matthew Honnibal
c76cb8af35
Fix training for new labels
2017-04-15 16:11:26 +02:00
Matthew Honnibal
4884b2c113
Refix StepwiseState
2017-04-15 16:00:28 +02:00
Matthew Honnibal
e6ee7e130f
Fix parse package meta
2017-04-15 13:38:53 +02:00
Matthew Honnibal
1a98e48b8e
Fix Stepwisestate'
2017-04-15 13:35:01 +02:00
ines
0739ae7b76
Tidy up and fix formatting and imports
2017-04-15 13:05:15 +02:00
ines
fefe6684cd
Fix symlink function to check for Windows
2017-04-15 12:17:27 +02:00
ines
35fb4febe2
Fix whitespace
2017-04-15 12:13:45 +02:00
ines
e1efd589c3
Fix json imports and use ujson
2017-04-15 12:13:34 +02:00
ines
958b12dec8
Use pathlib instead of os.path
2017-04-15 12:13:00 +02:00
ines
956dc36785
Move functions to deprecated
2017-04-15 12:12:31 +02:00
ines
c05ec4b89a
Add compat functions and remove old workarounds
...
Add ensure_path util function to handle checking instance of path
2017-04-15 12:11:16 +02:00
ines
26445ee304
Add compat module for Python2/3 and platform compatibility
2017-04-15 12:07:02 +02:00
ines
d24589aa72
Clean up imports, unused code, whitespace, docstrings
2017-04-15 12:05:47 +02:00
ines
561f2a3eb4
Use consistent formatting for docstrings
2017-04-15 11:59:21 +02:00
Matthew Honnibal
d13f0a7017
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-04-14 23:54:57 +02:00
Matthew Honnibal
354458484c
WIP on add_label bug during NER training
...
Currently when a new label is introduced to NER during training,
it causes the labels to be read in in an unexpected order. This
invalidates the model.
2017-04-14 23:52:17 +02:00
Matthew Honnibal
33ba5066eb
Refactor Language.end_training, making new save_to_directory method
2017-04-14 23:51:24 +02:00
ines
84341c2975
Only compile list of models if data_path exists
2017-04-14 16:48:02 +02:00
Gyorgy Orosz
dd3244c08a
Made json dump to produce unicode strings in py2
2017-04-13 23:30:47 +02:00
Gyorgy Orosz
a9469c8173
Fixed typo
2017-04-13 15:24:14 +02:00
ines
41037f0f07
Remove unused imports
2017-04-13 13:52:11 +02:00
ines
1b92c8d5d5
Use unicode paths on Windows/Python 2 and catch other errors ( resolves #970 )
...
try/except here is quite dirty, but it'll at least make sure users see
an error message that explains what's going on
2017-04-10 17:49:51 +02:00
Matthew Honnibal
49e2de900e
Add costs property to StepwiseState, to show which moves are gold.
2017-04-10 11:37:04 +02:00
Matthew Honnibal
e26577b202
Increment version
2017-04-07 18:45:06 +02:00
Matthew Honnibal
40bf7ecf27
Increment version
2017-04-07 18:44:20 +02:00
Matthew Honnibal
1dca7eeb03
Add unicode declaration on new regression test
2017-04-07 18:09:23 +02:00
ines
887827fc6a
Merge branch 'develop'
2017-04-07 17:36:23 +02:00
ines
444dd511c5
Fix xpassing URL test case
2017-04-07 17:36:05 +02:00
ines
bf0f15e762
Add / to tokenizer infixes ( resolves #891 )
2017-04-07 17:30:44 +02:00
ines
00b9011a49
Fix whitespace
2017-04-07 17:29:59 +02:00
ines
f9869e4dc5
Merge branch 'master' into develop
2017-04-07 17:23:40 +02:00
Matthew Honnibal
4a6204dbad
Merge remote-tracking branch 'origin/develop'
2017-04-07 17:20:09 +02:00
Matthew Honnibal
0513c43bf0
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-07 17:07:10 +02:00
Matthew Honnibal
cc36c308f4
Fix noun_chunk rules around coordination
...
Closes #693 .
2017-04-07 17:06:40 +02:00
Matthew Honnibal
ab846256cf
Merge pull request #966 from recognai/master
...
Prepare Spanish language for training models, including configuration, rich-UD tag map and tests
2017-04-07 16:12:29 +02:00
Matthew Honnibal
83dca920d4
Rename test #913 -> #957 , comment
...
Make test for #957 reference correct bug. Add comment.
Previous commit closes #957 .
2017-04-07 15:54:25 +02:00
Matthew Honnibal
be204ed714
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-07 15:50:14 +02:00
Matthew Honnibal
e7b1ee9efd
Switch to regex module for URL identification
...
The URL detection regex was failing on input such as 0.1.2.3, as this
input triggered excessive back-tracking in the builtin re module.
The solution was to switch to the regex module, which behaves better.
Closes #913 .
2017-04-07 15:47:36 +02:00
Matthew Honnibal
5887383fc0
Add test for Issue #913 : Hang from bad regex
2017-04-07 15:47:27 +02:00
ines
7ea1673072
Fix whitespace
2017-04-07 13:28:48 +02:00
ines
255650dbc2
Add connlu2json converter from explosion/spacy-dev-resources/#11
2017-04-07 13:05:12 +02:00
ines
789ce8a45e
Add convert command
2017-04-07 13:04:17 +02:00
ines
9952d3b08a
Fix whitespace
2017-04-07 13:02:05 +02:00
ines
47ddce6eb7
Remove unused variable
2017-04-07 13:01:48 +02:00
ines
dcf8ab0c47
Merge branch 'develop'
2017-04-07 12:00:09 +02:00
ines
75f9b4c6e2
Fix whitespace
2017-04-07 10:22:18 +02:00
oeg
c693d40791
feature(model): Add support for creating the Spanish model, including rich tagset, configuration, and basich tests
2017-04-06 18:48:45 +02:00
oeg
010293fb2f
fix(typo): Fixes typo in method calling PseudoProjectivity.deprojectivize, failing with new train cli
2017-04-06 17:33:15 +02:00
ines
808cd6cf7f
Add missing tags to verbs ( resolves #948 )
2017-04-03 18:12:52 +02:00
ines
ad8bf1829f
Import and combine Portuguese tokenizer exceptions (see #943 )
2017-04-01 10:37:42 +02:00
Ines Montani
f8b2d9c3b7
Merge pull request #943 from mamoit/master
...
Portuguese improvements
2017-04-01 10:32:00 +02:00
ines
3b667a24d4
Remove whitespace
2017-04-01 10:21:08 +02:00
ines
e71a1f4bd0
Fix download commands in error messages (see #946 )
2017-04-01 10:20:57 +02:00
ines
42382d5692
Fix download commands in error messages (see #946 )
2017-04-01 10:19:32 +02:00
ines
d4a59c254b
Remove whitespace
2017-04-01 10:19:01 +02:00
Matthew Honnibal
51882ee2b8
Fix check for setting ent_id in merge
2017-03-31 19:32:01 +02:00
Miguel Almeida
4fde64c4ea
Portuguese contractions and some abreviations
2017-03-31 15:52:55 +01:00
Miguel Almeida
465b240bcb
Review Portuguese stop words
...
Mainly to review typos and add missing masculines/feminines
2017-03-31 13:00:47 +01:00
Matthew Honnibal
fc3900e5b2
Allow ent_id to be set in Token
2017-03-31 14:00:14 +02:00
Matthew Honnibal
9720103428
Improve attribute handlign in doc.merge(). Still unsatisfying
2017-03-31 13:59:58 +02:00
Matthew Honnibal
cfff4e0f61
Improve test
2017-03-31 13:59:32 +02:00
Matthew Honnibal
1bb7b4ca71
Add comment
2017-03-31 13:59:19 +02:00
Matthew Honnibal
725249c59a
Add merge_phrase callback in matcher.pyx
2017-03-31 13:58:59 +02:00
Matthew Honnibal
e854f28304
Add test for Issue #758
...
Issue #758 occurs when no actions are available for a single token
doc after merging.
2017-03-31 13:26:25 +02:00
Miguel Almeida
c1d020b0a6
Remove "ista" from portuguese stop words
2017-03-31 12:26:13 +01:00
Miguel Almeida
17a1e7a119
Add Portuguese numbers and ordinals
2017-03-31 12:21:01 +01:00
Matthew Honnibal
47a3ef06a6
Unhack deprojetivization, moving it into pipeline
...
Previously the deprojectivize() call was attached to the transition
system, and only called for German. Instead it should be a separate
process, called after the parser. This makes it available for any
language. Closes #898 .
2017-03-31 12:31:50 +02:00
Joshua Reeter
564daf6dec
Issue #934 symlink should not convert paths as_posix under windows.
2017-03-30 23:47:45 -05:00
Bruno P. Kinoshita
c2d48974bc
Fix typos in Portuguese stop words
2017-03-30 21:59:18 +13:00
Matthew Honnibal
0fefdfcbda
Merge pull request #935 from ericzhao28/master
...
Add option to use label=ent_type in doc.merge arguments (Bug fix for issue #862 )
2017-03-30 02:51:24 +02:00
ines
4759fd437d
Merge branch 'master' into develop
2017-03-29 10:37:13 +02:00
ines
7e4befec88
Add Hebrew to init and setup.py
2017-03-29 10:34:57 +02:00
Grégory Howard
9c2996b27f
correction of package.py (encoding on open instead of write)
2017-03-29 09:11:02 +02:00
Eric Zhao
aafdf6ffb8
Add option to use label karg to determine ent_type in doc.merge
2017-03-28 23:35:03 -07:00
ines
7198cf1c8a
Remove unused import
2017-03-26 20:56:05 +02:00
ines
7ceaa1614b
Add experimental model init command
2017-03-26 20:51:40 +02:00
Matthew Honnibal
83ba6c247c
Fix init of Language without model
2017-03-26 16:46:00 +02:00
Matthew Honnibal
fa107f95f6
Remove unused train_config command
2017-03-26 09:28:59 -05:00
Matthew Honnibal
df83921f0a
Increment version
2017-03-26 09:27:32 -05:00
Matthew Honnibal
92ac3af21d
Merge branch 'master' of https://github.com/explosion/spaCy
2017-03-26 09:26:59 -05:00
Matthew Honnibal
a9b1f23c7d
Enable regression loss for parser
2017-03-26 09:26:30 -05:00
ines
c00d997924
Merge branch 'develop'
2017-03-26 15:57:00 +02:00
Matthew Honnibal
2efdbc08ff
Make training work with directories
2017-03-26 08:46:44 -05:00
ines
007a2492bd
Remove train_config command for now
2017-03-26 15:40:50 +02:00
ines
b297fab062
Update error message for missing commands
2017-03-26 15:40:02 +02:00
ines
7f95023fc0
Fix formatting
2017-03-26 15:37:37 +02:00
ines
5901c8f7f0
Update spacy train CLI documentation
2017-03-26 15:33:48 +02:00
Matthew Honnibal
9dcb58aaaf
Merge CLI changes
2017-03-26 07:30:45 -05:00
Matthew Honnibal
6b7f7a2060
Connect parser L1 option to train CLI
2017-03-26 07:24:07 -05:00
Matthew Honnibal
ed2b106f4d
Fix circular import in lemmatizer
2017-03-26 07:17:07 -05:00
Matthew Honnibal
dec5571bf3
Update train CLI
2017-03-26 07:16:52 -05:00
ines
53cf2f1c0e
Make dev data optional
2017-03-26 11:48:17 +02:00
Matthew Honnibal
5eac089fbe
Merge branch 'master' into develop
2017-03-26 04:45:43 -05:00
ines
0fc56e2544
Update flag and defaults
2017-03-26 11:42:11 +02:00
Matthew Honnibal
2f63806ddb
Update config when adding label. Re #910
2017-03-25 22:35:44 +01:00
Matthew Honnibal
b94286de30
Fix regression test
2017-03-25 22:35:07 +01:00
Matthew Honnibal
c748907a66
Fix errors in previous commit
2017-03-25 22:25:01 +01:00
Matthew Honnibal
4f400fa486
Prevent lemmatization of base nouns
...
Update lemmatizer's base-form check, for change in morphology class.
Closes #903 .
2017-03-25 21:51:12 +01:00
Matthew Honnibal
850d35dcb3
Make morphology use int attributes internally
...
The morphology class was calling the lemmatizer inconsistently,
which some string-valued attributes. This caused Issue #903 .
2017-03-25 21:49:10 +01:00
Matthew Honnibal
4454c1b23f
Block lemmatization of base-form adjectives
...
Fixes check that an adjective is a base form (as opposed to a
comparative or superlative), so that it's not lemmatized.
e.g. inner -!> inn. Closes #912 .
2017-03-25 21:29:57 +01:00
ines
97814f8da6
Update Windows Python 2 link workaround to use helper functions
2017-03-25 14:04:27 +01:00
ines
fdec758113
Add is_windows and is_python2 utility functions
2017-03-25 14:04:02 +01:00
Ines Montani
09837158e4
Merge pull request #921 from solresol/master
...
Possible solution to #909
2017-03-25 13:51:55 +01:00
Greg Baker
b7f714b498
Possible solution to #909
2017-03-25 21:36:38 +11:00
Ines Montani
97cb4d5e3c
Merge branch 'master' into master
2017-03-25 10:03:47 +01:00
Iddo Berger
da135bd823
add hebrew tokenizer
2017-03-24 18:27:44 +03:00
Matthew Honnibal
f40fbc3710
Add test for Issue #910 : Resuming entity training
2017-03-23 23:38:57 +01:00
Matthew Honnibal
9c9cd99144
Merge branch 'master' of https://github.com/explosion/spaCy
2017-03-23 11:11:24 +01:00
ines
0035fd9efe
Add spacy train work in progress
2017-03-23 11:08:41 +01:00
ines
d5ebf583a4
Fix formatting
2017-03-23 11:08:30 +01:00
ines
3f20efe165
Merge branch 'develop'
...
# Conflicts:
# spacy/util.py
2017-03-22 17:14:15 +01:00
Ines Montani
f86a3a92d5
Merge pull request #899 from raphael0202/duplicate_keys
...
Remove duplicate keys in [en|fi] language data dicts
2017-03-22 10:20:11 +01:00
Ines Montani
87a2c85e1b
Merge pull request #900 from raphael0202/unused_imports
...
Remove unused import statements
2017-03-22 10:10:43 +01:00
ines
ce065e5d65
Fix imports
2017-03-22 10:02:14 +01:00
Andrew Poliakov
07199c3e8b
Fix infinite recursion in spacy.info
2017-03-22 11:43:22 +03:00
Raphaël Bournhonesque
f332bf05be
Remove unused import statements
2017-03-21 21:08:54 +01:00
ines
c3a9f73896
Fix writing to file
2017-03-21 12:35:22 +01:00
ines
d74aa428ad
Fix path
2017-03-21 12:26:00 +01:00
ines
83a999ea83
Change default license from MIT to CC
2017-03-21 12:24:43 +01:00
ines
ae46647560
Fix brackets
2017-03-21 12:21:42 +01:00
ines
3e134b5b2b
Make sure paths in copytree and rmtree are strings
2017-03-21 12:15:33 +01:00
ines
cf0094187e
Fetch MANIFEST.in from GitHub as well
2017-03-21 11:32:38 +01:00
ines
09b24bc5a9
Add docs for package command
2017-03-21 11:19:21 +01:00
ines
3f4e3fda1d
Update command and fetch file templates from GitHub
...
While feature is still experimental, this allows files to be modified
without having to ship a new version of spaCy.
2017-03-21 11:17:36 +01:00
ines
5230ed5b98
Move directory check and overwriting/creating dirs to own function
2017-03-21 02:06:53 +01:00
ines
46bc3c36b0
Fix typo
2017-03-21 02:06:37 +01:00
ines
64e38f304e
Only import shutil
2017-03-21 02:06:29 +01:00
ines
448a916d0d
Add --force option to override directory
2017-03-21 02:05:34 +01:00
ines
8eb9a2b355
Fix formatting
2017-03-21 02:05:14 +01:00
ines
b2bcdec0f6
Update docstring
2017-03-20 22:50:55 +01:00
ines
bf240132d7
Add cli.package command to build model packages
2017-03-20 22:50:13 +01:00
ines
a54e3c2efe
Remove empty line
2017-03-20 22:49:36 +01:00
ines
5aea327a5b
Add util function to get raw user input
2017-03-20 22:48:56 +01:00
ines
a6c0361803
Handle raw_input vs input in Python 2 and 3
2017-03-20 22:48:32 +01:00
ines
adbcac6591
Fix spacing
2017-03-20 22:48:21 +01:00
Matthew Honnibal
692eb0603d
Fix high memory usage in download command
...
Due to PyPi issue #2984 , installing large packages via pip causes
a large spike in memory usage. The recommended fix is to disable
caching.
2017-03-20 18:24:44 +01:00
ines
f830213c4c
Remove compatibility check test
...
Will only cause problems when incrementing version and not updating
table. Also depends on external URL, which is bad.
2017-03-20 13:20:26 +01:00
Matthew Honnibal
f314d3d044
Increment version
2017-03-20 12:58:24 +01:00
Matthew Honnibal
b487b8735a
Decrease beam density, and fix Python 3 problem in beam
2017-03-20 12:56:05 +01:00
Ines Montani
b6ee241e26
Fix print statements
2017-03-20 11:46:37 +01:00
ines
b8f8d5d8bf
Make sure model_path is a Posix path
...
Otherwise, formatting the success message with model_path.as_posix()
fails when using a local path for linking (linking still works, but the
error message is confusing)
2017-03-19 11:57:13 +01:00
ines
fe0ff00fe1
Fix spacing
2017-03-19 11:55:37 +01:00
ines
5712da6095
Add regression test for #891
2017-03-19 11:48:01 +01:00
Raphaël Bournhonesque
7f579ae834
Remove duplicate keys in [en|fi] data dicts
2017-03-19 11:40:29 +01:00
ines
8de5108af6
Exclude common cache directories from mode list in cli.info
...
This means models called "cache" etc. won't show up in the list, but it
seems worth it.
2017-03-19 01:44:43 +01:00
Matthew Honnibal
6ee2ea1128
Increment version
2017-03-19 01:40:52 +01:00