Commit Graph

2817 Commits

Author SHA1 Message Date
ines
255650dbc2 Add connlu2json converter from explosion/spacy-dev-resources/#11 2017-04-07 13:05:12 +02:00
ines
789ce8a45e Add convert command 2017-04-07 13:04:17 +02:00
ines
9952d3b08a Fix whitespace 2017-04-07 13:02:05 +02:00
ines
47ddce6eb7 Remove unused variable 2017-04-07 13:01:48 +02:00
ines
dcf8ab0c47 Merge branch 'develop' 2017-04-07 12:00:09 +02:00
ines
75f9b4c6e2 Fix whitespace 2017-04-07 10:22:18 +02:00
oeg
c693d40791 feature(model): Add support for creating the Spanish model, including rich tagset, configuration, and basich tests 2017-04-06 18:48:45 +02:00
oeg
010293fb2f fix(typo): Fixes typo in method calling PseudoProjectivity.deprojectivize, failing with new train cli 2017-04-06 17:33:15 +02:00
ines
808cd6cf7f Add missing tags to verbs (resolves #948) 2017-04-03 18:12:52 +02:00
ines
ad8bf1829f Import and combine Portuguese tokenizer exceptions (see #943) 2017-04-01 10:37:42 +02:00
Ines Montani
f8b2d9c3b7 Merge pull request #943 from mamoit/master
Portuguese improvements
2017-04-01 10:32:00 +02:00
ines
3b667a24d4 Remove whitespace 2017-04-01 10:21:08 +02:00
ines
e71a1f4bd0 Fix download commands in error messages (see #946) 2017-04-01 10:20:57 +02:00
ines
42382d5692 Fix download commands in error messages (see #946) 2017-04-01 10:19:32 +02:00
ines
d4a59c254b Remove whitespace 2017-04-01 10:19:01 +02:00
Matthew Honnibal
51882ee2b8 Fix check for setting ent_id in merge 2017-03-31 19:32:01 +02:00
Miguel Almeida
4fde64c4ea Portuguese contractions and some abreviations 2017-03-31 15:52:55 +01:00
Miguel Almeida
465b240bcb Review Portuguese stop words
Mainly to review typos and add missing masculines/feminines
2017-03-31 13:00:47 +01:00
Matthew Honnibal
fc3900e5b2 Allow ent_id to be set in Token 2017-03-31 14:00:14 +02:00
Matthew Honnibal
9720103428 Improve attribute handlign in doc.merge(). Still unsatisfying 2017-03-31 13:59:58 +02:00
Matthew Honnibal
cfff4e0f61 Improve test 2017-03-31 13:59:32 +02:00
Matthew Honnibal
1bb7b4ca71 Add comment 2017-03-31 13:59:19 +02:00
Matthew Honnibal
725249c59a Add merge_phrase callback in matcher.pyx 2017-03-31 13:58:59 +02:00
Matthew Honnibal
e854f28304 Add test for Issue #758
Issue #758 occurs when no actions are available for a single token
doc after merging.
2017-03-31 13:26:25 +02:00
Miguel Almeida
c1d020b0a6 Remove "ista" from portuguese stop words 2017-03-31 12:26:13 +01:00
Miguel Almeida
17a1e7a119 Add Portuguese numbers and ordinals 2017-03-31 12:21:01 +01:00
Matthew Honnibal
47a3ef06a6 Unhack deprojetivization, moving it into pipeline
Previously the deprojectivize() call was attached to the transition
system, and only called for German. Instead it should be a separate
process, called after the parser. This makes it available for any
language. Closes #898.
2017-03-31 12:31:50 +02:00
Joshua Reeter
564daf6dec Issue #934 symlink should not convert paths as_posix under windows. 2017-03-30 23:47:45 -05:00
Bruno P. Kinoshita
c2d48974bc Fix typos in Portuguese stop words 2017-03-30 21:59:18 +13:00
Matthew Honnibal
0fefdfcbda Merge pull request #935 from ericzhao28/master
Add option to use label=ent_type in doc.merge arguments (Bug fix for issue #862)
2017-03-30 02:51:24 +02:00
ines
4759fd437d Merge branch 'master' into develop 2017-03-29 10:37:13 +02:00
ines
7e4befec88 Add Hebrew to init and setup.py 2017-03-29 10:34:57 +02:00
Grégory Howard
9c2996b27f correction of package.py (encoding on open instead of write) 2017-03-29 09:11:02 +02:00
Eric Zhao
aafdf6ffb8 Add option to use label karg to determine ent_type in doc.merge 2017-03-28 23:35:03 -07:00
ines
7198cf1c8a Remove unused import 2017-03-26 20:56:05 +02:00
ines
7ceaa1614b Add experimental model init command 2017-03-26 20:51:40 +02:00
Matthew Honnibal
83ba6c247c Fix init of Language without model 2017-03-26 16:46:00 +02:00
Matthew Honnibal
fa107f95f6 Remove unused train_config command 2017-03-26 09:28:59 -05:00
Matthew Honnibal
df83921f0a Increment version 2017-03-26 09:27:32 -05:00
Matthew Honnibal
92ac3af21d Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-26 09:26:59 -05:00
Matthew Honnibal
a9b1f23c7d Enable regression loss for parser 2017-03-26 09:26:30 -05:00
ines
c00d997924 Merge branch 'develop' 2017-03-26 15:57:00 +02:00
Matthew Honnibal
2efdbc08ff Make training work with directories 2017-03-26 08:46:44 -05:00
ines
007a2492bd Remove train_config command for now 2017-03-26 15:40:50 +02:00
ines
b297fab062 Update error message for missing commands 2017-03-26 15:40:02 +02:00
ines
7f95023fc0 Fix formatting 2017-03-26 15:37:37 +02:00
ines
5901c8f7f0 Update spacy train CLI documentation 2017-03-26 15:33:48 +02:00
Matthew Honnibal
9dcb58aaaf Merge CLI changes 2017-03-26 07:30:45 -05:00
Matthew Honnibal
6b7f7a2060 Connect parser L1 option to train CLI 2017-03-26 07:24:07 -05:00
Matthew Honnibal
ed2b106f4d Fix circular import in lemmatizer 2017-03-26 07:17:07 -05:00
Matthew Honnibal
dec5571bf3 Update train CLI 2017-03-26 07:16:52 -05:00
ines
53cf2f1c0e Make dev data optional 2017-03-26 11:48:17 +02:00
Matthew Honnibal
5eac089fbe Merge branch 'master' into develop 2017-03-26 04:45:43 -05:00
ines
0fc56e2544 Update flag and defaults 2017-03-26 11:42:11 +02:00
Matthew Honnibal
2f63806ddb Update config when adding label. Re #910 2017-03-25 22:35:44 +01:00
Matthew Honnibal
b94286de30 Fix regression test 2017-03-25 22:35:07 +01:00
Matthew Honnibal
c748907a66 Fix errors in previous commit 2017-03-25 22:25:01 +01:00
Matthew Honnibal
4f400fa486 Prevent lemmatization of base nouns
Update lemmatizer's base-form check, for change in morphology class.
Closes #903.
2017-03-25 21:51:12 +01:00
Matthew Honnibal
850d35dcb3 Make morphology use int attributes internally
The morphology class was calling the lemmatizer inconsistently,
which some string-valued attributes. This caused Issue #903.
2017-03-25 21:49:10 +01:00
Matthew Honnibal
4454c1b23f Block lemmatization of base-form adjectives
Fixes check that an adjective is a base form (as opposed to a
comparative or superlative), so that it's not lemmatized.
e.g. inner -!> inn. Closes #912.
2017-03-25 21:29:57 +01:00
ines
97814f8da6 Update Windows Python 2 link workaround to use helper functions 2017-03-25 14:04:27 +01:00
ines
fdec758113 Add is_windows and is_python2 utility functions 2017-03-25 14:04:02 +01:00
Ines Montani
09837158e4 Merge pull request #921 from solresol/master
Possible solution to #909
2017-03-25 13:51:55 +01:00
Greg Baker
b7f714b498 Possible solution to #909 2017-03-25 21:36:38 +11:00
Ines Montani
97cb4d5e3c Merge branch 'master' into master 2017-03-25 10:03:47 +01:00
Iddo Berger
da135bd823 add hebrew tokenizer 2017-03-24 18:27:44 +03:00
Matthew Honnibal
f40fbc3710 Add test for Issue #910: Resuming entity training 2017-03-23 23:38:57 +01:00
Matthew Honnibal
9c9cd99144 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-23 11:11:24 +01:00
ines
0035fd9efe Add spacy train work in progress 2017-03-23 11:08:41 +01:00
ines
d5ebf583a4 Fix formatting 2017-03-23 11:08:30 +01:00
ines
3f20efe165 Merge branch 'develop'
# Conflicts:
#	spacy/util.py
2017-03-22 17:14:15 +01:00
Ines Montani
f86a3a92d5 Merge pull request #899 from raphael0202/duplicate_keys
Remove duplicate keys in [en|fi] language data dicts
2017-03-22 10:20:11 +01:00
Ines Montani
87a2c85e1b Merge pull request #900 from raphael0202/unused_imports
Remove unused import statements
2017-03-22 10:10:43 +01:00
ines
ce065e5d65 Fix imports 2017-03-22 10:02:14 +01:00
Andrew Poliakov
07199c3e8b Fix infinite recursion in spacy.info 2017-03-22 11:43:22 +03:00
Raphaël Bournhonesque
f332bf05be Remove unused import statements 2017-03-21 21:08:54 +01:00
ines
c3a9f73896 Fix writing to file 2017-03-21 12:35:22 +01:00
ines
d74aa428ad Fix path 2017-03-21 12:26:00 +01:00
ines
83a999ea83 Change default license from MIT to CC 2017-03-21 12:24:43 +01:00
ines
ae46647560 Fix brackets 2017-03-21 12:21:42 +01:00
ines
3e134b5b2b Make sure paths in copytree and rmtree are strings 2017-03-21 12:15:33 +01:00
ines
cf0094187e Fetch MANIFEST.in from GitHub as well 2017-03-21 11:32:38 +01:00
ines
09b24bc5a9 Add docs for package command 2017-03-21 11:19:21 +01:00
ines
3f4e3fda1d Update command and fetch file templates from GitHub
While feature is still experimental, this allows files to be modified
without having to ship a new version of spaCy.
2017-03-21 11:17:36 +01:00
ines
5230ed5b98 Move directory check and overwriting/creating dirs to own function 2017-03-21 02:06:53 +01:00
ines
46bc3c36b0 Fix typo 2017-03-21 02:06:37 +01:00
ines
64e38f304e Only import shutil 2017-03-21 02:06:29 +01:00
ines
448a916d0d Add --force option to override directory 2017-03-21 02:05:34 +01:00
ines
8eb9a2b355 Fix formatting 2017-03-21 02:05:14 +01:00
ines
b2bcdec0f6 Update docstring 2017-03-20 22:50:55 +01:00
ines
bf240132d7 Add cli.package command to build model packages 2017-03-20 22:50:13 +01:00
ines
a54e3c2efe Remove empty line 2017-03-20 22:49:36 +01:00
ines
5aea327a5b Add util function to get raw user input 2017-03-20 22:48:56 +01:00
ines
a6c0361803 Handle raw_input vs input in Python 2 and 3 2017-03-20 22:48:32 +01:00
ines
adbcac6591 Fix spacing 2017-03-20 22:48:21 +01:00
Matthew Honnibal
692eb0603d Fix high memory usage in download command
Due to PyPi issue #2984, installing large packages via pip causes
a large spike in memory usage. The recommended fix is to disable
caching.
2017-03-20 18:24:44 +01:00
ines
f830213c4c Remove compatibility check test
Will only cause problems when incrementing version and not updating
table. Also depends on external URL, which is bad.
2017-03-20 13:20:26 +01:00
Matthew Honnibal
f314d3d044 Increment version 2017-03-20 12:58:24 +01:00
Matthew Honnibal
b487b8735a Decrease beam density, and fix Python 3 problem in beam 2017-03-20 12:56:05 +01:00
Ines Montani
b6ee241e26 Fix print statements 2017-03-20 11:46:37 +01:00
ines
b8f8d5d8bf Make sure model_path is a Posix path
Otherwise, formatting the success message with model_path.as_posix()
fails when using a local path for linking (linking still works, but the
error message is confusing)
2017-03-19 11:57:13 +01:00
ines
fe0ff00fe1 Fix spacing 2017-03-19 11:55:37 +01:00
ines
5712da6095 Add regression test for #891 2017-03-19 11:48:01 +01:00
Raphaël Bournhonesque
7f579ae834 Remove duplicate keys in [en|fi] data dicts 2017-03-19 11:40:29 +01:00
ines
8de5108af6 Exclude common cache directories from mode list in cli.info
This means models called "cache" etc. won't show up in the list, but it
seems worth it.
2017-03-19 01:44:43 +01:00
Matthew Honnibal
6ee2ea1128 Increment version 2017-03-19 01:40:52 +01:00
Matthew Honnibal
797f286c38 Use import to find data package 2017-03-19 01:39:36 +01:00
Matthew Honnibal
5941fb9e92 Make spacy/data a package 2017-03-18 20:04:22 +01:00
Matthew Honnibal
bc10d06bc2 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-18 19:32:54 +01:00
Matthew Honnibal
583628c350 Import metadata into __init__ 2017-03-18 19:30:03 +01:00
Matthew Honnibal
1754e0db9b Call pip via subprocess, to make it use virtualenv 2017-03-18 19:29:36 +01:00
ines
1277abcde2 Remove print statement 2017-03-18 19:14:58 +01:00
Matthew Honnibal
dcec104643 Remove unused import 2017-03-18 18:57:45 +01:00
Matthew Honnibal
703eb7bdbd Fix link module 2017-03-18 18:57:31 +01:00
Matthew Honnibal
f6c6c89546 Add empty data directory 2017-03-18 18:32:29 +01:00
ines
7d33104180 Use distutils.sysconfig.get_python_lib
site.getsitepackages seems to not work as expected in Python 2
2017-03-18 18:20:40 +01:00
Matthew Honnibal
1a53fcc685 Fix CLI for Python 2 2017-03-18 18:14:03 +01:00
ines
aefb898e37 Add title-case version of morph rules (resolves #686) 2017-03-18 17:27:11 +01:00
ines
64ec17abc1 Pass xpassing tests and add xfails for failures 2017-03-18 17:20:46 +01:00
ines
d0b85faf69 Pass regression test for #401 (resolves #401)
Fixed in new English models.
2017-03-18 17:06:49 +01:00
ines
be9daefbdd Remove actual model downloading from tests 2017-03-18 17:01:10 +01:00
ines
850650221a Use correct command in deprecated download command message 2017-03-18 17:01:01 +01:00
ines
0dd7710556 Make sure paths are paths 2017-03-18 16:48:52 +01:00
Matthew Honnibal
de0e6385b4 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-18 16:17:28 +01:00
Matthew Honnibal
fe442cac53 Fix #717: Set correct lemma for contracted verbs 2017-03-18 16:16:10 +01:00
ines
ad934a9abd Add regression test for #693 2017-03-18 16:12:30 +01:00
ines
f57c616830 Add regression test for #704 and test new model (resolves #704)
(using new English model)
2017-03-18 16:04:14 +01:00
Matthew Honnibal
413138de79 Fix #719: Lemmatizer can no longer output empty string 2017-03-18 16:02:06 +01:00
ines
ab1451f997 Don't mark compatibility test as slow 2017-03-18 15:17:39 +01:00
ines
ec3e810662 Add directory cli and set up command line interface 2017-03-18 15:14:48 +01:00
ines
cd94ea1095 Use info module for spacy.info() 2017-03-18 13:01:26 +01:00
ines
e3e25c0a33 Add spacy.info module
Print info about spaCy installation, local setup and models. Allow
export in Markdown format to copy-paste into GitHub issues.
2017-03-18 13:01:16 +01:00
ines
0eafc0f2c6 Add util functions to print data as table or markdown list 2017-03-18 13:00:14 +01:00
ines
6b9b444065 Fix imports 2017-03-18 12:59:41 +01:00
ines
a035ebd32a Use pathlib.Path instead of os.path 2017-03-18 12:59:21 +01:00
ines
9605cf39cc Handle default path in Language classes 2017-03-18 12:58:45 +01:00
Matthew Honnibal
ac4b88cce9 Fix auto-linking in download command 2017-03-17 21:36:13 +01:00
ines
8a34c3e666 Fix shortcut name 2017-03-17 20:07:34 +01:00
Matthew Honnibal
6420f86f02 Merge changes to __init__.py 2017-03-17 19:51:45 +01:00
ines
e01fbacf81 Update resolve_model_name 2017-03-17 19:26:28 +01:00
ines
aedefef49d Add function to resolve model names and link them 2017-03-17 18:47:05 +01:00
Matthew Honnibal
d013aba7b5 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-17 18:30:53 +01:00
Matthew Honnibal
854cfce7cf Make vocabs more compatible across versions
Previously, symbols were inserted into the string-store
before strings were loaded. This meant that adding a symbol
would invalidate saved models. We now make sure that strings
are loaded faithfully, so that compatibility is maintained.
2017-03-17 18:29:04 +01:00
Matthew Honnibal
1cc841e600 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-17 08:18:11 -05:00
Matthew Honnibal
4bfc55b532 Auto-add words to vocab when loading vectors
When calling vocab.load_vectors_from_bin_loc, ensure that missing
entries are added to the vocab. Otherwise, loading vectors into an
empty vocab object resulted in no vectors being added.
2017-03-17 08:15:59 -05:00
ines
0e533ad0cc Mark compatibility table test as slow (temporary)
Prevent Travis from running test test until models repo is published
2017-03-17 13:11:36 +01:00
ines
279b1d1965 Update version 2017-03-17 12:43:08 +01:00
ines
8af4b9e4df Fix compatibility.json link 2017-03-17 12:43:03 +01:00
Matthew Honnibal
a630726b13 Fix typo in tests 2017-03-16 20:50:36 -05:00
Matthew Honnibal
f98b30583f Fix tests 2017-03-16 19:48:00 -05:00
Matthew Honnibal
db51abf685 Fix tests 2017-03-16 18:53:47 -05:00
Matthew Honnibal
adb0b7e43b Fix loading when no package found 2017-03-16 18:30:23 -05:00
Matthew Honnibal
5c66cffafd Add tag map for Spanish 2017-03-16 18:05:15 -05:00
Matthew Honnibal
c4351e1165 Update base-form check in lemmatizer, for UD 2.0 morphology 2017-03-16 17:59:31 -05:00
Matthew Honnibal
1e10383e1b Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-16 17:41:13 -05:00
Matthew Honnibal
859315863a Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-16 17:40:07 -05:00
Matthew Honnibal
fea9fe08af Merge pull request #866 from juanmirocks/master
Fix lemmatization of OOV words
2017-03-16 23:37:36 +01:00
Matthew Honnibal
ffd4a19383 Increment version 2017-03-16 17:35:57 -05:00
Matthew Honnibal
28bb546939 Merge pull request #883 from ericzhao28/master
Add `lower_` and `upper_` properties to `Span` class
2017-03-16 23:35:47 +01:00
ines
fd60961825 Fix spacing 2017-03-16 23:23:26 +01:00
Matthew Honnibal
890747d8ff Fix trailing whitespace on morphology features 2017-03-16 17:07:37 -05:00
Matthew Honnibal
af41a9790c Merge remote-tracking branch 'origin/develop-downloads' 2017-03-16 20:41:37 +01:00
Matthew Honnibal
303a56f173 Get absolute path for linking 2017-03-16 20:41:23 +01:00
ines
3d484c3faf Don't print in parse_package_meta and accept on_erro callback instead
TODO: log warning for missing meta data in spacy.link, as this affects
the Language class returned by spacy.load()
2017-03-16 20:34:50 +01:00
ines
d8c984b65e Don't exit if no model meta data is present 2017-03-16 20:33:33 +01:00
Matthew Honnibal
2524efc0ac Merge remote-tracking branch 'origin/develop-downloads' 2017-03-16 20:20:41 +01:00
ines
8253581057 Link model automatically if not direct download 2017-03-16 19:54:51 +01:00
Matthew Honnibal
8843b84bd1 Merge remote-tracking branch 'origin/develop-downloads' 2017-03-16 12:00:42 -05:00
Matthew Honnibal
55f813bfbb Don't reapply the model during training 2017-03-16 11:59:43 -05:00
Matthew Honnibal
c90dc7ac29 Clean up state initiatisation in transition system 2017-03-16 11:59:11 -05:00
Matthew Honnibal
a46933a8fe Clean up FTRL parsing stuff. 2017-03-16 11:58:20 -05:00
ines
618ce3b425 Add .meta to Language object
Allows getting the current model's meta data, e.g.:
nlp = spacy.load('my-model')
print(nlp.meta)
2017-03-16 17:14:56 +01:00
ines
e348d4434c Add spacy.info(model_name) to show model meta
Allows "previewing" model before loading and making sure it's linked
correctly.
2017-03-16 17:13:40 +01:00
ines
eea3b35e3f Update model loading to support links
Remove match_best_version check, fetch model language from meta instead
of directory name, and don't make too many assumptions – if model is
downloaded via downloader, version should match anyway. (Otherwise,
users should be free to add and load whichever models they want.)
2017-03-16 17:13:08 +01:00
ines
5f3f04bd0a Add util function to load and parse package meta.json 2017-03-16 17:10:05 +01:00
ines
7f920c2f75 Don't break text in when rendering print_msg 2017-03-16 17:09:50 +01:00
ines
16a63d9676 Add docstring 2017-03-16 17:09:11 +01:00
ines
68c04fa897 Move sys_exit() function to util 2017-03-16 17:08:58 +01:00
ines
ccd1a79988 Add spacy.link module to link model directories to shortcuts 2017-03-16 17:01:51 +01:00
Matthew Honnibal
2611ac2a89 Fix scorer bug for NER, related to ambiguity between missing annotations and misaligned tokens 2017-03-16 09:38:28 -05:00
ines
595d89698a Add basestring 2017-03-16 10:01:14 +01:00
ines
7b2eca36e4 Revert "Fix formatting and remove unused code"
This reverts commit d7898d586f.
2017-03-16 09:58:41 +01:00
ines
2f0db1dd36 Use small English model as default 2017-03-16 09:54:40 +01:00
Matthew Honnibal
3d0833c3df Fix off-by-1 in parse features fill_context 2017-03-15 19:55:35 -05:00
Matthew Honnibal
4ef68c413f Approximate cost in Break transition, to speed things up a bit. 2017-03-15 16:40:27 -05:00
Matthew Honnibal
8543db8a5b Use ftrl optimizer in parser 2017-03-15 11:56:37 -05:00
ines
4cfc8ffbd2 Reformat pickle tests 2017-03-15 17:39:54 +01:00
ines
2a0fcf1354 Add tests for new download module 2017-03-15 17:39:43 +01:00
ines
71956c94db Handle deprecated language-specific model downloading 2017-03-15 17:37:55 +01:00
ines
58b884b6d4 Refactor download script and about.py to use new download method 2017-03-15 17:37:18 +01:00
ines
f5d1a39a5b Add util functions for printing and wrapping messages 2017-03-15 17:35:57 +01:00
ines
d7898d586f Fix formatting and remove unused code 2017-03-15 17:35:41 +01:00
ines
b672e95045 Fix formatting 2017-03-15 17:35:04 +01:00
ines
0474e706a0 Remove unused deprecated functions for sputnik 2017-03-15 17:34:54 +01:00
ines
b13e7f79b4 Fix formatting and remove unused imports 2017-03-15 17:33:57 +01:00
ines
1101fd3855 Fix formatting and remove unused imports 2017-03-15 17:33:39 +01:00
ines
842782c128 Move fix_deprecated_glove_vectors_loading to deprecated.py 2017-03-15 17:33:29 +01:00
Matthew Honnibal
4cab8ac136 Update morph exceptions test 2017-03-15 09:31:34 -05:00
Matthew Honnibal
d719f8e77e Use nogil in parser, and set L1 to 0.0 by default 2017-03-15 09:31:01 -05:00
Matthew Honnibal
c61c501406 Update beam-parser to allow parser to maintain nogil 2017-03-15 09:30:22 -05:00