Commit Graph

1344 Commits

Author SHA1 Message Date
Matthew Honnibal
995b2d18fd * Route token.string via token.txt_with_ws, to deprecate token.string in future 2016-01-16 17:14:34 +01:00
Matthew Honnibal
54a98eaf19 * Fix typo text_wth_ws --> text_with_ws. Reroute .string attribute to text_with_ws, to deprecate .string in future 2016-01-16 17:13:50 +01:00
Matthew Honnibal
3e9961d2c4 * If final token is whitespace, don't mark it as owning a trailing space. Fixes Issue #154 2016-01-16 17:08:59 +01:00
Matthew Honnibal
223d2b3484 * Add test for Issue #154: Additional whitespace introduced when string ends with a whitespace token. 2016-01-16 17:08:07 +01:00
Matthew Honnibal
3dc398b727 * Fix merge conflict in requirements.txt 2016-01-16 16:20:49 +01:00
Matthew Honnibal
fc5962a77d * Improve test for root token in Span 2016-01-16 16:19:09 +01:00
Matthew Honnibal
c025a0c64b * Check for KeyboardInerrupt in parser.__call__ 2016-01-16 16:18:44 +01:00
Matthew Honnibal
03e8a4293d * Add loop guard to Token.lefts and Token.rights properties 2016-01-16 16:18:17 +01:00
Matthew Honnibal
304339985e * Add a linear scan to Span.root method, to help with long sentences 2016-01-16 16:17:28 +01:00
Matthew Honnibal
aa0dd79f52 * Delete test_token_references, which checked a flakey strategy for preventing orphan tokens from a while ago. Now orphan tokens simply hold a reference to Pool, preventing the memory from being freed underneath them. This means that we don't need to run this slow test. 2016-01-16 16:03:35 +01:00
Matthew Honnibal
8cbcc3a799 * Fix calculation of root token in Span. Now take root to be word with shortest tree path. Avoids parse trees ending up in inconsistent state, as had occurred in Issue #214. 2016-01-16 15:38:50 +01:00
Matthew Honnibal
c1039fa4b4 * Add test for Issue #214. Resolved in change to Span.root 2016-01-16 15:37:47 +01:00
Henning Peters
41ea14a56f fix pickling 2016-01-16 13:23:11 +01:00
Henning Peters
5551052840 fix py2/3 issue 2016-01-16 12:44:53 +01:00
Henning Peters
235f094534 untangle data_path/via 2016-01-16 12:23:45 +01:00
Matthew Honnibal
42a9f29b40 * Add loop guard in Span.root, to raise errors if there is a cycle in the dependency parse, instead of entering an infinite loop. Re Issue #214 2016-01-16 11:53:37 +01:00
Henning Peters
6d1a3af343 cleanup unused 2016-01-16 10:05:04 +01:00
Henning Peters
846fa49b2a distinct load() and from_package() methods 2016-01-16 10:00:57 +01:00
Henning Peters
211913d689 add about.py, adapt setup.py 2016-01-15 18:57:01 +01:00
Henning Peters
f8a8f97d25 cleanup 2016-01-15 18:13:37 +01:00
Henning Peters
780cb847c9 add default_model to about 2016-01-15 18:07:15 +01:00
Henning Peters
788f734513 refactored data_dir->via, add zip_safe, add spacy.load() 2016-01-15 18:01:02 +01:00
Matthew Honnibal
478a79a3d5 * Add test for Issue #220: Whitespace being tagged as noun 2016-01-15 16:17:07 +01:00
Henning Peters
d9471f684f fix typo 2016-01-14 12:14:12 +01:00
Henning Peters
9b75d872b0 fix model download 2016-01-14 12:02:56 +01:00
Henning Peters
bc229790ac integrate with sputnik 2016-01-13 19:46:17 +01:00
Matthew Honnibal
3fbfba575a * xfail the contractions test 2015-12-31 13:16:28 +01:00
Matthew Honnibal
3bd910ccad * Merge therell test 2015-12-31 11:55:18 +01:00
Matthew Honnibal
eaf2ad59f1 * Fix use of mock Package object 2015-12-31 04:13:15 +01:00
Matthew Honnibal
029136a007 * Fix resource loading for Matcher 2015-12-31 02:45:12 +01:00
Matthew Honnibal
55bcdf8bdd * Fix errors 2015-12-29 22:32:03 +01:00
Matthew Honnibal
a6ba43ecaf * Fix errors in packaging revision 2015-12-29 18:37:26 +01:00
Matthew Honnibal
4b4eec8b47 * Fix Issue #201: Tokenization of there'll 2015-12-29 18:09:09 +01:00
Matthew Honnibal
86ee9d046d * Remove test that belongs to a change for master 2015-12-29 18:07:23 +01:00
Matthew Honnibal
a2dfdec85d * Clean up spacy.util 2015-12-29 18:06:09 +01:00
Matthew Honnibal
aec130af56 Use util.Package class for io
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().

Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.

Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00
Matthew Honnibal
0e2498da00 * Replace from_package with load() classmethod in Vocab 2015-12-29 16:56:51 +01:00
Matthew Honnibal
c5902f2b4b * Upd Lemmatizer to use MockPackage. Replace from_package with load() classmethod 2015-12-29 16:56:02 +01:00
Matthew Honnibal
4131e45543 * Add MockPackage class, to see whether we can proxy for Sputnik in a lightweight way 2015-12-29 16:55:03 +01:00
Matthew Honnibal
f5dea1406d * Fix silly mistake in Language.__init__ 2015-12-28 18:48:57 +01:00
Matthew Honnibal
187960606f * Fix pickle problems 2015-12-28 16:54:03 +01:00
Matthew Honnibal
8c7e149ec9 * Replace kwargs argument of Language.__init__ with explicit arguments, to fix pickle bug 2015-12-28 15:56:27 +01:00
Henning Peters
32d655b6e1 bump version 2015-12-28 09:34:39 +01:00
Matthew Honnibal
8b61d45ed0 * Fix merge conflicts for headers branch 2015-12-27 17:46:25 +01:00
Matthew Honnibal
6bb9c7f311 Merge pull request #202 from henningpeters/sputnik
access model via sputnik
2015-12-28 03:29:53 +11:00
Henning Peters
0e321a7105 get mingw32 to work 2015-12-22 23:25:38 +01:00
Henning Peters
d8d348bb55 allow to specify version constraint within model name 2015-12-18 19:12:08 +01:00
Henning Peters
7f7299cafb Merge branch 'tmpdir' into headers 2015-12-18 12:25:25 +01:00
Henning Peters
cfa187aaf0 fix tests 2015-12-18 10:58:02 +01:00
Henning Peters
8359bd4d93 strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible 2015-12-18 09:52:55 +01:00