Commit Graph

388 Commits

Author SHA1 Message Date
Paul O'Leary McCann
7dd21b66d5 Extras require mecab (#3024)
* Add note that Unidic is required for Japanese

This addresses #3001. -POLM

* Add extras_require for mecab with old version

Related to issue #3018.

* mecab → ja

Co-Authored-By: polm <polm@dampfkraft.com>
2018-12-08 06:34:49 +01:00
Justin DuJardin
33fca8672f fix issue compiling the latest spacy on MacOS 10.3.6 (#2998) 2018-12-02 05:51:11 +01:00
Matthew Honnibal
05b2336ffa Try again to fix OSX build 2018-12-01 03:12:21 +01:00
Matthew Honnibal
4895b2e830 Merge branch 'master' of https://github.com/explosion/spaCy 2018-12-01 02:37:21 +01:00
Matthew Honnibal
3f16af123e Try to fix OSX build error 2018-12-01 02:36:56 +01:00
Matthew Honnibal
61abb1ef70 Remove msgpack dependency, to try to fix #2995 2018-12-01 02:36:41 +01:00
Matthew Honnibal
9e2ff2f583
Fix regex pin to harmonize with conda (#2964) 2018-11-26 19:28:54 +01:00
Matthew Honnibal
e2ae25d6f5 Try setting older regex version, to align with conda 2018-10-29 13:39:00 +01:00
Matthew Honnibal
a2745d310e Revert "Update regex version"
This reverts commit 62358dd867.
2018-10-28 16:38:56 +01:00
Matthew Honnibal
62358dd867 Update regex version 2018-10-28 16:27:50 +01:00
Ines Montani
fd750ec3bf Fix msgpack-numpy version pin 2018-10-15 14:18:38 +02:00
Ines Montani
051a6b73eb Update Thinc version pin 2018-10-15 01:40:28 +02:00
Matthew Honnibal
7202abdfa9 Fix specifiers for GPU 2018-10-15 00:08:44 +02:00
Matthew Honnibal
b305b24c24 Require thinc 6.10.6 2018-10-14 23:28:41 +02:00
Matthew Honnibal
6e6f6be3f5 Update requirements and setup.py 2018-10-14 23:06:46 +02:00
Ines Montani
9ebe607f82 Add wheel to setup_requires 2018-10-14 16:38:48 +02:00
Ines Montani
2e675d9523 Update murmurhash pin 2018-10-14 16:37:38 +02:00
Matthew Honnibal
f784e42ffe Try older version of regex 2018-10-03 00:23:40 +02:00
Matthew Honnibal
e4fd2ccd07 Try previous version of regex 2018-10-02 23:37:17 +02:00
Matthew Honnibal
9937ff93e5 Update regex version dependency 2018-10-02 19:43:59 +02:00
Matthew Honnibal
05b6103a0c Try to fix version pin for msgpack-numpy 2018-09-28 14:07:00 +02:00
Matthew Honnibal
276aa83d1a Require older msgpack-numpy 2018-09-27 15:34:24 +02:00
Matthew Honnibal
7be9118be3 Require numpy>=1.15.0 to avoid the RuntimeWarning 2018-08-10 00:14:13 +02:00
Matthew Honnibal
cabce07ba6 Fix thinc version requirement 2018-07-21 15:56:33 +02:00
Matthew Honnibal
a723fafea3 Require thinc 6.10.3.dev1 2018-07-21 12:49:09 +02:00
ines
95641f4026 Only install pathlib backport on Python < 3.4 2018-07-20 21:08:29 +02:00
Matthew Honnibal
adde3826e2 Build against thinc 6.10.3.dev0 2018-07-20 13:34:54 +02:00
Ines Montani
d4cc736b7c 💫 Improve model downloads: check for existing install, customise pip and use requests library again (#2346)
* Go back to using requests instead of urllib (closes #2320)

Fewer dependencies are good, but this one was simply causing too many other problems around SSL verification and Python 2/3 compatibility. requests is a popular enough package that it's okay for spaCy to depend on it – and this will hopefully make model downloads less flakey.

* Only download model if not installed (see #1456)

Use #egg=model==version to allow pip to check for existing installations. The download is only started if no installation matching the package/version is found. Fixes a long-standing inconvenience.

* Pass additional options to pip when installing model (resolves #1456)

Treat all additional arguments passed to the download command as pip options to allow user to customise the command. For example:

python -m spacy download en --user

* Add CLI option to enable installing model package dependencies

* Revert "Add CLI option to enable installing model package dependencies"

This reverts commit 9336ffe695.

* Update documentation
2018-05-20 20:26:56 +02:00
Matthew Honnibal
abf8b16d71
Add doc.retokenize() context manager (#2172)
This patch takes a step towards #1487 by introducing the
doc.retokenize() context manager, to handle merging spans, and soon
splitting tokens.

The idea is to do merging and splitting like this:

with doc.retokenize() as retokenizer:
    for start, end, label in matches:
        retokenizer.merge(doc[start : end], attrs={'ent_type': label})

The retokenizer accumulates the merge requests, and applies them
together at the end of the block. This will allow retokenization to be
more efficient, and much less error prone.

A retokenizer.split() function will then be added, to handle splitting a
single token into multiple tokens. These methods take `Span` and `Token`
objects; if the user wants to go directly from offsets, they can append
to the .merges and .splits lists on the retokenizer.

The doc.merge() method's behaviour remains unchanged, so this patch
should be 100% backwards incompatible (modulo bugs). Internally,
doc.merge() fixes up the arguments (to handle the various deprecated styles),
opens the retokenizer, and makes the single merge.

We can later start making deprecation warnings on direct calls to doc.merge(),
to migrate people to use of the retokenize context manager.
2018-04-03 14:10:35 +02:00
Matthew Honnibal
8308bbc617 Get msgpack and msgpack_numpy via Thinc, to avoid potential version conflicts 2018-03-29 00:14:55 +02:00
ines
366c98a94b Remove requests dependency 2018-03-28 12:46:18 +02:00
ines
ce6071ca89 Remove ftfy dependency and update docs 2018-03-28 12:09:42 +02:00
ines
6d2c85f428 Drop six and related hacks as a dependency 2018-03-28 10:45:25 +02:00
ines
f5f4de98d1 Version-lock msgpack-python (see #2015) 2018-02-22 16:02:32 +01:00
ines
002ee80ddf Add html5lib to setup.py to fix six error (see #1924) 2018-02-02 20:32:08 +01:00
Matthew Honnibal
2e449c1fbf Fix compiler flags, addressing #1591 2018-01-14 14:34:36 +01:00
Matthew Honnibal
04a92bd75e Pin msgpack-numpy requirement 2017-12-06 03:24:24 +01:00
Hugo
aa898ab4e4 Drop support for EOL Python 2.6 and 3.3 2017-11-26 19:46:24 +02:00
Matthew Honnibal
716ccbb71e Require thinc 6.10.1 2017-11-15 14:59:34 +01:00
Matthew Honnibal
314f5b9cdb Require thinc 6.10.0 2017-10-28 18:20:10 +00:00
Matthew Honnibal
64e4ff7c4b Merge 'tidy-up' changes into branch. Resolve conflicts 2017-10-28 13:16:06 +02:00
ines
7946464742 Remove spacy.tagger (now in pipeline) 2017-10-27 19:45:04 +02:00
Matthew Honnibal
531142a933 Merge remote-tracking branch 'origin/develop' into feature/better-parser 2017-10-27 12:34:48 +00:00
Matthew Honnibal
642eb28c16 Don't compile with OpenMP by default 2017-10-27 10:16:58 +00:00
Matthew Honnibal
90d1d9b230 Remove obsolete parser code 2017-10-26 13:22:45 +02:00
Matthew Honnibal
79fcf8576a Compile with march=native 2017-10-18 21:46:34 +02:00
Matthew Honnibal
2eb0fe4957 Fix setup.py 2017-10-03 21:40:04 +02:00
Matthew Honnibal
b49cc8153a Require correct thinc 2017-09-26 10:00:18 -05:00
ines
68f66aebf8 Use pkg_resources instead of pip for is_package (resolves #1293) 2017-09-16 20:27:59 +02:00
Matthew Honnibal
07cdbd1219 Require thinc 6.8.1, for Windows 2017-09-15 22:47:53 +02:00