Commit Graph

120 Commits

Author SHA1 Message Date
Sofie Van Landeghem
2c27093c5f
require_cpu functionality (#6336)
* add require_cpu from Thinc 8.0.0rc2

* add docs

* fix test if cupy is not installed
2020-12-08 14:42:40 +08:00
Ines Montani
d7950c5ada
Merge pull request #6297 from adrianeboyd/docs/nightly-conda-install [ci skip] 2020-11-10 02:45:52 +01:00
Adriane Boyd
1c4df8fd09
Replace pytokenizations with internal alignment (#6293)
* Replace pytokenizations with internal alignment

Replace pytokenizations with internal alignment algorithm that is
restricted to only allow differences in whitespace and capitalization.

* Rename `spacy.training.align` to `spacy.training.alignment` to contain
the `Alignment` dataclass
* Implement `get_alignments` in `spacy.training.align`

* Refactor trailing whitespace handling

* Remove unnecessary exception for empty docs

Allow a non-empty whitespace-only doc to be aligned with an empty doc

* Remove empty docs exceptions completely
2020-11-03 16:24:38 +01:00
Sofie Van Landeghem
ace6ae435b
set pydantic upper pin to 1.7 for now (#6308) 2020-10-26 23:31:08 +01:00
Adriane Boyd
4299a7f654 Setup / install / quickstart updates
* Add `cuda110` to setup.cfg and quickstart dropdown
* Switch to `pip` for pip-only packages in conda quickstart instructions
* Update zh pkuseg install message with version range and conda
* Remove `zh` from `extras_require` because the default doesn't require
additional packages
2020-10-23 11:27:54 +02:00
Adriane Boyd
3629296757 Fix requirements, remove version pins 2020-10-19 19:04:42 +02:00
Adriane Boyd
56077e7e64 Add dependency for jinja2 2020-10-19 18:58:15 +02:00
Ines Montani
2e8dcba379 Update version pins 2020-10-14 14:59:09 +02:00
Ines Montani
74972744e5 Update Thinc 2020-10-10 19:08:57 +02:00
Ines Montani
59558b1b80 Update pin [ci skip] 2020-10-08 23:09:14 +02:00
Ines Montani
1e7560f327 Update pin [ci skip] 2020-10-08 11:10:48 +02:00
Ines Montani
43e59bb22a Update docs and install extras [ci skip] 2020-10-08 10:58:50 +02:00
Ines Montani
b79a420c20 Adjust version pin [ci skip] 2020-10-07 13:16:56 +02:00
Sofie Van Landeghem
fff3f8ccfa
Fix packaging pin (#6212)
* pin packaging to >=20.0

* ignore spacy-pkuseg in requirements unit test
2020-10-06 14:16:05 +02:00
Ines Montani
4cf73d85bc Add [zh] to extras [ci skip] 2020-10-05 21:37:09 +02:00
Sofie Van Landeghem
f4f49f5877
update blis (#6198)
* allow higher blis version

* fix typo

* bump to 3.0.0a34

* fix pins in other files
2020-10-05 14:58:56 +02:00
Ines Montani
52e4586ec1 Add transformers to extras_require [ci skip] 2020-10-03 11:13:00 +02:00
Ines Montani
6d8df081bd
Merge pull request #6180 from adrianeboyd/docs/minor-v3-2 [ci skip] 2020-10-02 11:37:25 +02:00
Adriane Boyd
351f352cdc Update Japanese docs and pin for sudachipy 2020-10-02 10:12:44 +02:00
Ines Montani
01c1538c72 Integrate file readers 2020-10-02 01:36:06 +02:00
Ines Montani
95b2a448cf Update lookups data pin [ci skip] 2020-09-30 00:24:42 +02:00
Ines Montani
7d04ba20c0 Update Thinc 2020-09-30 00:05:17 +02:00
Ines Montani
d3c63b7965 Merge branch 'develop' into feature/prepare 2020-09-29 20:53:05 +02:00
svlandeg
cd21eb2485 upgrade pydantic pin for thinc's field.default_factory 2020-09-28 16:45:48 +02:00
Ines Montani
e44a7519cd Update CLI and add [initialize] block 2020-09-28 11:56:14 +02:00
Ines Montani
c0c842ae5b Update Thinc version 2020-09-27 23:24:40 +02:00
Ines Montani
7e938ed63e Update config resolution to use new Thinc 2020-09-27 22:21:31 +02:00
Ines Montani
ca3c997062 Improve CLI config validation with latest Thinc 2020-09-26 13:13:57 +02:00
Sofie Van Landeghem
009ba14aaf
Fix pretraining in train script (#6143)
* update pretraining API in train CLI

* bump thinc to 8.0.0a35

* bump to 3.0.0a26

* doc fixes

* small doc fix
2020-09-25 15:47:10 +02:00
Ines Montani
76bbed3466 Use Literal type for nr_feature_tokens 2020-09-23 16:00:03 +02:00
Sofie Van Landeghem
39872de1f6
Introducing the gpu_allocator (#6091)
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'

* --code instead of --code-path

* update documentation

* avoid querying the "system" section directly

* add explanation of gpu_allocator to TF/PyTorch section in docs

* fix typo

* fix typo 2

* use set_gpu_allocator from thinc 8.0.0a34

* default null instead of empty string
2020-09-19 01:17:02 +02:00
svlandeg
0dc914b667 bump thinc to 8.0.0a33 2020-09-16 16:42:58 +02:00
Ines Montani
a25bb50e36
Merge pull request #6036 from explosion/chore/update-lookups-data
Update to latest spacy-lookups-data
2020-09-09 21:47:17 +02:00
Sofie Van Landeghem
60f22e1800
Pipe API (#6034)
* ensure Language passes on valid examples for initialization

* fix tagger model initialization

* check for valid get_examples across components

* assume labels were added before begin_training

* fix senter initialization

* fix morphologizer initialization

* use methods to check arguments

* test textcat init, requires thinc>=8.0.0a31

* fix tok2vec init

* fix entity linker init

* use islice

* fix simple NER

* cleanup debug model

* fix assert statements

* fix tests

* throw error when adding a label if the output layer can't be resized anymore

* fix test

* add failing test for simple_ner

* UX improvements

* morphologizer UX

* assume begin_training gets a representative set and processes the labels

* remove assumptions for output of untrained NER model

* restore test for original purpose
2020-09-08 22:44:25 +02:00
Ines Montani
40058ee626 Update to latest spacy-lookups-data 2020-09-08 12:23:06 +02:00
Ines Montani
ff4175e839 Add more info to debug config 2020-08-27 18:17:58 +02:00
Ines Montani
3aec98ca38 Update wasabi: new diff_strings and MarkdownRenderer 2020-08-26 15:33:11 +02:00
Ines Montani
e12b03358b
Support removing extra values in fill-config (#5966)
* Support removing extra values in fill-config

* Fix test
2020-08-24 22:53:47 +02:00
Matthew Honnibal
463f1c8623 Avoid requiring smart-open directly 2020-08-24 14:49:17 +02:00
Matthew Honnibal
e559867605
Allow spacy project to push and pull to/from remote storage (#5949)
* Add utils for working with remote storage

* WIP add remote_cache for project

* WIP add push and pull commands

* Use pathy in remote_cache

* Updarte util

* Update remote_cache

* Update util

* Update project assets

* Update pull script

* Update push script

* Fix type annotation in util

* Work on remote storage

* Remove site and env hash

* Fix imports

* Fix type annotation

* Require pathy

* Require pathy

* Fix import

* Add a util to handle project variable substitution

* Import push and pull commands

* Fix pull command

* Fix push command

* Fix tarfile in remote_storage

* Improve printing

* Fiddle with status messages

* Set version to v3.0.0a9

* Draft docs for spacy project remote storages

* Update docs [ci skip]

* Use Thinc config to simplify and unify template variables

* Auto-format

* Don't import Pathy globally for now

Causes slow and annoying Google Cloud warning

* Tidy up test

* Tidy up and update tests

* Update to latest Thinc

* Update docs

* variables -> vars

* Update docs [ci skip]

* Update docs [ci skip]

Co-authored-by: Ines Montani <ines@ines.io>
2020-08-23 18:32:09 +02:00
Ines Montani
6ad59d59fe Merge branch 'develop' of https://github.com/explosion/spaCy into develop [ci skip] 2020-08-20 11:20:58 +02:00
Ines Montani
daba316930 Update Thinc version 2020-08-14 18:39:51 +02:00
Ines Montani
67cc39af7f Update Thinc and include section order 2020-08-14 14:06:22 +02:00
Ines Montani
88b0a96801 Update for new Thinc and adjust config 2020-08-13 17:38:30 +02:00
Ines Montani
955d7b1b6b Update to latest Thinc 2020-08-07 14:41:35 +02:00
Ines Montani
ab5ef37abb Update to latest Thinc 2020-08-05 15:00:49 +02:00
svlandeg
5fa3235d06 set DATA_VALIDATION to False for debug_model (upgrade thinc) 2020-07-31 15:21:01 +02:00
Matthew Honnibal
520d25cb50
Add smart_open dependency to fetch project assets (#5812)
* Use smart_open for project assets

* Fix assets.py

* Update pyproject.toml
2020-07-26 12:15:00 +02:00
Ines Montani
e92df281ce Tidy up, autoformat, add types 2020-07-25 15:01:15 +02:00
Ines Montani
a063a82c40 Tidy up __init__.py 2020-07-25 12:14:37 +02:00