Sofie Van Landeghem
d307e9ca58
take care of global vectors in multiprocessing ( #5081 )
...
* restore load_nlp.VECTORS in the child process
* add unit test
* fix test
* remove unnecessary import
* add utf8 encoding
* import unicode_literals
2020-03-03 13:58:22 +01:00
adrianeboyd
d078b47c81
Break out of infinite loop as intended ( #5077 )
2020-03-03 12:29:05 +01:00
adrianeboyd
697bec764d
Normalize IS_SENT_START to SENT_START for Matcher ( #5080 )
2020-03-03 12:22:39 +01:00
adrianeboyd
2281c4708c
Restore empty tokenizer properties ( #5026 )
...
* Restore empty tokenizer properties
* Check for types in tokenizer.from_bytes()
* Add test for setting empty tokenizer rules
2020-03-02 11:55:02 +01:00
Sofie Van Landeghem
c6b12ab02a
Bugfix/get doc ( #5049 )
...
* new (broken) unit test
* fixing get_doc method
2020-03-02 11:49:28 +01:00
Ines Montani
648f61d077
Tidy up compiler flags and imports ( #5071 )
2020-03-02 11:48:10 +01:00
Ines Montani
7efaa76168
Update errors.py
2020-02-28 12:23:31 +01:00
Ines Montani
37691e6d5d
Simplify warnings
2020-02-28 12:20:23 +01:00
Ines Montani
5da3ad682a
Tidy up and auto-format
2020-02-28 11:57:41 +01:00
adrianeboyd
65d7bab10f
Initialize all values in a2b/b2a in new align ( #5063 )
2020-02-27 18:43:00 +01:00
Sofie Van Landeghem
06f0a8daa0
Default settings to configurations ( #4995 )
...
* fix grad_clip naming
* cleaning up pretrained_vectors out of cfg
* further refactoring Model init's
* move Model building out of pipes
* further refactor to require a model config when creating a pipe
* small fixes
* making cfg in nn_parser more consistent
* fixing nr_class for parser
* fixing nn_parser's nO
* fix printing of loss
* architectures in own file per type, consistent naming
* convenience methods default_tagger_config and default_tok2vec_config
* let create_pipe access default config if available for that component
* default_parser_config
* move defaults to separate folder
* allow reading nlp from package or dir with argument 'name'
* architecture spacy.VocabVectors.v1 to read static vectors from file
* cleanup
* default configs for nel, textcat, morphologizer, tensorizer
* fix imports
* fixing unit tests
* fixes and clean up
* fixing defaults, nO, fix unit tests
* restore parser IO
* fix IO
* 'fix' serialization test
* add *.cfg to manifest
* fix example configs with additional arguments
* replace Morpohologizer with Tagger
* add IO bit when testing overfitting of tagger (currently failing)
* fix IO - don't initialize when reading from disk
* expand overfitting tests to also check IO goes OK
* remove dropout from HashEmbed to fix Tagger performance
* add defaults for sentrec
* update thinc
* always pass a Model instance to a Pipe
* fix piped_added statement
* remove obsolete W029
* remove obsolete errors
* restore byte checking tests (work again)
* clean up test
* further test cleanup
* convert from config to Model in create_pipe
* bring back error when component is not initialized
* cleanup
* remove calls for nlp2.begin_training
* use thinc.api in imports
* allow setting charembed's nM and nC
* fix for hardcoded nM/nC + unit test
* formatting fixes
* trigger build
2020-02-27 18:42:27 +01:00
Matthew Honnibal
b4e0d2bf50
Improve Makefile ( #5067 )
...
* Improve pex making
* Update gitignore
2020-02-26 20:59:10 +01:00
Adriane Boyd
9f740a9891
Add a few more Danish tokenizer exceptions
2020-02-26 14:59:03 +01:00
Ines Montani
1c212215cd
Merge pull request #5064 from adrianeboyd/feature/german-tokenization
...
Improve German tokenization
2020-02-26 13:41:44 +01:00
Ines Montani
f39ddda193
Merge pull request #5062 from svlandeg/bugfix/merge-conflicts
...
Fix sync between master and develop
2020-02-26 13:41:16 +01:00
Ines Montani
56978f5cd8
Merge pull request #5060 from svlandeg/feature/update-thinc
...
update thinc
2020-02-26 13:40:23 +01:00
Adriane Boyd
d1f703d78d
Improve German tokenization
...
Improve German tokenization with respect to Tiger.
2020-02-26 13:06:52 +01:00
Ines Montani
54da6a2a07
Update pyproject.toml
2020-02-26 12:51:53 +01:00
Ines Montani
ed9358420e
Merge branch 'master' into pr/5060
2020-02-26 12:51:29 +01:00
adrianeboyd
ff184b7a9c
Add tag_map argument to CLI debug-data and train ( #4750 ) ( #5038 )
...
Add an argument for a path to a JSON-formatted tag map, which is used to
update and extend the default language tag map.
2020-02-26 12:10:38 +01:00
svlandeg
18ff97589d
update spacy to 2.2.4.dev0
2020-02-26 10:50:05 +01:00
svlandeg
62406a9513
update from thinc 7.4.0.dev2 to 7.4.0
2020-02-26 10:30:35 +01:00
svlandeg
fc6e34c3a1
fix bugs from porting master to develop
2020-02-26 08:44:22 +01:00
Ines Montani
c7e3c034d2
Merge pull request #5061 from explosion/fix/pyproject-toml-master
...
Update pyproject.toml
2020-02-25 20:22:26 +01:00
Ines Montani
192b8d45a1
Merge pull request #5008 from svlandeg/fix/build_dependencies
...
Re-add pyproject.toml and add tests for dependency version consistency
2020-02-25 16:52:18 +01:00
Ines Montani
dc36ec98a4
Update pyproject.toml
2020-02-25 16:46:14 +01:00
Ines Montani
b6a6cff708
Add blis to pyproject.toml
2020-02-25 16:17:23 +01:00
Ines Montani
912572e04a
Only copy if file exists (not if installed from sdist etc.)
2020-02-25 16:01:58 +01:00
Ines Montani
436b26fe0f
Revert other changes
2020-02-25 15:48:29 +01:00
Ines Montani
c1a5ece65f
Tidy up setup and update requirements tests
2020-02-25 15:46:39 +01:00
Ines Montani
5d21d3e8b9
Merge branch 'develop' into pr/5008
2020-02-25 15:24:47 +01:00
Ines Montani
acb4e3c7ba
Merge pull request #5039 from adrianeboyd/typo/website-token-api-shape
...
Fix formatting in Token API
2020-02-25 14:57:25 +01:00
Ines Montani
d50152b917
Merge pull request #5019 from questoph/master
...
Optimizing tokenization for Luxembourgish (dealing with apostrophe infixes)
2020-02-25 14:48:50 +01:00
Ines Montani
4440a072d2
Merge pull request #5006 from svlandeg/bugfix/multiproc-underscore
...
load Underscore state when multiprocessing
2020-02-25 14:46:02 +01:00
Ines Montani
38fc05986c
Merge pull request #5058 from bryant1410/patch-1
...
Add missing comma in a dependency specification
2020-02-25 14:44:29 +01:00
svlandeg
d848a68340
thinc 7.4.0.dev2
2020-02-25 12:07:42 +01:00
Santiago Castro
54d8665ff7
Add missing comma in a dependency specification
...
Conda is complaining that it can't parse that line otherwise.
2020-02-24 16:15:28 -05:00
svlandeg
d5bfebe1c5
it's moving day
2020-02-24 10:04:24 +01:00
svlandeg
217c16c7a9
running tests BEFORE deleting them ?
2020-02-24 09:38:43 +01:00
svlandeg
6f846c2cbf
removing --pyargs for testing purposes
2020-02-24 09:19:08 +01:00
svlandeg
d821c95eb0
debugging prints
2020-02-23 17:38:33 +01:00
svlandeg
58568bd0cd
fix
2020-02-23 16:45:37 +01:00
svlandeg
0f55e51704
assert we found the root_dir
2020-02-23 16:33:58 +01:00
svlandeg
783da088ea
avoid try except
2020-02-23 16:21:21 +01:00
svlandeg
b49a3afd0c
use clean_underscore fixture
2020-02-23 15:49:20 +01:00
Ines Montani
d6c0746347
Merge branch 'master' into spacy.io
2020-02-23 13:57:01 +01:00
Ines Montani
4890db6339
Auto-format and fix image [ci skip]
2020-02-23 13:56:50 +01:00
Ines Montani
89967f3701
Merge branch 'master' into spacy.io
2020-02-23 12:04:20 +01:00
Tom Keefe
ddf63b97a8
make idx available via to_array ( #5030 )
2020-02-22 14:13:06 +01:00
Sofie Van Landeghem
44f4142ce4
add two abbreviations and some additional unit tests ( #5040 )
2020-02-22 14:12:32 +01:00