Commit Graph

8489 Commits

Author SHA1 Message Date
Adriane Boyd
6108dabdc8 Rephrase error related to sample data initialization
Now that the initialize step is fully implemented, the source of E923 is
typically missing or improperly converted/formatted data rather than a
bug in spaCy, so rephrase the error and message and remove the prompt to
open an issue.
2021-02-08 09:21:36 +01:00
Sofie Van Landeghem
6ed423c16c
reduce memory load when reading all vectors from file (#6945)
* reduce memory load when reading all vectors from file

* one more small typo fix
2021-02-07 08:05:43 +08:00
Sofie Van Landeghem
a323ef90df
ensure the loss value is cast as float (#6928) 2021-02-07 07:51:56 +08:00
melonwater211
a7977b5143
The test spacy/tests/vocab_vectors/test_lexeme.py::test_vocab_lexeme_add_flag_auto_id seems to fail occasionally when the test suite is run in a random order. (#6956)
```python
    def test_vocab_lexeme_add_flag_auto_id(en_vocab):
        is_len4 = en_vocab.add_flag(lambda string: len(string) == 4)
        assert en_vocab["1999"].check_flag(is_len4) is True
        assert en_vocab["1999"].check_flag(IS_DIGIT) is True
        assert en_vocab["199"].check_flag(is_len4) is False
>       assert en_vocab["199"].check_flag(IS_DIGIT) is True
E       assert False is True
E        +  where False = <built-in method check_flag of spacy.lexeme.Lexeme object at 0x7fa155c36840>(3)
E        +    where <built-in method check_flag of spacy.lexeme.Lexeme object at 0x7fa155c36840> = <spacy.lexeme.Lexeme object at 0x7fa155c36840>.check_flag

spacy/tests/vocab_vectors/test_lexeme.py:49: AssertionError
```

>  `pytest==6.1.1`
>
>  `numpy==1.19.2`
>
> `Python version: 3.8.3`

To reproduce the error, run `pytest --random-order-bucket=global --random-order-seed=170158 -v spacy/tests`

If `test_vocab_lexeme_add_flag_auto_id` is run after `test_vocab_lexeme_add_flag_provided_id`, it fails.
It seems like `test_vocab_lexeme_add_flag_provided_id` uses the `IS_DIGIT` bit for testing purposes but does not reset the bit.

This solution seems to work but, if anyone has a better fix, please let me know and I will integrate it.
2021-02-07 07:51:34 +08:00
René Octavio Queiroz Dias
59271e887a
fix: TransformerListener with TextCatEnsemble (#6951)
* bug: Regression test
Issue #6946

* fix: Fix issue #6946

* chore: Remove regression test
2021-02-06 13:44:51 +01:00
René Octavio Queiroz Dias
999ff03b19
fix: Fix textcat labels to expect a Optional[Iterable[str]] instead of Optional[Dict] (#6911)
* docs: Add agreement

* bug: Regression test

Issue #6908

* fix: Changed from Dict to Iterable[str]

Fix #6908

* Update test to use make_tempdir

* fix: Fix WindowsPath error

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-02-04 23:37:13 +01:00
Adriane Boyd
b903de3fcb
Pass on vocab arg in spacy.blank() (#6924) 2021-02-04 15:09:01 +01:00
svlandeg
f852af2acf add capture arg 2021-02-02 19:47:12 +01:00
Matthew Honnibal
b6a198481b Set version to v3.0.0 2021-02-02 20:26:17 +11:00
Sofie Van Landeghem
f319d2765f
Add capture argument to project_run (#6878)
* add capture argument to project_run and run_commands

* git bump to 3.0.1

* Set version to 3.0.1.dev0

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2021-02-02 10:11:15 +08:00
Sofie Van Landeghem
f638306598
remove link_components flag again (#6883) 2021-02-02 10:08:40 +08:00
Ines Montani
a59f3fcf5d Make wheel the default format and update docs [ci skip] 2021-02-01 23:18:43 +11:00
Ines Montani
b9573e9e22 Fix pip args 2021-02-01 23:15:00 +11:00
Ines Montani
b46073234a Fix default clone branch and error handling [ci skip] 2021-02-01 22:29:04 +11:00
Sofie Van Landeghem
acabb284dd
Fix linking resumed components (#6859)
* link components across enabled, resumed and frozen

* revert renaming

* revert renaming, the sequel
2021-02-01 22:19:58 +11:00
Adriane Boyd
35a863cd27 Remove nlp.tokenizer from quickstart template
Remove `nlp.tokenizer` from quickstart template so that the default
language-specific tokenizer settings are filled instead.
2021-02-01 11:20:12 +01:00
svlandeg
91e72c031e reformatting 2021-01-30 17:29:33 +01:00
svlandeg
a8d84188f0 add stop words
Co-authored-by: tewodrosm <tedmaam2006@gmail.com>
2021-01-30 17:26:49 +01:00
Ines Montani
f058cbd751 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2021-01-30 21:03:25 +11:00
Ines Montani
14f631f52c Update parent package and version [ci skip] 2021-01-30 20:12:42 +11:00
Ines Montani
3435b894df Remove nightly reference from auto docs [ci skip] 2021-01-30 20:12:08 +11:00
Ines Montani
d0c3775712 Replace links to nightly docs [ci skip] 2021-01-30 20:09:38 +11:00
Ines Montani
b26a3daa9a
Merge pull request #6860 from explosion/feature/package-wheel 2021-01-30 14:17:01 +11:00
Ines Montani
2332c4280b Update and use unified --build option 2021-01-30 13:11:36 +11:00
Ines Montani
e6accb3a9e Tidy up and auto-format 2021-01-30 12:52:33 +11:00
Ines Montani
817b0db521 Fix escape sequence 2021-01-30 12:39:58 +11:00
Ines Montani
526b416118 Tidy up comments 2021-01-30 12:34:09 +11:00
Ines Montani
30765674d0 Merge branch 'master' into develop 2021-01-30 12:20:28 +11:00
Ines Montani
2609ba4e89 Support building wheel in spacy package 2021-01-30 11:54:02 +11:00
Pamphile ROY
41ee75ac6d
Remove --no-cache-dir when downloading models
When `--no-cache-dir` is present, it prevents caching to properly function.

If the user still wants to do this, there is the possibility to pass options with `user_pip_args`.
But you should not enforce options like these. In my case this is preventing some docker build (using buildkit caching) to have proper caching of models.
2021-01-29 15:37:44 +01:00
Ines Montani
bbf080dfe5
Merge pull request #6645 from bittlingmayer/patch-3 2021-01-30 01:26:28 +11:00
Adriane Boyd
bced6309e5
Add full exceptions with spaces 2021-01-29 14:27:22 +01:00
Ines Montani
7886d59c56 Add check for remove_listener method 2021-01-29 23:47:30 +11:00
Ines Montani
7694f76dd1 Update warning and mention replace_listeners 2021-01-29 23:46:01 +11:00
Ines Montani
94232aea08 Improve E889 2021-01-29 23:39:23 +11:00
Ines Montani
924396c20c Merge branch 'feature/replace-listeners' of https://github.com/explosion/spaCy into feature/replace-listeners 2021-01-29 21:43:10 +11:00
Ines Montani
2102082478 Make Tok2Vec.remove_listener return bool
Whether listener was removed
2021-01-29 21:41:38 +11:00
Ines Montani
e766e8c56d
Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-01-29 21:41:17 +11:00
Ines Montani
bc089b693c Update tests 2021-01-29 19:38:09 +11:00
Ines Montani
325f47500d Move replacement logic to Language.from_config 2021-01-29 19:37:04 +11:00
Ines Montani
0f3e3eedc2 Add Tok2vec.remove_listener 2021-01-29 19:36:38 +11:00
Ines Montani
99842387cb Remove default value 2021-01-29 18:45:37 +11:00
Ines Montani
44b5542d14 Change method order 2021-01-29 18:42:41 +11:00
Ines Montani
8c15d1daec Update and validate config first and exit early if paths don't exist 2021-01-29 18:24:47 +11:00
Ines Montani
bbb94b37c6 Update error handling and docstring 2021-01-29 16:27:49 +11:00
Ines Montani
01ecfbcc45 Merge branch 'develop' into feature/replace-listeners 2021-01-29 15:57:32 +11:00
Ines Montani
911dfcccfc Add option to replace listeners for sourced components 2021-01-29 15:57:04 +11:00
Adriane Boyd
fcce3600ed
Forbid OP matching 2+ tokens in DependencyMatcher (#6824)
Instead of silently using only the first token in each matched span:

* Forbid `OP: ?/*/+` through `DependencyMatcher` validation
* As a fail-safe, add warning if a token match that's not exactly one
token long is found by a token pattern.
2021-01-29 08:52:01 +08:00
Sofie Van Landeghem
24a697abb8
avoid empty aliases and improve UX and docs (#6840) 2021-01-29 08:51:40 +08:00
Sofie Van Landeghem
837a4f53c2
Error handling in nlp.pipe (#6817)
* add error handler for pipe methods

* add unit tests

* remove pipe method that are the same as their base class

* have Language keep track of a default error handler

* cleanup

* formatting

* small refactor

* add documentation
2021-01-29 08:51:21 +08:00