Commit Graph

1207 Commits

Author SHA1 Message Date
Peter Baumgartner
51b68aacad
Merge branch 'explosion:master' into data-debug-lemmatizer 2023-01-19 13:21:25 -05:00
Peter Baumgartner
a944e55291 partial annotation support 2023-01-12 09:10:39 -05:00
Daniël de Kok
319eb508b5
Add a spacy benchmark speed subcommand (#11902)
* Add a `spacy evaluate speed` subcommand

This subcommand reports the mean batch performance of a model on a data set with
a 95% confidence interval. For reliability, it first performs some warmup
rounds. Then it will measure performance on batches with randomly shuffled
documents.

To avoid having too many spaCy commands, `speed` is a subcommand of `evaluate`
and accuracy evaluation is moved to its own `evaluate accuracy` subcommand.

* Fix import cycle

* Restore `spacy evaluate`, make `spacy benchmark speed` an alias

* Add documentation for `spacy benchmark`

* CREATES -> PRINTS

* WPS -> words/s

* Disable formatting of benchmark speed arguments

* Fail with an error message when trying to speed bench empty corpus

* Make it clearer that `benchmark accuracy` is a replacement for `evaluate`

* Fix docstring webpage reference

* tests: check `evaluate` output against `benchmark accuracy`
2023-01-12 11:55:21 +01:00
Peter Baumgartner
28c31048c9 use 0 instead of none for no annotation 2023-01-11 14:47:10 -05:00
Peter Baumgartner
bfb9fa44a0 additional fixes
- set approach to identifying unique trees
- adjust line length on messages
- add logic for detecting docs without annotations
2023-01-11 14:36:28 -05:00
Sofie Van Landeghem
7f6c638c3a
fix processing of "auto" in convert (#12050)
* fix processing of "auto" in walk_directory

* add check for None

* move AUTO check to convert and fix verification of args

* add specific CLI test with CliRunner

* cleanup

* more cleanup

* update docstring
2023-01-05 10:21:00 +01:00
github-actions[bot]
90896504a5
Auto-format code with black (#12019)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-12-23 12:44:07 +01:00
kadarakos
c223cd7a86
Add apply CLI (#11376)
* annotate cli first try

* add batch-size and n_process

* rename to apply

* typing fix

* handle file suffixes

* walk directories

* support jsonl

* typing fix

* remove debug

* make suffix optional for walk

* revert unrelated

* don't warn but raise

* better error message

* minor touch up

* Update spacy/tests/test_cli.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* update tests and bugfix

* add force_overwrite

* typo

* fix adding .spacy suffix

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* store user data and rename cmd arg

* include test for user attr

* rename cmd arg

* better help message

* documentation

* prettier

* black

* link fix

* Update spacy/cli/apply.py

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* Update website/docs/api/cli.md

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* Update website/docs/api/cli.md

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* Update website/docs/api/cli.md

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* addressing reviews

* dont quit but warn

* prettier

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-12-20 17:11:33 +01:00
Zhangrp
23085ffef4
Fix interpolation in directory names, see #11235. (#11914) 2022-12-06 17:42:12 +09:00
Adriane Boyd
8afa8b5a7b
Refactor kwargs in CLI msg for future wasabi compatibility (#11918)
Necessary for mypy with wasabi v1+.
2022-12-05 10:00:00 +01:00
Paul O'Leary McCann
f9d17a644b
Config generation fails for GPU without transformers (#11899)
If you don't have spacy-transformers installed, but try to use `init
config` with the GPU flag, you'll get an error. The issue is that the
`use_transformers` flag in the config is conflated with the GPU flag,
and then there's an attempt to access transformers config info that may
not exist.

There may be a better way to do this, but this stops the error.
2022-12-02 10:17:11 +01:00
Peter Baumgartner
c53d57a54e fix reference dataset for dev data 2022-12-01 14:01:10 -05:00
Adriane Boyd
6f9d630f7e
Replace Pipe type with Callable in Language (#11803)
* Replace Pipe type with Callable in Language

* Use Callable[[Doc], Doc] in the docstrings
2022-11-29 13:20:08 +01:00
Adriane Boyd
1ebe7db07c
Support local filesystem remotes for projects (#11762)
* Support local filesystem remotes for projects

* Fix support for local filesystem remotes for projects
  * Use `FluidPath` instead of `Pathy` to support both filesystem and
    remote paths
  * Create missing parent directories if required for local filesystem
  * Add a more general `_file_exists` method to support both `Pathy`,
    `Path`, and `smart_open`-compatible URLs
* Add explicit `smart_open` dependency starting with support for
  `compression` flag
* Update `pathy` dependency to exclude older versions that aren't
  compatible with required `smart_open` version
* Update docs to refer to `Pathy` instead of `smart_open` for project
  remotes (technically you can still push to any `smart_open`-compatible
  path but you can't pull from them)
* Add tests for local filesystem remotes

* Update pathy for general BlobStat sorting

* Add import

* Remove _file_exists since only Pathy remotes are supported

* Format CLI docs

* Clean up merge
2022-11-29 11:40:58 +01:00
Adriane Boyd
681ec20914
Add smart_open requirement, update deprecated options (#11864)
* Switch from deprecated `ignore_ext` to `compression`
* Add upload/download test for local files
2022-11-25 13:00:57 +01:00
Raphael Mitsch
c0fd8a2e71
find-threshold: CLI command for multi-label classifier threshold tuning (#11280)
* Add foundation for find-threshold CLI functionality.

* Finish first draft for find-threshold.

* Add tests.

* Revert adjusted import statements.

* Fix mypy errors.

* Fix imports.

* Harmonize arguments with spacy evaluate command.

* Generalize component and threshold handling. Harmonize arguments with 'spacy evaluate' CLI.

* Fix Spancat test.

* Add beta parameter to Scorer and PRFScore.

* Make beta a component scorer setting.

* Remove beta.

* Update nlp.config (workaround).

* Reload pipeline on threshold change. Adjust tests. Remove confection reference.

* Remove assumption of component being a Pipe object or having a .cfg attribute.

* Adjust test output and reference values.

* Remove beta references. Delete universe.json.

* Reverting unnecessary changes. Removing unused default values. Renaming variables in find-cli tests.

* Update spacy/cli/find_threshold.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Remove adding labels in tests.

* Remove unused error

* Undo changes to PRFScorer

* Change default value for n_trials. Log table iteratively.

* Add warnings for pointless applications of find_threshold().

* Fix imports.

* Adjust type check of TextCategorizer to exclude subclasses.

* Change check of if there's only one unique value in scores.

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Incorporate feedback.

* Fix test issue. Update docstring.

* Update docs & docstring.

* Update spacy/tests/test_cli.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Add examples to docs. Rename _nlp to nlp in tests.

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-11-25 11:44:55 +01:00
Adriane Boyd
a83463c5e0
Add transformer recommendation for ca (#11819)
Model recommendation from @cayorodriguez.
2022-11-18 08:15:27 +01:00
Sofie Van Landeghem
caa9efad59
prevent rewriting an already raw URL (#11810) 2022-11-15 14:15:00 +01:00
github-actions[bot]
188a7d00eb
Auto-format code with black (#11792)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-11-11 09:58:31 +01:00
Adriane Boyd
03eebe9d1c
Update warning, add tests for project requirements check (#11777)
* Update warning, add tests for project requirements check

* Make warning more general for differences between PEP 508 and pip
* Add tests for _check_requirements

* Parameterize test
2022-11-09 10:59:28 +01:00
Adriane Boyd
e116395f89
Add fallback in requirements check, only check once (#11735)
* Add fallback in requirements check, only check once

* Rename to skip_requirements_check

* Update spacy/cli/project/run.py

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-11-07 14:46:08 +01:00
Adriane Boyd
e91b47a226
Check for unsafe paths in tarfile.extractall (CVE-2007-4559) (#11746)
* Adding tarfile member sanitization to extractall()

* Format

* Simplify and add error message

* Fix import

* Add comment about CVE

Co-authored-by: TrellixVulnTeam <charles.mcfarland@trellix.com>
2022-11-07 10:43:34 +01:00
Paul O'Leary McCann
858565a567
Fix issues with DVC commands (#11592)
* Fix flag handling in dvc

Prior to this commit, if a flag (--verbose or --quiet) was passed to
DVC, it would be added to the end of the generated dvc command line.
This would result in the command being interpreted as part of the actual
command to run, rather than an argument to dvc. This would result in
command lines like:

    spacy project run preprocess --verbose

That would fail with an error that there's no such directory as
`--verbose`.

This change puts the flags at the front of the dvc command so that they
are interpreted correctly. It removes the `run_dvc_commands` function,
which had been reduced to just a for loop and wasn't used elsewhere.

A separate problem is that there's no way to specify the quiet behaviour
to dvc from the command line, though it's unclear if that's a bug.

* Add dvc quiet flag to docs

* Handle case in DVC where no commands are appropriate

If only have commands with no deps or outputs (admittedly unlikely), you
get a weird error about the dvc file not existing. This gives explicit
output instead.

* Add support for quiet flag

* Fix command execution

Commands are strings now because they're joined further up.
2022-10-18 15:11:39 +09:00
Adriane Boyd
6d7630c5d3
Allow overriding spacy_version in spacy package meta (#11552) 2022-09-29 10:44:06 +02:00
Peter Baumgartner
e794d4ae39
debug data Spancat Table Improvements (#11504)
* update

* fix format function

* pull out _format_number

* format with black
2022-09-28 17:16:05 +02:00
Raphael Mitsch
af9b01ef97
Add dependency check to project step runs (#11226)
* Add dependency check to project step running.

* Fix dependency mismatch warning.

* Remove newline.

* Add types-setuptools to setup.cfg.

* Move types-setuptools to test requirements. Move warnings into _validate_requirements(). Handle file reading in project_run().

* Remove newline formatting for output of package conflicts.

* Show full version conflict message instead of just package name.

* Update spacy/cli/project/run.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Fix typo.

* Re-add rephrasing of message for conflicting packages. Remove requirements path redundancy.

* Update spacy/cli/project/run.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/cli/project/run.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Print unified message for requirement conflicts and missing requirements.

* Update spacy/cli/project/run.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Fix warning message.

* Print conflict/missing messages individually.

* Print conflict/missing messages individually.

* Add check_requirements setting in project.yml to disable requirements check.

* Update website/docs/usage/projects.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/usage/projects.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update description of project.yml structure in projects.md.

* Update website/docs/usage/projects.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Prettify projects docs.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-09-16 16:54:31 +02:00
kadarakos
6b83fee58d
Assets message (#11458)
* new error message when 'project run assets'

* new error message when 'project run assets'

* Update spacy/cli/project/run.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-09-09 17:17:10 +02:00
Adriane Boyd
8a86a35eab
Remove has_letters in config template (#11465)
Due to problems with the javascript conversion in the website
quickstart, remove the `has_letters` setting to simplify generating
`attrs` for the default `tok2vec`.

Additionally reduce `PREFIX` as in the trained pipelines.
2022-09-09 15:10:04 +02:00
github-actions[bot]
0c72c6bb2c
Auto-format code with black (#11468)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-09-09 11:21:17 +02:00
Peter Baumgartner
73bedaa9ec rm component function 2022-09-08 13:53:16 -04:00
Peter Baumgartner
5c3337b81b cleanup + reword 2022-09-08 11:57:15 -04:00
Peter Baumgartner
c09d99a069 cleanup 2022-09-07 10:30:58 -04:00
Paul O'Leary McCann
977dc33312
Add a way to get the URL to download a pipeline to the CLI (#11175)
* Add a dry run flag to download

* Remove --dry-run, add --url option to `spacy info` instead

* Make mypy happy

* Print only the URL, so it's easier to use in scripts

* Don't add the egg hash unless downloading an sdist

* Update spacy/cli/info.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Add two implementations of requirements

* Clean up requirements sample slightly

This should make mypy happy

* Update URL help string

* Remove requirements option

* Add url option to docs

* Add URL to spacy info model output, when available

* Add types-setuptools to testing reqs

* Add types-setuptools to requirements

* Add "compatible", expand docstring

* Update spacy/cli/info.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Run prettier on CLI docs

* Update docs

Add a sidebar about finding download URLs, with some examples of the new
command.

* Add download URLs to table on model page

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Updates from review

* download url -> download link

* Update docs

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-09-02 11:58:21 +02:00
Peter Baumgartner
64b57204e2 WIP 2022-09-01 13:00:18 -04:00
Peter Baumgartner
19145c6c92 rm total 2022-08-31 14:22:17 -04:00
Peter Baumgartner
0bd7c095e3 rm ipython embeds 2022-08-31 14:06:13 -04:00
Peter Baumgartner
6007404682 WIP 2022-08-31 14:00:43 -04:00
Adriane Boyd
b07708d5d0
Support full prerelease versions in the compat table (#11228)
* Support full prerelease versions in the compat table

* Fix types
2022-08-04 15:14:19 +02:00
Edward
360a702ecd
Add parent argument (#11210) 2022-07-26 14:35:18 +02:00
Zackere
8ffff18ac4
Try cloning repo from main & master (#10843)
* Try cloning repo from main & master

* fixup! Try cloning repo from main & master

* fixup! fixup! Try cloning repo from main & master

* refactor clone and check for repo:branch existence

* spacing fix

* make mypy happy

* type util function

* Update spacy/cli/project/clone.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Peter Baumgartner <5107405+pmbaumgartner@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-06-28 09:11:15 -04:00
Sofie Van Landeghem
f00254ae27
add counts to verbose list of NER labels (#10957) 2022-06-20 09:48:40 +02:00
Sofie Van Landeghem
eaeca5eb6a
account for NER labels with a hyphen in the name (#10960)
* account for NER labels with a hyphen in the name

* cleanup

* fix docstring

* add return type to helper method

* shorter method and few more occurrences

* user helper method across repo

* fix circular import

* partial revert to avoid circular import
2022-06-17 20:02:37 +01:00
Raphael Mitsch
a7f6bc5dfb
Workaround for Typer optional default values with Python calls (#10788)
* Workaround for Typer optional default values with Python calls: added test and workaround.

* @rmitsch Workaround for Typer optional default values with Python calls: reverting some black formatting changes.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* @rmitsch Workaround for Typer optional default values with Python calls: removing return type hint.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Workaround for Typer optional default values with Python calls: fixed imports, added GitHub issue marker.

* Workaround for Typer optional default values with Python calls: removed forcing of default values for optional arguments in init_config_cli(). Added default values for init_config(). Synchronized default values for init_config_cli() and init_config().

* Workaround for Typer optional default values with Python calls: removed unused import.

* Workaround for Typer optional default values with Python calls: fixed usage of optimize in init_config_cli().

* Workaround for Typer optional default values with Pythhon calls: remove output_file from InitDefaultValues.

* Workaround for Typer optional default values with Python calls: rename class for default init values.

* Workaround for Typer optional default values with Python calls: remove newline.

* remove introduced newlines

* Remove test_init_config_from_python_without_optional_args().

* remove leftover import

* reformat import

* remove duplicate

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-06-17 12:15:36 +02:00
Daniël de Kok
7c6a97559d Simplify GPU check
This change removes `thinc.util.has_cupy` from the GPU presence check.
Currently `gpu_is_available` already implies `has_cupy`.  We also want
to show this warning in the future when a machine has a non-CuPy GPU.
2022-05-25 14:06:45 +02:00
Lj Miranda
1d34aa2b3d
Add spacy-span-analyzer to debug data (#10668)
* Rename to spans_key for consistency

* Implement spans length in debug data

* Implement how span bounds and spans are obtained

In this commit, I implemented how span boundaries (the tokens) around a
given span and spans are obtained. I've put them in the compile_gold()
function so that it's accessible later on. I will do the actual
computation of the span and boundary distinctiveness in the main
function above.

* Compute for p_spans and p_bounds

* Add computation for SD and BD

* Fix mypy issues

* Add weighted average computation

* Fix compile_gold conditional logic

* Add test for frequency distribution computation

* Add tests for kl-divergence computation

* Fix weighted average computation

* Make tables more compact by rounding them

* Add more descriptive checks for spans

* Modularize span computation methods

In this commit, I added the _get_span_characteristics and
_print_span_characteristics functions so that they can be reusable
anywhere.

* Remove unnecessary arguments and make fxs more compact

* Update a few parameter arguments

* Add tests for print_span and get_span methods

* Update API to talk about span characteristics in brief

* Add better reporting of spans_length

* Add test for span length reporting

* Update formatting of span length report

Removed '' to indicate that it's not a string, then
sort the n-grams by their length, not by their frequency.

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Show all frequency distribution when -V

In this commit, I displayed the full frequency distribution of the
span lengths when --verbose is passed. To make things simpler, I
rewrote some of the formatter functions so that I can call them
whenever.

Another notable change is that instead of showing percentages as
Integers, I showed them as floats (max 2-decimal places). I did this
because it looks weird when it displays (0%).

* Update logic on how total is computed

The way the 90% thresholding is computed now is that we keep
adding the percentages until we reach >= 90%. I also updated the wording
and used the term "At least" to denote that >= 90% of your spans have
these distributions.

* Fix display when showing the threshold percentage

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Add better phrasing for span information

* Update spacy/cli/debug_data.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Add minor edits for whitespaces etc.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-05-23 19:06:38 +02:00
Raphael Mitsch
2904359685
Allow assets to be optional in spacy project (#10714)
* Allow assets to be optional in spacy project: draft for optional flag/download_all options.

* Allow assets to be optional in spacy project: added OPTIONAL_DEFAULT reflecting default asset optionality.

* Allow assets to be optional in spacy project: renamed --all to --extra.

* Allow assets to be optional in spacy project: included optional flag in project config test.

* Allow assets to be optional in spacy project: added documentation.

* Allow assets to be optional in spacy project: fixing deprecated --all reference.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Allow assets to be optional in spacy project: fixed project_assets() docstring.

* Allow assets to be optional in spacy project: adjusted wording in justification of optional assets.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: switched to  as keyword in project.yml. Updated docs.

* Allow assets to be optional in spacy project: updated comment.

* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in output.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in docstring..

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in test..

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in test.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: renamed OPTIONAL_DEFAULT to EXTRA_DEFAULT.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-05-10 10:40:11 +02:00
Lj Miranda
02dafa3a84
Add debug diff command in spaCy CLI (#10502)
* Add initial design for diff command

For now, the diffing process looks like this:
- The default config is created based from some values in the user
config (e.g. which pipeline components were used, the lang, etc.)
- The user must supply manually if it was optimized for acc/efficiency
and if pretraining was involved.

* Make diff command structure similar to siblings

* Include gpu as a user option for CLI

* Make variables more explicit

* Fix type declaration for optimize enum

* Improve docstrings for diff CLI

* Add debug-diff to website API docs

* Switch position of configs so that user config is modded

* Add markdown flag for debug diff

This commit adds a --markdown (--md) flag that allows easier
copy-pasting to Github issues. Please note that this commit is dependent
on an unreleased version of wasabi (for the time being).

For posterity, the related PR is found here: https://github.com/ines/wasabi/pull/20

* Bump version of wasabi to 0.9.1

So that we can use the add_symbols parameter.

* Apply suggestions from code review

Co-authored-by: Ines Montani <ines@ines.io>

* Update docs based on code review suggestions

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Change command name from diff -> diff-config

* Clarify when options are relevant or not

* Rerun prettier on cli.md

Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-04-07 10:48:45 +02:00
Adriane Boyd
03762b4b92
Add spancat, trainable_lemmatizer to quickstart (#10524)
* Add `SPACY` and `IS_SPACE` as default `tok2vec` features
2022-04-01 09:01:04 +02:00
Adriane Boyd
e3ccc1973b
Provide debug data info for floret vectors (#10592) 2022-03-31 15:11:32 +02:00
Adriane Boyd
d85117f88c
Stream large assets on download (#10521)
Stream large assets on download rather than reading the whole file at
once and potentially running into `urllib3` limits on single read sizes.
2022-03-24 11:47:05 +01:00