spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-08 09:41:11 +03:00

Author	SHA1	Message	Date
Ines Montani	6cfa66ed1c	Make training.loop return nlp object and path (#6520 )	2020-12-08 14:55:55 +08:00
Sofie Van Landeghem	2c27093c5f	require_cpu functionality (#6336 ) * add require_cpu from Thinc 8.0.0rc2 * add docs * fix test if cupy is not installed	2020-12-08 14:42:40 +08:00
Ines Montani	d8e01ca931	Merge pull request #6391 from adrianeboyd/docs/install-guide	2020-12-08 07:42:16 +01:00
Sofie Van Landeghem	f98a04434a	pretrain architectures (#6451 ) * define new architectures for the pretraining objective * add loss function as attr of the omdel * cleanup * cleanup * shorten name * fix typo * remove unused error	2020-12-08 14:41:03 +08:00
Adriane Boyd	29b058ebdc	Fix spacy when retokenizing cases with affixes (#6475 ) Preserve `token.spacy` corresponding to the span end token in the original doc rather than adjusting for the current offset. * If not modifying in place, this checks in the original document (`doc.c` rather than `tokens`). * If modifying in place, the document has not been modified past the current span start position so the value at the current span end position is valid.	2020-12-08 14:25:56 +08:00
Adriane Boyd	4448680750	Fix alignment for 1-to-1 tokens and lowercasing (#6476 ) * When checking for token alignments, check not only that the tokens are identical but that the character positions are both at the start of a token. It's possible for the tokens to be identical even though the two tokens aren't aligned one-to-one in a case like `["a'", "''"]` vs. `["a", "''", "'"]`, where the middle tokens are identical but should not be aligned on the token level at character position 2 since it's the start of one token but the middle of another. * Use the lowercased version of the token texts to create the character-to-token alignment because lowercasing can change the string length (e.g., for `İ`, see the not-a-bug bug report: https://bugs.python.org/issue34723)	2020-12-08 14:25:16 +08:00
Ines Montani	ee2ec52f48	Merge pull request #6409 from svlandeg/feature/trf-docs	2020-12-08 06:32:10 +01:00
Ines Montani	c2b196c2c1	Merge pull request #6419 from svlandeg/feature/rel-docs	2020-12-08 06:30:41 +01:00
Ines Montani	82e88f0e3b	Merge pull request #6379 from svlandeg/fix/labels-constructor	2020-12-08 06:29:56 +01:00
Adriane Boyd	78085fab1f	Check for spacy-nightly package in download (#6502 ) Also check for spacy-nightly in download so that `--no-deps` isn't set for normal nightly installs.	2020-12-04 09:40:03 +01:00
Ines Montani	63f83e7034	Merge pull request #6470 from adrianeboyd/feature/license-in-package	2020-12-04 03:55:54 +01:00
Sofie Van Landeghem	d6c616a125	Fixes in test suite (#6457 ) * fix slow test for textcat readers * cleanup test_issue5551 * add explicit score weight * cleanup	2020-12-02 12:57:08 +01:00
Adriane Boyd	31ec9a906e	Clean up 3rd party license info (#6478 ) Move scikit-learn license from `Scorer` to `licenses/3rd_party_licenses.txt`.	2020-12-02 10:15:23 +01:00
Adriane Boyd	591cd48aa8	Remove config.cfg from MANIFEST	2020-12-01 12:58:02 +01:00
Adriane Boyd	b0dd13e0ba	Support LICENSE in spacy package If present, include the file `input_dir/LICENSE` at the top level of the packaged model.	2020-11-30 13:43:58 +01:00
Adriane Boyd	1442d2f213	Improve simple training example in v3 migration (#6438 ) * Create the examples once * Use the examples in the initialization * Provide the batch size * Fix `begin_training` migration example	2020-11-30 09:39:45 +08:00
Sofie Van Landeghem	079f6ea474	avoid resolving the full config (#6465 )	2020-11-30 09:34:29 +08:00
Ines Montani	9beba7164f	Make jinja2 top-level import No problem anymore since it's now an official dependency	2020-11-27 15:17:14 +08:00
Ines Montani	d21d2c2e59	Don't multiply accuracy by 100	2020-11-27 15:15:51 +08:00
Adriane Boyd	26296ab223	Add error message if DocBin zlib decompress fails (#6394 ) Add a better error message if DocBin zlib decompress fails, indicating that the data is not in `DocBin` format.	2020-11-27 14:39:49 +08:00
Adriane Boyd	6f133877aa	Update source install instructions * Don't recommend an editable install in the default source instructions. * Use `pip install --no-build-isolation` for editable installs. * Remove reference to `virtualenv`.	2020-11-24 14:44:13 +01:00
svlandeg	218abaa69a	typo	2020-11-20 22:36:49 +01:00
svlandeg	e861e928df	more small corrections	2020-11-20 22:29:58 +01:00
svlandeg	5ac0867427	final fixes	2020-11-20 22:18:53 +01:00
svlandeg	331ec83493	edits and updates to implementing REL component docs	2020-11-20 21:41:52 +01:00
svlandeg	4a3e611abc	small fixes and formatting	2020-11-20 15:55:05 +01:00
svlandeg	124f49feb6	update REL model code	2020-11-20 15:25:20 +01:00
svlandeg	636be3c791	Merge remote-tracking branch 'upstream/develop' into feature/trf-docs	2020-11-19 14:15:35 +01:00
Sofie Van Landeghem	165993d8e5	fix typo in transformer docs (#6404 )	2020-11-19 14:11:38 +01:00
Adriane Boyd	96726ec1f6	Fix DocBin init in training example (#6396 )	2020-11-17 14:36:44 +01:00
Adriane Boyd	ed32fa80cd	Update source install instructions * Use `pip install` instead of `python setup.py install` * For developers recommend: * `python setup.py build_ext --inplace -j N` * `python setup.py develop`	2020-11-16 10:13:51 +01:00
svlandeg	99d0412b6e	add link to REL project	2020-11-15 18:35:56 +01:00
svlandeg	73fc1ed963	remove labels from morphologizer constructor	2020-11-11 21:48:50 +01:00
svlandeg	d5a920325f	remove labels from constructor	2020-11-11 21:34:12 +01:00
svlandeg	fcd79e0655	remove set_morphology from docs	2020-11-11 21:32:34 +01:00
Adriane Boyd	a7e7d6c6c9	Ignore misaligned in Morphologizer.get_loss (#6363 ) Fix bug where `Morphologizer.get_loss` treated misaligned annotation as `EMPTY_MORPH` rather than ignoring it. Remove unneeded default `EMPTY_MORPH` mappings.	2020-11-10 20:15:09 +08:00
Sofie Van Landeghem	a0c899a0ff	Fix textcat + transformer architecture (#6371 ) * add pooling to textcat TransformerListener * maybe_get_dim in case it's null	2020-11-10 20:14:47 +08:00
Ines Montani	3ca5c7082d	Use pip install . in quickstart [ci skip]	2020-11-10 17:27:49 +08:00
Ines Montani	de6453940e	Merge pull request #6305 from svlandeg/feature/score-docs [ci skip]	2020-11-10 02:52:11 +01:00
Ines Montani	d7950c5ada	Merge pull request #6297 from adrianeboyd/docs/nightly-conda-install [ci skip]	2020-11-10 02:45:52 +01:00
Ines Montani	448bfbdc30	Remove conda from nightly install widget [ci skip]	2020-11-10 09:44:52 +08:00
svlandeg	789fb3d124	add docs for upstream argument of TransformerListener	2020-11-09 21:42:58 +01:00
Ines Montani	363ac73c72	Update docs [ci skip]	2020-11-09 12:43:26 +08:00
Sofie Van Landeghem	8ef056cf98	fix embed_size in Entity Linker architecture (#6343 )	2020-11-04 22:20:13 +01:00
Ines Montani	019a1dd5e8	Fix v3 overview [ci skip]	2020-11-03 18:10:06 +01:00
Adriane Boyd	1c4df8fd09	Replace pytokenizations with internal alignment (#6293 ) * Replace pytokenizations with internal alignment Replace pytokenizations with internal alignment algorithm that is restricted to only allow differences in whitespace and capitalization. * Rename `spacy.training.align` to `spacy.training.alignment` to contain the `Alignment` dataclass * Implement `get_alignments` in `spacy.training.align` * Refactor trailing whitespace handling * Remove unnecessary exception for empty docs Allow a non-empty whitespace-only doc to be aligned with an empty doc * Remove empty docs exceptions completely	2020-11-03 16:24:38 +01:00
Adriane Boyd	a4b32b9552	Handle missing reference values in scorer (#6286 ) * Handle missing reference values in scorer Handle missing values in reference doc during scoring where it is possible to detect an unset state for the attribute. If no reference docs contain annotation, `None` is returned instead of a score. `spacy evaluate` displays `-` for missing scores and the missing scores are saved as `None`/`null` in the metrics. Attributes without unset states: * `token.head`: relies on `token.dep` to recognize unset values * `doc.cats`: unable to handle missing annotation Additional changes: * add optional `has_annotation` check to `score_scans` to replace `doc.sents` hack * update `score_token_attr_per_feat` to handle missing and empty morph representations * fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START` vs. `SENT_START` * Fix import * Update return types	2020-11-03 15:47:18 +01:00
Adriane Boyd	5d2cb86c34	Fix on_match callback for DependencyMatcher (#6313 ) Fix `DependencyMatcher` so that the callback is called only once per match.	2020-10-31 12:20:27 +01:00
Sofie Van Landeghem	2918923541	fix resolving of dot notation (#6326 )	2020-10-31 12:17:06 +01:00
Adriane Boyd	dc816bba9d	Fix node name typo in dependency matcher example (#6311 )	2020-10-28 16:32:46 +01:00

1 2 3 4 5 ...

13813 Commits