spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-17 15:41:59 +03:00

Author	SHA1	Message	Date
Sofie Van Landeghem	c786e98e56	assemble CLI command (#7783 ) * assemble CLI command * ensure assemble runs even without training section * cleanup	2021-04-19 18:39:11 +10:00
Adriane Boyd	15bd230413	Set catalogue lower pin to v2.0.3 (#7762 ) * Set catalogue lower pin to v2.0.2 * Update importlib-metadata pins to match * Require catalogue v2.0.3 Switch to vendored `importlib-metadata` v3.2.0 provided by `catalogue`.	2021-04-19 18:37:17 +10:00
Adriane Boyd	1ad646cbcf	Improve checks for sourced components (#7490 ) * Improve checks for sourced components * Remove language class checks * Convert python warning to logger warning * Remove unused warning * Fix formatting	2021-04-19 18:36:32 +10:00
Sofie Van Landeghem	05bdbe28bb	Fix vectors data on GPU (#7626 ) * ensure vectors data is stored on right device * ensure the added vector is on the right device * move vector to numpy before iterating * move best_rows to numpy before iterating	2021-04-19 18:30:03 +10:00
Bram Vanroy	ed561cf428	Terminology: deprecated vs obsolete (#7621 ) * Terminology: deprecated vs obsolete Typically, deprecated is used for functionality that is bound to become unavailable but that can still be used. Obsolete is used for features that have been removed. In E941, I think what is meant is "obsolete" since loading a model by a shortcut simply does not work anymore (and throws an error). This is different from downloading a model with a shortcut, which is deprecated but still works. In light of this, perhaps all other error codes should be checked as well. * clarify that the link command is removed and not just deprecated Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-04-12 14:37:00 +02:00
Sofie Van Landeghem	8d7af5b2b1	Ensure hyphen in config file works as string value (#7642 ) * add test for serializing '-' in a config file * bump srsly to 2.4.1	2021-04-12 14:35:57 +02:00
Sofie Van Landeghem	27dbbb9903	Bugfix/nel crossing sentence (#7630 ) * ensure each entity gets a KB ID, even when it's not within a sentence * cleanup	2021-04-12 18:08:01 +10:00
Adriane Boyd	73a8c0f992	Update debug data further for v3 (#7602 ) * Update debug data further for v3 * Remove new/existing label distinction (new labels are not immediately distinguishable because the pipeline is already initialized) * Warn on missing labels in training data for all components except parser * Separate textcat and textcat_multilabel sections * Add section for morphologizer * Reword missing label warnings	2021-04-09 11:53:42 +02:00
Stanislav Schmidt	2516896849	Make vocab update in get_docs deterministic (#7603 ) * Make vocab update in get_docs deterministic The attribute `DocBin.strings` is a set. In `DocBin.get_docs` a given vocab is updated by iterating over this set. Iteration over a python set produces an arbitrary ordering, therefore vocab is updated non-deterministically. When training (fine-tuning) a spacy model, the base model's vocabulary will be updated with the new vocabulary in the training data in exactly the way described above. After serialization, the file `model/vocab/strings.json` will be sorted in an arbitrary way. This prevents reproducible model training. * Revert "Make vocab update in get_docs deterministic" This reverts commit `d6b87a2f55`. * Sort strings in StringStore serialization Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-04-09 11:53:13 +02:00
Adriane Boyd	8008e2f75b	Use morph hash in lemmatizer cache key (#7690 ) Use the morph hash rather than the `MorphAnalysis` object in the cache key so that the `Lemmatizer` can be pickled.	2021-04-08 13:22:38 +02:00
Adriane Boyd	e6b7600adf	Fix parser sourcing in NER converter (#7631 )	2021-04-08 12:25:03 +02:00
Sofie Van Landeghem	204c2f116b	Extend score_spans for overlapping & non-labeled spans (#7209 ) * extend span scorer with consider_label and allow_overlap * unit test for spans y2x overlap * add score_spans unit test * docs for new fields in scorer.score_spans * rename to include_label * spell out if-else for clarity * rename to 'labeled' Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-04-08 12:19:17 +02:00
Paul O'Leary McCann	c362006cb9	Fix is_sent_start when converting from JSON (fix #7635 ) (#7655 ) Data in the JSON format is split into sentences, and each sentence is saved with is_sent_start flags. Currently the flags are 1 for the first token and 0 for the others. When deserialized this results in a pattern of True, None, None, None... which makes single-sentence documents look as though they haven't had sentence boundaries set. Since items saved in JSON format have been split into sentences already, the is_sent_start values should all be True or False.	2021-04-08 18:24:52 +10:00
Adriane Boyd	82d3caf861	Implement replace_listeners for source in config (#7620 ) Implement replace_listeners for sourced components loaded from a config.	2021-04-08 18:21:22 +10:00
broaddeep	ee159b8543	Support match alignments (#7321 ) * Support match alignments * change naming from match_alignments to with_alignments, add conditional flow if with_alignments is given, validate with_alignments, add related test case * remove added errors, utilize bint type, cleanup whitespace * fix no new line in end of file * Minor formatting * Skip alignments processing if as_spans is set * Add with_alignments to Matcher API docs * Update website/docs/api/matcher.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-04-08 18:10:14 +10:00
Adriane Boyd	ff84075839	Support large/infinite training corpora (#7208 ) * Support infinite generators for training corpora Support a training corpus with an infinite generator in the `spacy train` training loop: * Revert `create_train_batches` to the state where an infinite generator can be used as the in the first epoch of exactly one epoch without resulting in a memory leak (`max_epochs != 1` will still result in a memory leak) * Move the shuffling for the first epoch into the corpus reader, renaming it to `spacy.Corpus.v2`. * Switch to training option for shuffling in memory Training loop: * Add option `training.shuffle_train_corpus_in_memory` that controls whether the corpus is loaded in memory once and shuffled in the training loop * Revert changes to `create_train_batches` and rename to `create_train_batches_with_shuffling` for use with `spacy.Corpus.v1` and a corpus that should be loaded in memory * Add `create_train_batches_without_shuffling` for a corpus that should not be shuffled in the training loop: the corpus is merely batched during training Corpus readers: * Restore `spacy.Corpus.v1` * Add `spacy.ShuffledCorpus.v1` for a corpus shuffled in memory in the reader instead of the training loop * In combination with `shuffle_train_corpus_in_memory = False`, each epoch could result in a different augmentation * Refactor create_train_batches, validation * Rename config setting to `training.shuffle_train_corpus` * Refactor to use a single `create_train_batches` method with a `shuffle` option * Only validate `get_examples` in initialize step if: * labels are required * labels are not provided * Switch back to max_epochs=-1 for streaming train corpus * Use first 100 examples for stream train corpus init * Always check validate_get_examples in initialize	2021-04-08 18:08:04 +10:00
graue70	81fd595223	Fix __add__ method of PRFScore (#7557 ) * Add failing test for PRFScore * Fix erroneous implementation of __add__ * Simplify constructor Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-04-08 17:34:14 +10:00
Paul O'Leary McCann	7944761ba7	Add warning if initial vectors are empty (#7641 ) See #7637, where this came up.	2021-04-04 20:20:24 +02:00
Ayush Chaurasia	3c2ce41dd8	W&B integration: Optional support for dataset and model checkpoint logging and versioning (#7429 ) * Add optional artifacts logging * Update docs * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Bump WandbLogger Version * Add documentation of v1 to legacy docs * bump spacy-legacy to 3.0.2 (to be released) Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-04-01 19:36:23 +02:00
Adriane Boyd	348d1829c7	Preserve user data for DependencyMatcher on spans (#7528 ) * Preserve user data for DependencyMatcher on spans * Clean underscore in test * Modify test to use extensions stored in user data	2021-03-30 12:26:22 +02:00
m0canu1	921feee092	Added more exception to the italian language from https://forum.wordr … (#7246 ) * Added more exception to the italian language from https://forum.wordreference.com/threads/le-abbreviazioni-nella-lingua-italiana-abbreviations-in-italian.2464189/ * Remove unnecessary exception Co-authored-by: Alexandru Mocanu <alexandru.mocanu@augeos.it> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-03-30 10:23:32 +02:00
Adriane Boyd	27a48f2802	Fix/update extension copying in Span.as_doc and Doc.from_docs (#7574 ) * Adjust custom extension data when copying user data in `Span.as_doc()` * Restrict `Doc.from_docs()` to adjusting offsets for custom extension data * Update test to use extension * (Duplicate bug fix for character offset from #7497)	2021-03-30 09:49:12 +02:00
Adriane Boyd	3ae8661085	Fix tensor retokenization for non-numpy ops (#7527 ) Implement manual `append` and `delete` for non-numpy ops.	2021-03-29 22:34:48 +11:00
Adriane Boyd	139f655f34	Merge doc.spans in Doc.from_docs() (#7497 ) Merge data from `doc.spans` in `Doc.from_docs()`. * Fix internal character offset set when merging empty docs (only affects tokens and spans in `user_data` if an empty doc is in the list of docs)	2021-03-29 22:34:01 +11:00
Adriane Boyd	d59f968d08	Keep sent starts without parse in retokenization (#7424 ) In the retokenizer, only reset sent starts (with `set_children_from_head`) if the doc is parsed. If there is no parse, merged tokens have the unset `token.is_sent_start == None` by default after retokenization.	2021-03-29 22:32:00 +11:00
Paul O'Leary McCann	cdab341a75	Remove mention of -1 for early stopping (fix #7535 ) Maybe this used to work differently, but currently a negative patience just causes immediate termination.	2021-03-23 11:50:35 +09:00
Ines Montani	4bd3d01aaf	Merge pull request #7471 from polm/fix/listener-warnings	2021-03-22 12:45:02 +01:00
Ines Montani	d545ab4ca4	Merge pull request #7495 from adrianeboyd/bugfix/norm-ux Update lexeme_norm checks	2021-03-22 12:44:52 +01:00
Ines Montani	66ebd5c69e	Merge pull request #7491 from adrianeboyd/bugfix/corpus-depr-props Update deprecated doc.is_sentenced in Corpus	2021-03-21 02:17:24 +01:00
Adriane Boyd	39153ef90f	Update lexeme_norm checks * Add util method for check * Add new languages to list with lexeme norm tables * Add check to all relevant components * Add config details to warning message Note that we're not actually inspecting the model config to see if `NORM` is used as an attribute, so it may warn in cases where it's not relevant.	2021-03-19 10:59:27 +01:00
Adriane Boyd	c771ec22f0	Update matcher errors and docs * Mention `tagger+attribute_ruler` in `POS`/`MORPH` error messages for `Matcher` and `PhraseMatcher` * Document `Matcher.__call__(allow_missing=)`	2021-03-19 10:11:18 +01:00
Adriane Boyd	48b90c8e1c	Update deprecated doc.is_sentenced in Corpus	2021-03-19 09:43:52 +01:00
Ines Montani	4f9aaa2366	Merge pull request #7451 from adrianeboyd/chore/add-py.typed Add py.typed	2021-03-19 02:08:16 +01:00
Ines Montani	66b900a76d	Merge pull request #7440 from adrianeboyd/bugfix/ru-pymorph2-lookup-lemmatize Rename and update Russian pymorphy2 lookup lemmatize	2021-03-19 01:54:08 +01:00
Ines Montani	2c6fa8c890	Merge pull request #7489 from adrianeboyd/bugfix/callbacks-entry-points Check for callbacks entry points	2021-03-19 01:53:53 +01:00
Adriane Boyd	0ad9e16ec3	Check for callbacks entry points	2021-03-18 21:18:25 +01:00
Lukas Winkler	3c362ac520	replace "is not" with !=	2021-03-18 21:09:11 +01:00
Paul O'Leary McCann	40bc01e668	Proactively remove unused listeners With this the changes in initialize.py might be unecessary. Requires testing.	2021-03-17 22:41:41 +09:00
Paul O'Leary McCann	ef77c88638	Don't warn about components not in the pipeline See here: https://github.com/explosion/spaCy/discussions/7463 Still need to check if there are any side effects of listeners being present but not in the pipeline, but this commit will silence the warnings.	2021-03-17 14:56:04 +09:00
Adriane Boyd	02b5c8a1a2	Add py.typed	2021-03-16 09:48:31 +01:00
Adriane Boyd	3bcf74aca7	Rename and update ru pymorphy2 lookup lemmatize * To allow default lookup lemmatization with a blank Russian model, rename pymorphy2 lookup mode to `pymorphy2_lookup` * Bug fix: update pymorphy2 lookup lemmatize to return list rather than string	2021-03-15 11:11:06 +01:00
Ines Montani	068b97a617	Merge pull request #7408 from adrianeboyd/bugfix/load-keyword-only	2021-03-13 04:25:50 +01:00
Adriane Boyd	03e9e7b567	Add --code option to init fill-config	2021-03-12 10:03:57 +01:00
Adriane Boyd	ce6317231f	Add --code to spacy debug CLI	2021-03-12 09:51:26 +01:00
Adriane Boyd	508cb3bef7	Also exclude user hooks in displacy conversion (#7419 )	2021-03-12 09:41:59 +01:00
Adriane Boyd	deffc3a532	Update package requirements tests (#7409 ) * Add hypothesis to packages skipped in version check * Add numpy back to tests following `2df1ab8a`	2021-03-11 16:24:31 +01:00
Adriane Boyd	124304b146	Add vocab kwarg back to spacy.load * Additional minor formatting and docs cleanup	2021-03-11 10:58:59 +01:00
Adriane Boyd	fbf3a755d7	Make spacy.load kwargs keyword-only	2021-03-11 09:36:58 +01:00
Adriane Boyd	53a3b967ac	Update thinc pin and set version to v3.0.5 (#7389 )	2021-03-10 11:10:53 +01:00
Adriane Boyd	3b911ee5ef	Set version to v3.0.4 (#7376 )	2021-03-09 16:49:41 +01:00
Adriane Boyd	d746ea6278	Add warning about GPU selection in Jupyter notebooks (#7075 ) * Initial warning * Update check * Redo edit * Move jupyter warning to helper method * Add link with details to warnings	2021-03-09 15:35:21 +01:00
Ines Montani	37fc495f5d	Merge pull request #7353 from jankrepl/fix_entity_rules_labels	2021-03-09 15:09:24 +01:00
Sofie Van Landeghem	932887b950	textcat scoring fix and multi_label docs (#6974 ) * add multi-label textcat to menu * add infobox on textcat API * add info to v3 migration guide * small edits * further fixes in doc strings * add infobox to textcat architectures * add textcat_multilabel to overview of built-in components * spelling * fix unrelated warn msg * Add textcat_multilabel to quickstart [ci skip] * remove separate documentation page for multilabel_textcategorizer * small edits * positive label clarification * avoid duplicating information in self.cfg and fix textcat.score * fix multilabel textcat too * revert threshold to storage in cfg * revert threshold stuff for multi-textcat Co-authored-by: Ines Montani <ines@ines.io>	2021-03-09 23:04:22 +11:00
Sofie Van Landeghem	39de3602e0	return custom error in nlp.initialize (#7104 ) * return custom error in nlp.initialize * Rename error Co-authored-by: Ines Montani <ines@ines.io>	2021-03-09 23:01:31 +11:00
Jan Krepl	f26b61e001	Make sure sorted	2021-03-09 10:49:53 +01:00
Adriane Boyd	3f3e8110dc	Fix lowercase augmentation (#7336 ) * Fix aborted/skipped augmentation for `spacy.orth_variants.v1` if lowercasing was enabled for an example * Simplify `spacy.orth_variants.v1` for `Example` vs. `GoldParse` * Preserve reference tokenization in `spacy.lower_case.v1`	2021-03-09 14:02:32 +11:00
Sofie Van Landeghem	cd70c3cb79	Fixing pretrain (#7342 ) * initialize NLP with train corpus * add more pretraining tests * more tests * function to fetch tok2vec layer for pretraining * clarify parameter name * test different objectives * formatting * fix check for static vectors when using vectors objective * clarify docs * logger statement * fix init_tok2vec and proc.initialize order * test training after pretraining * add init_config tests for pretraining * pop pretraining block to avoid config validation errors * custom errors	2021-03-09 14:01:13 +11:00
Adriane Boyd	97bcf2ae3a	Fix patience for identical scores (#7250 ) * Fix patience for identical scores Fix training patience so that the earliest best step is chosen for identical max scores. * Restore break, remove print * Explicitly define best_step for clarity	2021-03-06 18:42:14 +11:00
Ines Montani	ea555b03e0	Merge pull request #7255 from adrianeboyd/bugfix/extraneous-tok2vec Omit unused tok2vec/transformer components	2021-03-03 23:15:06 +11:00
svlandeg	d900c55061	consistently use registry as callable	2021-03-02 17:56:28 +01:00
Adriane Boyd	8a4200d4e9	Omit unused tok2vec/transformer components Omit unused tok2vec/transformer components in quickstart template.	2021-03-02 15:53:30 +01:00
Sofie Van Landeghem	212f0e779e	Support doc.spans in Example.from_dict (#7197 ) * add support for spans in Example.from_dict * add unit tests * update error to E879	2021-03-03 01:12:54 +11:00
Adriane Boyd	fb98862337	Add hint for --gpu-id to CLI device info (#7234 ) * Add hint for --gpu-id to CLI device info If the user has `cupy` and an available GPU, add a hint about using `--gpu-id 0` to the CLI output. * Undo change to original CPU message	2021-03-03 01:11:18 +11:00
Ines Montani	635ae55b74	Merge pull request #7237 from adrianeboyd/bugfix/is-cython-func-7224	2021-03-03 00:05:16 +11:00
Adriane Boyd	0efb7413f9	Use make_tempdir instead	2021-03-01 17:54:14 +01:00
Adriane Boyd	e9f7f9a4bc	Fix is_cython_func for additional imported code * Fix `is_cython_func` for imported code loaded under `python_code` module name * Add `make_named_tempfile` context manager to test utils to test loading of imported code * Add test for validation of `initialize` params in custom module	2021-03-01 16:37:39 +01:00
Sofie Van Landeghem	dd99872bb0	Fix spans weak ref in doc copy (#7225 ) * failing unit test * ensure that doc.spans refers to the copied doc, not the old * add type info	2021-02-28 12:32:48 +11:00
Ines Montani	408b94887a	Merge pull request #7207 from adrianeboyd/docs/get-noun-chunks [ci skip] Extend docs related to Vocab.get_noun_chunks	2021-02-27 11:51:08 +11:00
Ines Montani	dc46fa078f	Merge pull request #7220 from svlandeg/docs/has_annotation [ci skip] has_annotation docs fix	2021-02-27 11:50:34 +11:00
Ines Montani	0dbc2a1b16	Merge pull request #7222 from adrianeboyd/bugfix/quickstart-recs-bg-bn Fix formatting in bg/bn quickstart recs	2021-02-27 11:50:02 +11:00
svlandeg	2010219a7f	import wandb failure - UX	2021-02-26 18:00:39 +01:00
Adriane Boyd	ee7bb0b393	Fix formatting in bg/bn quickstart recs	2021-02-26 17:08:37 +01:00
svlandeg	248339039e	fix type in docs	2021-02-26 14:27:10 +01:00
Adriane Boyd	e43d43db32	Allow sourcing disabled components (#7215 ) Check `component_names` instead of `pipe_names` to allow sourcing disabled components.	2021-02-26 13:50:56 +01:00
Adriane Boyd	10c930cc96	Re-refactor Sentencizer with Pipe API (#7176 ) Reapply the refactoring (#4721) so that `Sentencizer` uses the faster `predict` and `set_annotations` for both `__call__` and `pipe`.	2021-02-26 09:48:14 +01:00
Adriane Boyd	6a37f343d5	Extend docs related to Vocab.get_noun_chunks	2021-02-25 16:38:21 +01:00
Ines Montani	592678fb7d	Merge pull request #7073 from adrianeboyd/feature/logger-level-in-formatter Add time and level to default logging formatter	2021-02-24 22:40:46 +11:00
Sofie Van Landeghem	0563cd73d6	Fix SpanGroup import (#7182 ) * import SpanGroup from tokens module * revert edits from different PR * add to __all__	2021-02-24 21:06:16 +11:00
Sofie Van Landeghem	b92f81d5da	fix NEL config and IO, and n_sents functionality (#7100 ) * fix NEL config and IO, and n_sents functionality * add docs * fix test	2021-02-22 14:49:52 +11:00
Sofie Van Landeghem	113e8d082b	only evaluate named entities for NEL if there is a corresponding gold span (#7074 )	2021-02-22 11:06:50 +11:00
Adriane Boyd	264862c67a	Fix Ukrainian lemmatizer init (#7127 ) Fix class variable and init for `UkrainianLemmatizer` so that it loads the `uk` dictionaries rather than having the parent `RussianLemmatizer` override with the `ru` settings.	2021-02-22 11:05:08 +11:00
Sofie Van Landeghem	ba5a50f62b	NEL docs & UX (#7129 ) * EL set_kb docs fix * custom warning for set_kb mistake	2021-02-22 11:04:22 +11:00
Boian Tzonev	cca8651fc8	Bulgarian tokenizer exceptions (#7114 ) * [Bulgarian] Add tokenizer exceptions and like_num for Bulgarian * [Bulgarian] Add tokenizer exceptions and like_num for Bulgarian	2021-02-19 19:19:19 +01:00
Sofie Van Landeghem	709c9e75af	span.ent only returns first sentence (#7084 ) * return first sentence when span contains sentence boundary * docs fix * small fixes * cleanup	2021-02-19 23:02:38 +11:00
Adriane Boyd	30e1a89aeb	Fix displacy output in evaluate CLI (#7122 ) Now that `nlp.evaluate()` does not modify the examples, rerun the pipeline on the (limited) texts in order to provide the predicted annotation in the displacy output option.	2021-02-19 23:01:20 +11:00
Adriane Boyd	4188beda87	Fix conll converter option (#7071 ) Map `conll` to the NER converter, not the `CoNLL-U` converter.	2021-02-18 10:22:41 +01:00
Adriane Boyd	a3293efc48	Add time and level to default logging formatter	2021-02-15 14:19:20 +01:00
Ines Montani	1e3a326e53	Change Dutch transformer recommendation [ci skip] https://github.com/explosion/spaCy/discussions/6529#discussioncomment-366620	2021-02-14 15:30:16 +11:00
Ines Montani	f4f46b617f	Preserve sourced components in fill-config (fixes #7055 ) (#7058 )	2021-02-14 14:02:14 +11:00
Matthew Honnibal	0fb8d437c0	Fix sentence fragments bug (#7056 , #7035 ) (#7057 ) * Add test for #7035 * Update test for issue 7056 * Fix test * Fix transitions method used in testing * Fix state eol detection when rebuffer * Clean up redundant fix	2021-02-14 13:38:13 +11:00
Ines Montani	660642902a	Increment version [ci skip]	2021-02-14 13:36:13 +11:00
Matthew Honnibal	b31471b5b8	Set version to v3.0.2	2021-02-13 23:50:00 +11:00
Ines Montani	9ba715ed16	Tidy up and auto-format	2021-02-13 12:55:56 +11:00
Ines Montani	34ee0fbd70	Merge pull request #7011 from Shumie82/master	2021-02-13 12:30:42 +11:00
Ines Montani	e583050547	Merge pull request #7039 from svlandeg/debug	2021-02-13 11:53:41 +11:00
Ines Montani	6c450decfc	Fix punctuation settings and add to initialize tests	2021-02-13 11:51:21 +11:00
Ines Montani	f4712a634e	Merge pull request #7046 from adrianeboyd/bugfix/vocab-pickle-noun-chunks-6891 Include noun chunks method when pickling Vocab	2021-02-13 11:43:03 +11:00
Adriane Boyd	0ee2ae86bf	Update trf quickstart recommendations Add/update trf recommendations for Bengali, Hindi, Sinhala, and Tamil based on #7044.	2021-02-12 15:55:17 +01:00
svlandeg	03b4ec7d7f	fix typo	2021-02-12 14:30:16 +01:00
Adriane Boyd	5e47a54d29	Include noun chunks method when pickling Vocab	2021-02-12 13:27:46 +01:00

1 2 3 4 5 ...

8658 Commits