spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-16 14:47:16 +03:00

Author	SHA1	Message	Date
Ines Montani	de11ea753a	Merge branch 'master' into develop	2020-02-18 14:47:23 +01:00
Kabir Khan	f6ed07b85c	Use nlp.pipe in EntityRuler for phrase patterns in add_patterns (#4931 ) * Fix ent_ids and labels properties when id attribute used in patterns * use set for labels * sort end_ids for comparison in entity_ruler tests * fixing entity_ruler ent_ids test * add to set * Run make_doc optimistically if using phrase matcher patterns. * remove unused coveragerc I was testing with * format * Refactor EntityRuler.add_patterns to use nlp.pipe for phrase patterns. Improves speed substantially. * Removing old add_patterns function * Fixing spacing * Make sure token_patterns loaded as well, before generator was being emptied in from_disk	2020-02-16 18:17:47 +01:00
Julin S	479e81bafc	fix link (#4977 )	2020-02-10 20:31:26 -05:00
Ines Montani	9c08d9baa3	Remove old sections [ci skip] (closes #4961 )	2020-02-03 13:10:46 +01:00
Ines Montani	abd5c06374	Adjust formatting [ci skip]	2020-02-03 13:00:02 +01:00
Martin A. Kayser	02a44c5be2	Adding a note on retrieving the string rep of the match_id (#4904 ) Stolen from here: https://stackoverflow.com/questions/47638877/using-phrasematcher-in-spacy-to-find-multiple-match-types	2020-02-03 12:58:58 +01:00
adrianeboyd	7ad000fce7	Update docs for train CLI --use_gpu option (#4927 )	2020-01-20 17:02:47 +01:00
Preston Badeer	b216ff43c9	Update vectors-similarity.md (#4889 ) These links are broken on the website, due to quotes around the URLs.	2020-01-08 16:49:40 +01:00
Geoffrey Gordon Ashbrook	53929138d7	remove extra word typo (#4875 ) "let you find you"	2020-01-06 12:37:42 +01:00
Ines Montani	400257a802	Update index.md [ci skip]	2020-01-04 01:52:18 +01:00
Ivan Echevarria	ef13e0c038	Add n_process to Language.pipe documentation (#4842 ) [ci skip] * Add n_process to documentation * Auto-format and add default [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2019-12-29 14:23:33 +01:00
Ines Montani	db55577c45	Drop Python 2.7 and 3.5 (#4828 ) * Remove unicode declarations * Remove Python 3.5 and 2.7 from CI * Don't require pathlib * Replace compat helpers * Remove OrderedDict * Use f-strings * Set Cython compiler language level * Fix typo * Re-add OrderedDict for Table * Update setup.cfg * Revert CONTRIBUTING.md * Revert lookups.md * Revert top-level.md * Small adjustments and docs [ci skip]	2019-12-22 01:53:56 +01:00
Ines Montani	158b98a3ef	Merge branch 'master' into develop	2019-12-21 18:55:03 +01:00
Ines Montani	1b838d1313	Divide models into core and starters [ci skip]	2019-12-21 14:10:22 +01:00
Sofie Van Landeghem	8ebbb85117	Documentation for PhraseMatcher constructor (#4826 ) * add max_length as argument for init PhraseMatcher * improve error message too	2019-12-20 23:00:04 +01:00
Thiago Lages de Alencar	a067ded495	Update doc.md (#4796 )	2019-12-11 18:21:40 +01:00
Tclack88	ab8dc2732c	Update token.md (#4767 ) * Update token.md documentation is confusing: A '?' is a right punct, but '¿' is a left punct * Update token.md add quotations around parentheses in `is_left_punct` and `is_right_punct` for clarrification, ensuring the question mark that follows is not percieved as an example of left and right punctuation * Move quotes into code block [ci skip]	2019-12-06 19:22:02 +01:00
Ines Montani	bf611ebca7	Document jsonl option on converter [ci skip]	2019-12-06 19:17:45 +01:00
Nicolai Bjerre Pedersen	de5453cdcb	Fix link to user hooks in docs (#4778 ) * Fix link to user hooks in docs * Update mr_bjerre.md Mistake in contributor agreement * Apparently hard to get it right (wrong name of sca)	2019-12-06 19:17:12 +01:00
Ines Montani	cbacb0f1a4	Update shape docs and examples (resolves #4615 ) [ci skip]	2019-11-23 17:16:55 +01:00
Ines Montani	a6200bc424	Update scorer.md [ci skip]	2019-11-21 17:02:43 +01:00
Ines Montani	235fe6fe3b	Auto-format [ci skip]	2019-11-20 13:14:58 +01:00
adrianeboyd	2c876eb672	Add tokenizer explain() debugging method (#4596 ) * Expose tokenizer rules as a property Expose the tokenizer rules property in the same way as the other core properties. (The cache resetting is overkill, but consistent with `from_bytes` for now.) Add tests and update Tokenizer API docs. * Update Hungarian punctuation to remove empty string Update Hungarian punctuation definitions so that `_units` does not match an empty string. * Use _load_special_tokenization consistently Use `_load_special_tokenization()` and have it to handle `None` checks. * Fix precedence of `token_match` vs. special cases Remove `token_match` check from `_split_affixes()` so that special cases have precedence over `token_match`. `token_match` is checked only before infixes are split. * Add `make_debug_doc()` to the Tokenizer Add `make_debug_doc()` to the Tokenizer as a working implementation of the pseudo-code in the docs. Add a test (marked as slow) that checks that `nlp.tokenizer()` and `nlp.tokenizer.make_debug_doc()` return the same non-whitespace tokens for all languages that have `examples.sentences` that can be imported. * Update tokenization usage docs Update pseudo-code and algorithm description to correspond to `nlp.tokenizer.make_debug_doc()` with example debugging usage. Add more examples for customizing tokenizers while preserving the existing defaults. Minor edits / clarifications. * Revert "Update Hungarian punctuation to remove empty string" This reverts commit `f0a577f7a5`. * Rework `make_debug_doc()` as `explain()` Rework `make_debug_doc()` as `explain()`, which returns a list of `(pattern_string, token_string)` tuples rather than a non-standard `Doc`. Update docs and tests accordingly, leaving the visualization for future work. * Handle cases with bad tokenizer patterns Detect when tokenizer patterns match empty prefixes and suffixes so that `explain()` does not hang on bad patterns. * Remove unused displacy image * Add tokenizer.explain() to usage docs	2019-11-20 13:07:25 +01:00
Ines Montani	e8b9cee6fd	Make example consistent with model (closes #4587 ) [ci skip]	2019-11-18 12:41:48 +01:00
Ines Montani	e01a1a237f	Auto-format [ci skip]	2019-11-18 12:41:31 +01:00
adrianeboyd	62e00fd9da	Update tokenization usage docs (#4666 ) Update pseudo-code and algorithm description to correspond to current tokenizer behavior. Add more examples for customizing tokenizers while preserving the existing defaults. Minor edits / clarifications.	2019-11-18 12:35:13 +01:00
Ines Montani	5adcb352e9	Adjust order of docs sections [ci skip]	2019-11-17 16:08:56 +01:00
Ines Montani	e30d08410a	Add CI for Python 3.8 (#4479 ) * Add 3.8 classifier * Update azure-pipelines.yml * Remove 3.8 warning from docs [ci skip]	2019-11-15 01:13:48 +01:00
adrianeboyd	faaa832518	Generalize handling of tokenizer special cases (#4259 ) * Generalize handling of tokenizer special cases Handle tokenizer special cases more generally by using the Matcher internally to match special cases after the affix/token_match tokenization is complete. Instead of only matching special cases while processing balanced or nearly balanced prefixes and suffixes, this recognizes special cases in a wider range of contexts: * Allows arbitrary numbers of prefixes/affixes around special cases * Allows special cases separated by infixes Existing tests/settings that couldn't be preserved as before: * The emoticon '")' is no longer a supported special case * The emoticon ':)' in "example:)" is a false positive again When merged with #4258 (or the relevant cache bugfix), the affix and token_match properties should be modified to flush and reload all special cases to use the updated internal tokenization with the Matcher. * Remove accidentally added test case * Really remove accidentally added test * Reload special cases when necessary Reload special cases when affixes or token_match are modified. Skip reloading during initialization. * Update error code number * Fix offset and whitespace in Matcher special cases * Fix offset bugs when merging and splitting tokens * Set final whitespace on final token in inserted special case * Improve cache flushing in tokenizer * Separate cache and specials memory (temporarily) * Flush cache when adding special cases * Repeated `self._cache = PreshMap()` and `self._specials = PreshMap()` are necessary due to this bug: https://github.com/explosion/preshed/issues/21 * Remove reinitialized PreshMaps on cache flush * Update UD bin scripts * Update imports for `bin/` * Add all currently supported languages * Update subtok merger for new Matcher validation * Modify blinded check to look at tokens instead of lemmas (for corpora with tokens but not lemmas like Telugu) * Use special Matcher only for cases with affixes * Reinsert specials cache checks during normal tokenization for special cases as much as possible * Additionally include specials cache checks while splitting on infixes * Since the special Matcher needs consistent affix-only tokenization for the special cases themselves, introduce the argument `with_special_cases` in order to do tokenization with or without specials cache checks * After normal tokenization, postprocess with special cases Matcher for special cases containing affixes * Replace PhraseMatcher with Aho-Corasick Replace PhraseMatcher with the Aho-Corasick algorithm over numpy arrays of the hash values for the relevant attribute. The implementation is based on FlashText. The speed should be similar to the previous PhraseMatcher. It is now possible to easily remove match IDs and matches don't go missing with large keyword lists / vocabularies. Fixes #4308. * Restore support for pickling * Fix internal keyword add/remove for numpy arrays * Add test for #4248, clean up test * Improve efficiency of special cases handling * Use PhraseMatcher instead of Matcher * Improve efficiency of merging/splitting special cases in document * Process merge/splits in one pass without repeated token shifting * Merge in place if no splits * Update error message number * Remove UD script modifications Only used for timing/testing, should be a separate PR * Remove final traces of UD script modifications * Update UD bin scripts * Update imports for `bin/` * Add all currently supported languages * Update subtok merger for new Matcher validation * Modify blinded check to look at tokens instead of lemmas (for corpora with tokens but not lemmas like Telugu) * Add missing loop for match ID set in search loop * Remove cruft in matching loop for partial matches There was a bit of unnecessary code left over from FlashText in the matching loop to handle partial token matches, which we don't have with PhraseMatcher. * Replace dict trie with MapStruct trie * Fix how match ID hash is stored/added * Update fix for match ID vocab * Switch from map_get_unless_missing to map_get * Switch from numpy array to Token.get_struct_attr Access token attributes directly in Doc instead of making a copy of the relevant values in a numpy array. Add unsatisfactory warning for hash collision with reserved terminal hash key. (Ideally it would change the reserved terminal hash and redo the whole trie, but for now, I'm hoping there won't be collisions.) * Restructure imports to export find_matches * Implement full remove() Remove unnecessary trie paths and free unused maps. Parallel to Matcher, raise KeyError when attempting to remove a match ID that has not been added. * Switch to PhraseMatcher.find_matches * Switch to local cdef functions for span filtering * Switch special case reload threshold to variable Refer to variable instead of hard-coded threshold * Move more of special case retokenize to cdef nogil Move as much of the special case retokenization to nogil as possible. * Rewrap sort as stdsort for OS X * Rewrap stdsort with specific types * Switch to qsort * Fix merge * Improve cmp functions * Fix realloc * Fix realloc again * Initialize span struct while retokenizing * Temporarily skip retokenizing * Revert "Move more of special case retokenize to cdef nogil" This reverts commit `0b7e52c797`. * Revert "Switch to qsort" This reverts commit `a98d71a942`. * Fix specials check while caching * Modify URL test with emoticons The multiple suffix tests result in the emoticon `:>`, which is now retokenized into one token as a special case after the suffixes are split off. * Refactor _apply_special_cases() * Use cdef ints for span info used in multiple spots * Modify _filter_special_spans() to prefer earlier Parallel to #4414, modify _filter_special_spans() so that the earlier span is preferred for overlapping spans of the same length. * Replace MatchStruct with Entity Replace MatchStruct with Entity since the existing Entity struct is nearly identical. * Replace Entity with more general SpanC * Replace MatchStruct with SpanC * Add error in debug-data if no dev docs are available (see #4575) * Update azure-pipelines.yml * Revert "Update azure-pipelines.yml" This reverts commit `ed1060cf59`. * Use latest wasabi * Reorganise install_requires * add dframcy to universe.json (#4580) * Update universe.json [ci skip] * Fix multiprocessing for as_tuples=True (#4582) * Fix conllu script (#4579) * force extensions to avoid clash between example scripts * fix arg order and default file encoding * add example config for conllu script * newline * move extension definitions to main function * few more encodings fixes * Add load_from_docbin example [ci skip] TODO: upload the file somewhere * Update README.md * Add warnings about 3.8 (resolves #4593) [ci skip] * Fixed typo: Added space between "recognize" and "various" (#4600) * Fix DocBin.merge() example (#4599) * Replace function registries with catalogue (#4584) * Replace functions registries with catalogue * Update __init__.py * Fix test * Revert unrelated flag [ci skip] * Bugfix/dep matcher issue 4590 (#4601) * add contributor agreement for prilopes * add test for issue #4590 * fix on_match params for DependencyMacther (#4590) * Minor updates to language example sentences (#4608) * Add punctuation to Spanish example sentences * Combine multilanguage examples for lang xx * Add punctuation to nb examples * Always realloc to a larger size Avoid potential (unlikely) edge case and cymem error seen in #4604. * Add error in debug-data if no dev docs are available (see #4575) * Update debug-data for GoldCorpus / Example * Ignore None label in misaligned NER data	2019-11-13 21:24:35 +01:00
f11r	877971860e	Fix assert in sentencizer documentation. (#4639 )	2019-11-13 15:24:14 +01:00
Ines Montani	9d5ff177c4	Work around Markdown rendering issue surfaced in #4600 [ci skip]	2019-11-11 17:12:08 +01:00
adrianeboyd	0f8678c0b1	Fix DocBin.merge() example (#4599 )	2019-11-07 11:26:48 +01:00
walterhenry	5563c42ef5	Fixed typo: Added space between "recognize" and "various" (#4600 )	2019-11-06 23:06:36 +01:00
Ines Montani	828ef27a32	Add warnings about 3.8 (resolves #4593 ) [ci skip]	2019-11-05 18:30:11 +01:00
Ines Montani	59358d9b71	Remove box-decoration-break from entities in displacy (#4564 )	2019-10-31 15:09:43 +01:00
Ines Montani	4e1de85e43	Update syntax iterators [ci skip]	2019-10-30 14:31:40 +01:00
Matthew Honnibal	d5509e0989	Support Mish activation (requires Thinc 7.3) (#4536 ) * Add arch for MishWindowEncoder * Support mish in tok2vec and conv window >=2 * Pass new tok2vec settings from parser * Syntax error * Fix tok2vec setting * Fix registration of MishWindowEncoder * Fix receptive field setting * Fix mish arch * Pass more options from parser * Support more tok2vec options in pretrain * Require thinc 7.3 * Add docs [ci skip] * Require thinc 7.3.0.dev0 to run CI * Run black * Fix typo * Update Thinc version Co-authored-by: Ines Montani <ines@ines.io>	2019-10-28 15:16:33 +01:00
Ines Montani	cfffdba7b1	Implement new API for {Phrase}Matcher.add (backwards-compatible) (#4522 ) * Implement new API for {Phrase}Matcher.add (backwards-compatible) * Update docs * Also update DependencyMatcher.add * Update internals * Rewrite tests to use new API * Add basic check for common mistake Raise error with suggestion if user likely passed in a pattern instead of a list of patterns * Fix typo [ci skip]	2019-10-25 22:21:08 +02:00
Ines Montani	d2da117114	Also support passing list to Language.disable_pipes (#4521 ) * Also support passing list to Language.disable_pipes * Adjust internals	2019-10-25 16:19:08 +02:00
Ines Montani	493be8e9db	Update new version identifier [ci skip]	2019-10-25 11:42:49 +02:00
Ines Montani	2abf1028cb	Update docs [ci skip]	2019-10-25 11:27:00 +02:00
Ines Montani	f31876154d	Adjust formatting [ci skip]	2019-10-25 11:19:46 +02:00
Kabir Khan	93640373c7	Make entity_ruler ent_id resolution 2x faster and add docs for… (#4513 ) * Update entityruler.py * Making ent_id resolution 2x faster and adding docs * Fixing newlines in docstrings * Fixing newlines in docstrings	2019-10-25 11:16:42 +02:00
adrianeboyd	1b0bbe4b76	Update tag maps and docs for English and German (#4501 ) * Update English tag_map Update English tag_map based on this conversion table: https://universaldependencies.org/tagset-conversion/en-penn-uposf.html * Update German tag_map Update German tag_map based on this conversion table: https://universaldependencies.org/tagset-conversion/de-stts-uposf.html * Add missing Tiger dependencies to glossary * Add quotes to definition of TO * Update POS/TAG tables in docs Update POS/TAG tables for English and German docs using current information generated from the tag_maps and GLOSSARY. * Update warning that -PRON- is specific to English * Revert docs to default JSON output with convert * Revert "Revert docs to default JSON output with convert" This reverts commit `6b78c048f1`.	2019-10-24 12:56:05 +02:00
adrianeboyd	8516e9d53b	Support train dict format as JSONL (#4471 ) * Support train dict format as JSONL * Add (overly simple) check for dict vs. tuple to read JSONL lines as either train dicts or train tuples * Extend JSON/JSONL roundtrip conversion tests using `docs_to_json()` and `GoldCorpus.train_tuples` * Revert docs to default JSON output with convert	2019-10-23 16:01:44 +02:00
adrianeboyd	7fc39f124c	Fix logic in rules+model entity example [ci skip] (#4510 )	2019-10-23 14:41:21 +02:00
Ines Montani	4659435573	Fix argument type in PhraseMatcher.add docs (closes #4496 ) [ci skip]	2019-10-22 14:37:30 +02:00
Ines Montani	b2f88e2060	Fix formatting [ci skip]	2019-10-21 12:26:07 +02:00
adrianeboyd	3195a8f170	Add Entity Linking to menu (#4489 )	2019-10-21 12:17:30 +02:00
Pepe Berba	7772d5d3c5	Update `vocab.get_vector` docs to include features on Fasttext ngram (#4464 ) * Update `vocab.get_vector` * Added contrib agreement	2019-10-20 01:28:18 +02:00
Ghola	258eb9e064	Misspelling on Lemmatizer Example #4406 (#4449 ) Removing extra o in the lookups = Loookups()	2019-10-16 23:23:15 +02:00
Anastassia	4a77d03ff7	Fix documentation for the docs_to_json function (#4456 )	2019-10-16 23:17:58 +02:00
Ines Montani	573e543e4a	Alphanumeric -> alphabetic [ci skip] see ines/spacy-course#38	2019-10-06 13:30:01 +02:00
Ines Montani	e65dffd80b	Clarify serialization of extension attributes (closes #4377 ) [ci skip]	2019-10-05 11:58:00 +02:00
Sofie Van Landeghem	4e7259c6cf	Bugfix initializing DocBin with attributes (#4368 ) * docbin init fix + documentation fix + unit tests * newline * try with zlib instead of gzip (python 2 incompatibilities)	2019-10-03 14:48:45 +02:00
Ines Montani	ce1d441de5	Add docs for Vectors.most_similar [ci skip]	2019-10-03 14:29:47 +02:00
Ines Montani	80cf385f65	Update v2-2.md [ci skip]	2019-10-02 16:58:21 +02:00
Ines Montani	b6670bf0c2	Use consistent spelling	2019-10-02 10:37:39 +02:00
Ines Montani	475e3188ce	Add docs on filtering overlapping spans for merging (resolves #4352 ) [ci skip]	2019-10-01 21:59:50 +02:00
Ines Montani	0dd127bb00	Update v2-2.md [ci skip]	2019-10-01 21:37:06 +02:00
Ines Montani	cf65a80f36	Refactor lemmatizer and data table integration (#4353 ) * Move test * Allow default in Lookups.get_table * Start with blank tables in Lookups.from_bytes * Refactor lemmatizer to hold instance of Lookups * Get lookups table within the lemmatization methods to make sure it references the correct table (even if the table was replaced or modified, e.g. when loading a model from disk) * Deprecate other arguments on Lemmatizer.__init__ and expect Lookups for consistency * Remove old and unsupported Lemmatizer.load classmethod * Refactor language-specific lemmatizers to inherit as much as possible from base class and override only what they need * Update tests and docs * Fix more tests * Fix lemmatizer * Upgrade pytest to try and fix weird CI errors * Try pytest 4.6.5	2019-10-01 21:36:03 +02:00
Ines Montani	bc7e7db208	Fix wording [ci skip]	2019-10-01 14:20:44 +02:00
Ines Montani	2a3a4565cd	Update infobox [ci skip]	2019-10-01 14:19:34 +02:00
Ines Montani	66aa0d479f	Update v2.2 page [ci skip]	2019-10-01 14:11:05 +02:00
Ines Montani	a8a1800f2a	Update lemma data documentation [ci skip]	2019-10-01 13:22:13 +02:00
Ines Montani	932ad9cb91	Fix typos and formatting [ci skip]	2019-10-01 12:30:04 +02:00
Ines Montani	3d8fd4b461	Revert #4334	2019-09-29 17:32:12 +02:00
Ines Montani	3bd4da068e	Fix link [ci skip]	2019-09-29 17:30:38 +02:00
Ines Montani	089f44cc56	Update serialization docs [ci skip]	2019-09-29 17:11:13 +02:00
Ines Montani	c9cd516d96	Move tests out of package (#4334 ) * Move tests out of package * Fix typo	2019-09-28 18:05:00 +02:00
Ines Montani	10742d3219	Update v2 docs [ci skip]	2019-09-28 15:57:22 +02:00
Ines Montani	f8d1e2f214	Update CLI docs [ci skip]	2019-09-28 13:12:30 +02:00
Ines Montani	59beab8405	Update v2-2.md [ci skip]	2019-09-27 18:10:43 +02:00
Ines Montani	685e4b2554	Update v2-2.md [ci skip]	2019-09-27 16:35:01 +02:00
Ines Montani	aad66d9bb9	Document PhraseMatcher.remove [ci skip]	2019-09-27 16:34:53 +02:00
Ines Montani	eb0649e38e	Fix tag [ci skip]	2019-09-26 16:22:33 +02:00
Ines Montani	da9a869d3f	Update vectors name docs [ci skip]	2019-09-26 16:21:32 +02:00
Em Zhan	aafa091541	Fix typo in documentation (#4322 ) * Fix typo 'probj' instead of 'pobj' * Add spaCy contributor agreement for zqianem	2019-09-25 19:42:18 +02:00
Matthew Honnibal	92ed4dc5e0	Allow vectors name to be set in init-model (#4321 ) * Allow vectors name to be specified in init-model * Document --vectors-name argument to init-model * Update website/docs/api/cli.md Co-Authored-By: Ines Montani <ines@ines.io>	2019-09-25 13:11:00 +02:00
Ines Montani	197406de1d	Update v2-2.md [ci skip]	2019-09-19 14:33:58 +02:00
Ines Montani	ddc09b08ed	Update v2-2.md [ci skip]	2019-09-19 00:58:30 +02:00
Matthew Honnibal	e2047576c4	Fix merge conflict	2019-09-18 21:42:11 +02:00
Matthew Honnibal	46c02d25b1	Merge changes to test_ner	2019-09-18 21:41:24 +02:00
Ines Montani	9c940eab94	Update version in examples [ci skip]	2019-09-18 21:23:26 +02:00
Ines Montani	f873548f6c	Add backwards incompatibility [ci skip]	2019-09-18 21:21:48 +02:00
Ines Montani	6ebdc5f7d2	Update download docs [ci skip]	2019-09-18 21:21:39 +02:00
Ines Montani	dd1810f05a	Update DocBin and add docs	2019-09-18 20:23:21 +02:00
Ines Montani	d62690b3ba	Update examples	2019-09-18 19:57:36 +02:00
Ines Montani	bd435faddd	Add note about usage docs [ci skip]	2019-09-18 19:56:43 +02:00
Matthew Honnibal	931e96b6c7	DocPallet->DocBin in docs	2019-09-18 15:17:26 +02:00
Matthew Honnibal	f537cbeacc	Update v2-2 docs	2019-09-18 14:07:55 +02:00
Ines Montani	ee15fdfe88	Fix wording [ci skip]	2019-09-17 14:59:42 +02:00
Ines Montani	f566e69f38	Fix --vectors-loc docs (closes #4270 )	2019-09-17 14:59:12 +02:00
Ines Montani	25c2b4b9a5	Improve init-model docs (see #4137 )	2019-09-17 14:51:44 +02:00
Ines Montani	198b7e9789	Auto-format [ci skip]	2019-09-17 14:48:35 +02:00
adrianeboyd	b5d999e510	Add textcat to train CLI (#4226 ) * Add doc.cats to spacy.gold at the paragraph level Support `doc.cats` as `"cats": [{"label": string, "value": number}]` in the spacy JSON training format at the paragraph level. * `spacy.gold.docs_to_json()` writes `docs.cats` * `GoldCorpus` reads in cats in each `GoldParse` * Update instances of gold_tuples to handle cats Update iteration over gold_tuples / gold_parses to handle addition of cats at the paragraph level. * Add textcat to train CLI * Add textcat options to train CLI * Add textcat labels in `TextCategorizer.begin_training()` * Add textcat evaluation to `Scorer`: * For binary exclusive classes with provided label: F1 for label * For 2+ exclusive classes: F1 macro average * For multilabel (not exclusive): ROC AUC macro average (currently relying on sklearn) * Provide user info on textcat evaluation settings, potential incompatibilities * Provide pipeline to Scorer in `Language.evaluate` for textcat config * Customize train CLI output to include only metrics relevant to current pipeline * Add textcat evaluation to evaluate CLI * Fix handling of unset arguments and config params Fix handling of unset arguments and model confiug parameters in Scorer initialization. * Temporarily add sklearn requirement * Remove sklearn version number * Improve Scorer handling of models without textcats * Fixing Scorer handling of models without textcats * Update Scorer output for python 2.7 * Modify inf in Scorer for python 2.7 * Auto-format Also make small adjustments to make auto-formatting with black easier and produce nicer results * Move error message to Errors * Update documentation * Add cats to annotation JSON format [ci skip] * Fix tpl flag and docs [ci skip] * Switch to internal roc_auc_score Switch to internal `roc_auc_score()` adapted from scikit-learn. * Add AUCROCScore tests and improve errors/warnings * Add tests for AUCROCScore and roc_auc_score * Add missing error for only positive/negative values * Remove unnecessary warnings and errors * Make reduced roc_auc_score functions private Because most of the checks and warnings have been stripped for the internal functions and access is only intended through `ROCAUCScore`, make the functions for roc_auc_score adapted from scikit-learn private. * Check that data corresponds with multilabel flag Check that the training instances correspond with the multilabel flag, adding the multilabel flag if required. * Add textcat score to early stopping check * Add more checks to debug-data for textcat * Add example training data for textcat * Add more checks to textcat train CLI * Check configuration when extending base model * Fix typos * Update textcat example data * Provide licensing details and licenses for data * Remove two labels with no positive instances from jigsaw-toxic-comment data. Co-authored-by: Ines Montani <ines@ines.io>	2019-09-15 22:31:31 +02:00
Ines Montani	bab9976d9a	💫 Adjust Table API and add docs (#4289 ) * Adjust Table API and add docs * Add attributes and update description [ci skip] * Use strings.get_string_id instead of hash_string * Fix table method calls * Make orth arg in Lemmatizer.lookup optional Fall back to string, which is now handled by Table.__contains__ out-of-the-box * Fix method name * Auto-format	2019-09-15 22:08:13 +02:00
Ines Montani	16c2522791	Merge branch 'master' into develop	2019-09-14 16:42:01 +02:00
Ines Montani	86befc80bf	WIP: Add v2.2 page [ci skip]	2019-09-14 16:41:48 +02:00
Ines Montani	04d36d2471	Remove unused link [ci skip]	2019-09-14 16:41:19 +02:00
Ines Montani	5c8b5e68ec	Fix docs consistency [ci skip]	2019-09-14 16:23:37 +02:00
Ines Montani	bbf7337eaf	Update adding languages docs [ci skip]	2019-09-14 15:32:15 +02:00
Ines Montani	3126dd0904	Tidy up and auto-format [ci skip]	2019-09-14 12:58:06 +02:00
Ines Montani	3c3658ef9f	Merge branch 'master' into develop	2019-09-12 18:03:01 +02:00
Sofie Van Landeghem	9be4d1c105	Allow copying of user_data in as_doc (#4282 ) * Allow copying the user_data with as_doc + unit test * add option to docs * add typing * import fix * workaround to avoid bool clashing ... * bint instead of bool	2019-09-12 17:08:14 +02:00
Ines Montani	ff51fba96a	Update lemmaitzer docs [ci skip]	2019-09-12 16:26:33 +02:00
Ines Montani	25b2b3ff45	Remove LEMMA from exception examples [ci skip]	2019-09-12 16:26:27 +02:00
Ines Montani	82c16b7943	Remove u-strings and fix formatting [ci skip]	2019-09-12 16:11:15 +02:00
Ines Montani	a31e9e1cd5	Update training docs [ci skip]	2019-09-12 15:32:39 +02:00
Ines Montani	b544dcb3c5	Document debug-data [ci skip]	2019-09-12 15:26:20 +02:00
Ines Montani	c0a4cab178	Update "Adding languages" docs [ci skip]	2019-09-12 14:53:06 +02:00
Ines Montani	10257f3131	Document Lookups [ci skip]	2019-09-12 14:00:14 +02:00
Ines Montani	aa4ff0baa1	Auto-format [ci skip]	2019-09-12 13:05:53 +02:00
Ines Montani	625ce2db8e	Update Language docs [ci skip]	2019-09-12 13:03:38 +02:00
Ines Montani	cb41a33d14	Update displaCy API docs [ci skip]	2019-09-12 12:59:20 +02:00
Ines Montani	e7c20ad1d2	Update colors entry points docs [ci skip]	2019-09-12 12:59:10 +02:00
Ines Montani	7b59a919e6	Update entry points docs [ci skip]	2019-09-12 12:52:06 +02:00
Sofie Van Landeghem	0b4b4f1819	Documentation for Entity Linking (#4065 ) * document token ent_kb_id * document span kb_id * update pipeline documentation * prior and context weights as bool's instead * entitylinker api documentation * drop for both models * finish entitylinker documentation * small fixes * documentation for KB * candidate documentation * links to api pages in code * small fix * frequency examples as counts for consistency * consistent documentation about tensors returned by predict * add entity linking to usage 101 * add entity linking infobox and KB section to 101 * entity-linking in linguistic features * small typo corrections * training example and docs for entity_linker * predefined nlp and kb * revert back to similarity encodings for simplicity (for now) * set prior probabilities to 0 when excluded * code clean up * bugfix: deleting kb ID from tokens when entities were removed * refactor train el example to use either model or vocab * pretrain_kb example for example kb generation * add to training docs for KB + EL example scripts * small fixes * error numbering * ensure the language of vocab and nlp stay consistent across serialization * equality with = * avoid conflict in errors file * add error 151 * final adjustements to the train scripts - consistency * update of goldparse documentation * small corrections * push commit * typo fix * add candidate API to kb documentation * update API sidebar with EntityLinker and KnowledgeBase * remove EL from 101 docs * remove entity linker from 101 pipelines / rephrase * custom el model instead of existing model * set version to 2.2 for EL functionality * update documentation for 2 CLI scripts	2019-09-12 11:38:34 +02:00
Sofie Van Landeghem	53a9ca45c9	Docs: bufsize instead of buffsize (#4247 )	2019-09-06 11:11:54 +02:00
Sofie Van Landeghem	6b012cebff	Make pos/tag distinction more clear in docs (#4246 ) * make distinction between tag and pos more prominent in docs * out of the 101	2019-09-06 10:31:21 +02:00
adrianeboyd	82159b5c19	Updates/bugfixes for NER/IOB converters (#4186 ) * Updates/bugfixes for NER/IOB converters * Converter formats `ner` and `iob` use autodetect to choose a converter if possible * `iob2json` is reverted to handle sentence-per-line data like `word1\|pos1\|ent1 word2\|pos2\|ent2` * Fix bug in `merge_sentences()` so the second sentence in each batch isn't skipped * `conll_ner2json` is made more general so it can handle more formats with whitespace-separated columns * Supports all formats where the first column is the token and the final column is the IOB tag; if present, the second column is the POS tag * As in CoNLL 2003 NER, blank lines separate sentences, `-DOCSTART- -X- O O` separates documents * Add option for segmenting sentences (new flag `-s`) * Parser-based sentence segmentation with a provided model, otherwise with sentencizer (new option `-b` to specify model) * Can group sentences into documents with `n_sents` as long as sentence segmentation is available * Only applies automatic segmentation when there are no existing delimiters in the data * Provide info about settings applied during conversion with warnings and suggestions if settings conflict or might not be not optimal. * Add tests for common formats * Add '(default)' back to docs for -c auto * Add document count back to output * Revert changes to converter output message * Use explicit tabs in convert CLI test data * Adjust/add messages for n_sents=1 default * Add sample NER data to training examples * Update README * Add links in docs to example NER data * Define msg within converters	2019-08-29 12:04:01 +02:00
Björn Böing	bae0455f91	Fix visualizer options linking for displaCy. (#4202 )	2019-08-27 14:04:28 +02:00
Christos Aridas	61f5c007a0	DOC Fix pipeline functions examples (#4189 )	2019-08-23 19:15:32 +02:00
adrianeboyd	8fe7bdd0fa	Improve token pattern checking without validation (#4105 ) * Fix typo in rule-based matching docs * Improve token pattern checking without validation Add more detailed token pattern checks without full JSON pattern validation and provide more detailed error messages. Addresses #4070 (also related: #4063, #4100). * Check whether top-level attributes in patterns and attr for PhraseMatcher are in token pattern schema * Check whether attribute value types are supported in general (as opposed to per attribute with full validation) * Report various internal error types (OverflowError, AttributeError, KeyError) as ValueError with standard error messages * Check for tagger/parser in PhraseMatcher pipeline for attributes TAG, POS, LEMMA, and DEP * Add error messages with relevant details on how to use validate=True or nlp() instead of nlp.make_doc() * Support attr=TEXT for PhraseMatcher * Add NORM to schema * Expand tests for pattern validation, Matcher, PhraseMatcher, and EntityRuler * Remove unnecessary .keys() * Rephrase error messages * Add another type check to Matcher Add another type check to Matcher for more understandable error messages in some rare cases. * Support phrase_matcher_attr=TEXT for EntityRuler * Don't use spacy.errors in examples and bin scripts * Fix error code * Auto-format Also try get Azure pipelines to finally start a build :( * Update errors.py Co-authored-by: Ines Montani <ines@ines.io> Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2019-08-21 14:00:37 +02:00
Ines Montani	3134a9b6e0	Add section on expanding regex match to token boundaries (see #4158 ) [ci skip]	2019-08-21 12:53:31 +02:00
Ines Montani	fe230c8776	Fix typo [ci skip]	2019-08-20 13:02:05 +02:00
Daniel Bourke	b0a28fd0de	fix PhraseMatcher link typo (#4150 ) /api/phtasematcher -> /api/phrasematcher	2019-08-20 13:01:43 +02:00
Ines Montani	ce4c3e5204	Document force flag on set_extension (closes #4148 )	2019-08-19 19:22:07 +02:00
Ines Montani	66aba2d676	Improve regex matching docs [ci skip]	2019-08-19 13:59:41 +02:00
Sofie Van Landeghem	cc66f47893	Make enabling/disabling jupyter mode more explicit (#4144 ) * make enabling/disabling jupyter mode more explicit * markup fix	2019-08-19 11:53:34 +02:00
Ines Montani	e520eb3f6c	Make visualized NER examples more clear (closes #4104 ) [ci skip]	2019-08-18 16:29:29 +02:00
Ines Montani	1362f793cf	Improve docs on phrase pattern attributes (closes #4100 ) [ci skip]	2019-08-11 11:13:49 +02:00
Ines Montani	8b4a0fabbb	Adjust docs example [ci skip]	2019-08-07 00:46:47 +02:00
adrianeboyd	69aca7d839	Add validate option to EntityRuler (#4089 ) * Add validate option to EntityRuler * Add validate to EntityRuler, passed to Matcher and PhraseMatcher * Add validate to usage and API docs * Update website/docs/usage/rule-based-matching.md Co-Authored-By: Ines Montani <ines@ines.io> * Update website/docs/usage/rule-based-matching.md Co-Authored-By: Ines Montani <ines@ines.io>	2019-08-07 00:40:53 +02:00
Ines Montani	4ae320e5c2	Use consistent casing for entity ruler patterns (see #4063 ) [ci skip]	2019-08-06 12:20:22 +02:00
Ines Montani	223bde5cf6	Improve docs on matcher attributes [ci skip] (closes #4063 )	2019-08-06 12:13:42 +02:00
Ines Montani	2bfae0b167	Auto-format	2019-08-06 12:13:31 +02:00
Ines Montani	0f76e0022d	Update .tensor docs [ci skip]	2019-08-01 18:37:09 +02:00
Björn Böing	a83c0add2e	Add links to tokenizer API docs to refer relevant information. (#4064 ) * Add links to tokenizer API docs to refer relevant information. * Add suggested changes Co-Authored-By: Ines Montani <ines@ines.io>	2019-08-01 14:28:38 +02:00
Ejar	2cdf7d39e7	Corrected imported fucntion (#4062 ) The example showed an incorrected import	2019-08-01 12:43:36 +02:00
Ines Montani	fcd2f7f656	Fix version introducing Span.ents (closes #4045 ) [ci skip]	2019-07-30 10:32:33 +02:00
Ines Montani	fc69da0acb	💫 Support simple training format in nlp.evaluate and add tests (#4033 ) * Support simple training format in nlp.evaluate and add tests * Update docs [ci skip]	2019-07-27 17:30:18 +02:00
Ines Montani	bd39e5e630	Add "Processing text" section [ci skip]	2019-07-25 17:38:03 +02:00
Ines Montani	a5e3d2f318	Improve section on disabling pipes [ci skip]	2019-07-25 14:25:34 +02:00
Ines Montani	02e444ec7c	Add section on special tokenizer component [ci skip]	2019-07-25 14:25:03 +02:00
Ines Montani	1fa6d6ba55	Improve consistency of docs examples [ci skip]	2019-07-25 14:24:56 +02:00
adrianeboyd	784a5f4284	Update GoldParse attributes in API docs (#4023 ) * add `words` * update name of entity list to `ner` I think it might be a bit more consistent to have `ner` named `entities` or `ents` (and `ents` is actually set somewhere to `None`, which is a bit confusing), but it looks like renaming it would be a non-trivial decision.	2019-07-25 12:14:02 +02:00
Adriane Boyd	6c5044ed2a	Update annotation docs for German - minor formatting fixes - remove STTS tags not used in Tiger - update list of dependency relations to match tiger2dep	2019-07-22 11:59:03 +02:00
adrianeboyd	d2c474cbb7	Fix initial example in EntityRuler API docs (#3999 )	2019-07-22 11:18:55 +02:00
Ines Montani	1167c303a0	Fix typos [ci skip]	2019-07-19 13:08:18 +02:00
BreakBB	6d9a7c0749	Add '--silent' argument to bash example of CLI Info	2019-07-19 10:00:45 +02:00
BreakBB	c8ba0f690d	Fix --force parameter of CLI package	2019-07-19 10:00:45 +02:00
Ines Montani	a0acb1b3cd	Also add infobox to API docs [ci skip]	2019-07-17 16:26:41 +02:00
Ines Montani	c3ead02ea5	Adjust wording [ci skip]	2019-07-17 16:06:25 +02:00
Ines Montani	1d5ff3e455	Add infobox	2019-07-17 15:29:36 +02:00
Ines Montani	114cb18892	Improve wording	2019-07-17 15:27:53 +02:00
Ines Montani	7522beef9e	Add "Things to try" prompts	2019-07-17 15:25:02 +02:00
Ines Montani	9f02e3c027	Adjust example Not actually supported in this alignment interpretation	2019-07-17 15:13:50 +02:00
Ines Montani	1ea472468a	Add usage docs for aligning tokenization	2019-07-17 15:08:33 +02:00
Ines Montani	f97a555445	Add API documentation	2019-07-17 14:30:04 +02:00
pmbaumgartner	9a86d95ea2	fix custom attribute links	2019-07-14 20:23:54 -04:00
Ines Montani	40cd03fc35	Improve EntityRuler serialization	2019-07-10 12:25:45 +02:00
Ines Montani	8721849423	Update Scorer.ents_per_type	2019-07-10 11:19:28 +02:00
Ines Montani	ebe58e7fa1	Document gold.docs_to_json [ci skip]	2019-07-10 10:27:33 +02:00
Ines Montani	881f5bc401	Auto-format	2019-07-10 10:27:29 +02:00
Björn Böing	205c73a589	Update tokenizer and doc init example (#3939 ) * Fix Doc.to_json hyperlink * Update tokenizer and doc init examples * Change "matchin rules" to "punctuation rules" * Auto-format	2019-07-10 10:16:48 +02:00
Björn Böing	04982ccc40	Update pretrain to prevent unintended overwriting of weight fil… (#3902 ) * Update pretrain to prevent unintended overwriting of weight files for #3859 * Add '--epoch-start' to pretrain docs * Add mising pretrain arguments to bash example * Update doc tag for v2.1.5	2019-07-09 21:48:30 +02:00
Joshua Smith	2eb925bd05	Added an argument to `EntityRuler` constructor to pass attrs to… (#3919 ) * Perserve flags in EntityRuler The EntityRuler (explosion/spaCy#3526) does not preserve overwrite flags (or `ent_id_sep`) when serialized. This commit adds support for serialization/deserialization preserving overwrite and ent_id_sep flags. * add signed contributor agreement * flake8 cleanup mostly blank line issues. * mark test from the issue as needing a model The test from the issue needs some language model for serialization but the test wasn't originally marked correctly. * Adds `phrase_matcher_attr` to allow args to PhraseMatcher This is an added arg to pass to the `PhraseMatcher`. For example, this allows creation of a case insensitive phrase matcher when the `EntityRuler` is created. References explosion/spaCy#3822 * remove unneeded model loading The model didn't need to be loaded, and I replaced it with a change that doesn't require it (using existings fixtures) * updated docstring for new argument * updated docs to reflect new argument to the EntityRuler constructor * change tempdir handling to be compatible with python 2.7 * return conflicted code to entityruler Some stuff got cut out because of merge conflicts, this returns that code for the phrase_matcher_attr. * fixed typo in the code added back after conflicts * flake8 compliance When I deconflicted the branch there were some flake8 issues introduced. This resolves the spacing problems. * test changes: attempts to fix flaky test in python3.5 These tests seem to be alittle flaky in 3.5 so I changed the check to avoid the comparisons that seem to be fail sometimes.	2019-07-09 20:09:17 +02:00
Ines Montani	d361e380b8	Fix matcher callback example (closes #3862 )	2019-06-26 14:47:26 +02:00
Guillaume Claret	d7a519a922	Typo (#3865 ) * Typo * Add contributor agreement	2019-06-20 10:31:19 +02:00
Björn Böing	ebf5a04d6c	Update pretrain docs and add unsupported loss_func error (#3860 ) * Add error to `get_vectors_loss` for unsupported loss function of `pretrain` * Add missing "--loss-func" argument to pretrain docs. Update pretrain plac annotations to match docs. * Add missing quotation marks	2019-06-20 10:30:44 +02:00
Alejandro Alcalde	4866a7ee9e	Changed learning rate by its param name. (#3855 ) * Changed learning rate by its param name. I've been searching for a while how the parameter learning rate was named, with `beta1` and `beta2` its easy as they are marked as code, but learning rate wasn't. I think writing the actual parameter name would be helpful. * Signing SCA	2019-06-20 10:29:20 +02:00
Ines Montani	81c12640ab	Auto-format [ci skip]	2019-06-16 14:33:20 +02:00
Greg Werner	9041a72d7f	Update tokenizer.md for construction example (#3790 ) * Update tokenizer.md for construction example Self contained example. You should really say what nlp is so that the example will work as is * Update CONTRIBUTOR_AGREEMENT.md * Restore contributor agreement * Adjust construction examples	2019-06-16 14:32:56 +02:00
BreakBB	d8573ee715	Update error raising for CLI pretrain to fix #3840 (#3843 ) * Add check for empty input file to CLI pretrain * Raise error if JSONL is not a dict or contains neither `tokens` nor `text` key * Skip empty values for correct pretrain keys and log a counter as warning * Add tests for CLI pretrain core function make_docs. * Add a short hint for the `tokens` key to the CLI pretrain docs * Add success message to CLI pretrain * Update model loading to fix the tests * Skip empty values and do not create docs out of it	2019-06-16 13:22:57 +02:00
Motoki Wu	9c064e6ad9	Add resume logic to spacy pretrain (#3652 ) * Added ability to resume training * Add to readmee * Remove duplicate entry	2019-06-12 13:29:23 +02:00
Ramanan Balakrishnan	eb12703d10	minor fix to broken link in documentation (#3819 ) [ci skip]	2019-06-04 11:15:35 +02:00
Ines Montani	0c74506c9c	Fix typos in docs (closes #3802 ) [ci skip]	2019-06-01 11:35:01 +02:00
Nipun Sadvilkar	1f13005751	Incorrect Token attribute ent_iob_ description (#3800 ) * Incorrect Token attribute ent_iob_ description * Add spaCy contributor agreement	2019-05-31 16:50:45 +02:00
Ramanan Balakrishnan	26c37c5a4d	fix all references to BILUO annotation format (#3797 )	2019-05-31 12:19:19 +02:00
mak	89379a7fa4	Corrected example model URL in requirements.txt (#3786 ) The URL used to show how to add a model to the requirements.txt had the old release path (excl. explosion).	2019-05-29 10:51:55 +02:00
Ines Montani	7634812172	Document Language.evaluate	2019-05-24 14:06:36 +02:00
Ines Montani	45e6855550	Update Language.update docs	2019-05-24 14:06:26 +02:00
Ines Montani	b78a8dc1d2	Update Scorer and add API docs	2019-05-24 14:06:04 +02:00
Ines Montani	321c9f5acc	Fix lex_id docs (closes #3743 )	2019-05-16 23:15:58 +02:00
Ines Montani	f96af8526a	Merge branch 'spacy.io' [ci skip]	2019-05-11 23:03:56 +02:00
Ines Montani	7534f7cb44	Fix return value of Language.update (closes #3692 )	2019-05-11 18:40:19 +02:00
devforfu	21af12eb53	Make "text" key in JSONL format optional when "tokens" key is provided (#3721 ) * Fix issue with forcing text key when it is not required * Extending the docs to reflect the new behavior	2019-05-11 15:41:29 +02:00
Ines Montani	6cfa1e1f47	Fix DependencyParser.predict docs (resolves #3561 )	2019-05-11 15:37:54 +02:00
Ines Montani	25f5592d57	Improve Token.prob and Lexeme.prob docs (resolves #3701 )	2019-05-11 15:23:41 +02:00
Aaron Kub	719a15f23d	fixing regex matcher examples (#3708 ) (#3719 )	2019-05-10 14:23:52 +02:00
Ines Montani	65b55f1aaa	Add version tag to `--base-model` argument (closes #3720 )	2019-05-10 14:06:47 +02:00
Ines Montani	505c9e0e19	Add util.filter_spans helper (#3686 )	2019-05-08 02:33:40 +02:00
张晓飞	ba1ff00370	update response after calling add_pipe (#3661 ) * update response after calling add_pipe component:print_info is appened in the last, so need show it at the end of pipeline * Create henry860916.md	2019-05-01 12:02:18 +02:00
Ramiro Gómez	8ee4100f8f	Remove dangling M (#3657 ) I assume this is a typo. Sorry if it has a meaning that I'm not aware of.	2019-04-29 19:44:43 +02:00
Amit Chaudhary	167d63af31	Fix broken link to Dive Into Python 3 website (#3656 ) * Fix broken link to Dive Into Python 3 website * Sign spaCy Contributor Agreement	2019-04-29 19:44:00 +02:00
Ivan Tham	fa94f83697	Improve redundant variable name (#3643 ) * Improve redundant variable name * Apply suggestions from code review Co-Authored-By: pickfire <pickfire@riseup.net>	2019-04-26 16:50:14 +02:00
Ines Montani	ec0d840ab5	Document early stopping	2019-04-22 14:31:32 +02:00
Ines Montani	1d567913f9	Update spacy evaluate example	2019-04-22 14:28:42 +02:00
Ines Montani	7917ce2f73	Make flag shortcut consistent and document	2019-04-22 14:23:44 +02:00
Ines Montani	52658c80d5	Allow jupyter=False to override Jupyter mode (closes #3598 )	2019-04-22 14:18:32 +02:00
Motoki Wu	8e2cef49f3	Add save after `--save-every` batches for `spacy pretrain` (#3510 ) <!--- Provide a general summary of your changes in the title. --> When using `spacy pretrain`, the model is saved only after every epoch. But each epoch can be very big since `pretrain` is used for language modeling tasks. So I added a `--save-every` option in the CLI to save after every `--save-every` batches. ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> To test... Save this file to `sample_sents.jsonl` ``` {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} ``` Then run `--save-every 2` when pretraining. ```bash spacy pretrain sample_sents.jsonl en_core_web_md here -nw 1 -bs 1 -i 10 --save-every 2 ``` And it should save the model to the `here/` folder after every 2 batches. The models that are saved during an epoch will have a `.temp` appended to the save name. At the end the training, you should see these files (`ls here/`): ```bash config.json model2.bin model5.bin model8.bin log.jsonl model2.temp.bin model5.temp.bin model8.temp.bin model0.bin model3.bin model6.bin model9.bin model0.temp.bin model3.temp.bin model6.temp.bin model9.temp.bin model1.bin model4.bin model7.bin model1.temp.bin model4.temp.bin model7.temp.bin ``` ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> This is a new feature to `spacy pretrain`. 🌵 Unfortunately, I haven't been able to test this because compiling from source is not working (cythonize error). ``` Processing matcher.pyx [Errno 2] No such file or directory: '/Users/mwu/github/spaCy/spacy/matcher.pyx' Traceback (most recent call last): File "/Users/mwu/github/spaCy/bin/cythonize.py", line 169, in <module> run(args.root) File "/Users/mwu/github/spaCy/bin/cythonize.py", line 158, in run process(base, filename, db) File "/Users/mwu/github/spaCy/bin/cythonize.py", line 124, in process preserve_cwd(base, process_pyx, root + ".pyx", root + ".cpp") File "/Users/mwu/github/spaCy/bin/cythonize.py", line 87, in preserve_cwd func(args) File "/Users/mwu/github/spaCy/bin/cythonize.py", line 63, in process_pyx raise Exception("Cython failed") Exception: Cython failed Traceback (most recent call last): File "setup.py", line 276, in <module> setup_package() File "setup.py", line 209, in setup_package generate_cython(root, "spacy") File "setup.py", line 132, in generate_cython raise RuntimeError("Running cythonize failed") RuntimeError: Running cythonize failed ``` Edit: Fixed! after deleting all `.cpp` files: `find spacy -name ".cpp" \| xargs rm` ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-04-22 14:10:16 +02:00
Ines Montani	0dce4585b1	Add course to 101	2019-04-19 15:59:51 +02:00
Ines Montani	2efc87c382	Remove unused image	2019-04-19 15:48:12 +02:00
Ines Montani	38395d9518	Merge branch 'spacy.io'	2019-04-19 15:26:20 +02:00
Ines Montani	7ac5bb0a7b	Update landing and feature overview	2019-04-19 15:23:08 +02:00
fizban99	f2f2df6e78	entity types for colors should be in uppercase (#3599 ) although the text indicates the entity types should be in lowercase, the sample code shows uppercase, which is the correct format.	2019-04-17 11:22:56 +02:00
Ines Montani	5289dd1356	Fix formatting	2019-04-13 17:58:26 +02:00
Ines Montani	9e7deeaf48	Remove Datacamp	2019-04-13 17:46:32 +02:00
Santiago Castro	86e4b68aa9	Fix website docs for Vectors.from_glove (#3565 ) * Fix website docs for Vectors.from_glove * Add myself as a contributor	2019-04-10 15:23:27 +02:00
Bharat Raghunathan	72820896d4	Fix typo in web docs cli.md (#3559 )	2019-04-09 11:40:03 +02:00
pierremonico	0d26bfe677	Removes duplicate in table (#3550 ) * Removes duplicate in table Just fixing typos. * Remove newline Co-authored-by: Ines Montani <ines@ines.io>	2019-04-08 10:30:42 +02:00
Ines Montani	2f0f439c54	Remove non-existent example (closes #3533 )	2019-04-03 09:59:17 +02:00
Samuel Kane	06a1846379	fix(util): fix decaying function output (#3495 ) * fix(util): fix decaying function output * fix(util): better test and adhere to code standards * fix(util): correct variable name, pytestify test, update website text	2019-03-28 13:24:47 +01:00
Bharat Raghunathan	1db3e47509	DOC: Update tokenizer docs to include default value for batch_size in pipe (#3492 )	2019-03-28 12:48:02 +01:00
Ines Montani	200d8bdb3c	Merge branch 'spacy.io' [ci skip]	2019-03-23 16:46:34 +01:00
Ines Montani	1e5b917d75	Fix formatting [ci skip]	2019-03-23 16:45:50 +01:00
Matthew Honnibal	6c783f8045	Bug fixes and options for TextCategorizer (#3472 ) * Fix code for bag-of-words feature extraction The _ml.py module had a redundant copy of a function to extract unigram bag-of-words features, except one had a bug that set values to 0. Another function allowed extraction of bigram features. Replace all three with a new function that supports arbitrary ngram sizes and also allows control of which attribute is used (e.g. ORTH, LOWER, etc). * Support 'bow' architecture for TextCategorizer This allows efficient ngram bag-of-words models, which are better when the classifier needs to run quickly, especially when the texts are long. Pass architecture="bow" to use it. The extra arguments ngram_size and attr are also available, e.g. ngram_size=2 means unigram and bigram features will be extracted. * Fix size limits in train_textcat example * Explain architectures better in docs	2019-03-23 16:44:44 +01:00
Ines Montani	06bf130890	💫 Add better and serializable sentencizer (#3471 ) * Add better serializable sentencizer component * Replace default factory * Add tests * Tidy up * Pass test * Update docs	2019-03-23 15:45:02 +01:00
Ines Montani	b532386a60	Fix typo [ci skip]	2019-03-22 18:36:17 +01:00
Ines Montani	5073ce63fd	Merge branch 'spacy.io' [ci skip]	2019-03-22 15:17:11 +01:00
Ines Montani	0712efc6b3	Update version requirements [ci skip]	2019-03-21 10:23:54 +01:00
Ines Montani	dac8f8ff99	Update Span.__init__ docs (see #3445 ) [ci skip]	2019-03-20 17:24:17 +01:00
Ines Montani	d4eed4a84f	Add note on unicode build to troubleshooting guide (see #3421 ) [ci skip]	2019-03-19 10:27:02 +01:00
Ines Montani	08284f3a11	💫 v2.1.0 launch updates (only merge on launch!) (#3414 ) * Update README.md * Use production docsearch [ci skip] * Add option to exclude pages from search	2019-03-18 16:07:26 +01:00
Ines Montani	a611b32fbf	Update model docs [ci skip]	2019-03-17 11:48:18 +01:00
Matthew Honnibal	62afa64a8d	Expose batch size and length caps on CLI for pretrain (#3417 ) Add and document CLI options for batch size, max doc length, min doc length for `spacy pretrain`. Also improve CLI output. Closes #3216 ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-16 21:38:45 +01:00
Ines Montani	2c5dd4d602	Update Vectors.find docs [ci skip]	2019-03-16 17:10:57 +01:00
Ines Montani	cbcba699dd	Fix missing ids	2019-03-14 17:56:53 +01:00
Ines Montani	4cfe4aa224	Fix small issues in the docs [ci skip]	2019-03-12 22:57:15 +01:00
Ines Montani	ba7eb2d131	Update section [ci skip]	2019-03-12 16:18:34 +01:00
Ines Montani	cecc31b765	Don't auto-slugify accordion links [ci skip]	2019-03-12 15:30:49 +01:00
Ines Montani	72fb324d95	Add vector training script to bin [ci skip]	2019-03-12 12:07:56 +01:00
Ines Montani	3abf0e6b9f	Replace dev-resources links with real examples	2019-03-12 12:07:40 +01:00
Ines Montani	59c0620487	Auto-format	2019-03-12 12:07:11 +01:00
Ines Montani	cdd418b93e	Auto-format [ci skip]	2019-03-11 17:10:50 +01:00
Matthew Honnibal	b0b990e405	Fix token.conjuncts (closes #795 ) (#3392 ) * Implement conjuncts method * Add span.conjuncts property * Un-xfail token.conjuncts tests * Update docs for token.conjuncts and span.conjuncts * Fix merge error in token.conjuncts	2019-03-11 17:05:45 +01:00
Ines Montani	25cb764e64	Document new API [ci skip]	2019-03-11 15:23:53 +01:00
Ines Montani	ebcf2bb1c3	Add Doc.lang and Doc.lang_	2019-03-11 14:21:40 +01:00
Ines Montani	7c05ca01e8	💫 Support mutable default values for extension attributes (#3389 ) * Support mutable default values in extensions * Update documentation	2019-03-11 12:50:44 +01:00
Matthew Honnibal	98acf5ffe4	💫 Allow passing of config parameters to specific pipeline components (#3386 ) * Add component_cfg kwarg to begin_training * Document component_cfg arg to begin_training * Update docs and auto-format * Support component_cfg across Language * Format * Update docs and docstrings [ci skip] * Fix begin_training	2019-03-10 23:36:47 +01:00
Ines Montani	8dbf1e9037	Also fix #3387 on develop	2019-03-10 23:36:28 +01:00
Ines Montani	7ba3a5d95c	💫 Make serialization methods consistent (#3385 ) * Make serialization methods consistent exclude keyword argument instead of random named keyword arguments and deprecation handling * Update docs and add section on serialization fields	2019-03-10 19:16:45 +01:00
Ines Montani	9a8f169e5c	Update v2-1.md	2019-03-10 18:58:51 +01:00
Ines Montani	0426689db8	💫 Improve Doc.to_json and add Doc.is_nered (#3381 ) * Use default return instead of else * Add Doc.is_nered to indicate if entities have been set * Add properties in Doc.to_json if they were set, not if they're available This way, if a processed Doc exports "pos": None, it means that the tag was explicitly unset. If it exports "ents": [], it means that entity annotations are available but that this document doesn't contain any entities. Before, this would have been unclear and problematic for training.	2019-03-10 15:24:34 +01:00
Ines Montani	76764fcf59	💫 Improve converters and training data file formats (#3374 ) * Populate converter argument info automatically * Add conversion option for msgpack * Update docs * Allow reading training data from JSONL	2019-03-08 23:15:23 +01:00
Ines Montani	296446a1c8	Tidy up and improve docs and docstrings (#3370 ) <!--- Provide a general summary of your changes in the title. --> ## Description * tidy up and adjust Cython code to code style * improve docstrings and make calling `help()` nicer * add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects * fix various typos and inconsistencies in docs ### Types of change enhancement, docs ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-08 11:42:26 +01:00
Ines Montani	fa7314b221	Clarify train_path and dev_path format (see #3366 ) [ci skip]	2019-03-07 12:23:27 +01:00
Ines Montani	e9babd9973	Update hyperparameters section (see #3352 )	2019-03-06 14:40:30 +01:00
Ines Montani	48a206a95f	Fix displaCy visualizations in docs (closes #3357 ) [ci skip]	2019-03-06 13:20:44 +01:00
Ines Montani	5eadf61327	Update pretraining docs on file format (closes #3354 )	2019-03-04 16:30:13 +00:00
Ines Montani	1d4ba7678f	Auto-format [ci skip]	2019-02-27 12:07:35 +01:00
Matthew Honnibal	f1d77eb140	💫 Improve handling of missing NER tags (closes #2603 ) (#3341 ) * Improve handling of missing NER tags GoldParse can accept missing NER tags, if entities is provided in BILUO format (rather than as spans). Missing tags can be provided as None values. Fix bug that occurred when first tag was a None value. Closes #2603. * Document specification of missing NER tags.	2019-02-27 12:06:32 +01:00
Ines Montani	c478a2ccb6	Update backwards incompat [ci skip]	2019-02-27 11:56:56 +01:00
Matthew Honnibal	4a3371acd5	Make doc[0].is_sent_start == True (closes #2869 ) (#3340 ) * Make doc[0] have sent_start True. Closes #2869 * Document that doc[0].is_sent_start defaults True.	2019-02-27 11:17:17 +01:00
Ines Montani	1b6238101a	Add table explaining training metrics [closes #2644 ]	2019-02-25 10:03:43 +01:00
Ines Montani	d0b3af9222	Fix remaining inaccuracies in API docs (closes #2329 )	2019-02-24 22:21:25 +01:00
Ines Montani	62b558ab72	💫 Support lexical attributes in retokenizer attrs (closes #2390 ) (#3325 ) * Fix formatting and whitespace * Add support for lexical attributes (closes #2390) * Document lexical attribute setting during retokenization * Assign variable oputside of nested loop	2019-02-24 21:13:51 +01:00
Ines Montani	aa52305461	Improve pipeline model and meta example [ci skip]	2019-02-24 18:45:39 +01:00
Ines Montani	df19e2bff6	💫 Allow setting of custom attributes during retokenization (closes #3314 ) (#3324 ) <!--- Provide a general summary of your changes in the title. --> ## Description This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter and a setter implemented. ```python Token.set_extension('is_musician', default=False) doc = nlp("I like David Bowie.") with doc.retokenize() as retokenizer: attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}} retokenizer.merge(doc[2:4], attrs=attrs) assert doc[2].text == "David Bowie" assert doc[2].lemma_ == "David Bowie" assert doc[2]._.is_musician ``` ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-02-24 18:38:47 +01:00
Ines Montani	403b9cd58b	Add docs on adding to existing tokenizer rules [ci skip]	2019-02-24 18:35:19 +01:00
Ines Montani	1ea1bc98e7	Document regex utilities [ci skip]	2019-02-24 18:34:10 +01:00
Ines Montani	46ec5cdccc	Update TextCategorizer docs	2019-02-24 13:11:57 +01:00
Ines Montani	c03cb1cc63	Improve built-in component API docs	2019-02-24 13:11:49 +01:00
Ines Montani	383e2e1f12	Update Python versions [ci skip]	2019-02-24 11:49:45 +01:00
Ines Montani	b624cb4b89	Update v2-1.md	2019-02-24 11:49:27 +01:00
Ines Montani	250e88ef55	Fix docs example (see #2728 )	2019-02-21 14:22:06 +01:00
Ines Montani	0fc908d7a5	Add note on merging speed in v2.1 (see #3300 ) [ci skip]	2019-02-21 12:34:18 +01:00
Ines Montani	236aa94ded	Update v2-1.md	2019-02-21 12:33:56 +01:00
Sofie	9a478b6db8	Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293 ) * splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * remove duplicate * remove xfail for Issue #2179 fixed by Matt * adjust documentation and remove reference to regex lib	2019-02-20 22:10:13 +01:00
Ines Montani	57ae71ea95	Add docs on serializing the pipeline (see #3289 ) [ci skip]	2019-02-18 14:13:29 +01:00
Ines Montani	38e4422c0d	Improve matcher example (resolves #3287 )	2019-02-18 13:26:37 +01:00
Ines Montani	660cfe44c5	Fix formatting	2019-02-18 13:26:22 +01:00
Ines Montani	212ff359ef	Fix links [ci skip]	2019-02-17 22:25:50 +01:00
Ines Montani	04b4df0ec9	Remove n_threads	2019-02-17 22:25:42 +01:00
Ines Montani	e597110d31	💫 Update website (#3285 ) <!--- Provide a general summary of your changes in the title. --> ## Description The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in straightforward Markdown without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on. This PR also includes various new docs pages and content. Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837. ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-02-17 19:31:19 +01:00
ines	808f7ee417	Update API documentation	2017-10-03 14:27:22 +02:00
ines	3f4fd2c5d5	Update usage documentation	2017-10-03 14:26:20 +02:00
Reza Gharibi	0461b82158	Fix typos	2017-09-27 03:56:20 +03:30
Reza Gharibi	fa1844b132	Fix typo	2017-09-27 03:55:54 +03:30
Reza Gharibi	b5dd7e7cc4	Fix typo	2017-09-27 03:55:28 +03:30
Ines Montani	b8e81daccf	Fix typo (closes #1312 )	2017-09-14 12:49:59 +02:00
ines	d15775c3ad	Fix typos and commands in alpha docs	2017-08-21 13:40:11 +02:00
ines	3c33003078	Port over typo corrections from #1245	2017-08-20 12:00:17 +02:00
ines	1261b01e46	Update Doc.char_span docs	2017-08-19 16:34:32 +02:00
ines	5cb0200e63	Document new Span.to_array() method	2017-08-19 12:45:28 +02:00
ines	471eed4126	Add example to Span.merge()	2017-08-19 12:45:16 +02:00
ines	404d3067b8	Document new Doc.char_span() method	2017-08-19 12:45:00 +02:00
ines	d53cbf369f	Document as_tuples kwarg on Language.pipe()	2017-08-19 12:44:50 +02:00
ines	6a37c93311	Update argument type	2017-08-19 12:44:33 +02:00
ines	4731d50220	Add break utility for long nowrap items (e.g. code)	2017-08-19 12:44:23 +02:00
ines	0aba11b64b	Update package command docs	2017-08-14 16:45:44 +02:00
ines	a29f132ffd	Change python -m spacy to spacy Reflects latest change to entry point or auto-alias	2017-08-14 13:04:48 +02:00
Nikolai Kruglikov	08e443e083	Fix small typo in documentation	2017-08-14 12:19:04 +02:00
ines	ab8ffbaab7	Add text classification to v2 overview	2017-07-22 17:56:51 +02:00
ines	f085b88f9d	Add TextCategorizer API docs stub	2017-07-22 17:56:33 +02:00
ines	ab1a4e8b3c	Add Tensorizer API docs stub	2017-07-22 17:56:25 +02:00
ines	0fb89dd204	Add text classification usage guide template	2017-07-22 17:56:07 +02:00
ines	d05ab1b3a0	Add text classification to 101 overview and change order	2017-07-22 17:55:53 +02:00
ines	d2a7e5b8e5	Add GoldParse.cats attribute	2017-07-22 17:55:35 +02:00
ines	23d976ed00	Add Doc.cats attribute and missing v2 tag	2017-07-22 17:55:14 +02:00
Ines Montani	1ddbeddca2	Fix typo	2017-07-22 15:00:58 +02:00
Jarle Mathiesen	f20533ec0c	fix small typo	2017-06-24 12:31:33 +02:00
Savva Kolbachev	800a8faff4	Changed the capital of Lithuania to Vilnius Hi, There is a typo about the capital of Lithuania. Vilnius is the capital of Lithuania https://en.wikipedia.org/wiki/Vilnius Ljubljana is the capital of Slovenia https://en.wikipedia.org/wiki/Ljubljana	2017-06-12 23:27:00 +03:00
Ines Montani	57f64b9e1c	Merge pull request #1124 from v3t3a/patch-3 docs - Fix url error for Displacy Ent visualizer	2017-06-12 21:20:32 +02:00
Ines Montani	b2a28028cf	Merge pull request #1115 from v3t3a/patch-2 docs - Add read() method when opening file (Lightning tour)	2017-06-12 21:19:25 +02:00
Ines Montani	fe8d136ae0	Merge pull request #1114 from v3t3a/patch-1 docs - Update doc.jade (Just remove a duplicate 'doc =')	2017-06-12 21:19:02 +02:00
Vetea	eae1f7b19c	Fix url error for Displacy Ent visualizer	2017-06-12 14:30:02 +02:00
ines	49026a1346	Fix typos in example (see #1105 )	2017-06-08 19:15:50 +02:00
Vetea	cc3aee1189	Add read() method when opening file Add read() method for to avoid : ```TypeError: Argument 'string' has incorrect type (expected str, got _io.TextIOWrapper)``` Test with: spaCy : v2.0.0 Alpha python : 3.5.2+ (default, Sep 22 2016, 12:18:14)	2017-06-08 11:27:09 +02:00
Vetea	8e20cf6368	Update doc.jade Just remove a duplicate 'doc ='	2017-06-08 10:35:58 +02:00
ines	6b799bac54	Fix formatting and details	2017-06-06 14:37:49 +02:00
ines	fd9ae0f0e0	Update v2 comparison table	2017-06-05 16:39:11 +02:00
ines	a3f9745a14	Update similarity usage guide and examples	2017-06-05 15:37:33 +02:00
ines	fd35d910b8	Update v2 docs and benchmarks	2017-06-05 14:13:38 +02:00
ines	9f55c0d4f6	Add Vectors class	2017-06-05 13:33:11 +02:00
ines	040553ca59	Update architecture and features table	2017-06-05 13:33:01 +02:00
ines	e204788c30	Add docs for util.load_model_from_path	2017-06-05 13:18:22 +02:00
ines	efc37ea3de	Update train CLI	2017-06-04 23:45:14 +02:00
ines	505d43b832	Update norms example	2017-06-04 23:33:26 +02:00
ines	f8e93b6d0a	Update norms example	2017-06-04 23:24:29 +02:00
ines	a857b2b511	Update norms example	2017-06-04 23:21:37 +02:00
ines	47d066b293	Add under construction	2017-06-04 23:17:54 +02:00
ines	e9816daa6a	Add details on syntax iterators	2017-06-04 23:16:33 +02:00
ines	990cb81556	Add info on syntax iterators	2017-06-04 21:47:22 +02:00
ines	e4eb33daf7	Add links to production use guide	2017-06-04 20:56:58 +02:00
ines	63cd539d04	Add more details on model packages and requirements.txt (see #1099 )	2017-06-04 20:52:10 +02:00
ines	97ff83d163	Fix docs on model loading	2017-06-04 20:44:59 +02:00
ines	b6002db797	Add v2 label	2017-06-04 18:53:03 +02:00
ines	468ff1a7dd	Update v2 docs and add benchmarks stub	2017-06-04 15:34:28 +02:00
Matthew Honnibal	23fd6b1782	Add intro narrative for v2	2017-06-04 15:10:37 +02:00
ines	3419ecbfdd	Update docs on model shortcut links	2017-06-04 13:55:00 +02:00
ines	586e901143	Add v2 intro stub	2017-06-04 13:42:37 +02:00
ines	4f8f62d9b3	Merge branch 'v2-docs-edits' into develop	2017-06-04 13:40:58 +02:00
ines	809903dcad	Fix link and update wording	2017-06-04 13:29:20 +02:00
ines	22dd18c364	Remove redundant CPU commands	2017-06-04 13:29:13 +02:00
ines	1d6377218a	Update architecture blurb and move other info	2017-06-04 13:28:58 +02:00
ines	7a66c9f039	Fix formatting	2017-06-04 13:14:00 +02:00
Matthew Honnibal	f2c4a9f690	Edits to spacy-101 page	2017-06-04 13:10:27 +02:00
Matthew Honnibal	aca53b95e1	Link architecture blurb	2017-06-04 13:10:06 +02:00
Matthew Honnibal	64ca5123bb	Add Architecture 101 blurb	2017-06-04 13:09:19 +02:00
Matthew Honnibal	e77ed953f4	Update GPU instructions	2017-06-04 12:03:22 +02:00
ines	1d3b012e56	Update adding languages docs and add 101	2017-06-03 23:54:23 +02:00
ines	a3715a81d5	Update adding languages guide	2017-06-03 22:16:38 +02:00
ines	ec6d2bc81d	Add table of contents mixin	2017-06-03 22:16:26 +02:00
ines	9acf8686f7	Update note on compact mode issues	2017-06-03 13:31:16 +02:00
ines	b0225183c2	Update displaCy defaults	2017-06-03 13:27:06 +02:00
ines	c60431357d	Port over docs typo corrections	2017-06-03 11:31:30 +02:00
ines	c6dc2fafc0	Add Spanish and move example sentences to meta	2017-06-01 17:49:56 +02:00
ines	1bebc6392c	Add source files to pipeline components	2017-06-01 17:38:06 +02:00
ines	b577ed79ee	Move social image logic out to function and move files	2017-06-01 14:27:44 +02:00
ines	5e60b09dcd	Fix custom tokenizer example	2017-06-01 13:02:50 +02:00
ines	706cec6d58	Move annotation specs up	2017-06-01 13:02:43 +02:00
ines	8274dffad6	Update NER training draft	2017-06-01 12:51:36 +02:00
ines	04fac3f52a	Add NER training example code	2017-06-01 12:47:47 +02:00
ines	7f5e7e7320	Fix typo	2017-06-01 12:47:36 +02:00
ines	4a927154d8	Update v2 docs	2017-06-01 11:56:32 +02:00
ines	03bbb96db8	Remove outdated examples	2017-06-01 11:56:02 +02:00
ines	789e69b73f	Update training guide	2017-06-01 11:53:23 +02:00
ines	2f40d6e7e7	Add training 101	2017-06-01 11:53:16 +02:00
ines	abed463bbb	Update serialization 101	2017-06-01 11:52:58 +02:00
ines	72380c952a	Update training section in NER guide and add links	2017-06-01 11:52:49 +02:00
ines	77dca25c7f	Update Language API docs	2017-06-01 11:51:31 +02:00
ines	22b1f72870	Add spaCy 101 intro	2017-05-31 12:44:09 +02:00
ines	a18b95ca12	Update docs on testing	2017-05-31 12:43:40 +02:00
ines	981196c181	Fix typo	2017-05-31 11:34:31 +02:00
ines	f86289566a	Update new in v2 section and add note on Matcher acceptors	2017-05-30 13:53:06 +02:00
ines	ce4e45d0bb	Update 101 intro	2017-05-29 22:15:06 +02:00
ines	b5bfab8699	Add description	2017-05-29 15:27:16 +02:00
ines	687ed28340	Update processing pipelines guide	2017-05-29 14:21:00 +02:00
ines	d5992f408f	Update note on vocab consistency	2017-05-29 14:14:26 +02:00
ines	567485a818	Fix and document model loading with pipeline and overrides	2017-05-29 14:10:10 +02:00
ines	a2134951f2	Update 101 and add note on pipeline order and tensors	2017-05-29 11:45:32 +02:00
ines	17b635eaab	Update alpha docs note and fix typo	2017-05-29 11:09:24 +02:00
ines	fbe105f1eb	Add note on L in long integers in Python 2	2017-05-29 11:05:05 +02:00
ines	9d74810f6f	Update examples	2017-05-29 01:09:52 +02:00
ines	42cf414138	Update Matcher example	2017-05-29 01:09:52 +02:00
ines	00b2094dc3	Fix typos, long integers and tests	2017-05-29 01:09:52 +02:00
ines	d71c6db76e	Add missing Chainer install for GPU if building spaCy from source	2017-05-28 23:34:59 +02:00
ines	e0f9ccdaa3	Update texts and rename vectorizer to tensorizer	2017-05-28 23:26:13 +02:00
ines	606879b217	Update hash strings examples	2017-05-28 19:42:44 +02:00
ines	c7b57ea314	Update docs and change integer IDs to hash values	2017-05-28 19:25:34 +02:00
ines	738b4f7187	Add quickstart options and docs for GPU	2017-05-28 19:20:11 +02:00
ines	4c00cb8c8b	Update 101 and add community/FAQ and table of contents	2017-05-28 18:45:49 +02:00
ines	0ea31d1e31	Add under construction note to pipeline components	2017-05-28 18:44:07 +02:00
ines	8a148b6563	Fix code, links and formatting	2017-05-28 18:29:16 +02:00
ines	414193e9ba	Update docs to reflect StringStore changes	2017-05-28 18:19:11 +02:00
ines	69bda9aed7	Update text, examples, typos, wording and formatting	2017-05-28 16:41:01 +02:00
ines	f8185b8e11	Rename vocab-stringsotre to vocab	2017-05-28 16:37:14 +02:00
ines	10d05c2b92	Fix typos, wording and formatting	2017-05-28 01:30:12 +02:00
ines	eb5a8be9ad	Update language overview and add section on 'xx' lang class	2017-05-28 01:15:44 +02:00
ines	eb703f7656	Update API docs	2017-05-28 00:32:43 +02:00
ines	c1983621fb	Update util functions for model loading	2017-05-28 00:22:40 +02:00
ines	db116cbeda	Update tokenization 101 and add illustration	2017-05-28 00:22:40 +02:00
ines	b03fb2d7b0	Update 101 and usage docs	2017-05-28 00:22:40 +02:00
ines	ae11c8d60f	Add emoji sentiment to lightning tour matcher example	2017-05-27 20:02:20 +02:00
ines	22bf5f63bf	Update Matcher docs and add social media analysis example	2017-05-27 17:58:18 +02:00
ines	0d33ead507	Fix initialisation of Doc in lightning tour example	2017-05-27 17:58:06 +02:00
ines	e05bcd6aa8	Update docs to reflect flattened model meta.json Don't use "setup" key and instead, keep "lang" on root level and add "pipeline".	2017-05-27 17:57:46 +02:00
ines	70afcfec3e	Update defaults and example	2017-05-26 14:04:31 +02:00
ines	1b982f0838	Update train command and add docs on hyperparameters	2017-05-26 14:02:38 +02:00
ines	1b9c6ded71	Update API docs and add "source" button to GH source	2017-05-26 13:40:32 +02:00
ines	93ee5c4a52	Update serialization info	2017-05-26 13:22:45 +02:00
ines	f122d82f29	Update usage docs and ddd "under construction"	2017-05-26 13:17:48 +02:00
ines	286c3d0719	Update usage and 101 docs	2017-05-26 12:46:29 +02:00
ines	6d76c1ea16	Add 101 for Vocab, Lexeme and StringStore	2017-05-26 12:45:01 +02:00
ines	d48530835a	Update API docs and fix typos	2017-05-26 12:43:16 +02:00
ines	ea9474f71c	Add version tag mixin to label new features	2017-05-26 12:42:36 +02:00
ines	353f0ef8d7	Use disable argument (list) for serialization	2017-05-26 12:33:54 +02:00
ines	9063654a1a	Add Training 101 stub	2017-05-25 11:18:02 +02:00
ines	b2324be3e9	Fix typos, text, examples and formatting	2017-05-25 11:17:21 +02:00
ines	dcb10da615	Update and fix lightning tour examples	2017-05-25 11:15:56 +02:00
ines	4b5540cc63	Rewrite examples in lightning tour	2017-05-25 01:58:33 +02:00
ines	87c976e04c	Update model tag	2017-05-25 01:58:22 +02:00
ines	fe2b0b8b8d	Update migrating docs	2017-05-25 00:56:35 +02:00
ines	709ea58990	Tidy up workflows	2017-05-25 00:56:16 +02:00
ines	d122bbc908	Rewrite custom tokenizer docs	2017-05-25 00:30:21 +02:00
ines	0f48fb1f97	Rename processing text to production use and remove linear feature scheme	2017-05-25 00:10:33 +02:00
ines	419d265ff0	Add section on disabling pipeline components	2017-05-25 00:10:06 +02:00
ines	9efa662345	Update dependency parse docs and add note on disabling parser	2017-05-25 00:09:51 +02:00
ines	9337866dae	Add aside to pipeline 101 table	2017-05-24 22:46:18 +02:00
ines	c25f3133ca	Update section on new v2.0 features	2017-05-24 20:54:37 +02:00
ines	f4658ff053	Rewrite usage workflow on saving and loading	2017-05-24 20:54:02 +02:00
ines	764bfa3239	Add section on using displaCy in a web app	2017-05-24 20:53:43 +02:00
ines	4f396236f6	Update saving and loading docs	2017-05-24 19:25:49 +02:00
ines	8aaed8bea7	Add pipelines 101 and rewrite pipelines workflow	2017-05-24 19:25:13 +02:00
ines	54885b5e88	Add serialization 101	2017-05-24 19:24:40 +02:00
ines	8b86b08bed	Update usage workflows	2017-05-24 11:59:08 +02:00
ines	66088851dc	Add Doc.to_disk() and Doc.from_disk() methods	2017-05-24 11:58:17 +02:00
ines	10afb3c796	Tidy up and merge usage pages	2017-05-24 00:37:47 +02:00
ines	990a70732a	Move installation troubleshooting to installation docs	2017-05-24 00:37:21 +02:00
ines	697d3d7cb3	Fix links to CLI docs	2017-05-24 00:36:38 +02:00
ines	4fb5fb7218	Update v2 docs	2017-05-23 23:40:04 +02:00
ines	e6d88dfe08	Add features table to 101	2017-05-23 23:38:33 +02:00
ines	7ef7f0b42c	Add linguistic annotations 101 content	2017-05-23 23:37:51 +02:00
ines	9ed6b48a49	Update dependency parse workflow	2017-05-23 23:34:39 +02:00
ines	fe24267948	Update usage docs meta and navigation	2017-05-23 23:19:20 +02:00
ines	af348025ec	Update word vectors & similarity workflow	2017-05-23 23:19:09 +02:00
ines	b6c62baab3	Update What's new in v2 docs	2017-05-23 23:18:53 +02:00
ines	b6209e2427	Update POS tagging workflow	2017-05-23 23:18:08 +02:00
ines	43258d6b0a	Update NER workflow	2017-05-23 23:17:57 +02:00
ines	61cf2bba55	Fix code example	2017-05-23 23:17:37 +02:00
ines	1c06ef3542	Update spaCy architecture	2017-05-23 23:17:25 +02:00
ines	a433e5012a	Update adding languages docs	2017-05-23 23:16:44 +02:00
ines	3523715d52	Add spaCy 101 components	2017-05-23 23:16:31 +02:00
ines	a38393e2f6	Update annotation docs	2017-05-23 23:16:17 +02:00
ines	786af87ffb	Update IOB docs	2017-05-23 23:15:50 +02:00
ines	3aff883434	Add displaCy examples to lightning tour	2017-05-23 23:15:39 +02:00
ines	6ef09d7ed8	Change save_to_directory to to_disk	2017-05-23 23:15:31 +02:00
ines	c8bde2161c	Add kwargs to spacy.load	2017-05-23 23:14:02 +02:00
ines	0a8a2d2f6d	Remove tip infoboxes from annotation docs	2017-05-23 23:13:51 +02:00
ines	e6acd3bbf2	Fix matcher tests and matcher docs	2017-05-23 11:36:02 +02:00
ines	f497cf60b2	Update formatting	2017-05-23 11:32:25 +02:00
ines	4cd26bcb83	Update docs on rule-based matching and add examples	2017-05-22 19:04:02 +02:00
ines	701cba1524	Update models documentation with notes	2017-05-22 18:53:14 +02:00
ines	a23f487b06	Tidy up displaCy and add "manual" option Also don't require title in EntityRenderer	2017-05-22 18:48:20 +02:00
ines	aa9c3bd464	Fix formatting	2017-05-22 13:55:01 +02:00
ines	dddad5bf26	Update util.prints docs	2017-05-22 13:54:52 +02:00
ines	d5a6a9a6a9	Use string values for attrs in Matcher docs	2017-05-22 13:54:45 +02:00
ines	54f04a9fe0	Update API docs with changes in spacy.gold and spacy.language	2017-05-22 12:29:30 +02:00
ines	fc3ec733ea	Reduce complexity in CLI Remove now redundant model command and move plac annotations to cli files	2017-05-22 12:28:58 +02:00
ines	cc569a348d	Add quickstart widget to models and update docs Add global variable for models and generate all model listings programmatically	2017-05-21 20:55:52 +02:00
ines	2c5cfe8bbf	Update docstrings and API docs for StringStore	2017-05-21 14:18:58 +02:00
ines	251346b59f	Fix typos and formatting	2017-05-21 14:18:46 +02:00
ines	075f5ff87a	Update docstrings and API docs for GoldParse	2017-05-21 13:53:46 +02:00
ines	465a1dd710	Add BILUO scheme to annotation docs	2017-05-21 13:53:34 +02:00
ines	c9f04f3cd0	Add note on automated processes to download command	2017-05-21 13:23:39 +02:00
ines	8ab59515b2	Fix typo and use consistent description for from_bytes	2017-05-21 13:18:39 +02:00
ines	c5a653fa48	Update docstrings and API docs for Tokenizer	2017-05-21 13:18:14 +02:00
ines	d82ae9a585	Change "function" to "callable" in docs	2017-05-21 13:17:40 +02:00
ines	ee3fdffffb	Move attributes and remove deprecated methods	2017-05-21 01:18:31 +02:00
ines	1cb2c86f9a	Update CLI docs	2017-05-21 01:13:05 +02:00
ines	272a8981c3	Add model tag to spacy.load API docs	2017-05-21 01:12:43 +02:00
ines	3871157d84	Update spacy.util documentation	2017-05-21 01:12:09 +02:00
ines	da12aee0c1	Update spacy.load with note on get_lang_class	2017-05-21 00:19:26 +02:00
ines	924e8506de	Move Defaults subclass to module scope (necessary for pickling)	2017-05-20 19:02:27 +02:00
ines	27de0834b2	Update docstrings and API docs for Lexeme	2017-05-20 15:13:42 +02:00
ines	7ed8a92ed1	Update docstrings and API docs for Token	2017-05-20 15:13:33 +02:00
ines	4ed6a36622	Update docstrings and API docs for Matcher	2017-05-20 14:43:10 +02:00
ines	39f36539f6	Update docstrings and API docs for Matcher	2017-05-20 14:32:34 +02:00
ines	c00ff257be	Update docstrings and API docs for Matcher	2017-05-20 14:26:10 +02:00
ines	463e3cc80f	Remove resize_vectors and vectors_length	2017-05-20 14:02:14 +02:00
ines	b218c1964a	Update "What's new in v2.0" docs	2017-05-20 14:00:41 +02:00
ines	f0cc642bb9	Update docstrings and API docs for Vocab	2017-05-20 14:00:41 +02:00
Matthew Honnibal	a93276bb78	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-05-20 13:55:12 +02:00
Matthew Honnibal	ce9234f593	Update Matcher API	2017-05-20 13:54:53 +02:00
ines	8b14476253	Fix typo	2017-05-20 13:00:13 +02:00
ines	6557ff9e85	Update example	2017-05-20 13:00:07 +02:00
ines	fea4925f41	Reorganise API docs navigation	2017-05-20 12:59:57 +02:00
ines	b2678372c7	Add API docs for top-level spaCy functions i.e. spacy.load(), spacy.info(), spacy.explain()	2017-05-20 12:59:44 +02:00
ines	797f10ab16	Update formatting	2017-05-20 12:59:16 +02:00
ines	e10c48210d	Update Matcher API and workflow to reflect new API on_match is now the second positional argument, to easily allow a variable number of patterns while keeping the method clean and readable.	2017-05-20 12:59:03 +02:00
ines	eb521af267	Fix formatting	2017-05-20 12:58:15 +02:00
ines	7973912114	Update CLI docs	2017-05-20 12:58:05 +02:00
ines	9edc7fb0ba	Update Matcher API docs	2017-05-20 12:27:22 +02:00
ines	5163a4513e	Update API docs	2017-05-20 01:43:48 +02:00
ines	784347160d	Rewrite rule-based matching workflow	2017-05-20 01:38:55 +02:00
ines	7f9539da27	Fix old download command and formatting	2017-05-20 01:38:43 +02:00
ines	e3256e7406	Update Matcher API docs	2017-05-20 01:38:34 +02:00
ines	0cabf9e13f	Fix model tag	2017-05-20 01:38:14 +02:00
ines	fe5d8819ea	Update Matcher docstrings and API docs	2017-05-19 21:47:06 +02:00
ines	c8580da686	Update "requires model" tags	2017-05-19 20:24:46 +02:00
ines	c3e903e4c2	Update examples and API docs	2017-05-19 19:59:02 +02:00
ines	e9e62b01b0	Update docstrings and API docs for Token	2017-05-19 18:47:56 +02:00
ines	62ceec4fc6	Update docstrings and API docs for Span	2017-05-19 18:47:46 +02:00
ines	23f9a3ccc8	Update docstrings and API docs for Doc	2017-05-19 18:47:39 +02:00
ines	2c8c9dc0c9	Update docstrings and API docs for Language	2017-05-19 18:47:24 +02:00
ines	0791f0aae6	Update docstrings and API docs for Span class	2017-05-19 00:31:31 +02:00
ines	5b68579eb8	Use returns/yields instead of return/yield	2017-05-19 00:02:34 +02:00
ines	b687ad109d	Update docstrings and API docs for Doc class	2017-05-18 23:59:44 +02:00
ines	d42bc16868	Update docstrings and API docs for Language class	2017-05-18 23:57:38 +02:00
ines	b87066ff10	Update docstrings and API docs for Doc class	2017-05-18 22:17:41 +02:00
ines	476b8209fe	Update docs with new Jupyter auto-detection	2017-05-18 14:58:17 +02:00
ines	11f52b8b83	Add headline to installation details and move aside	2017-05-17 12:04:03 +02:00
ines	533bb63816	Implement quickstart widget	2017-05-17 12:04:03 +02:00
ines	9df9a87d03	Add visualizer usage example	2017-05-17 12:04:03 +02:00
ines	6364a9be9d	Add What's new and spaCy 101 stubs	2017-05-17 12:04:03 +02:00
ines	f4ae1e8750	Add section on adding titles to documents	2017-05-17 12:04:03 +02:00
ines	02a4841e7b	Move CLI docs to API reference	2017-05-17 12:04:03 +02:00
ines	accf05b0a9	Update visualizers docs	2017-05-15 14:37:01 +02:00
ines	d7244ae72d	Add docs on collapse_punct option	2017-05-15 13:51:33 +02:00
ines	6d7986b7bc	Update docs	2017-05-15 01:46:33 +02:00
ines	c6e8d55dcb	Update NER workflow with new displaCy	2017-05-15 01:42:11 +02:00
ines	860a60e251	Fix explanation	2017-05-15 01:31:11 +02:00
ines	5c044cb670	Add visualizers usage docs	2017-05-15 01:25:18 +02:00
ines	c33bdeb564	Use uppercase for entity types	2017-05-15 01:24:57 +02:00
ines	3d37564a09	Remove resources from navigation for now Not sure what to do with this page... maybe merge it with something else?	2017-05-14 23:29:58 +02:00
ines	cf7e5ed534	Use American spelling for "visualizers" Kinda sucks because we normally use British spelling, but it just looks weird and confusing otherwise... same with tokenizer and all other library internals. So this is sort of the "official policy" for now.	2017-05-14 23:29:36 +02:00
ines	fe5a5086e1	Fix typo	2017-05-14 23:27:56 +02:00
ines	1ae07da18f	Add API docs for spacy.displacy (see #1058 )	2017-05-14 19:31:23 +02:00
ines	b462076d80	Merge load_lang_class and get_lang_class	2017-05-14 01:31:10 +02:00
ines	1465c6c221	Add API docs for util functions	2017-05-13 21:23:12 +02:00
ines	144161c58c	Update links to dev resources	2017-05-13 21:23:02 +02:00
ines	0095d5322b	Update adding languages docs	2017-05-13 18:54:10 +02:00
ines	1d94c0e98a	Update table of contents	2017-05-13 15:42:51 +02:00
ines	a48e21755e	Add section on testing language tokenizers	2017-05-13 15:39:27 +02:00
ines	2f54fefb5d	Update adding languages docs	2017-05-13 14:54:58 +02:00
ines	3665acc0de	Update adding languages docs	2017-05-13 12:39:36 +02:00
ines	3454f2aca8	Update showcase	2017-05-13 03:32:03 +02:00
ines	67726d1837	Update data model docs	2017-05-13 03:10:56 +02:00
ines	915b50c736	Update adding languages docs	2017-05-13 03:10:50 +02:00
ines	19879cb693	Update alpha support docs	2017-05-12 15:57:49 +02:00
ines	63d79947c8	Update title in navigation	2017-05-12 15:40:43 +02:00
ines	531ee1373b	Rename "Language models" to "Languages" in API	2017-05-12 15:38:56 +02:00
ines	c4d2c3cac7	Update adding languages docs	2017-05-12 15:38:17 +02:00
ines	fac3566aac	Add descriptions to POS tagging scheme	2017-05-03 20:11:02 +02:00
ines	1570b83ee5	Add spacy.explain() note to NER annotation scheme	2017-05-03 20:11:02 +02:00
ines	219369bb7d	Add detailed docs for dependency label annotations	2017-05-03 20:11:02 +02:00
ines	f9384b0fbd	Update alpha languages and add aside for tokenizer dependencies	2017-05-03 09:58:31 +02:00
Yasuaki Uechi	0e7a9b9fac	Add Japanese to 'Alpha support’ section	2017-05-03 13:56:45 +09:00
Ines Montani	fb96f88b59	Update info on CoNLL format and include link	2017-04-27 14:36:08 +02:00
M. Z. Ferdous (Imran)	c9f9203d5f	fix typo, CONLL format tried to google about connlu format. Saw there is conll format, not connlu.	2017-04-27 16:48:54 +06:00
ines	5aa49971f9	Add French example to models docs	2017-04-27 12:08:47 +02:00
ines	034ec5710b	Fix typo and add Norwegian to alpha languages	2017-04-27 11:24:21 +02:00
ines	100846bed3	Fix typo in model list	2017-04-26 21:40:17 +02:00
ines	375edf0bb5	Add list of models and include French	2017-04-26 20:50:27 +02:00
ines	4eacd72bc3	Move list of models to own file	2017-04-26 20:50:27 +02:00
ines	c2006166d3	Update list of available models and info	2017-04-26 16:03:41 +02:00
ines	e6bdf5bc5c	Update adding language / training docs (see #966 ) Add data examples and more info on training and CLI commands	2017-04-26 14:01:19 +02:00
ines	ae2b77db1b	Fix info on naming conventions	2017-04-26 14:01:19 +02:00
Julien Chaumond	f997bceb07	Make object of the deep learning tutorial clearer This is a great tutorial, but I think it is weirdly explained in the current form. The largest part of the code is about implementing the actual sentiment analysis model, not about counting entities. (which is not even present in the `deep_learning_keras.py` script in `examples`)	2017-04-24 11:55:41 +02:00
ines	2bfec1a4f8	Add note on languages with non-latin characters (see #996 )	2017-04-23 15:58:40 +02:00
ines	ddd5194088	Update Language docs and docstrings	2017-04-17 01:52:13 +02:00
ines	2ab394d655	Fix whitespace	2017-04-17 01:45:00 +02:00
ines	7f776258f0	Add link to API docs	2017-04-17 01:41:46 +02:00
ines	aad80a291f	Add save_to_directory method to API docs	2017-04-17 01:40:34 +02:00
ines	c6c3162c50	Fix lightning tour example (closes #889 )	2017-04-17 00:00:30 +02:00
ines	de5062711b	Update adding languages workflow to reflect changes in __init__.py	2017-04-16 22:26:46 +02:00
ines	e4dd645c37	Update link	2017-04-16 20:37:46 +02:00
ines	dea79224ed	Remove saving & loading docs and link to new workflow	2017-04-16 20:37:45 +02:00
ines	c365795bf6	Update navigation	2017-04-16 20:37:45 +02:00
ines	5bbbb7674b	Add training examples to tutorials	2017-04-16 20:37:45 +02:00
ines	17e9743388	Add saving & loading models docs	2017-04-16 20:37:45 +02:00
ines	b15bdb5279	Update training docs	2017-04-16 20:37:45 +02:00
ines	5cb17b9f33	Add NER training docs	2017-04-16 20:37:45 +02:00
ines	d29c825ca4	Update docs for package command	2017-04-16 13:37:24 +02:00
ines	cf558e37c3	Update adding languages docs with new commands	2017-04-13 13:52:11 +02:00
Sohil	328678c7e9	Extra brace ")" creating error There is an extra closing brace `)` which is creating error while running example.	2017-04-13 17:12:28 +05:30
ines	1f501af602	Add file name shadowing module issue to troubleshooting guide (see #953 )	2017-04-07 16:21:32 +02:00
ines	2f38c1d77f	Add documentation for new convert and model commands	2017-04-07 13:27:55 +02:00
ines	f33c4cbae1	Add --no-cache-dir error to troubleshooting docs (see #958 )	2017-04-07 10:22:18 +02:00
ines	d6bbc3ffcd	Fix formatting	2017-04-07 10:22:18 +02:00
ines	2c36a61ec5	Add spacyr to libraries	2017-04-03 18:12:38 +02:00
ines	e210496f78	Update Windows compiler docs	2017-03-29 10:35:20 +02:00
ines	13df2d6a60	Add documentation for spaCy's JSON format	2017-03-26 15:56:15 +02:00
ines	5901c8f7f0	Update spacy train CLI documentation	2017-03-26 15:33:48 +02:00
ines	afd839f64b	Add pip and conda badges to installation docs	2017-03-26 14:11:31 +02:00
ines	9a481c9f42	Add "Troubleshooting" section	2017-03-26 13:42:36 +02:00
ines	d4a86b6394	Update formatting	2017-03-26 13:42:19 +02:00
ines	1dae97b2f6	Fix typos	2017-03-26 11:14:44 +02:00
ines	a5fc5fb0db	Add Hebrew to list of alpha languages	2017-03-25 10:22:46 +01:00
ines	9600cd1b9e	Fix download commands	2017-03-25 10:22:05 +01:00
ines	fa6e3cefbb	Simplify package command docs	2017-03-21 11:35:29 +01:00
ines	49bbfdaac1	Add info on CLI to docs on own models	2017-03-21 11:25:01 +01:00
ines	09b24bc5a9	Add docs for package command	2017-03-21 11:19:21 +01:00
ines	81b28ca606	Update models docs with info on retraining own models	2017-03-20 18:01:55 +01:00
ines	ef5e261387	Add spacy_api project by @kootenpv to showcase	2017-03-19 12:49:40 +01:00
ines	fa1f2040a5	Use correct code block language	2017-03-18 18:19:50 +01:00
ines	ff277140f9	Add CLI docs	2017-03-18 15:24:50 +01:00
ines	e635e1f6f4	Update docs to reflect new commands	2017-03-18 15:24:42 +01:00
ines	e9d8d756fc	Fix typo in pytest flags	2017-03-18 15:24:20 +01:00
ines	3926ffdb70	Update models docs	2017-03-17 19:26:37 +01:00
ines	76c0ea6cc6	Update models docs	2017-03-17 17:01:16 +01:00
ines	b322f31521	Update models docs	2017-03-17 16:09:56 +01:00
ines	7f25f64acc	Update lightning tour	2017-03-17 13:11:00 +01:00
ines	e461fafd14	Update example	2017-03-16 23:23:35 +01:00
ines	f4df9463f2	Fix wording	2017-03-16 22:21:46 +01:00
ines	08b0fb62cc	Update models docs	2017-03-16 22:09:43 +01:00
ines	0b5c664b04	Update resources	2017-03-16 21:59:26 +01:00
ines	807139ae61	Update installation docs and add models quickstart aside	2017-03-16 21:53:44 +01:00
ines	ec75c781b9	Add docs page for models	2017-03-16 21:53:31 +01:00
ines	4c53eed35a	Remove sputnik from dependencies and docs	2017-03-15 17:39:25 +01:00
ines	758335452d	Update installation instructions and fix formatting	2017-03-08 11:36:00 +01:00
ines	004c4c9566	Update installation docs Include conda and virtualenv info for pip, add instructions for downloading models manually and add details and fab commands to "Compile from source" section.	2017-03-07 18:52:22 +01:00
yalei	27c0e6226b	Edit example code The original code forget to import the `random` module and the `EntityRecognizer` module.	2017-03-07 18:07:40 +08:00
ines	d25f17f139	Add Bengali to list of languages (see #865 )	2017-03-01 15:59:21 +01:00
ines	2b07ab7db4	Add feature scheme to API docs (see #857 , #739 )	2017-02-24 18:26:32 +01:00
ines	8ddad178f6	Add book and tutorial	2017-02-24 18:26:32 +01:00
Ines Montani	49a102aff3	Merge pull request #841 from jondoughty/patch-1 Updated Token class documentation	2017-02-16 23:47:51 +01:00
Jon Doughty	12a8757343	Update token.jade	2017-02-16 10:55:33 -08:00
nycmonkey	8946a2a496	Fix typo in IOB integer to letter map ent_iob value for an ent.iob_ value of 'B' should be 3, not B	2017-02-16 13:49:57 -05:00
John Gamboa	e31894b800	Fixes example 3 of entity recognition (see issue #832 )	2017-02-16 11:19:53 +01:00
Stefan Bunk	2bf19d4735	Fix error in pipeline loading documentation The cell for the `vocab` parameter is not displayed, making it seem as if the explanation belongs to the previous param.	2017-02-10 12:06:55 +01:00
Stefan Bunk	e972b2fa87	Fix error in matching documentation LOWER and IS_PUNCT are members of `spacy` and not of the `Matcher` class.	2017-02-07 16:52:01 +01:00
Matthew Honnibal	9aaa2c5633	Fix entity recognition example (closes #803 )	2017-02-05 11:23:12 +01:00
ines	a44da8fb34	Update language models and alpha support overview	2017-02-04 13:49:05 +01:00
Ines Montani	651bf411e0	Add tutorial	2017-01-26 13:48:38 +01:00
Ines Montani	da3aca4020	Fix formatting	2017-01-26 13:48:29 +01:00
Hidekazu Oiwa	7806ebafd2	Fix the span doc typo Fix the typo in the span API doc. It explains the `end` of the span as the `start_char` description.	2017-01-17 20:37:14 -08:00
Kevin Gao	7ec710af0e	Fix Custom Tokenizer docs - Fix mismatched quotations - Make it more clear where ORTH, LEMMA, and POS symbols come from - Make strings consistent - Fix lemma_ assertion s/-PRON-/me/	2017-01-17 10:38:14 -08:00
Jason Kessler	9fa6f9fb40	Origin of spacy.matcher attributes Make it clear that Matcher attributes live in spacy.matcher.attrs.	2017-01-16 13:31:35 -06:00
jktong	df0aeff379	Correct typo "chldren" in doc.jade	2017-01-16 09:34:59 -05:00
Ines Montani	57919566b8	Add Jupyter notebooks repo to resources list	2017-01-05 20:50:08 +01:00
Ines Montani	d677db6277	Change "Multi-language support" to amber for spaCy	2017-01-03 21:24:35 +01:00
Ines Montani	1b82756cc7	Tidy up and fix formatting and consistency	2017-01-02 00:29:24 +01:00
Ines Montani	e3d84572f2	Fix ents input format example	2017-01-01 12:28:37 +01:00
Guy Rosin	acdd2fc9a6	Tiny code typo	2016-12-31 14:53:05 +02:00
Ines Montani	d1585959d9	Add Hungarian to alpha support overview	2016-12-27 22:31:41 +01:00
Ines Montani	b7becaec85	Fix typo	2016-12-25 15:23:32 +01:00
Ines Montani	207555fae7	Fix spelling	2016-12-23 21:36:01 +01:00
Ines Montani	48b03b4001	Fix formatting and wording	2016-12-23 14:36:03 +01:00
Ines Montani	cc051ddc15	Add resources page to usage docs	2016-12-23 14:36:03 +01:00
Ines Montani	d1a2846750	Document DET_LEMMA	2016-12-21 18:18:35 +01:00
Ines Montani	71c00db8a5	Update language models page	2016-12-21 00:54:54 +01:00
aikramer2	349143faa2	update to training doc	2016-12-20 12:01:16 -08:00
Ines Montani	a2525c76ee	Reformat word frequencies section in "adding languages" workflow	2016-12-19 17:18:38 +01:00
Ines Montani	ddf5c5bb61	Generalise dependency parsing annotation specs beyond English (closes #657 )	2016-12-19 13:42:44 +01:00
Ines Montani	6a793251c8	Add aside on spaCy's custom pronoun lemma	2016-12-19 13:41:47 +01:00
Ines Montani	d0c15730c4	Fix link	2016-12-19 13:09:45 +01:00
Ines Montani	a9c0e77b80	Fix typo	2016-12-19 13:09:45 +01:00
Ines Montani	fa65c6b54c	Add "Adding languages" workflow (closes #562 )	2016-12-18 23:54:19 +01:00
Ines Montani	1cddb7da36	Add "Part-of-speech tagging" workflow (closes #581 )	2016-12-18 23:54:19 +01:00
Ines Montani	ac597b58f6	Update showcase	2016-12-18 23:54:18 +01:00
Ines Montani	614ca6fb41	Split annotation specs into files to they can be included in different places	2016-12-18 17:42:10 +01:00
Ines Montani	ce8bf08223	Fix formatting	2016-12-18 17:40:20 +01:00
David Edwards	278199dd2c	Update index.jade	2016-12-15 13:40:53 -08:00
jaspb	3d7f81ddf5	added 'en' to spacy.load(..)	2016-12-10 19:18:13 +00:00
Tobias Macey	1d768d6510	Fixed minor typo The word `motto` was missing the second `t`.	2016-12-01 06:08:33 -05:00
Jimi Smoot	8373115cbd	Minor typos	2016-11-25 18:22:52 -08:00
Ines Montani	ada007cb73	Fix formatting for consistency	2016-11-25 15:53:40 +01:00
Ines Montani	19f27cc6ef	Use consistent entity tables across docs	2016-11-25 15:48:50 +01:00
Ines Montani	e0c7a22f09	Add usage workflow for entity recognizer	2016-11-25 02:30:31 +01:00
Ines Montani	c8e69b98cc	Update tutorial tags	2016-11-25 02:30:31 +01:00
Ines Montani	6f7835bb70	Add tutorial	2016-11-24 19:25:21 +01:00
Ines Montani	a7b5fba132	Merge pull request #642 from ExplodingCabbage/specify-data-path Let --data-path be specified when running download.py scripts	2016-11-23 13:05:03 +01:00
Will Thompson	e896466dcf	docs: processing-text: fix missing line wrap	2016-11-21 10:43:16 +00:00
Will Thompson	1adc96f0a6	docs: fix "installaton" typo	2016-11-21 10:37:57 +00:00
Mark Amery	2dc305f46b	Merge remote-tracking branch 'origin/master' into specify-data-path	2016-11-20 18:29:06 +00:00
Ines Montani	20c8fc5255	Merge pull request #645 from ExplodingCabbage/formatting-mistake Fix another typo on the website	2016-11-20 19:13:53 +01:00
Mark Amery	270d42e73a	Fix another typo on the website	2016-11-20 17:08:04 +00:00
Mark Amery	b4e1dc0e3f	Fix a bunch of missing spaces of the website	2016-11-20 17:02:45 +00:00
Mark Amery	a0c4b29dcb	Document new --data-path argument	2016-11-20 16:52:56 +00:00
Paul Dechov	537f9eaaf8	[DOCS] Typo	2016-11-17 16:29:39 -05:00
tjrileywisc	464a4f3f6f	Fixed a minor typo in deep learning tutorial docs.	2016-11-17 13:38:10 -05:00
Matthew Honnibal	af953cf2e6	Merge pull request #620 from savkov/patch-1 Missing import statement for spacy.matcher.Matcher	2016-11-16 06:08:44 +11:00
Sasho Savkov	a8831a85e4	Added missing brackets & suggested import statmnt There are two missing brackets on the `add_pattern` lines. I also suggest you include the `from spacy.tokens.doc import Doc` statement to make it easy for people to copy paste a working example.	2016-11-11 17:12:56 +00:00
Sasho Savkov	250879bb96	Missing import statement It is useful to know where the Matcher class is if you haven't used it before. Or you are simply too lazy to remember, like me :) FYI: some packages don't appear in the PyCharm autocompletion lists. `spacy.matcher` is one of them.	2016-11-11 12:04:08 +00:00
Ines Montani	0a90b141f4	Trust link	2016-11-07 21:50:40 +01:00
Ines Montani	bf3c1c7a48	Add link to dependency parse workflow	2016-11-07 21:32:03 +01:00
Ines Montani	d5668cf0d2	Add spacy-api-docker to showcase	2016-11-06 13:46:20 +01:00
Ines Montani	98c8e70dc2	Update installation docs	2016-11-06 13:46:11 +01:00
Ines Montani	c20abc8a6d	Add customizing tokenizer and training workflow	2016-11-05 20:40:11 +01:00
Ines Montani	5e4e5b600f	Update language models docs	2016-11-05 02:50:55 +01:00
SultanMirza	daedf2c153	Fixing typos and errors!! Fixed some typos and errors on the page.	2016-11-04 20:54:28 +05:30
SultanMirza	d824f8c322	removed typo	2016-11-03 21:53:58 +05:30
Ines Montani	b5abdcb390	Fix formatting	2016-11-03 13:06:05 +01:00
Ines Montani	42bf4ff9fe	Add TruthBot to showcase	2016-11-03 12:43:55 +01:00
Ines Montani	c748474a9e	Fix formatting	2016-11-03 01:52:31 +01:00
Ines Montani	2515b32a74	Add documentation for Tokenizer API (see #600 )	2016-11-02 23:18:02 +01:00
Ines Montani	adf04a6ad3	Adjust tutorial category name	2016-11-02 12:11:17 +01:00
Ines Montani	2c65c15d7a	Fix typo	2016-11-02 11:25:09 +01:00
Ines Montani	823e47d946	Add language models to API docs (fixes #598 )	2016-11-02 11:24:13 +01:00
Ines Montani	d3b6a594f8	Add Natural Language Inference tutorial	2016-11-01 03:27:20 +01:00
Ines Montani	4b84b4522b	Update link in examples	2016-11-01 03:06:33 +01:00
Ines Montani	201445b3b8	Fix benchmarks intro	2016-10-31 20:55:59 +01:00
Ines Montani	06f2374f98	Remove old files	2016-10-31 19:18:12 +01:00
Ines Montani	7615b41bff	Update to new website	2016-10-31 19:04:15 +01:00
Pokey Rule	603a3f40c5	Fix small bug in code of mark-adverbs tutorial	2016-10-26 15:23:36 +01:00
Mahmoud Lababidi	f8ce28058c	fix typo in url	2016-10-24 15:21:18 -04:00
Ines Montani	efaa8eaf1f	Add matcher to navigation	2016-10-24 00:52:17 +02:00
Ines Montani	26dc3f3ebf	Fix indentation errors	2016-10-24 00:52:17 +02:00
Ines Montani	405347b46f	Fix inline code in docs	2016-10-24 00:52:17 +02:00
Ines Montani	b6fce4d82a	Fix broken links in docs	2016-10-24 00:52:17 +02:00
chssch	cf7b6f7a9d	Add merge phrases from https://github.com/explosion/spaCy/issues/523#issuecomment-255172782	2016-10-22 15:15:53 +02:00
chssch	6b30cbaf0b	Strings has be to on vocab object	2016-10-22 15:15:53 +02:00
Ines Montani	593f7eb413	Update installation docs	2016-10-21 01:00:34 +02:00
Ines Montani	0373d23727	Update code in rule-based matcher tutorial	2016-10-21 01:00:34 +02:00
Ines Montani	2251abb236	Update training tutorial	2016-10-21 01:00:34 +02:00
Ines Montani	f8322a69e7	Rename "English" section to "Language"	2016-10-21 01:00:34 +02:00
Ines Montani	da0985114d	Update tutorials	2016-10-19 01:24:22 +02:00
Ines Montani	e56dc9a075	Update website	2016-10-19 00:19:42 +02:00
Matthew Honnibal	ae29b9bdfd	Fix travis and README conflicts	2016-10-19 00:16:11 +02:00
Ines Montani	9e8f333763	Update website	2016-10-03 20:19:13 +02:00
Ines Montani	970ec145d9	Update website source	2016-09-30 20:29:03 +02:00
Johnny Lim	4c53a8ecd7	Fix doc This PR changes the `str`s to `unicode`s because `str`s throw the following error: ``` TypeError: Argument 'x' has incorrect type (expected unicode, got str) ```	2016-07-30 16:10:21 +09:00
Ines Montani	1f8309a862	Replace website with new version	2016-04-01 01:24:48 +11:00

... 21 22 23 24 25 ...

1816 Commits