spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-31 18:39:49 +03:00

Author	SHA1	Message	Date
Adriane Boyd	28fd589b85	Move all website gitignore settings to website/.gitignore (#12120 )	2023-01-18 21:46:19 +01:00
Adriane Boyd	7c98245c0c	Add levenshtein from polyleven (#11418 ) Add a simple levenshtein distance function using the implementation from the polyleven library as `spacy.matcher.levenshtein`.	2022-09-14 17:05:22 +02:00
Adriane Boyd	b16da378bb	Re-remove universe tests from test suite (#10357 )	2022-02-23 21:08:56 +01:00
Jette16	5eced281d8	Add universe test (#9278 ) * Added test for universe.json * Added contributor agreement * Ran black on test_universe_json.py	2021-09-23 14:31:42 +02:00
Ines Montani	991669c934	Tidy up and auto-format	2021-01-05 13:41:53 +11:00
Ines Montani	04e4d59235	Update docs [ci skip]	2020-08-20 16:17:25 +02:00
Ines Montani	e2f2ef3a5a	Update init config and recommendations - As much as I dislike YAML, it seemed like a better format here because it allows us to add comments if we want to explain the different recommendations - Don't include the generated JS in the repo by default and build it on the fly when running or deploying the site. This ensures it's always up to date. - Simplify jinja_to_js script and use fewer dependencies	2020-08-19 13:33:15 +02:00
Ines Montani	e92df281ce	Tidy up, autoformat, add types	2020-07-25 15:01:15 +02:00
Ines Montani	644074b954	Merge branch 'develop' into master-tmp	2020-07-20 14:58:04 +02:00
Adriane Boyd	971826a96d	Include git commit in package and model meta (#5694 ) * Include git commit in package and model meta * Rewrite to read file in setup * Fix file handle	2020-07-02 17:10:27 +02:00
Ines Montani	46568f40a7	Merge branch 'master' into tmp/sync	2020-03-26 13:38:14 +01:00
Ines Montani	5f68004264	Port over gitignore changes from develop Prevents stale files when switching branches	2020-03-09 11:05:00 +01:00
Matthew Honnibal	b4e0d2bf50	Improve Makefile (#5067 ) * Improve pex making * Update gitignore	2020-02-26 20:59:10 +01:00
Ines Montani	c1a5ece65f	Tidy up setup and update requirements tests	2020-02-25 15:46:39 +01:00
Sofie Van Landeghem	569cc98982	Update spaCy for thinc 8.0.0 (#4920 ) * Add load_from_config function * Add train_from_config script * Merge configs and expose via spacy.config * Fix script * Suggest create_evaluation_callback * Hard-code for NER * Fix errors * Register command * Add TODO * Update train-from-config todos * Fix imports * Allow delayed setting of parser model nr_class * Get train-from-config working * Tidy up and fix scores and printing * Hide traceback if cancelled * Fix weighted score formatting * Fix score formatting * Make output_path optional * Add Tok2Vec component * Tidy up and add tok2vec_tensors * Add option to copy docs in nlp.update * Copy docs in nlp.update * Adjust nlp.update() for set_annotations * Don't shuffle pipes in nlp.update, decruft * Support set_annotations arg in component update * Support set_annotations in parser update * Add get_gradients method * Add get_gradients to parser * Update errors.py * Fix problems caused by merge * Add _link_components method in nlp * Add concept of 'listeners' and ControlledModel * Support optional attributes arg in ControlledModel * Try having tok2vec component in pipeline * Fix tok2vec component * Fix config * Fix tok2vec * Update for Example * Update for Example * Update config * Add eg2doc util * Update and add schemas/types * Update schemas * Fix nlp.update * Fix tagger * Remove hacks from train-from-config * Remove hard-coded config str * Calculate loss in tok2vec component * Tidy up and use function signatures instead of models * Support union types for registry models * Minor cleaning in Language.update * Make ControlledModel specifically Tok2VecListener * Fix train_from_config * Fix tok2vec * Tidy up * Add function for bilstm tok2vec * Fix type * Fix syntax * Fix pytorch optimizer * Add example configs * Update for thinc describe changes * Update for Thinc changes * Update for dropout/sgd changes * Update for dropout/sgd changes * Unhack gradient update * Work on refactoring _ml * Remove _ml.py module * WIP upgrade cli scripts for thinc * Move some _ml stuff to util * Import link_vectors from util * Update train_from_config * Import from util * Import from util * Temporarily add ml.component_models module * Move ml methods * Move typedefs * Update load vectors * Update gitignore * Move imports * Add PrecomputableAffine * Fix imports * Fix imports * Fix imports * Fix missing imports * Update CLI scripts * Update spacy.language * Add stubs for building the models * Update model definition * Update create_default_optimizer * Fix import * Fix comment * Update imports in tests * Update imports in spacy.cli * Fix import * fix obsolete thinc imports * update srsly pin * from thinc to ml_datasets for example data such as imdb * update ml_datasets pin * using STATE.vectors * small fix * fix Sentencizer.pipe * black formatting * rename Affine to Linear as in thinc * set validate explicitely to True * rename with_square_sequences to with_list2padded * rename with_flatten to with_list2array * chaining layernorm * small fixes * revert Optimizer import * build_nel_encoder with new thinc style * fixes using model's get and set methods * Tok2Vec in component models, various fixes * fix up legacy tok2vec code * add model initialize calls * add in build_tagger_model * small fixes * setting model dims * fixes for ParserModel * various small fixes * initialize thinc Models * fixes * consistent naming of window_size * fixes, removing set_dropout * work around Iterable issue * remove legacy tok2vec * util fix * fix forward function of tok2vec listener * more fixes * trying to fix PrecomputableAffine (not succesful yet) * alloc instead of allocate * add morphologizer * rename residual * rename fixes * Fix predict function * Update parser and parser model * fixing few more tests * Fix precomputable affine * Update component model * Update parser model * Move backprop padding to own function, for test * Update test * Fix p. affine * Update NEL * build_bow_text_classifier and extract_ngrams * Fix parser init * Fix test add label * add build_simple_cnn_text_classifier * Fix parser init * Set gpu off by default in example * Fix tok2vec listener * Fix parser model * Small fixes * small fix for PyTorchLSTM parameters * revert my_compounding hack (iterable fixed now) * fix biLSTM * Fix uniqued * PyTorchRNNWrapper fix * small fixes * use helper function to calculate cosine loss * small fixes for build_simple_cnn_text_classifier * putting dropout default at 0.0 to ensure the layer gets built * using thinc util's set_dropout_rate * moving layer normalization inside of maxout definition to optimize dropout * temp debugging in NEL * fixed NEL model by using init defaults ! * fixing after set_dropout_rate refactor * proper fix * fix test_update_doc after refactoring optimizers in thinc * Add CharacterEmbed layer * Construct tagger Model * Add missing import * Remove unused stuff * Work on textcat * fix test (again :)) after optimizer refactor * fixes to allow reading Tagger from_disk without overwriting dimensions * don't build the tok2vec prematuraly * fix CharachterEmbed init * CharacterEmbed fixes * Fix CharacterEmbed architecture * fix imports * renames from latest thinc update * one more rename * add initialize calls where appropriate * fix parser initialization * Update Thinc version * Fix errors, auto-format and tidy up imports * Fix validation * fix if bias is cupy array * revert for now * ensure it's a numpy array before running bp in ParserStepModel * no reason to call require_gpu twice * use CupyOps.to_numpy instead of cupy directly * fix initialize of ParserModel * remove unnecessary import * fixes for CosineDistance * fix device renaming * use refactored loss functions (Thinc PR 251) * overfitting test for tagger * experimental settings for the tagger: avoid zero-init and subword normalization * clean up tagger overfitting test * use previous default value for nP * remove toy config * bringing layernorm back (had a bug - fixed in thinc) * revert setting nP explicitly * remove setting default in constructor * restore values as they used to be * add overfitting test for NER * add overfitting test for dep parser * add overfitting test for textcat * fixing init for linear (previously affine) * larger eps window for textcat * ensure doc is not None * Require newer thinc * Make float check vaguer * Slop the textcat overfit test more * Fix textcat test * Fix exclusive classes for textcat * fix after renaming of alloc methods * fixing renames and mandatory arguments (staticvectors WIP) * upgrade to thinc==8.0.0.dev3 * refer to vocab.vectors directly instead of its name * rename alpha to learn_rate * adding hashembed and staticvectors dropout * upgrade to thinc 8.0.0.dev4 * add name back to avoid warning W020 * thinc dev4 * update srsly * using thinc 8.0.0a0 ! Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> Co-authored-by: Ines Montani <ines@ines.io>	2020-01-29 17:06:46 +01:00
Ines Montani	8b738a9f35	Update .gitignore [ci skip]	2019-08-19 11:54:42 +02:00
cedar101	58f06e6180	Korean support (#3901 ) * start lang/ko * add test codes * using natto-py * add test_ko_tokenizer_full_tags() * spaCy contributor agreement * external dependency for ko * collections.namedtuple for python version < 3.5 * case fix * tuple unpacking * add jongseong(final consonant) * apply mecab option * Remove Pipfile for now Co-authored-by: Ines Montani <ines@ines.io>	2019-07-09 22:23:16 +02:00
Ines Montani	e597110d31	💫 Update website (#3285 ) <!--- Provide a general summary of your changes in the title. --> ## Description The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in straightforward Markdown without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on. This PR also includes various new docs pages and content. Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837. ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-02-17 19:31:19 +01:00
Ines Montani	75f3234404	💫 Refactor test suite (#2568 ) ## Description Related issues: #2379 (should be fixed by separating model tests) * total execution time down from > 300 seconds to under 60 seconds 🎉 * removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure * changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version) * merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways) * tidied up and rewrote existing tests wherever possible ### Todo - [ ] move tests to `/tests` and adjust CI commands accordingly - [x] move model test suite from internal repo to `spacy-models` - [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~ - [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted - [ ] update documentation on how to run tests ### Types of change enhancement, tests ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-07-24 23:38:44 +02:00
ines	c0b62ce13c	Ignore pytest cache	2018-07-19 12:30:09 +02:00
Mathias Deschamps	d82f868e1c	Ignore pycharm project files	2017-11-13 17:46:05 +01:00
ines	bfb512f45a	Add website package.json and fix gitignore	2017-10-04 00:18:41 +02:00
ines	22dd929b65	Add models documentation	2017-10-03 14:28:03 +02:00
ines	371b21f82d	Don't ignore /bin directory	2017-08-14 12:18:30 +02:00
ines	c862527474	Add more variations of .env to gitignore	2017-06-02 21:08:39 +02:00
ines	57beef5d36	Tidy up .gitignore	2017-05-18 13:51:31 +02:00
Em	1bb364a3b5	Adding venv to .gitignore	2017-03-10 16:52:04 -08:00
Em	426d17167f	Added string manipulation for spans	2017-03-10 16:50:02 -08:00
ines	00728a23f0	Fix path in gitignore	2017-02-24 18:26:32 +01:00
Ines Montani	427e942e84	Ignore temporary files	2016-11-24 19:21:27 +01:00
Mark Amery	bc368e4237	Ignore entire data folder Previously only some of its content was ignored, so running python -m spacy.en.download all after installing from a local repo would create unstaged changes.	2016-11-20 20:33:23 +00:00
Mark Amery	094c51f496	Add cythonize.json to .gitignore This gets generated for me when installing from the local repo with pip using `sudo pip3 install -e .` from within the spaCy folder. I figure it should be ignored.	2016-11-20 13:55:52 +00:00
Ines Montani	f0868dfc6b	Update .gitignore	2016-11-01 01:13:56 +01:00
Ines Montani	8cef8ebac5	Update .gitignore	2016-10-31 19:20:03 +01:00
Ines Montani	7615b41bff	Update to new website	2016-10-31 19:04:15 +01:00
Matthew Honnibal	ae29b9bdfd	Fix travis and README conflicts	2016-10-19 00:16:11 +02:00
Ines Montani	504b80b6da	Update gitignore	2016-10-03 20:19:05 +02:00
Matthew Honnibal	89174cda74	Ignore pyenv .python-version file	2016-09-30 20:44:52 +02:00
Matthew Honnibal	ea6fda0e05	Add tmp/ folder to gitignore	2016-09-30 20:40:52 +02:00
Ines Montani	f321272bee	Update gitignore for website	2016-04-01 00:36:56 +11:00
Oleg Zdornyy	a774131671	Added reloadable English() example for inv. count	2016-03-09 19:35:55 -08:00
maxirmx	59d85adff5	Added Windows file to .gitignore	2015-10-13 10:58:30 +03:00
maxirmx	8e03239ac5	Merge remote-tracking branch 'refs/remotes/honnibal/master' Conflicts: setup.py	2015-10-10 17:38:06 +03:00
Matthew Honnibal	7820c504d7	* Add sass-cache to gitignore	2015-09-24 18:14:21 +10:00
Matthew Honnibal	f9a6bea746	* Ignore keys and other things	2015-08-22 22:12:07 +02:00
Matthew Honnibal	221f7e51c7	* Ignore spacy/serialize/*.cpp	2015-07-17 01:36:49 +02:00
Matthew Honnibal	ba9a22ae0b	* Ignore cpp files in spacy/tokens	2015-07-13 22:30:15 +02:00
Jordan Suchow	3005c86682	Don't track generated data files	2015-04-19 13:25:42 -07:00
Matthew Honnibal	c0a3e25b43	* Upd gitignore	2015-04-08 07:48:04 +02:00
Matthew Honnibal	49df1b7002	* Ignore .tgz files	2015-03-26 16:44:42 +01:00

1 2

59 Commits