spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-09 08:49:42 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	f6e9394aa5	Fix push-tag script	2019-05-11 19:04:35 +02:00
Matthew Honnibal	a5159ddcf5	Set version to v2.1.4.dev1	2019-05-11 19:03:51 +02:00
Ines Montani	7534f7cb44	Fix return value of Language.update (closes #3692 )	2019-05-11 18:40:19 +02:00
Ines Montani	503b8c85f1	Add TWiML podcast to universe [ci skip]	2019-05-11 17:48:22 +02:00
Ines Montani	0daf2422a3	Auto-format	2019-05-11 17:48:07 +02:00
Ines Montani	6b3a79ac96	Call rmtree and copytree with strings (closes #3713 )	2019-05-11 15:48:35 +02:00
devforfu	21af12eb53	Make "text" key in JSONL format optional when "tokens" key is provided (#3721 ) * Fix issue with forcing text key when it is not required * Extending the docs to reflect the new behavior	2019-05-11 15:41:29 +02:00
Ines Montani	6cfa1e1f47	Fix DependencyParser.predict docs (resolves #3561 )	2019-05-11 15:37:54 +02:00
Ines Montani	25f5592d57	Improve Token.prob and Lexeme.prob docs (resolves #3701 )	2019-05-11 15:23:41 +02:00
Aaron Kub	719a15f23d	fixing regex matcher examples (#3708 ) (#3719 )	2019-05-10 14:23:52 +02:00
Luca Dorigo	82d034f976	Update glossary.py to match information found in documentation (#3704 ) (closes ##3679) * Update glossary.py to match information found in documentation I used regexes to add any dependency tag that was in the documentation but not in the glossary. Solves #3679 👍 * Adds forgotten colon	2019-05-10 14:23:20 +02:00
Wannaphong Phatthiyaphaibun	5a14a13f64	fix thai bug (#3693 ) fix tokenize for pythainlp	2019-05-10 14:21:34 +02:00
Luca Dorigo	2663f4133c	Submit contributor agreement (#3705 )	2019-05-10 14:19:18 +02:00
Ines Montani	65b55f1aaa	Add version tag to `--base-model` argument (closes #3720 )	2019-05-10 14:06:47 +02:00
richardpaulhudson	a1e07f0d14	Request to include Holmes in spaCy Universe (#3685 ) * Request to add Holmes to spaCy Universe Dear spaCy team, I would be grateful if you would consider my Python library Holmes for inclusion in the spaCy Universe. Holmes transforms the syntactic structures delivered by spaCy into semantic structures that, together with various other techniques including ontological matching and word embeddings, serve as the basis for information extraction. Holmes supports several use cases including chatbot, structured search, topic matching and supervised document classification. I had the basic idea for Holmes around 15 years ago and now spaCy has made it possible to build an implementation that is stable and fast enough to actually be of use - thank you! At present Holmes supports English and German (I am based in Munich) but could easily be extended to support any other language with a spaCy model. * Added	2019-05-08 02:42:03 +02:00
Ines Montani	505c9e0e19	Add util.filter_spans helper (#3686 )	2019-05-08 02:33:40 +02:00
F0rge1cE	dd1e6b0bc6	Fix offset bug in loading pre-trained word2vec. (#3689 ) * Fix offset bug in loading pre-trained word2vec. * add contributor agreement	2019-05-06 23:00:38 +02:00
Bram Vanroy	8e6f8deaf6	Re-added Universe readme (#3688 ) (closes #3680 )	2019-05-06 21:08:01 +02:00
Ines Montani	78cb807a9a	Auto-format [ci skip]	2019-05-06 16:58:29 +02:00
Ines Montani	dd153b2b33	Simplify helper (see #3681 ) [ci skip]	2019-05-06 15:13:10 +02:00
Ines Montani	f8fce6c03c	Fix typo (see #3681 )	2019-05-06 15:02:11 +02:00
Ines Montani	f2a56c1b56	Rewrite example to use Retokenizer (resolves #3681 ) Also add helper to filter spans	2019-05-06 14:51:18 +02:00
Brad Jascob	955b95cb8b	Fix inconsistant lemmatizer issue #3484 (#3646 ) * Fix inconsistant lemmatizer issue #3484 * Remove test case	2019-05-04 18:16:03 +02:00
Ines Montani	b4d142e3c4	Adjust wording and formatting [ci skip]	2019-05-03 12:00:31 +02:00
Ines Montani	04658ebbb2	Relax jsonschema pin (closes #3628 )	2019-05-03 11:58:58 +02:00
d5555	ba4bcbf285	Update universe.json (#3653 ) [ci skip] * Update universe.json * Update universe.json	2019-05-03 11:50:12 +02:00
Dobita21	f95ecedd83	Add Thai lex_attrs (#3655 ) * test sPacy commit to git fri 04052019 10:54 * change Data format from my format to master format * ทัทั้งนี้ ---> ทั้งนี้ * delete stop_word translate from Eng * Adjust formatting and readability * add Thai norm_exception * Add Dobita21 SCA * editรึ : หรือ, * Update Dobita21.md * Auto-format * Integrate norms into language defaults * add acronym and some norm exception words * add lex_attrs * Add lexical attribute getters into the language defaults * fix LEX_ATTRS Co-authored-by: Donut <dobita21@gmail.com> Co-authored-by: Ines Montani <ines@ines.io>	2019-05-01 12:03:14 +02:00
张晓飞	ba1ff00370	update response after calling add_pipe (#3661 ) * update response after calling add_pipe component:print_info is appened in the last, so need show it at the end of pipeline * Create henry860916.md	2019-05-01 12:02:18 +02:00
BreakBB	8952004dfc	Update French example sents and add two German stop words (#3662 ) * Update french example sentences * Add 'anderem' and 'ihren' to German stop words	2019-05-01 12:01:35 +02:00
Ramiro Gómez	8ee4100f8f	Remove dangling M (#3657 ) I assume this is a typo. Sorry if it has a meaning that I'm not aware of.	2019-04-29 19:44:43 +02:00
Amit Chaudhary	167d63af31	Fix broken link to Dive Into Python 3 website (#3656 ) * Fix broken link to Dive Into Python 3 website * Sign spaCy Contributor Agreement	2019-04-29 19:44:00 +02:00
Ramiro Gómez	e7e5999ddc	Create yaph.md so I can contribute (#3658 )	2019-04-29 19:43:06 +02:00
Brad Jascob	6fcafcc564	Doc changes for local website setup (#3651 )	2019-04-27 13:28:23 +02:00
Ivan Tham	fa94f83697	Improve redundant variable name (#3643 ) * Improve redundant variable name * Apply suggestions from code review Co-Authored-By: pickfire <pickfire@riseup.net>	2019-04-26 16:50:14 +02:00
Ines Montani	dc87fb805d	Merge branch 'master' of https://github.com/explosion/spaCy	2019-04-26 13:17:57 +02:00
Ines Montani	62060ae9c6	Merge branch 'spacy.io'	2019-04-26 13:17:52 +02:00
Brad Jascob	9afa0d6723	Update Universe Website for pyInflect (#3641 )	2019-04-26 13:17:36 +02:00
Ines Montani	db7c0dbfd6	Update seo.js	2019-04-23 18:39:30 +02:00
Dobita21	721e1fc86c	update norm_exceptions (#3627 ) * test sPacy commit to git fri 04052019 10:54 * change Data format from my format to master format * ทัทั้งนี้ ---> ทั้งนี้ * delete stop_word translate from Eng * Adjust formatting and readability * add Thai norm_exception * Add Dobita21 SCA * editรึ : หรือ, * Update Dobita21.md * Auto-format * Integrate norms into language defaults * add acronym and some norm exception words	2019-04-23 12:48:03 +02:00
Ines Montani	ec0d840ab5	Document early stopping	2019-04-22 14:31:32 +02:00
Ines Montani	e0f487f904	Rename early_stopping_iter to n_early_stopping	2019-04-22 14:31:25 +02:00
Ines Montani	9767427669	Auto-format	2019-04-22 14:31:11 +02:00
Ines Montani	1d567913f9	Update spacy evaluate example	2019-04-22 14:28:42 +02:00
Ines Montani	7917ce2f73	Make flag shortcut consistent and document	2019-04-22 14:23:44 +02:00
Ines Montani	52658c80d5	Allow jupyter=False to override Jupyter mode (closes #3598 )	2019-04-22 14:18:32 +02:00
Motoki Wu	8e2cef49f3	Add save after `--save-every` batches for `spacy pretrain` (#3510 ) <!--- Provide a general summary of your changes in the title. --> When using `spacy pretrain`, the model is saved only after every epoch. But each epoch can be very big since `pretrain` is used for language modeling tasks. So I added a `--save-every` option in the CLI to save after every `--save-every` batches. ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> To test... Save this file to `sample_sents.jsonl` ``` {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} ``` Then run `--save-every 2` when pretraining. ```bash spacy pretrain sample_sents.jsonl en_core_web_md here -nw 1 -bs 1 -i 10 --save-every 2 ``` And it should save the model to the `here/` folder after every 2 batches. The models that are saved during an epoch will have a `.temp` appended to the save name. At the end the training, you should see these files (`ls here/`): ```bash config.json model2.bin model5.bin model8.bin log.jsonl model2.temp.bin model5.temp.bin model8.temp.bin model0.bin model3.bin model6.bin model9.bin model0.temp.bin model3.temp.bin model6.temp.bin model9.temp.bin model1.bin model4.bin model7.bin model1.temp.bin model4.temp.bin model7.temp.bin ``` ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> This is a new feature to `spacy pretrain`. 🌵 Unfortunately, I haven't been able to test this because compiling from source is not working (cythonize error). ``` Processing matcher.pyx [Errno 2] No such file or directory: '/Users/mwu/github/spaCy/spacy/matcher.pyx' Traceback (most recent call last): File "/Users/mwu/github/spaCy/bin/cythonize.py", line 169, in <module> run(args.root) File "/Users/mwu/github/spaCy/bin/cythonize.py", line 158, in run process(base, filename, db) File "/Users/mwu/github/spaCy/bin/cythonize.py", line 124, in process preserve_cwd(base, process_pyx, root + ".pyx", root + ".cpp") File "/Users/mwu/github/spaCy/bin/cythonize.py", line 87, in preserve_cwd func(args) File "/Users/mwu/github/spaCy/bin/cythonize.py", line 63, in process_pyx raise Exception("Cython failed") Exception: Cython failed Traceback (most recent call last): File "setup.py", line 276, in <module> setup_package() File "setup.py", line 209, in setup_package generate_cython(root, "spacy") File "setup.py", line 132, in generate_cython raise RuntimeError("Running cythonize failed") RuntimeError: Running cythonize failed ``` Edit: Fixed! after deleting all `.cpp` files: `find spacy -name ".cpp" \| xargs rm` ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-04-22 14:10:16 +02:00
Dobita21	189c90743c	Add Thai norm_exceptions (#3612 ) * test sPacy commit to git fri 04052019 10:54 * change Data format from my format to master format * ทัทั้งนี้ ---> ทั้งนี้ * delete stop_word translate from Eng * Adjust formatting and readability * add Thai norm_exception * Add Dobita21 SCA * editรึ : หรือ, * Update Dobita21.md * Auto-format * Integrate norms into language defaults	2019-04-20 12:16:03 +02:00
Ines Montani	7937109ed9	Update link [ci skip]	2019-04-19 16:01:41 +02:00
Ines Montani	0dce4585b1	Add course to 101	2019-04-19 15:59:51 +02:00
Ines Montani	2efc87c382	Remove unused image	2019-04-19 15:48:12 +02:00

1 2 3 4 5 ...

10081 Commits