spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-05 14:59:59 +03:00

Author	SHA1	Message	Date
Kamolsit Mongkolsrisawat	dcc67f3f51	Update Thai tokenizer_exception list (#3529 ) * add tokenizer_exceptions word (ก-น) from https://goo.gl/JpJ2qq * update tokenizer_exceptions word list * add contributor file	2019-04-03 09:13:36 +02:00
ivigamberdiev	5e5641616d	Update links and http -> https (#3532 ) * update links and http -> https * SCA	2019-04-02 17:36:22 +02:00
Ines Montani	24cecdb44f	Update compatibility [ci skip]	2019-04-01 16:25:16 +02:00
jeannefukumaru	6cdb7b2e04	added tag_map for indonesian (#3515 ) * added tag_map for indonesian * changed tag map from .py to .txt to see if tests pass * added symbols import * added utf8 encoding flag * added missing SCONJ symbol * Auto-format * Remove unused imports * Make tag map available in Indonesian defaults	2019-04-01 12:27:48 +02:00
Ines Montani	c23e234d65	Auto-format	2019-04-01 12:11:27 +02:00
Ines Montani	5821b020d5	Merge branch 'spacy.io'	2019-04-01 11:47:59 +02:00
Matthew Honnibal	e64b241f9c	Merge branch 'master' of https://github.com/explosion/spaCy	2019-03-31 13:58:38 +02:00
Ines Montani	b070e0caf7	Update landing.js	2019-03-30 22:26:46 +01:00
Ines Montani	9d1221943b	Merge branch 'master' into spacy.io	2019-03-30 20:32:14 +01:00
Ines Montani	037ffdfd3f	Add spaCy IRL to landing [ci skip]	2019-03-30 20:32:03 +01:00
Ines Montani	68900066e0	Merge pull request #3459 from svlandeg/feature/el-framework Basic framework and APIs for entity linker	2019-03-29 14:02:22 +01:00
Hiromu Hota	914b9ff3d2	Tags are joined with a comma and padded with asterisks (#3491 ) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> Fix a bug in the test of JapaneseTokenizer. This PR may require @polm's review. ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> Bug fix ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-28 16:17:31 +01:00
Ines Montani	730f759b4f	Merge branch 'master' into spacy.io	2019-03-28 15:26:17 +01:00
Ines Montani	7d033a7b89	Fix met a description in universe projects [ci skip]	2019-03-28 15:26:01 +01:00
Ines Montani	fe2cb642ac	Merge branch 'master' into spacy.io	2019-03-28 15:13:39 +01:00
David	74e738dd4d	adds textpipe to universe (#3500 ) [ci skip] * Adds textpipe to universe * signed contributor agreement * Adjust formatting, code style and use "standalone" category	2019-03-28 15:13:19 +01:00
Ines Montani	04a9fb1a02	Merge branch 'master' into spacy.io	2019-03-28 13:34:46 +01:00
Samuel Kane	06a1846379	fix(util): fix decaying function output (#3495 ) * fix(util): fix decaying function output * fix(util): better test and adhere to code standards * fix(util): correct variable name, pytestify test, update website text	2019-03-28 13:24:47 +01:00
Duygu Altinok	5a7bc6b39d	Fix/irreg adverbs extension (#3499 ) * extended list of irreg adverbs * added test to exceptions * fixed typo	2019-03-28 13:23:33 +01:00
Bharat Raghunathan	1db3e47509	DOC: Update tokenizer docs to include default value for batch_size in pipe (#3492 )	2019-03-28 12:48:02 +01:00
Ines Montani	2ed16d82bf	Fix social image	2019-03-26 18:27:40 +01:00
Matthew Honnibal	f77bf2bdb1	Fix GPU training for textcat. Closes #3473	2019-03-26 13:36:11 +01:00
Sofie	a4a6bfa4e1	Merge branch 'master' into feature/el-framework	2019-03-26 11:00:02 +01:00
svlandeg	8814b9010d	entity as one field instead of both ID and name	2019-03-25 18:10:41 +01:00
Ines Montani	9e14b2b69f	Add Estonian to docs [ci skip] (closes #3482 )	2019-03-25 18:01:54 +01:00
Wannaphong Phatthiyaphaibun	297a051992	Update Thai tag map (#3480 ) * Update Thai tag map Update Thai tag map * Create wannaphongcom.md	2019-03-25 16:53:26 +01:00
Ines Montani	21ade53ef7	Merge branch 'master' into spacy.io	2019-03-25 13:05:00 +01:00
Ines Montani	db938ab0e3	Update favicon (closes #3475 ) [ci skip]	2019-03-25 13:04:47 +01:00
Ines Montani	c8c1baaea8	Update binderVersion	2019-03-25 12:17:03 +01:00
Matthew Honnibal	85dcd9477e	Set version to v2.1.3	2019-03-23 16:47:57 +01:00
Matthew Honnibal	f436efd8a4	Small tweak to ensemble textcat model	2019-03-23 16:47:26 +01:00
Ines Montani	200d8bdb3c	Merge branch 'spacy.io' [ci skip]	2019-03-23 16:46:34 +01:00
Ines Montani	1e5b917d75	Fix formatting [ci skip]	2019-03-23 16:45:50 +01:00
Matthew Honnibal	6c783f8045	Bug fixes and options for TextCategorizer (#3472 ) * Fix code for bag-of-words feature extraction The _ml.py module had a redundant copy of a function to extract unigram bag-of-words features, except one had a bug that set values to 0. Another function allowed extraction of bigram features. Replace all three with a new function that supports arbitrary ngram sizes and also allows control of which attribute is used (e.g. ORTH, LOWER, etc). * Support 'bow' architecture for TextCategorizer This allows efficient ngram bag-of-words models, which are better when the classifier needs to run quickly, especially when the texts are long. Pass architecture="bow" to use it. The extra arguments ngram_size and attr are also available, e.g. ngram_size=2 means unigram and bigram features will be extracted. * Fix size limits in train_textcat example * Explain architectures better in docs	2019-03-23 16:44:44 +01:00
Ines Montani	5944cf10c7	Add blog post to v2.1 page	2019-03-23 16:34:23 +01:00
Ines Montani	ffebdad08d	Add cheat sheet to spaCy 101	2019-03-23 16:32:55 +01:00
Ines Montani	06bf130890	💫 Add better and serializable sentencizer (#3471 ) * Add better serializable sentencizer component * Replace default factory * Add tests * Tidy up * Pass test * Update docs	2019-03-23 15:45:02 +01:00
Matthew Honnibal	d9a07a7f6e	💫 Fix class mismap on parser deserializing (closes #3433 ) (#3470 ) v2.1 introduced a regression when deserializing the parser after parser.add_label() had been called. The code around the class mapping is pretty confusing currently, as it was written to accommodate backwards model compatibility. It needs to be revised when the models are next retrained. Closes #3433	2019-03-23 13:46:25 +01:00
Matthew Honnibal	444a3abfe5	Add xfail test for #3433 . Improve test for add label.	2019-03-23 12:36:00 +01:00
Ines Montani	6b6e9b638e	Fix test for #3468	2019-03-23 11:24:29 +01:00
Ines Montani	fbec72b4c3	Slightly modify test for #3468 Check for Token.is_sent_start first (which is serialized/deserialized correctly)	2019-03-23 11:22:44 +01:00
Ines Montani	02d9378d8c	Add xfailing test for #3468	2019-03-23 11:19:11 +01:00
Ines Montani	ed91592726	Merge branch 'master' into spacy.io	2019-03-22 19:02:26 +01:00
Ines Montani	dcd6e06c47	Improve landing example [ci skip]	2019-03-22 19:02:15 +01:00
Ines Montani	c2bb39dcb4	Merge branch 'master' into spacy.io	2019-03-22 18:50:16 +01:00
Ines Montani	a841324034	Update landing example [ci skip]	2019-03-22 18:50:00 +01:00
Ines Montani	a9ad735241	Merge branch 'master' into spacy.io	2019-03-22 18:36:28 +01:00
Ines Montani	b532386a60	Fix typo [ci skip]	2019-03-22 18:36:17 +01:00
Ines Montani	7b5496027b	Merge branch 'master' into spacy.io	2019-03-22 18:21:16 +01:00
Ines Montani	d8533f0149	Update Binder [ci skip]	2019-03-22 18:16:46 +01:00

1 2 3 4 5 ...

9978 Commits