spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-23 04:26:46 +03:00

Author	SHA1	Message	Date
Ines Montani	ffc2fef13c	Merge pull request #1411 from raphael0202/issue_1078 Resolve issue #1078 by simplifying URL pattern	2017-10-11 11:54:57 +02:00
Raphaël Bournhonesque	3452d6ce52	Resolve issue #1078 by simplifying URL pattern - avoid catastrophic backtracking - reduce character range of host name, domain name and TLD identifier	2017-10-11 11:24:00 +02:00
Matthew Honnibal	331d338b8b	Merge pull request #1246 from polm/ja-pos-tagger [wip] Sample implementation of Japanese Tagger (ref #1214)	2017-10-09 04:00:53 +02:00
Ines Montani	d33899b60b	Merge pull request #1393 from yuukos/patch-1 Update adding-languages.jade	2017-10-06 18:03:31 +02:00
Ines Montani	e89689a31d	Update CONTRIBUTORS.md	2017-10-06 18:02:40 +02:00
Alex	763b54cbc3	Update adding-languages.jade Fixed misspellings	2017-10-06 16:30:44 +07:00
Matthew Honnibal	0e1adacaff	Merge pull request #1390 from mdcclv/contributor-mdcclv Contributor agreement for Orion Montoya @mdcclv	2017-10-06 02:39:08 +02:00
Orion Montoya	e04e11070f	Contributor agreement for Orion Montoya @mdcclv	2017-10-05 17:45:45 -04:00
Ines Montani	e77d8886f7	Update CONTRIBUTORS.md	2017-10-05 22:22:04 +02:00
Matthew Honnibal	dea81f113d	Merge pull request #1389 from mdcclv/lemmatizer_obey_exceptions Lemmatizer obey exceptions	2017-10-05 22:11:21 +02:00
Orion Montoya	b0d271809d	Unit test for lemmatizer exceptions -- copied from regression test for #1387	2017-10-05 10:49:28 -04:00
Orion Montoya	ffb50d21a0	Lemmatizer honors exceptions: Fix #1387	2017-10-05 10:49:02 -04:00
Orion Montoya	e81a608173	Regression test for lemmatizer exceptions -- demonstrate issue #1387	2017-10-05 10:47:48 -04:00
Ines Montani	678651ca98	Merge pull request #1386 from kokes/patch-1 Fixing links to SyntaxNet	2017-10-04 13:35:01 +02:00
Ondrej Kokes	a9362f1c73	Fixing links to SyntaxNet	2017-10-04 12:55:07 +02:00
Matthew Honnibal	eb72eae258	Merge pull request #1364 from Destygo/master Fixed NER model loading bug	2017-09-29 12:29:43 +02:00
Ines Montani	58bfe30a12	Merge pull request #1362 from IamJeffG/docs/custom-tokenizer Document Tokenizer(token_match) and clarify tokenizer_pseudo_code	2017-09-26 15:51:15 +02:00
Vincent Genty	259ed027af	Fixed NER model loading bug	2017-09-26 15:46:04 +02:00
Ines Montani	361211fe26	Merge pull request #1342 from wannaphongcom/master Add Thai language	2017-09-26 15:40:55 +02:00
Jeffrey Gerard	b6ebedd09c	Document Tokenizer(token_match) and clarify tokenizer_pseudo_code Closes #835 In the `tokenizer_pseudo_code` I put the `special_cases` kwarg before `find_prefix` because this now matches the order the args are used in the pseudocode, and it also matches spacy's actual code.	2017-09-25 13:13:25 -07:00
Matthew Honnibal	2f8d535f65	Merge pull request #1351 from hscspring/patch-4 Update punctuation.py	2017-09-24 12:16:39 +02:00
Matthew Honnibal	9177313063	Merge pull request #1352 from hscspring/patch-5 Update customizing-tokenizer.jade	2017-09-22 16:11:49 +02:00
Matthew Honnibal	1dbc2285b8	Merge pull request #1350 from hscspring/patch-3 Update word-vectors-similarities.jade	2017-09-22 16:11:05 +02:00
Yam	54855f0eee	Update customizing-tokenizer.jade	2017-09-22 12:15:48 +08:00
Yam	6f450306c3	Update customizing-tokenizer.jade update some codes: - `me` -> `-PRON` - `TAG` -> `POS` - `create_tokenizer` function	2017-09-22 10:53:22 +08:00
Yam	923c4c2fb2	Update punctuation.py add `……`	2017-09-22 09:50:46 +08:00
Yam	425c09488d	Update word-vectors-similarities.jade add ``` import spacy nlp = spacy.load('en') ```	2017-09-22 08:56:34 +08:00
Wannaphong Phatthiyaphaibun	1abf472068	add th test	2017-09-21 12:56:58 +07:00
Matthew Honnibal	ea2732469b	Merge pull request #1340 from hscspring/patch-1 Update punctuation.py	2017-09-20 23:57:00 +02:00
Wannaphong Phatthiyaphaibun	39bb5690f0	update th	2017-09-21 00:36:02 +07:00
Wannaphong Phatthiyaphaibun	44291f6697	add thai	2017-09-20 23:26:34 +07:00
Yam	978b24ccd4	Update punctuation.py In Chinese, `~` and `——` is hyphens, `·` is intermittent symbol	2017-09-20 23:02:22 +08:00
Matthew Honnibal	aa728b33ca	Merge pull request #1333 from galaxyh/master Add Chinese punctuation	2017-09-19 15:09:30 +02:00
Yu-chun Huang	188b439b25	Add Chinese punctuation Add Chinese punctuation.	2017-09-19 16:58:42 +08:00
Yu-chun Huang	1f1f35dcd0	Add Chinese punctuation Add Chinese punctuation.	2017-09-19 16:57:24 +08:00
Ines Montani	4bee26188d	Merge pull request #1323 from galaxyh/master Set the "cut_all" parameter in jieba.cut() to False, or jieba will return ALL POSSIBLE word segmentations.	2017-09-14 15:23:41 +02:00
Yu-chun Huang	7692b8c071	Update __init__.py Set the "cut_all" parameter to False, or jieba will return ALL POSSIBLE word segmentations.	2017-09-12 16:23:47 +08:00
Matthew Honnibal	ddaff6ca56	Merge pull request #1287 from IamJeffG/feature/1226-more-complete-noun-chunks Capture more noun chunks	2017-09-08 07:59:10 +02:00
Matthew Honnibal	45029a550e	Fix customized-tokenizer tests	2017-09-04 20:13:13 +02:00
Matthew Honnibal	34c585396a	Merge pull request #1294 from Vimos/master Fix issue #1292 and add test case for the Assertion Error	2017-09-04 19:20:40 +02:00
Matthew Honnibal	c68f188eb0	Fix error on test	2017-09-04 18:59:36 +02:00
Matthew Honnibal	33313c01ad	Merge pull request #1298 from ericzhao28/master Lowest common ancestor matrix for spans and docs	2017-09-04 18:57:54 +02:00
Matthew Honnibal	e8a26ebfab	Add efficiency note to new get_lca_matrix() method	2017-09-04 15:43:52 +02:00
Eric Zhao	d61c117081	Lowest common ancestor matrix for spans and docs Added functionality for spans and docs to get lowest common ancestor matrix by simply calling: doc.get_lca_matrix() or doc[:3].get_lca_matrix(). Corresponding unit tests were also added under spacy/tests/doc and spacy/tests/spans. Designed to address: https://github.com/explosion/spaCy/issues/969.	2017-09-03 12:22:19 -07:00
Matthew Honnibal	9bffcaa73d	Update test to make it slightly more direct The `nlp` container should be unnecessary here. If so, we can test the tokenizer class just a little more directly.	2017-09-01 21:16:56 +02:00
Vimos Tan	a6d9fb5bb6	fix issue #1292	2017-08-30 14:49:14 +08:00
Paul O'Leary McCann	8b3e1f7b5b	Handle out-of-vocab words Wasn't handling words out of the tokenizer dictionary vocabulary properly. This adds a fix and test for that. -POLM	2017-08-29 23:58:42 +09:00
Jeffrey Gerard	884ba168a8	Capture more noun chunks	2017-08-23 21:18:53 -07:00
Paul O'Leary McCann	95050201ce	Add importorskip for Japanese fixture	2017-08-22 21:30:59 +09:00
Paul O'Leary McCann	bcf2b9b4f5	Update tagger & tokenizer tests Tagger is now parametrized and has two sentences with more tag coverage. The tokenizer tests are updated to reflect differences in tokenization between IPAdic and Unidic. -POLM	2017-08-22 00:03:11 +09:00

1 2 3 4 5 ...

5228 Commits