spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-02 09:56:39 +03:00

Author	SHA1	Message	Date
Ines Montani	4bee26188d	Merge pull request #1323 from galaxyh/master Set the "cut_all" parameter in jieba.cut() to False, or jieba will return ALL POSSIBLE word segmentations.	2017-09-14 15:23:41 +02:00
Yu-chun Huang	7692b8c071	Update __init__.py Set the "cut_all" parameter to False, or jieba will return ALL POSSIBLE word segmentations.	2017-09-12 16:23:47 +08:00
Matthew Honnibal	ddaff6ca56	Merge pull request #1287 from IamJeffG/feature/1226-more-complete-noun-chunks Capture more noun chunks	2017-09-08 07:59:10 +02:00
Matthew Honnibal	45029a550e	Fix customized-tokenizer tests	2017-09-04 20:13:13 +02:00
Matthew Honnibal	34c585396a	Merge pull request #1294 from Vimos/master Fix issue #1292 and add test case for the Assertion Error	2017-09-04 19:20:40 +02:00
Matthew Honnibal	c68f188eb0	Fix error on test	2017-09-04 18:59:36 +02:00
Matthew Honnibal	33313c01ad	Merge pull request #1298 from ericzhao28/master Lowest common ancestor matrix for spans and docs	2017-09-04 18:57:54 +02:00
Matthew Honnibal	e8a26ebfab	Add efficiency note to new get_lca_matrix() method	2017-09-04 15:43:52 +02:00
Eric Zhao	d61c117081	Lowest common ancestor matrix for spans and docs Added functionality for spans and docs to get lowest common ancestor matrix by simply calling: doc.get_lca_matrix() or doc[:3].get_lca_matrix(). Corresponding unit tests were also added under spacy/tests/doc and spacy/tests/spans. Designed to address: https://github.com/explosion/spaCy/issues/969.	2017-09-03 12:22:19 -07:00
Matthew Honnibal	9bffcaa73d	Update test to make it slightly more direct The `nlp` container should be unnecessary here. If so, we can test the tokenizer class just a little more directly.	2017-09-01 21:16:56 +02:00
Vimos Tan	a6d9fb5bb6	fix issue #1292	2017-08-30 14:49:14 +08:00
Paul O'Leary McCann	8b3e1f7b5b	Handle out-of-vocab words Wasn't handling words out of the tokenizer dictionary vocabulary properly. This adds a fix and test for that. -POLM	2017-08-29 23:58:42 +09:00
Jeffrey Gerard	884ba168a8	Capture more noun chunks	2017-08-23 21:18:53 -07:00
Paul O'Leary McCann	95050201ce	Add importorskip for Japanese fixture	2017-08-22 21:30:59 +09:00
Paul O'Leary McCann	bcf2b9b4f5	Update tagger & tokenizer tests Tagger is now parametrized and has two sentences with more tag coverage. The tokenizer tests are updated to reflect differences in tokenization between IPAdic and Unidic. -POLM	2017-08-22 00:03:11 +09:00
Paul O'Leary McCann	adfd987316	Update the TAG_MAP	2017-08-22 00:02:55 +09:00
Paul O'Leary McCann	53e17296e9	Fix pronoun handling Missed this case earlier. 連体詞 have three classes for UD purposes: - その -> DET - それ -> PRON - 同じ -> ADJ -POLM	2017-08-22 00:01:49 +09:00
Paul O'Leary McCann	c435f748d7	Put Mecab import in utility function	2017-08-22 00:01:28 +09:00
ines	dcff10abe9	Add regression test for #1281	2017-08-21 16:11:47 +02:00
ines	edc596d9a7	Add missing tokenizer exceptions (resolves #1281 )	2017-08-21 16:11:36 +02:00
ines	c5c3f4c7d9	Use more generous .env ignore rule	2017-08-21 16:08:40 +02:00
Paul O'Leary McCann	234a8a7591	Change default tag for 動詞,非自立可能 Example of this is いる in these sentences: 彼はそこにいる。# should be VERB 彼は底に立っている。# should be AUX Unclear which case is more numerous - need to check a large corpus - but in keeping with the other ambiguous tags, this is mapped to the "dominant" or first part of the tag. -POLM	2017-08-21 00:21:45 +09:00
Ines Montani	dca026124f	Merge pull request #1262 from kevinmarsh/patch-1 Fix broken tutorial link on website	2017-08-16 09:58:07 +02:00
Kevin Marsh	e3738aba0d	Fix broken tutorial link on website	2017-08-15 21:50:09 +01:00
Ines Montani	a9465271a7	Merge pull request #1245 from delirious-lettuce/fix_typos Fix typos	2017-08-07 23:11:20 +02:00
Paul O'Leary McCann	6e9e686568	Sample implementation of Japanese Tagger (ref #1214 ) This is far from complete but it should be enough to check some things. 1. Mecab transition. Janome doesn't support Unidic, only IPAdic, but UD tag mappings are based on Unidic. This switches out Mecab for Janome to get around that. 2. Raw tag extension. A simple tag map can't meet the specifications for UD tag mappings, so this adds an extra field to ambiguous cases. For this demo it just deals with the simplest case, which only needs to look at the literal token. (In reality it may be necessary to look at the whole sentence, but that's another issue.) 3. General code structure. Seems nobody else has implemented a custom Tagger yet, so still not sure this is the correct way to pass the vocabulary around, for example. Any feedback would be greatly appreciated. -POLM	2017-08-08 01:27:15 +09:00
Delirious Lettuce	d3b03f0544	Fix typos: * `auxillary` -> `auxiliary` * `consistute` -> `constitute` * `earlist` -> `earliest` * `prefered` -> `preferred` * `direcory` -> `directory` * `reuseable` -> `reusable` * `idiosyncracies` -> `idiosyncrasies` * `enviroment` -> `environment` * `unecessary` -> `unnecessary` * `yesteday` -> `yesterday` * `resouces` -> `resources`	2017-08-06 21:31:39 -06:00
Matthew Honnibal	b7b121103f	Merge pull request #1244 from gideonite/patch-1 improve pipe, tee, izip explanation	2017-08-06 14:34:07 +02:00
Gideon Dresdner	7e98a3613c	improve pipe, tee, izip explanation Use an example from an old issue https://github.com/explosion/spaCy/issues/172#issuecomment-183963403.	2017-08-06 13:21:45 +02:00
ines	864cefd3b2	Update README.rst	2017-07-22 18:29:55 +02:00
ines	e349271506	Increment version	2017-07-22 18:29:30 +02:00
Ines Montani	570964e67f	Update README.rst	2017-07-22 16:20:19 +02:00
Matthew Honnibal	5494605689	Fiddle with regex pin	2017-07-22 16:09:50 +02:00
Matthew Honnibal	78fcf56dd5	Update version pin for regex library	2017-07-22 15:57:58 +02:00
Matthew Honnibal	d51d55bba6	Increment version	2017-07-22 15:43:16 +02:00
Matthew Honnibal	8ccf154413	Merge branch 'master' of https://github.com/explosion/spaCy	2017-07-22 15:42:44 +02:00
Matthew Honnibal	796b2f4c1b	Remove print statements in tests	2017-07-22 15:42:38 +02:00
ines	7c4bf9994d	Add note on requirements and preventing model re-downloads (closes #1143 )	2017-07-22 15:40:12 +02:00
ines	de25bad036	Use lower min version for requests dependency (fixes #1137 ) Ensure compatibility with docker-compose and other packages	2017-07-22 15:29:10 +02:00
ines	d7560047c5	Fix version	2017-07-22 15:24:33 +02:00
Matthew Honnibal	af945ea8e2	Merge branch 'master' of https://github.com/explosion/spaCy	2017-07-22 15:09:59 +02:00
Matthew Honnibal	4b2e5e59ed	Add flush_cache method to tokenizer, to fix #1061 The tokenizer caches output for common chunks, for efficiency. This cache is be invalidated when the tokenizer rules change, e.g. when a new special-case rule is introduced. That's what was causing #1061. When the cache is flushed, we free the intermediate token chunks. I think this is safe --- but if we start getting segfaults, this patch is to blame. The resolution would be to simply not free those bits of memory. They'll be freed when the tokenizer exits anyway.	2017-07-22 15:06:50 +02:00
Ines Montani	96df9c7154	Update CONTRIBUTORS.md	2017-07-22 15:05:46 +02:00
ines	b22b18a019	Add notes on spacy.explain() to annotation docs	2017-07-22 15:02:15 +02:00
ines	e3f23f9d91	Use latest available version in examples	2017-07-22 14:57:51 +02:00
Matthew Honnibal	23a55b40ca	Default to English noun chunks iterator if no lang set	2017-07-22 14:15:25 +02:00
Matthew Honnibal	9750a0128c	Fix Span.noun_chunks. Closes #1207	2017-07-22 14:14:57 +02:00
Matthew Honnibal	d9b85675d7	Rename regression test	2017-07-22 14:14:35 +02:00
Matthew Honnibal	dfbc7e49de	Add test for Issue #1207	2017-07-22 14:14:01 +02:00
Matthew Honnibal	0ae3807d7d	Fix gaps in Lexeme API. Closes #1031	2017-07-22 13:53:48 +02:00

1 2 3 4 5 ...

5304 Commits