spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-07 01:01:17 +03:00

Author	SHA1	Message	Date
Jeffrey Gerard	b6ebedd09c	Document Tokenizer(token_match) and clarify tokenizer_pseudo_code Closes #835 In the `tokenizer_pseudo_code` I put the `special_cases` kwarg before `find_prefix` because this now matches the order the args are used in the pseudocode, and it also matches spacy's actual code.	2017-09-25 13:13:25 -07:00
Matthew Honnibal	2f8d535f65	Merge pull request #1351 from hscspring/patch-4 Update punctuation.py	2017-09-24 12:16:39 +02:00
Matthew Honnibal	9177313063	Merge pull request #1352 from hscspring/patch-5 Update customizing-tokenizer.jade	2017-09-22 16:11:49 +02:00
Matthew Honnibal	1dbc2285b8	Merge pull request #1350 from hscspring/patch-3 Update word-vectors-similarities.jade	2017-09-22 16:11:05 +02:00
Yam	54855f0eee	Update customizing-tokenizer.jade	2017-09-22 12:15:48 +08:00
Yam	6f450306c3	Update customizing-tokenizer.jade update some codes: - `me` -> `-PRON` - `TAG` -> `POS` - `create_tokenizer` function	2017-09-22 10:53:22 +08:00
Yam	923c4c2fb2	Update punctuation.py add `……`	2017-09-22 09:50:46 +08:00
Yam	425c09488d	Update word-vectors-similarities.jade add ``` import spacy nlp = spacy.load('en') ```	2017-09-22 08:56:34 +08:00
Matthew Honnibal	ea2732469b	Merge pull request #1340 from hscspring/patch-1 Update punctuation.py	2017-09-20 23:57:00 +02:00
Yam	978b24ccd4	Update punctuation.py In Chinese, `~` and `——` is hyphens, `·` is intermittent symbol	2017-09-20 23:02:22 +08:00
Matthew Honnibal	aa728b33ca	Merge pull request #1333 from galaxyh/master Add Chinese punctuation	2017-09-19 15:09:30 +02:00
Yu-chun Huang	188b439b25	Add Chinese punctuation Add Chinese punctuation.	2017-09-19 16:58:42 +08:00
Yu-chun Huang	1f1f35dcd0	Add Chinese punctuation Add Chinese punctuation.	2017-09-19 16:57:24 +08:00
Ines Montani	4bee26188d	Merge pull request #1323 from galaxyh/master Set the "cut_all" parameter in jieba.cut() to False, or jieba will return ALL POSSIBLE word segmentations.	2017-09-14 15:23:41 +02:00
Yu-chun Huang	7692b8c071	Update __init__.py Set the "cut_all" parameter to False, or jieba will return ALL POSSIBLE word segmentations.	2017-09-12 16:23:47 +08:00
Matthew Honnibal	ddaff6ca56	Merge pull request #1287 from IamJeffG/feature/1226-more-complete-noun-chunks Capture more noun chunks	2017-09-08 07:59:10 +02:00
Matthew Honnibal	45029a550e	Fix customized-tokenizer tests	2017-09-04 20:13:13 +02:00
Matthew Honnibal	34c585396a	Merge pull request #1294 from Vimos/master Fix issue #1292 and add test case for the Assertion Error	2017-09-04 19:20:40 +02:00
Matthew Honnibal	c68f188eb0	Fix error on test	2017-09-04 18:59:36 +02:00
Matthew Honnibal	33313c01ad	Merge pull request #1298 from ericzhao28/master Lowest common ancestor matrix for spans and docs	2017-09-04 18:57:54 +02:00
Matthew Honnibal	e8a26ebfab	Add efficiency note to new get_lca_matrix() method	2017-09-04 15:43:52 +02:00
Eric Zhao	d61c117081	Lowest common ancestor matrix for spans and docs Added functionality for spans and docs to get lowest common ancestor matrix by simply calling: doc.get_lca_matrix() or doc[:3].get_lca_matrix(). Corresponding unit tests were also added under spacy/tests/doc and spacy/tests/spans. Designed to address: https://github.com/explosion/spaCy/issues/969.	2017-09-03 12:22:19 -07:00
Matthew Honnibal	9bffcaa73d	Update test to make it slightly more direct The `nlp` container should be unnecessary here. If so, we can test the tokenizer class just a little more directly.	2017-09-01 21:16:56 +02:00
Vimos Tan	a6d9fb5bb6	fix issue #1292	2017-08-30 14:49:14 +08:00
Jeffrey Gerard	884ba168a8	Capture more noun chunks	2017-08-23 21:18:53 -07:00
ines	dcff10abe9	Add regression test for #1281	2017-08-21 16:11:47 +02:00
ines	edc596d9a7	Add missing tokenizer exceptions (resolves #1281 )	2017-08-21 16:11:36 +02:00
ines	c5c3f4c7d9	Use more generous .env ignore rule	2017-08-21 16:08:40 +02:00
Ines Montani	dca026124f	Merge pull request #1262 from kevinmarsh/patch-1 Fix broken tutorial link on website	2017-08-16 09:58:07 +02:00
Kevin Marsh	e3738aba0d	Fix broken tutorial link on website	2017-08-15 21:50:09 +01:00
Ines Montani	a9465271a7	Merge pull request #1245 from delirious-lettuce/fix_typos Fix typos	2017-08-07 23:11:20 +02:00
Delirious Lettuce	d3b03f0544	Fix typos: * `auxillary` -> `auxiliary` * `consistute` -> `constitute` * `earlist` -> `earliest` * `prefered` -> `preferred` * `direcory` -> `directory` * `reuseable` -> `reusable` * `idiosyncracies` -> `idiosyncrasies` * `enviroment` -> `environment` * `unecessary` -> `unnecessary` * `yesteday` -> `yesterday` * `resouces` -> `resources`	2017-08-06 21:31:39 -06:00
Matthew Honnibal	b7b121103f	Merge pull request #1244 from gideonite/patch-1 improve pipe, tee, izip explanation	2017-08-06 14:34:07 +02:00
Gideon Dresdner	7e98a3613c	improve pipe, tee, izip explanation Use an example from an old issue https://github.com/explosion/spaCy/issues/172#issuecomment-183963403.	2017-08-06 13:21:45 +02:00
ines	864cefd3b2	Update README.rst	2017-07-22 18:29:55 +02:00
ines	e349271506	Increment version	2017-07-22 18:29:30 +02:00
Ines Montani	570964e67f	Update README.rst	2017-07-22 16:20:19 +02:00
Matthew Honnibal	5494605689	Fiddle with regex pin	2017-07-22 16:09:50 +02:00
Matthew Honnibal	78fcf56dd5	Update version pin for regex library	2017-07-22 15:57:58 +02:00
Matthew Honnibal	d51d55bba6	Increment version	2017-07-22 15:43:16 +02:00
Matthew Honnibal	8ccf154413	Merge branch 'master' of https://github.com/explosion/spaCy	2017-07-22 15:42:44 +02:00
Matthew Honnibal	796b2f4c1b	Remove print statements in tests	2017-07-22 15:42:38 +02:00
ines	7c4bf9994d	Add note on requirements and preventing model re-downloads (closes #1143 )	2017-07-22 15:40:12 +02:00
ines	de25bad036	Use lower min version for requests dependency (fixes #1137 ) Ensure compatibility with docker-compose and other packages	2017-07-22 15:29:10 +02:00
ines	d7560047c5	Fix version	2017-07-22 15:24:33 +02:00
Matthew Honnibal	af945ea8e2	Merge branch 'master' of https://github.com/explosion/spaCy	2017-07-22 15:09:59 +02:00
Matthew Honnibal	4b2e5e59ed	Add flush_cache method to tokenizer, to fix #1061 The tokenizer caches output for common chunks, for efficiency. This cache is be invalidated when the tokenizer rules change, e.g. when a new special-case rule is introduced. That's what was causing #1061. When the cache is flushed, we free the intermediate token chunks. I think this is safe --- but if we start getting segfaults, this patch is to blame. The resolution would be to simply not free those bits of memory. They'll be freed when the tokenizer exits anyway.	2017-07-22 15:06:50 +02:00
Ines Montani	96df9c7154	Update CONTRIBUTORS.md	2017-07-22 15:05:46 +02:00
ines	b22b18a019	Add notes on spacy.explain() to annotation docs	2017-07-22 15:02:15 +02:00
ines	e3f23f9d91	Use latest available version in examples	2017-07-22 14:57:51 +02:00

1 2 3 4 5 ...

5198 Commits