spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-03-06 12:51:26 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	c0799430a7	Make small changes to Doc.to_array * Change type-check logic to 'hasattr' (Python type-checking is brittle) * Small 'house style' edits, mostly making code more terse.	2017-10-20 11:17:00 +02:00
Ramanan Balakrishnan	fbccc8c87d	Update documentation on doc.to_array	2017-10-20 14:23:48 +05:30
Ramanan Balakrishnan	5941aa96a1	Support strings for attribute list in doc.to_array	2017-10-20 11:59:34 +05:30
Ramanan Balakrishnan	b47b4e2654	Support single value for attribute list in doc.to_scalar conversion	2017-10-18 14:43:47 +05:30
Matthew Honnibal	cd9378c8f1	Merge pull request #1423 from yuukos/master Fixed Russian tokenizer	2017-10-16 11:45:53 +02:00
Matthew Honnibal	6b0121091c	Merge pull request #1420 from polm/master [ja] Stash tokenizer output for speed	2017-10-16 10:28:22 +02:00
yuukos	34e9c6ddc0	Merge remote-tracking branch 'origin/master'	2017-10-16 13:48:10 +07:00
yuukos	92931a2efd	Merge branch 'russian_language'	2017-10-16 13:46:28 +07:00
yuukos	241d19a3e6	fixed Russian Tokenizer - added trailing space flags for tokens	2017-10-16 13:37:05 +07:00
Paul O'Leary McCann	71ae8013ec	[ja] Use user_details instead of a wrapper class Instead of using a JapaneseDoc wrapper class to store Mecab output, stash it in `user_data`. -POLM	2017-10-16 00:24:34 +09:00
Paul O'Leary McCann	43eedf73f2	[ja] Stash tokenizer output for speed Before this commit, the Mecab tokenizer had to be called twice when creating a Doc- once during tokenization and once during tagging. This creates a JapaneseDoc wrapper class for Doc that stashes the parsed tokenizer output to remove redundant processing. -POLM	2017-10-15 23:33:25 +09:00
Ines Montani	e00a6c08cf	Merge pull request #1418 from polm/master Contributor agreement	2017-10-14 15:10:58 +02:00
Paul O'Leary McCann	a31d33be06	Contributor agreement	2017-10-14 19:28:04 +09:00
Ines Montani	4b5af8bd17	Merge pull request #1414 from yuukos/master Adding Russian language support	2017-10-13 17:03:52 +02:00
Alex	95836abee1	Update CONTRIBUTORS.md	2017-10-13 21:02:19 +07:00
Alex	ce00405afc	Create yuukos.md	2017-10-13 21:00:15 +07:00
yuukos	6fb9d75bd2	fixed test with creating tokenizer	2017-10-13 15:51:03 +07:00
yuukos	a229b6e0de	added tests for Russian language added tests of creating Russian Language instance and Russian tokenizer	2017-10-13 14:04:37 +07:00
yuukos	622b6d6270	updated Russian tokenizer moved the trying to import pymorph into __init__	2017-10-13 13:57:29 +07:00
yuukos	f81dd284eb	updated spacy/__init__.py registered russian language via set_lang_class	2017-10-12 22:28:34 +07:00
yuukos	7b9491679f	added russian language support	2017-10-12 22:24:20 +07:00
yuukos	2a78f4d634	updated .gitignore file added excluding PyCharm's idea directory	2017-10-12 22:23:19 +07:00
Ines Montani	a06b84e7cc	Merge pull request #1407 from hscspring/patch-6 Update training.jade	2017-10-11 14:25:38 +02:00
Ines Montani	ffc2fef13c	Merge pull request #1411 from raphael0202/issue_1078 Resolve issue #1078 by simplifying URL pattern	2017-10-11 11:54:57 +02:00
Raphaël Bournhonesque	3452d6ce52	Resolve issue #1078 by simplifying URL pattern - avoid catastrophic backtracking - reduce character range of host name, domain name and TLD identifier	2017-10-11 11:24:00 +02:00
Yam	efe0800f91	Update training.jade fix several changes	2017-10-09 21:39:15 -05:00
Matthew Honnibal	331d338b8b	Merge pull request #1246 from polm/ja-pos-tagger [wip] Sample implementation of Japanese Tagger (ref #1214)	2017-10-09 04:00:53 +02:00
Ines Montani	d33899b60b	Merge pull request #1393 from yuukos/patch-1 Update adding-languages.jade	2017-10-06 18:03:31 +02:00
Ines Montani	e89689a31d	Update CONTRIBUTORS.md	2017-10-06 18:02:40 +02:00
Alex	763b54cbc3	Update adding-languages.jade Fixed misspellings	2017-10-06 16:30:44 +07:00
Matthew Honnibal	0e1adacaff	Merge pull request #1390 from mdcclv/contributor-mdcclv Contributor agreement for Orion Montoya @mdcclv	2017-10-06 02:39:08 +02:00
Orion Montoya	e04e11070f	Contributor agreement for Orion Montoya @mdcclv	2017-10-05 17:45:45 -04:00
Ines Montani	e77d8886f7	Update CONTRIBUTORS.md	2017-10-05 22:22:04 +02:00
Matthew Honnibal	dea81f113d	Merge pull request #1389 from mdcclv/lemmatizer_obey_exceptions Lemmatizer obey exceptions	2017-10-05 22:11:21 +02:00
Orion Montoya	b0d271809d	Unit test for lemmatizer exceptions -- copied from regression test for #1387	2017-10-05 10:49:28 -04:00
Orion Montoya	ffb50d21a0	Lemmatizer honors exceptions: Fix #1387	2017-10-05 10:49:02 -04:00
Orion Montoya	e81a608173	Regression test for lemmatizer exceptions -- demonstrate issue #1387	2017-10-05 10:47:48 -04:00
Ines Montani	678651ca98	Merge pull request #1386 from kokes/patch-1 Fixing links to SyntaxNet	2017-10-04 13:35:01 +02:00
Ondrej Kokes	a9362f1c73	Fixing links to SyntaxNet	2017-10-04 12:55:07 +02:00
Matthew Honnibal	eb72eae258	Merge pull request #1364 from Destygo/master Fixed NER model loading bug	2017-09-29 12:29:43 +02:00
Ines Montani	58bfe30a12	Merge pull request #1362 from IamJeffG/docs/custom-tokenizer Document Tokenizer(token_match) and clarify tokenizer_pseudo_code	2017-09-26 15:51:15 +02:00
Vincent Genty	259ed027af	Fixed NER model loading bug	2017-09-26 15:46:04 +02:00
Ines Montani	361211fe26	Merge pull request #1342 from wannaphongcom/master Add Thai language	2017-09-26 15:40:55 +02:00
Jeffrey Gerard	b6ebedd09c	Document Tokenizer(token_match) and clarify tokenizer_pseudo_code Closes #835 In the `tokenizer_pseudo_code` I put the `special_cases` kwarg before `find_prefix` because this now matches the order the args are used in the pseudocode, and it also matches spacy's actual code.	2017-09-25 13:13:25 -07:00
Matthew Honnibal	2f8d535f65	Merge pull request #1351 from hscspring/patch-4 Update punctuation.py	2017-09-24 12:16:39 +02:00
Matthew Honnibal	9177313063	Merge pull request #1352 from hscspring/patch-5 Update customizing-tokenizer.jade	2017-09-22 16:11:49 +02:00
Matthew Honnibal	1dbc2285b8	Merge pull request #1350 from hscspring/patch-3 Update word-vectors-similarities.jade	2017-09-22 16:11:05 +02:00
Yam	54855f0eee	Update customizing-tokenizer.jade	2017-09-22 12:15:48 +08:00
Yam	6f450306c3	Update customizing-tokenizer.jade update some codes: - `me` -> `-PRON` - `TAG` -> `POS` - `create_tokenizer` function	2017-09-22 10:53:22 +08:00
Yam	923c4c2fb2	Update punctuation.py add `……`	2017-09-22 09:50:46 +08:00

1 2 3 4 5 ...

5252 Commits