spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-02-09 16:10:33 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	87613edf8f	Add set_struct_attr staticmethod to token	2016-11-25 12:41:47 +01:00
Matthew Honnibal	fb69aa648f	Merge branch 'master' of ssh://github.com/explosion/spaCy	2016-11-25 11:35:44 +01:00
Matthew Honnibal	9a03a3f85e	Add get_struct_attr staticmethod to Token, to match Lexeme.get_struct_attr.	2016-11-25 11:35:17 +01:00
Matthew Honnibal	53d8ca8f51	Add spacy.attrs.intify_attrs function, to normalize strings in token attribute dictionaries.	2016-11-25 11:34:30 +01:00
Ines Montani	d21ad01840	Add emoticons	2016-11-24 19:13:00 +01:00
Ines Montani	4dcfafde02	Add line breaks	2016-11-24 14:57:37 +01:00
Ines Montani	6247c005a2	Add test for tokenizer regular expressions	2016-11-24 13:51:59 +01:00
Ines Montani	de747e39e7	Reformat language data	2016-11-24 13:51:32 +01:00
Matthew Honnibal	b8c4f5ea76	Allow German noun chunks to work on Span Update the German noun chunks iterator, so that it also works on Span objects.	2016-11-24 23:30:15 +11:00
Pokey Rule	3e3bda142d	Add noun_chunks to Span	2016-11-24 10:47:20 +00:00
Matthew Honnibal	09f68bc641	Fix Issue #639 : stop words in language class not used. This patch is messy, but it's better not to change too much until the language data loading can be properly refactored.	2016-11-24 00:13:55 +01:00
Matthew Honnibal	48e1dc29d4	Fix default path loading.	2016-11-23 23:48:55 +01:00
Matthew Honnibal	e01c1875ee	Work on test for #615	2016-11-23 23:48:41 +01:00
ExplodingCabbage	6c4f488e89	Fix syntax mistake	2016-11-23 15:12:45 +00:00
Matthew Honnibal	60eb2343ce	Only try to load vectors if they exist.	2016-11-23 13:50:24 +01:00
Matthew Honnibal	618ac36093	Fix use of path argument in Language.__init__. Needs to be keyword arg, not positional.	2016-11-23 13:26:34 +01:00
Mark Amery	fbe19680a6	Fix another bug related to Language.__init__'s path parameter	2016-11-20 20:31:34 +00:00
Mark Amery	b0a07c21a0	Fix `path` param of `Language.__init__` always being ignored There was an explicitly-declared `path` keyword argument, so 'path' would never be present in `**overrides`. This line just overwrote any manually-specified value the user might've passed to the `path` parameter.	2016-11-20 16:29:57 +00:00
Mark Amery	1988fce389	Merge remote-tracking branch 'origin/master' into specify-data-path	2016-11-20 16:07:14 +00:00
Mark Amery	3871007c72	Let --data-path be specified when running download.py scripts Resolves https://github.com/explosion/spaCy/issues/637	2016-11-20 15:48:04 +00:00
Ines Montani	dad2c6cae9	Strip trailing whitespace	2016-11-20 16:45:51 +01:00
Ines Montani	3082e49326	Update and reformat German stopwords	2016-11-20 16:45:26 +01:00
Sourav Singh	6745eac309	Update language_data.py	2016-11-20 19:52:02 +05:30
Sourav Singh	4d9aae7d6a	Add German Stopwords	2016-11-19 22:47:53 +05:30
Matthew Honnibal	7afb2544a7	Merge pull request #627 from sadovnychyi/patch-1 Remove duplicated line of vocab declaration	2016-11-16 06:09:18 +11:00
Yanhao	762169da29	Fixed bug: eg.guess is a tag id, rather than tag	2016-11-15 14:11:22 +08:00
Dmytro Sadovnychyi	e70a7050e1	Remove duplicated line of vocab declaration As already declared on line 211.	2016-11-13 18:52:49 +08:00
Matthew Honnibal	f123f92e0c	Fix #617 : Vocab.load() required Path. Should work with string as well.	2016-11-10 22:48:48 +01:00
Matthew Honnibal	e86f440ca6	Fix test for issue 617	2016-11-10 22:48:10 +01:00
Matthew Honnibal	faa7610c56	Merge branch 'master' of ssh://github.com/explosion/spaCy	2016-11-10 22:46:38 +01:00
Matthew Honnibal	a2c7de8329	spacy/tests/regression/test_issue617.py Test Issue #617	2016-11-10 22:46:23 +01:00
tiago	2a3e342c1f	Added a test case to cover the span.merge returning values	2016-11-09 18:57:50 +00:00
tiago	b38cfd0ef9	now span.merge returns token like it says on documentation	2016-11-09 14:58:19 +00:00
Dmitry Sadovnychyi	9488222e79	Fix PhraseMatcher to work with updated Matcher #613	2016-11-09 00:14:26 +08:00
Dmitry Sadovnychyi	86c056ba64	Add basic test for PhraseMatcher #613	2016-11-09 00:10:32 +08:00
Matthew Honnibal	3ea15b257f	Fix test for 605	2016-11-06 11:59:26 +01:00
Matthew Honnibal	efe7790439	Test #590 : Order dependence in Matcher rules.	2016-11-06 11:21:36 +01:00
Matthew Honnibal	5cd3acb265	Fix #605 : Acceptor now rejects matches as expected.	2016-11-06 10:50:42 +01:00
Matthew Honnibal	75805397dd	Test Issue #605	2016-11-06 10:42:32 +01:00
Matthew Honnibal	014b6936ac	Fix #608 -- __version__ should be available at the base of the package.	2016-11-04 21:21:02 +01:00
Matthew Honnibal	42b0736db7	Increment version	2016-11-04 20:04:21 +01:00
Matthew Honnibal	9f93386994	Update version	2016-11-04 19:28:16 +01:00
Matthew Honnibal	1fb09c3dc1	Fix morphology tagger	2016-11-04 19:19:09 +01:00
Matthew Honnibal	a36353df47	Temporarily put back the tokenize_from_strings method, while tests aren't updated yet.	2016-11-04 19:18:07 +01:00
Matthew Honnibal	f0917b6808	Fix Issue #376 : and/or was tagged as a noun.	2016-11-04 15:21:28 +01:00
Matthew Honnibal	737816e86e	Fix #368 : Tokenizer handled pattern 'unicode close quote, period' incorrectly.	2016-11-04 15:16:20 +01:00
Matthew Honnibal	ab952b4756	Fix #578 -- Sputnik had been purging all files on --force, not just the relevant one.	2016-11-04 10:44:11 +01:00
Matthew Honnibal	6e37ba1d82	Fix #602 , #603 --- Broken build	2016-11-04 09:54:24 +01:00
Matthew Honnibal	293c79c09a	Fix #595 : Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly.	2016-11-04 00:29:07 +01:00
Matthew Honnibal	e30348b331	Prefer to import from symbols instead of parts_of_speech	2016-11-04 00:27:55 +01:00
Matthew Honnibal	4a8a2b6001	Test #595 -- Bug in lemmatization of base forms.	2016-11-04 00:27:32 +01:00
Matthew Honnibal	f1605df2ec	Fix #588 : Matcher should reject empty pattern.	2016-11-03 00:16:44 +01:00
Matthew Honnibal	72b9bd57ec	Test Issue #588 : Matcher accepts invalid, empty patterns.	2016-11-03 00:09:35 +01:00
Matthew Honnibal	41a90a7fbb	Add tokenizer exception for 'Ph.D.', to fix 592.	2016-11-03 00:03:34 +01:00
Matthew Honnibal	532318e80b	Import Jieba inside zh.make_doc	2016-11-02 23:49:19 +01:00
Matthew Honnibal	f292f7f0e6	Fix Issue #599 , by considering empty documents to be parsed and tagged. Implementation is a bit dodgy.	2016-11-02 23:48:43 +01:00
Matthew Honnibal	b6b01d4680	Remove deprecated tokens_from_list test.	2016-11-02 23:47:21 +01:00
Matthew Honnibal	3d6c79e595	Test Issue #599 : .is_tagged and .is_parsed attributes not reflected after deserialization for empty documents.	2016-11-02 23:40:11 +01:00
Matthew Honnibal	05a8b752a2	Fix Issue #600 : Missing setters for Token attribute.	2016-11-02 23:28:59 +01:00
Matthew Honnibal	125c910a8d	Test Issue #600	2016-11-02 23:24:13 +01:00
Matthew Honnibal	e0c9695615	Fix doc strings for tokenizer	2016-11-02 23:15:39 +01:00
Matthew Honnibal	80824f6d29	Fix test	2016-11-02 20:48:40 +01:00
Matthew Honnibal	dbe47902bc	Add import fr	2016-11-02 20:48:29 +01:00
Matthew Honnibal	8f24dc1982	Fix infixes in Italian	2016-11-02 20:43:52 +01:00
Matthew Honnibal	41a4766c1c	Fix infixes in spanish and portuguese	2016-11-02 20:43:12 +01:00
Matthew Honnibal	3d4bd96e8a	Fix infixes in french	2016-11-02 20:41:43 +01:00
Matthew Honnibal	c09a8ce5bb	Add test for french tokenizer	2016-11-02 20:40:31 +01:00
Matthew Honnibal	b012ae3044	Add test for loading languages	2016-11-02 20:38:48 +01:00
Matthew Honnibal	ad1c747c6b	Fix stray POS in language stubs	2016-11-02 20:37:55 +01:00
Matthew Honnibal	e9e6fce576	Handle null prefix/suffix/infix search in tokenizer	2016-11-02 20:35:48 +01:00
Matthew Honnibal	22647c2423	Check that patterns aren't null before compiling regex for tokenizer	2016-11-02 20:35:29 +01:00
Matthew Honnibal	5ac735df33	Link languages in __init__.py	2016-11-02 20:05:14 +01:00
Matthew Honnibal	c68dfe2965	Stub out support for Italian	2016-11-02 20:03:24 +01:00
Matthew Honnibal	6dbf4f7ad7	Stub out support for French, Spanish, Italian and Portuguese	2016-11-02 20:02:41 +01:00
Matthew Honnibal	6b8b05ef83	Specify that spacy.util is encoded in utf8	2016-11-02 19:58:00 +01:00
Matthew Honnibal	5363224395	Add draft Jieba tokenizer for Chinese	2016-11-02 19:57:38 +01:00
Matthew Honnibal	f7fee6c24b	Check for class-defined make_docs method before assigning one provided as an argument	2016-11-02 19:57:13 +01:00
Matthew Honnibal	19c1e83d3d	Work on draft Italian tokenizer	2016-11-02 19:56:32 +01:00
Matthew Honnibal	9efe568177	Add missing unicode_literals to spacy.util. I think this was messing up the tokenizer regex for non-ascii characters in Python 2. Re Issue #596	2016-11-02 12:31:34 +01:00
Matthew Honnibal	d8db648ebf	Add __init__.py file for regression tests	2016-11-01 13:45:06 +01:00
Matthew Honnibal	11664b9f20	Fix variable error in token	2016-11-01 13:28:00 +01:00
Matthew Honnibal	8c4d1b46ce	Fix variable error in Span	2016-11-01 13:27:44 +01:00
Matthew Honnibal	e7af6b937f	Fix syntax error while fixing doc strings	2016-11-01 13:27:32 +01:00
Matthew Honnibal	62fc6b1afa	Use 32 bit hashes for OOV, re Issue #589 , Issue #285	2016-11-01 13:27:13 +01:00
Matthew Honnibal	6977a2b8cd	Add test for Issue #589	2016-11-01 12:33:36 +01:00
Matthew Honnibal	b86f8af0c1	Fix doc strings	2016-11-01 12:25:36 +01:00
Matthew Honnibal	d563f1eadb	Fix Issue #587 : Segfault in Matcher, due to simple error in the state machine.	2016-10-28 17:42:00 +02:00
Matthew Honnibal	7e5f63a595	Improve test slightly	2016-10-28 17:41:16 +02:00
Matthew Honnibal	782e4814f4	Test Issue #587 : Matcher segfaults on particular input	2016-10-28 16:38:32 +02:00
Matthew Honnibal	708ea22208	Infer types in transition_system.pyx	2016-10-27 18:08:13 +02:00
Matthew Honnibal	18590eba94	Fix training evaluate method	2016-10-27 18:02:19 +02:00
Matthew Honnibal	301f3cc898	Fix Issue #429 . Add an initialize_state method to the named entity recogniser that adds missing entity types. This is a messy place to add this, because it's strange to have the method mutate state. A better home for this logic could be found.	2016-10-27 18:01:55 +02:00
Matthew Honnibal	afea6505f3	Test Issue 429: No valid actions for NER after matcher adds a new entity label.	2016-10-27 18:01:34 +02:00
Matthew Honnibal	03a520ec4f	Change signature of Parser.parseC, so that nr_class is read from the transition system. This allows the transition system to modify the number of actions in initialize_state.	2016-10-27 17:58:56 +02:00
Matthew Honnibal	6c47048912	Fix test, after IOB tweak.	2016-10-26 17:22:03 +02:00
Matthew Honnibal	4ca31b4d87	Fix clobbering of 'missing' named ent values after assigning ents.	2016-10-26 13:13:56 +02:00
Matthew Honnibal	cb49189477	Remove dead code	2016-10-26 13:11:07 +02:00
Matthew Honnibal	a209b10579	Improve error message when oracle fails for non-projective trees, re Issue #571 .	2016-10-24 20:31:30 +02:00
Matthew Honnibal	b2d43b93d2	Fix Python 3 basestring error	2016-10-24 14:22:51 +02:00
Matthew Honnibal	276478fe0f	Update strings.pxd	2016-10-24 14:00:35 +02:00

1 2 3 4 5 ...

1972 Commits