spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-17 15:41:59 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	60eb2343ce	Only try to load vectors if they exist.	2016-11-23 13:50:24 +01:00
Matthew Honnibal	618ac36093	Fix use of path argument in Language.__init__. Needs to be keyword arg, not positional.	2016-11-23 13:26:34 +01:00
Mark Amery	fbe19680a6	Fix another bug related to Language.__init__'s path parameter	2016-11-20 20:31:34 +00:00
Mark Amery	b0a07c21a0	Fix `path` param of `Language.__init__` always being ignored There was an explicitly-declared `path` keyword argument, so 'path' would never be present in `**overrides`. This line just overwrote any manually-specified value the user might've passed to the `path` parameter.	2016-11-20 16:29:57 +00:00
Mark Amery	1988fce389	Merge remote-tracking branch 'origin/master' into specify-data-path	2016-11-20 16:07:14 +00:00
Mark Amery	3871007c72	Let --data-path be specified when running download.py scripts Resolves https://github.com/explosion/spaCy/issues/637	2016-11-20 15:48:04 +00:00
Ines Montani	dad2c6cae9	Strip trailing whitespace	2016-11-20 16:45:51 +01:00
Ines Montani	3082e49326	Update and reformat German stopwords	2016-11-20 16:45:26 +01:00
Sourav Singh	6745eac309	Update language_data.py	2016-11-20 19:52:02 +05:30
Sourav Singh	4d9aae7d6a	Add German Stopwords	2016-11-19 22:47:53 +05:30
Matthew Honnibal	7afb2544a7	Merge pull request #627 from sadovnychyi/patch-1 Remove duplicated line of vocab declaration	2016-11-16 06:09:18 +11:00
Yanhao	762169da29	Fixed bug: eg.guess is a tag id, rather than tag	2016-11-15 14:11:22 +08:00
Dmytro Sadovnychyi	e70a7050e1	Remove duplicated line of vocab declaration As already declared on line 211.	2016-11-13 18:52:49 +08:00
Matthew Honnibal	f123f92e0c	Fix #617 : Vocab.load() required Path. Should work with string as well.	2016-11-10 22:48:48 +01:00
Matthew Honnibal	e86f440ca6	Fix test for issue 617	2016-11-10 22:48:10 +01:00
Matthew Honnibal	faa7610c56	Merge branch 'master' of ssh://github.com/explosion/spaCy	2016-11-10 22:46:38 +01:00
Matthew Honnibal	a2c7de8329	spacy/tests/regression/test_issue617.py Test Issue #617	2016-11-10 22:46:23 +01:00
tiago	2a3e342c1f	Added a test case to cover the span.merge returning values	2016-11-09 18:57:50 +00:00
tiago	b38cfd0ef9	now span.merge returns token like it says on documentation	2016-11-09 14:58:19 +00:00
Dmitry Sadovnychyi	9488222e79	Fix PhraseMatcher to work with updated Matcher #613	2016-11-09 00:14:26 +08:00
Dmitry Sadovnychyi	86c056ba64	Add basic test for PhraseMatcher #613	2016-11-09 00:10:32 +08:00
Matthew Honnibal	3ea15b257f	Fix test for 605	2016-11-06 11:59:26 +01:00
Matthew Honnibal	efe7790439	Test #590 : Order dependence in Matcher rules.	2016-11-06 11:21:36 +01:00
Matthew Honnibal	5cd3acb265	Fix #605 : Acceptor now rejects matches as expected.	2016-11-06 10:50:42 +01:00
Matthew Honnibal	75805397dd	Test Issue #605	2016-11-06 10:42:32 +01:00
Matthew Honnibal	014b6936ac	Fix #608 -- __version__ should be available at the base of the package.	2016-11-04 21:21:02 +01:00
Matthew Honnibal	42b0736db7	Increment version	2016-11-04 20:04:21 +01:00
Matthew Honnibal	9f93386994	Update version	2016-11-04 19:28:16 +01:00
Matthew Honnibal	1fb09c3dc1	Fix morphology tagger	2016-11-04 19:19:09 +01:00
Matthew Honnibal	a36353df47	Temporarily put back the tokenize_from_strings method, while tests aren't updated yet.	2016-11-04 19:18:07 +01:00
Matthew Honnibal	f0917b6808	Fix Issue #376 : and/or was tagged as a noun.	2016-11-04 15:21:28 +01:00
Matthew Honnibal	737816e86e	Fix #368 : Tokenizer handled pattern 'unicode close quote, period' incorrectly.	2016-11-04 15:16:20 +01:00
Matthew Honnibal	ab952b4756	Fix #578 -- Sputnik had been purging all files on --force, not just the relevant one.	2016-11-04 10:44:11 +01:00
Matthew Honnibal	6e37ba1d82	Fix #602 , #603 --- Broken build	2016-11-04 09:54:24 +01:00
Matthew Honnibal	293c79c09a	Fix #595 : Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly.	2016-11-04 00:29:07 +01:00
Matthew Honnibal	e30348b331	Prefer to import from symbols instead of parts_of_speech	2016-11-04 00:27:55 +01:00
Matthew Honnibal	4a8a2b6001	Test #595 -- Bug in lemmatization of base forms.	2016-11-04 00:27:32 +01:00
Matthew Honnibal	f1605df2ec	Fix #588 : Matcher should reject empty pattern.	2016-11-03 00:16:44 +01:00
Matthew Honnibal	72b9bd57ec	Test Issue #588 : Matcher accepts invalid, empty patterns.	2016-11-03 00:09:35 +01:00
Matthew Honnibal	41a90a7fbb	Add tokenizer exception for 'Ph.D.', to fix 592.	2016-11-03 00:03:34 +01:00
Matthew Honnibal	532318e80b	Import Jieba inside zh.make_doc	2016-11-02 23:49:19 +01:00
Matthew Honnibal	f292f7f0e6	Fix Issue #599 , by considering empty documents to be parsed and tagged. Implementation is a bit dodgy.	2016-11-02 23:48:43 +01:00
Matthew Honnibal	b6b01d4680	Remove deprecated tokens_from_list test.	2016-11-02 23:47:21 +01:00
Matthew Honnibal	3d6c79e595	Test Issue #599 : .is_tagged and .is_parsed attributes not reflected after deserialization for empty documents.	2016-11-02 23:40:11 +01:00
Matthew Honnibal	05a8b752a2	Fix Issue #600 : Missing setters for Token attribute.	2016-11-02 23:28:59 +01:00
Matthew Honnibal	125c910a8d	Test Issue #600	2016-11-02 23:24:13 +01:00
Matthew Honnibal	e0c9695615	Fix doc strings for tokenizer	2016-11-02 23:15:39 +01:00
Matthew Honnibal	80824f6d29	Fix test	2016-11-02 20:48:40 +01:00
Matthew Honnibal	dbe47902bc	Add import fr	2016-11-02 20:48:29 +01:00
Matthew Honnibal	8f24dc1982	Fix infixes in Italian	2016-11-02 20:43:52 +01:00
Matthew Honnibal	41a4766c1c	Fix infixes in spanish and portuguese	2016-11-02 20:43:12 +01:00
Matthew Honnibal	3d4bd96e8a	Fix infixes in french	2016-11-02 20:41:43 +01:00
Matthew Honnibal	c09a8ce5bb	Add test for french tokenizer	2016-11-02 20:40:31 +01:00
Matthew Honnibal	b012ae3044	Add test for loading languages	2016-11-02 20:38:48 +01:00
Matthew Honnibal	ad1c747c6b	Fix stray POS in language stubs	2016-11-02 20:37:55 +01:00
Matthew Honnibal	e9e6fce576	Handle null prefix/suffix/infix search in tokenizer	2016-11-02 20:35:48 +01:00
Matthew Honnibal	22647c2423	Check that patterns aren't null before compiling regex for tokenizer	2016-11-02 20:35:29 +01:00
Matthew Honnibal	5ac735df33	Link languages in __init__.py	2016-11-02 20:05:14 +01:00
Matthew Honnibal	c68dfe2965	Stub out support for Italian	2016-11-02 20:03:24 +01:00
Matthew Honnibal	6dbf4f7ad7	Stub out support for French, Spanish, Italian and Portuguese	2016-11-02 20:02:41 +01:00
Matthew Honnibal	6b8b05ef83	Specify that spacy.util is encoded in utf8	2016-11-02 19:58:00 +01:00
Matthew Honnibal	5363224395	Add draft Jieba tokenizer for Chinese	2016-11-02 19:57:38 +01:00
Matthew Honnibal	f7fee6c24b	Check for class-defined make_docs method before assigning one provided as an argument	2016-11-02 19:57:13 +01:00
Matthew Honnibal	19c1e83d3d	Work on draft Italian tokenizer	2016-11-02 19:56:32 +01:00
Matthew Honnibal	9efe568177	Add missing unicode_literals to spacy.util. I think this was messing up the tokenizer regex for non-ascii characters in Python 2. Re Issue #596	2016-11-02 12:31:34 +01:00
Matthew Honnibal	d8db648ebf	Add __init__.py file for regression tests	2016-11-01 13:45:06 +01:00
Matthew Honnibal	11664b9f20	Fix variable error in token	2016-11-01 13:28:00 +01:00
Matthew Honnibal	8c4d1b46ce	Fix variable error in Span	2016-11-01 13:27:44 +01:00
Matthew Honnibal	e7af6b937f	Fix syntax error while fixing doc strings	2016-11-01 13:27:32 +01:00
Matthew Honnibal	62fc6b1afa	Use 32 bit hashes for OOV, re Issue #589 , Issue #285	2016-11-01 13:27:13 +01:00
Matthew Honnibal	6977a2b8cd	Add test for Issue #589	2016-11-01 12:33:36 +01:00
Matthew Honnibal	b86f8af0c1	Fix doc strings	2016-11-01 12:25:36 +01:00
Matthew Honnibal	d563f1eadb	Fix Issue #587 : Segfault in Matcher, due to simple error in the state machine.	2016-10-28 17:42:00 +02:00
Matthew Honnibal	7e5f63a595	Improve test slightly	2016-10-28 17:41:16 +02:00
Matthew Honnibal	782e4814f4	Test Issue #587 : Matcher segfaults on particular input	2016-10-28 16:38:32 +02:00
Matthew Honnibal	708ea22208	Infer types in transition_system.pyx	2016-10-27 18:08:13 +02:00
Matthew Honnibal	18590eba94	Fix training evaluate method	2016-10-27 18:02:19 +02:00
Matthew Honnibal	301f3cc898	Fix Issue #429 . Add an initialize_state method to the named entity recogniser that adds missing entity types. This is a messy place to add this, because it's strange to have the method mutate state. A better home for this logic could be found.	2016-10-27 18:01:55 +02:00
Matthew Honnibal	afea6505f3	Test Issue 429: No valid actions for NER after matcher adds a new entity label.	2016-10-27 18:01:34 +02:00
Matthew Honnibal	03a520ec4f	Change signature of Parser.parseC, so that nr_class is read from the transition system. This allows the transition system to modify the number of actions in initialize_state.	2016-10-27 17:58:56 +02:00
Matthew Honnibal	6c47048912	Fix test, after IOB tweak.	2016-10-26 17:22:03 +02:00
Matthew Honnibal	4ca31b4d87	Fix clobbering of 'missing' named ent values after assigning ents.	2016-10-26 13:13:56 +02:00
Matthew Honnibal	cb49189477	Remove dead code	2016-10-26 13:11:07 +02:00
Matthew Honnibal	a209b10579	Improve error message when oracle fails for non-projective trees, re Issue #571 .	2016-10-24 20:31:30 +02:00
Matthew Honnibal	b2d43b93d2	Fix Python 3 basestring error	2016-10-24 14:22:51 +02:00
Matthew Honnibal	276478fe0f	Update strings.pxd	2016-10-24 14:00:35 +02:00
Matthew Honnibal	d8134817ff	Workaround Issue #285 : Allow the StringStore to be 'frozen', in which case strings will be pushed into an OOV map. We can then flush this OOV map, freeing all of the OOV strings.	2016-10-24 13:49:03 +02:00
Matthew Honnibal	d3a617aa99	Test workaround for Issue #285 : Streaming data memory growth	2016-10-24 13:48:06 +02:00
Matthew Honnibal	64e5f02cf7	Update test	2016-10-23 21:08:07 +02:00
Matthew Honnibal	66d7a6eca2	Update test	2016-10-23 21:02:05 +02:00
Matthew Honnibal	90bf797125	Update test	2016-10-23 20:54:17 +02:00
Matthew Honnibal	5e76320ffe	Update test	2016-10-23 20:44:54 +02:00
Matthew Honnibal	aa105927f3	Update test	2016-10-23 20:31:25 +02:00
Matthew Honnibal	6b9237aa83	Increment version	2016-10-23 20:22:53 +02:00
Matthew Honnibal	150e02d72e	Fix Issue #566	2016-10-23 20:19:01 +02:00
Matthew Honnibal	e120561294	Fix vector_norm test.	2016-10-23 19:56:16 +02:00
Matthew Honnibal	fefde8aef8	Make installation print data path.	2016-10-23 19:46:44 +02:00
Matthew Honnibal	e7414cd064	Try to fix weird install glitch.	2016-10-23 19:46:28 +02:00
Matthew Honnibal	90f7544edd	Increment version	2016-10-23 19:43:06 +02:00
Matthew Honnibal	6036ec7c77	Fix vector norm when loading lexemes.	2016-10-23 19:40:18 +02:00
Matthew Honnibal	c05cd2356e	Fix similarity test for Python 3	2016-10-23 18:16:56 +02:00
Matthew Honnibal	3e688e6d4b	Fix issue #514 -- serializer fails when new entity type has been added. The fix here is quite ugly. It's best to add the entities ASAP after loading the NLP pipeline, to mitigate the brittleness.	2016-10-23 17:45:44 +02:00
Matthew Honnibal	79aa03fe98	Test Issue #514 : Serializer fails when new entity type has been added.	2016-10-23 17:41:44 +02:00
Matthew Honnibal	f97548c6f1	Fix broken test, re Issue #461	2016-10-23 17:02:23 +02:00
Matthew Honnibal	4de30a8e38	Test Issue #514 : Serialization fails after adding a new entity label.	2016-10-23 16:40:27 +02:00
Matthew Honnibal	936e6246aa	Fix Issue #459 -- failed to deserialize empty doc.	2016-10-23 16:31:05 +02:00
Matthew Honnibal	e99b3f5322	Test Issue #459 : Fail to deserialize empty doc	2016-10-23 16:30:22 +02:00
Matthew Honnibal	49c117960c	Fix bug where huffman codec died if given empty freqs dict.	2016-10-23 16:28:05 +02:00
Matthew Honnibal	99ff8b902f	Test that huffman codec works with empty freqs dict	2016-10-23 16:27:45 +02:00
Matthew Honnibal	15c9b59f0e	Fix Issue #461 : O tag was being clobbered by doc.ents.__set__	2016-10-23 15:50:26 +02:00
Matthew Honnibal	e5627134d9	Test Issue #461 : ent_iob tag incorrect after setting entities.	2016-10-23 15:50:04 +02:00
Matthew Honnibal	f62088d646	Fix compile error	2016-10-23 14:50:50 +02:00
Matthew Honnibal	2c3a67b693	Fix calculation of vector norm, re Issue #522 . Need to consolidate the calculations into a helper function.	2016-10-23 14:49:31 +02:00
Matthew Honnibal	a0a4ada42a	Fix calculation of L2-norm for Lexeme	2016-10-23 14:44:45 +02:00
Matthew Honnibal	2989072aac	Add tests to verify that Issue #442 is fixed in 1.1	2016-10-23 14:33:13 +02:00
Matthew Honnibal	739213a8af	Fix create_pipeline keyword argument.	2016-10-23 14:24:16 +02:00
Matthew Honnibal	bea44bd3c4	Fix vector_norm when vector is assigned to Lexeme.	2016-10-23 14:23:56 +02:00
Matthew Honnibal	e838b6d53f	Add tests for using the new Entity ID tracking in the rule matcher	2016-10-23 14:04:01 +02:00
Matthew Honnibal	e7af75e0a9	Add test for vector resizing, re Issue #544	2016-10-21 17:07:21 +02:00
Matthew Honnibal	ca8ea33abc	Bump version to 1.1.0	2016-10-21 16:30:57 +02:00
Matthew Honnibal	7ab03050d4	Add resize_vectors method to Vocab	2016-10-21 01:44:50 +02:00
Matthew Honnibal	8ce8803824	Fix JSON in tokenizer	2016-10-21 01:44:20 +02:00
Matthew Honnibal	6eb73a095f	Fix JSON in tagger	2016-10-21 01:44:10 +02:00
Matthew Honnibal	e16e78a737	Merge branch 'master' of ssh://github.com/explosion/spaCy	2016-10-21 00:00:15 +02:00
Matthew Honnibal	147373c807	Increment version	2016-10-21 00:00:03 +02:00
Matthew Honnibal	e80944276f	Fix Span.vector_norm	2016-10-20 21:58:56 +02:00
Matthew Honnibal	f5fe4f595b	Fix json loading, for Python 3.	2016-10-20 21:23:26 +02:00
Matthew Honnibal	2e92c6fb3a	Fix JSON encoding issue on load	2016-10-20 21:06:48 +02:00
Matthew Honnibal	4ad7bb96c9	Increment version.	2016-10-20 20:48:30 +02:00
Matthew Honnibal	5ec32f5d97	Fix loading of GloVe vectors, to address Issue #541	2016-10-20 18:27:48 +02:00
Matthew Honnibal	ddeabd76c4	Fix mistake loading GloVe vectors. GloVe vectors now loaded by default if present, as promised.	2016-10-20 16:57:53 +02:00
Matthew Honnibal	bfe5cb1244	Increment version.	2016-10-20 14:52:00 +02:00
Matthew Honnibal	f189a3cb00	Fix encoding when opening files in Python 2.7, re Issue #539	2016-10-20 14:42:56 +02:00
Matthew Honnibal	c353a5214d	Increment version	2016-10-19 23:51:01 +02:00
Matthew Honnibal	d10c17f2a4	Fix Issue #536 : oov_prob was 0 for OOV words.	2016-10-19 23:38:47 +02:00
Matthew Honnibal	dfa752d064	Increment version	2016-10-19 23:19:13 +02:00
Matthew Honnibal	3588a18fb8	Fix hook names in doc	2016-10-19 21:15:16 +02:00
Matthew Honnibal	5d5742b773	Add sentiment field to doc, rename getters_for_tokens and getters_for_spans, add user_hooks field to Doc.	2016-10-19 20:54:22 +02:00
Matthew Honnibal	ed5e178817	Add sentiment property on lexeme object	2016-10-19 20:52:52 +02:00
Matthew Honnibal	d4aaf2752c	Fix issue #535 : Pipeline elements added even when data not installed.	2016-10-19 19:55:19 +02:00
Matthew Honnibal	04d1c959da	Fix version	2016-10-19 03:45:37 +02:00
Matthew Honnibal	d35aa7344e	Change version ID to make PyPi happy	2016-10-19 03:24:39 +02:00
Matthew Honnibal	89d2a5c8b3	Increment build version.	2016-10-19 03:05:17 +02:00
Matthew Honnibal	622b0a9674	Tweak download script	2016-10-19 00:52:16 +02:00
Matthew Honnibal	5a5c7192a5	Fix download.py for GloVe vectors.	2016-10-19 00:47:44 +02:00
Matthew Honnibal	edc45c19d6	Update download script	2016-10-19 00:41:14 +02:00
Matthew Honnibal	2bbb050500	Fix default of serializer_freqs	2016-10-18 19:55:41 +02:00
Matthew Honnibal	1b651db9c5	Fix parser creation in Language class.	2016-10-18 19:36:44 +02:00
Matthew Honnibal	45a6f9b9c7	Fix loading of tagger.	2016-10-18 19:33:04 +02:00
Matthew Honnibal	76c815f40d	Fix spacy.load	2016-10-18 19:23:31 +02:00

1 2 3 4 5 ...

2008 Commits