spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-07 12:26:45 +03:00

Author	SHA1	Message	Date
Ines Montani	d52b1ab245	Add unicode_literals (hopefully fixes test failure on Python 2)	2017-11-27 15:16:54 +01:00
Søren Lind Kristiansen	0ffd27b0f6	Add several Danish alternative spellings	2017-11-27 13:35:41 +01:00
Ines Montani	6362024cf8	Merge pull request #1645 from GreenRiverRUS/fix_default_meta Fixed spaCy version string in default meta	2017-11-27 11:58:02 +00:00
Vadim Mazaev	c332ffdde1	Added model command to create model from raw data: words counts, brown clusters and vectors	2017-11-27 01:21:47 +03:00
Vadim Mazaev	59f03ab1d7	Fixed spacy version string in default meta	2017-11-26 23:02:07 +03:00
Vadim Mazaev	53e7c38637	Fixed tests depends on pymorphy2	2017-11-26 21:04:44 +03:00
Vadim Mazaev	cacd859dcd	Added tag map, fixed tests fails, added more exceptions	2017-11-26 20:54:48 +03:00
Ines Montani	a7bb8f1b42	Merge pull request #1637 from sorenlind/da_tokenization Improve Danish tokenization	2017-11-26 15:41:38 +00:00
ines	c699aec089	Add offsets_from_biluo_tags helper and tests (see #1626 )	2017-11-26 16:38:01 +01:00
Søren Lind Kristiansen	ef03e9ea53	Remove unused import.	2017-11-25 13:04:02 +01:00
Søren Lind Kristiansen	6aa241bcec	Add day of month tokenizer exceptions for Danish.	2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen	0c276ed020	Add weekday abbreviations and remove abiguous month abbreviations for Danish.	2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen	056547e989	Add multiple tokenizer exceptions for Danish.	2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen	8dc265ac0c	Add test for tokenization of 'i.' for Danish.	2017-11-24 11:29:37 +01:00
Søren Lind Kristiansen	ac8116510d	Fix tokenization of 'i.' for Danish.	2017-11-24 11:16:53 +01:00
Matthew Honnibal	79f11d4f85	Pickle vectors with vocab	2017-11-23 17:19:50 +01:00
Matthew Honnibal	f29c3925ee	Fix more efficient nonproj	2017-11-23 12:48:00 +00:00
Matthew Honnibal	e10e9ad2c5	Improve efficiency of Doc.to_array	2017-11-23 12:33:27 +00:00
Matthew Honnibal	2acc907d55	Improve profiling	2017-11-23 12:33:03 +00:00
Matthew Honnibal	fa62427300	Remove lookup-based lemmatization	2017-11-23 12:32:22 +00:00
Matthew Honnibal	fb26b2cb12	Use lookup lemmatizer if lemma unset	2017-11-23 12:31:58 +00:00
Matthew Honnibal	db5c714ad2	Improve efficiency of deprojectivization	2017-11-23 12:31:34 +00:00
Matthew Honnibal	8fec7268eb	Move string cleanup under a setting flag	2017-11-23 12:19:18 +00:00
Matthew Honnibal	5949777b12	Fix misleading multi-threading docstring	2017-11-23 12:18:59 +00:00
Matthew Honnibal	542e6fd4ea	Don't remove entries from specials	2017-11-23 12:17:42 +00:00
Matthew Honnibal	30ba81f881	Merge pull request #1576 from ligser/master Actually reset caches in pipe [wip]	2017-11-23 12:54:48 +01:00
ines	c90fe92e15	Fix displaCy test	2017-11-22 05:04:39 +01:00
ines	a6f33ac27d	Fix displaCy test	2017-11-22 04:19:28 +01:00
ines	93b0be611a	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-22 00:28:55 +01:00
ines	60b4915569	Use .pos_ instead of .tags_ in displaCy by default (see #1006 )	2017-11-22 00:28:52 +01:00
Vadim Mazaev	81314f8659	Fixed tokenizer: added char classes; added first lemmatizer and tokenizer tests	2017-11-21 22:23:59 +03:00
Vadim Mazaev	52ee1f9bf9	Updated Russian Language, added lemmatizer, norm exceptions and lex attrs	2017-11-21 11:44:46 +03:00
Burton DeWilde	a5c6869b2d	Fix bug where span.orth_ != span.text (see #1612 )	2017-11-20 12:05:43 -06:00
Burton DeWilde	635792997c	Add regression test for #1612	2017-11-20 12:05:35 -06:00
ines	9a63e32f21	Add noqa to Python 2 compat variables of built-ins (see #1617 )	2017-11-20 14:03:42 +01:00
ines	d70a64d78b	Fix syntax error and formatting in test (see #1617 )	2017-11-20 14:01:25 +01:00
ines	17849dee4b	Fix French test (see #1617 )	2017-11-20 13:59:59 +01:00
Felix Sonntag	33b0f86de3	Changed tokenizer to add infix when infix_start is offset	2017-11-19 16:32:10 +01:00
Felix Sonntag	8be3392302	Added regression text for 1494	2017-11-19 16:30:35 +01:00
Motoki Wu	a52e195a0a	Fixes Issue #1207 where `noun_chunks` of `Span` gives an error. Make sure to reference `self.doc` when getting the noun chunks. Same fix as `9750a0128c`	2017-11-17 17:16:20 -08:00
Motoki Wu	b818afaa0e	Added failing test for Issue #1207 . The noun chunk iterator should work for `Doc` but not for `Span`.	2017-11-17 17:04:27 -08:00
Vadim Mazaev	a0739a06d4	Returned russian support from v1.10 branch	2017-11-17 17:06:15 +03:00
yuukos	7401152289	updated Russian tokenizer moved the trying to import pymorph into __init__	2017-11-17 17:04:50 +03:00
yuukos	3aad66cf00	added russian language support	2017-11-17 17:04:22 +03:00
ines	a3d4dd1a5d	Test adding of lots of pipeline components (see #1585 ) Just to make sure that there's no error now or in the future with adding a large number of pipeline components.	2017-11-15 17:28:06 +01:00
Roman Domrachev	61d28d03e4	Try again to do selective remove cache	2017-11-15 19:11:12 +03:00
Roman Domrachev	b3311100c7	Merge branch 'master' of github.com:explosion/spaCy	2017-11-15 18:30:04 +03:00
Matthew Honnibal	b60d92aca8	Increment version	2017-11-15 16:14:46 +01:00
Roman Domrachev	505c6a2f2f	Completely cleanup tokenizer cache Tokenizer cache can have be different keys than string That modification can slow down tokenizer and need to be measured	2017-11-15 17:55:48 +03:00
Matthew Honnibal	cf0be62096	Increment version	2017-11-15 15:00:18 +01:00
ines	97a4f9362b	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-15 14:24:00 +01:00
ines	8e65247886	Fix lex.id if vectors is None	2017-11-15 14:23:58 +01:00
Matthew Honnibal	437ad1a852	Merge pull request #1570 from explosion/feature/fix-beam-leak Fix memory leak in beam parser	2017-11-15 14:15:05 +01:00
Matthew Honnibal	2f169fdb0a	Set lex ID correctly for new tokens in Vocab	2017-11-15 13:58:03 +01:00
Matthew Honnibal	fe3c42a06b	Fix caching in tokenizer	2017-11-15 13:55:46 +01:00
Matthew Honnibal	8d692771f6	Improve profiling	2017-11-15 13:51:25 +01:00
Matthew Honnibal	b797dca977	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-15 13:11:43 +01:00
ines	c9d72de0fb	Add dummy serialization methods for Japanese and missing lang getter (resolves #1557 )	2017-11-15 12:44:02 +01:00
Matthew Honnibal	d274d3a3b9	Let beam forward use minibatches	2017-11-15 00:51:42 +01:00
Matthew Honnibal	855872f872	Remove state hashing	2017-11-14 23:36:46 +01:00
Roman Domrachev	3e21680814	Use safer method to get string without hit	2017-11-14 22:58:46 +03:00
Roman Domrachev	a33d5a068d	Try to hold origin data instead of restore it	2017-11-14 22:40:03 +03:00
Roman Domrachev	91e2fa6561	Clean all caches	2017-11-14 21:15:04 +03:00
Roman Domrachev	4e378dc4a4	Remove all obsolete code and test only initial problem	2017-11-14 20:45:04 +03:00
Roman	47ce2347b0	Create test that fails when actual cleanup caused	2017-11-14 20:28:13 +03:00
Roman	caae77f72d	Update strings.pyx	2017-11-14 19:44:40 +03:00
Roman Domrachev	3d247d2bb8	Get back previous testcase	2017-11-14 18:01:37 +03:00
Roman Domrachev	870defa815	Swap keys in proper place Remove unnecessary clear of the hits	2017-11-14 17:56:30 +03:00
Roman Domrachev	86ca434c93	Merge github.com:explosion/spaCy	2017-11-14 17:46:22 +03:00
Roman Domrachev	a2745b0e84	StringStore now actually cleaned Do not lose docs in ref tracking	2017-11-14 17:45:50 +03:00
Matthew Honnibal	2512ea9eeb	Fix memory leak in beam parser	2017-11-14 02:11:40 +01:00
Matthew Honnibal	86ddf692a1	Fix bug in limit calculation on dev data	2017-11-14 01:37:10 +01:00
Ines Montani	ea6c85c67a	Merge pull request #1566 from MathiasDesch/master (resolves #1248 ) Add exceptions to tokenizer and norm	2017-11-13 19:05:22 +01:00
Matthew Honnibal	1b348389bb	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-13 18:18:48 +01:00
Matthew Honnibal	ca73d0d8fe	Cleanup states after beam parsing, explicitly	2017-11-13 18:18:26 +01:00
Matthew Honnibal	63ef9a2e73	Remove __dealloc__ from ParserBeam	2017-11-13 18:18:08 +01:00
Mathias Deschamps	c0691b2ab4	Add tokenizer exceptions for ing verbs Extend list of tokenizing exceptions introduced in `123810b`	2017-11-13 17:46:05 +01:00
Mathias Deschamps	288298ead9	Add norm exception for ing verbs Some ing verbs are sometimes written in or in'. Make the NORM form correct	2017-11-13 17:46:05 +01:00
Abhinav Sharma	59f5740ede	improved upon the list of included stop_words	2017-11-13 17:13:49 +05:30
Matthew Honnibal	6e641f46d4	Create a preprocess function that gets bigrams	2017-11-12 00:43:41 +01:00
Matthew Honnibal	c9251d79e3	Edit comment	2017-11-11 18:38:32 +01:00
Matthew Honnibal	dd1678eab3	Edit comment	2017-11-11 18:37:08 +01:00
Roman Domrachev	ee60a52ee7	Fix test imports and last batch cleanup	2017-11-11 11:32:16 +03:00
Roman Domrachev	4a6b094e09	Remove unused import	2017-11-11 03:13:05 +03:00
Roman Domrachev	3c600adf23	Try to fix StringStore clean up (see #1506 )	2017-11-11 03:11:27 +03:00
ines	ee97fd3cb4	Add regression test for #1547	2017-11-11 00:14:03 +01:00
ines	2df27db671	Add unicode declaration	2017-11-11 00:13:56 +01:00
ines	35653bef3a	Add missing import (fixes #1546 )	2017-11-10 19:05:18 +01:00
ines	4c5d2c80d5	Re-add python -m to commands, too brittle :( (see #1536 )	2017-11-10 02:30:55 +01:00
ines	123810b6de	Add "lovin'" to tokenizer exceptions (see #1248 )	2017-11-09 17:09:30 +01:00
ines	1c218397f6	Ensure path in Doc.to_disk/from_disk (resolves ##1521) Also add Doc serialization tests with both Path and string path options	2017-11-09 02:29:03 +01:00
Matthew Honnibal	49fd5a646f	Set version for 2.0.2 release	2017-11-08 22:39:39 +01:00
Matthew Honnibal	fba2dbddf7	Increment version	2017-11-08 22:19:08 +01:00
Matthew Honnibal	a5ea0fdf5a	Fix #1518 : vocab.vectors.resize() didn't work	2017-11-08 22:18:37 +01:00
Matthew Honnibal	de45702bbe	Strip dev suffixes from version for compatibility check	2017-11-08 18:40:21 +01:00
Matthew Honnibal	51639214a1	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-08 18:04:33 +01:00
Matthew Honnibal	a2f980de4e	Exclude .devN versioning from compatibility check	2017-11-08 18:03:52 +01:00
Daniel Hershcovich	d7ae54ff44	Fix typo in message	2017-11-08 16:06:28 +02:00
Matthew Honnibal	4194bc5744	Xfail flakey serialization test	2017-11-08 13:55:13 +01:00
Matthew Honnibal	d5537e5516	Work on Windows test failure	2017-11-08 13:25:18 +01:00
Matthew Honnibal	c27c82d5f9	Fix serialization	2017-11-08 13:08:48 +01:00
Matthew Honnibal	1d5599cd28	Fix dtype	2017-11-08 12:18:32 +01:00
Matthew Honnibal	fa7fdd0d9b	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-08 12:11:31 +01:00
Matthew Honnibal	072ff38a01	Try to fix python3.5 serialization	2017-11-08 12:10:49 +01:00
Ines Montani	3a0f34d567	Merge pull request #1509 from abhi18av/patch-1 Create examples.py for Hindi language	2017-11-08 11:37:19 +01:00
Ines Montani	42b241ccd0	Update language code in usage example in comment	2017-11-08 11:36:38 +01:00
Matthew Honnibal	e262e8d942	Increment version to v2.0.2.dev0	2017-11-08 11:25:47 +01:00
Matthew Honnibal	a8b592783b	Make a dtype more specific, to fix a windows build	2017-11-08 11:24:35 +01:00
Abhinav Sharma	84edade82d	Create examples.py Populated the file with the translations of English example sentences	2017-11-08 13:23:08 +05:30
Matthew Honnibal	d725aee4e2	Increment version to 2.0.1	2017-11-08 02:14:47 +01:00
Matthew Honnibal	8d6f68f1df	Increment version	2017-11-08 01:12:34 +01:00
ines	bcf42b8846	Fix typo	2017-11-08 01:06:37 +01:00
Matthew Honnibal	bbd2a3dee1	Fix title in about.py	2017-11-07 14:02:58 +01:00
Matthew Honnibal	4efaf9306c	Set version to spacy-nightly rc2	2017-11-07 13:27:26 +01:00
Matthew Honnibal	bf1ec2965f	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-07 13:20:29 +01:00
Matthew Honnibal	726f689da4	Fix missing import	2017-11-07 13:20:12 +01:00
ines	834f9c1aab	Update about.py	2017-11-07 13:11:33 +01:00
ines	a4662a31a9	Move model package templates to cli.package and update docs	2017-11-07 12:15:35 +01:00
ines	a09c096d3c	Get docs ready for v2.0.0	2017-11-07 12:00:43 +01:00
Matthew Honnibal	9a88e66103	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-07 02:00:06 +01:00
Matthew Honnibal	174abe4677	Increment to 2.0.0rc1	2017-11-07 01:59:46 +01:00
ines	42a0fbf291	Fix textcat simple train example	2017-11-07 01:25:54 +01:00
ines	8fb48b9b91	Update and document new util functions	2017-11-07 00:22:43 +01:00
Matthew Honnibal	1cab703bba	Move minibatch function to util	2017-11-06 23:45:36 +01:00
ines	5f43953536	Move test	2017-11-06 23:14:10 +01:00
Matthew Honnibal	dd90fe09f5	Remove extraneous label from textcat class	2017-11-06 22:09:02 +01:00
Matthew Honnibal	45e0617e61	Allow Language.update to take unicode text and dict objects	2017-11-06 22:07:38 +01:00
Matthew Honnibal	1831dbd065	Add test of simple textcat workflow	2017-11-06 22:04:29 +01:00
Matthew Honnibal	ffb9101f3f	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-06 19:20:41 +01:00
Matthew Honnibal	8fea512ac8	Don't set tensor in textcat	2017-11-06 19:20:14 +01:00
ines	acb9bdb852	Fix PRON_LEMMA imports	2017-11-06 17:41:53 +01:00
Matthew Honnibal	7d46793dd7	Add PRON_LEMMA to spacy.symbols	2017-11-06 17:38:25 +01:00
Matthew Honnibal	2f7e9f390d	Make test less flakey	2017-11-06 17:34:50 +01:00
Matthew Honnibal	407b08017e	Make test less flakey	2017-11-06 17:31:40 +01:00
Matthew Honnibal	102f797933	Fix lemma ordering in test	2017-11-06 17:02:17 +01:00
Matthew Honnibal	75e1618ec3	Fix lemma clobbering	2017-11-06 16:56:19 +01:00
Matthew Honnibal	6fdffd7246	Merge pull request #1497 from explosion/feature/improve-optimizer-handling 💫 Improve optimizer handling	2017-11-06 16:41:15 +01:00
Matthew Honnibal	8e6795437b	Set release=True	2017-11-06 16:39:32 +01:00
Matthew Honnibal	5c85bf3791	Fix missing import	2017-11-06 15:06:27 +01:00
Matthew Honnibal	25859dbb48	Return optimizer from begin_training, creating if necessary	2017-11-06 14:26:49 +01:00
Matthew Honnibal	465adfee94	Remove unused resume_training method, and pass optimizer through	2017-11-06 14:26:00 +01:00
Matthew Honnibal	13336a6197	Fix Adam import	2017-11-06 14:25:37 +01:00
Matthew Honnibal	2eb11d60f2	Add function create_default_optimizer to spacy._ml	2017-11-06 14:11:59 +01:00
Matthew Honnibal	31babe3c3f	Fix non-clobbering lemmatization	2017-11-06 12:36:05 +01:00
Matthew Honnibal	63c6ae4191	Fix lemmatizer test	2017-11-06 11:57:06 +01:00
Matthew Honnibal	a86a0181b5	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 22:19:10 +01:00
Matthew Honnibal	134d3b8143	Fix morphology	2017-11-05 22:18:22 +01:00
ines	08d1cf850a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 21:41:58 +01:00
ines	baa231745c	Fix Dutch tag map	2017-11-05 21:41:50 +01:00
Matthew Honnibal	46e62ad747	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 19:40:00 +01:00

1 2 3 4 5 ...

4650 Commits