spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-22 18:12:00 +03:00

Author	SHA1	Message	Date
Gavriel Loria	9c8c4287bf	Accept iob2 and allow generic whitespace (#2999 ) * accept non-pipe whitespace as delimiter; allow iob2 filename * added small documentation note for IOB2 allowance * added contributor agreement	2018-12-06 15:50:25 +01:00
Amandine Périnet	2457318b7a	Lemmatization of Verbs - French : adding rules and vocabulary (#3006 ) * updating rules and vocabulary for French lemmatization of verbs * updating the file with French auxiliary verb * updating rules and vocabulary for French lemmatization of verbs * adding contributor agreement for amperinet * adding rules for words with inclusive parentheses wrongly tokenized	2018-12-06 15:49:28 +01:00
Beate Sildnes	f0d7e206ec	Updated wordforms for Norwegian lemmatizer (#3007 ) * Updated wordforms for Norwegian lemmatizer Upload of updated lists of wordforms for the Norwegian lemmatizer (nouns, verbs, adverbs, adjectives and lookup). * Add spaCy contributor agreement for user beatesi * Updated wordforms for Norwegian lemmatizer	2018-12-06 15:46:18 +01:00
Matthew Honnibal	bbaca991ba	Set version to v2.0.18	2018-12-01 03:35:09 +01:00
Matthew Honnibal	e1a4b0d7f7	Set version to v2.0.18.dev1	2018-12-01 03:12:12 +01:00
Matthew Honnibal	413530b269	Set version to 2.0.18	2018-12-01 03:00:27 +01:00
Matthew Honnibal	24d52876e1	Set version to v2.0.18.dev0	2018-12-01 02:38:04 +01:00
Ines Montani	c9bdeafbc7	Don't run weird failing test for now	2018-11-30 16:13:40 +01:00
Sofie	585de273cd	Fix small typo bug in French regexp + relevant unit test (#2980 ) * additional unit test for new entr word not in other lists * bugfix - unit test works * use _latin_lower instead of alpha_lower for french * revert back to ALPHA_LOWER (following the code for languages) * contributor agreement	2018-11-29 20:16:13 +01:00
Adam Schwalm	00566949de	Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977 ) Fixes #2976	2018-11-28 19:49:33 +01:00
Ines Montani	968aff2f6a	Update tests for pytest 4.x (#2965 ) <!--- Provide a general summary of your changes in the title. --> ## Description - [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize)) - [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here) ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-11-26 18:14:57 +01:00
Marc Puig	98fe1ab259	Catalan Language Support (#2940 ) * Catalan language Support * Ddding Catalan to documentation	2018-11-26 15:25:47 +01:00
Ines Montani	048416f265	Fix formatting	2018-11-26 13:27:41 +01:00
Shawn Cicoria	7601ae0cff	fixes symbolic link on py3 and windows (#2949 ) * fixes symbolic link on py3 and windows during setup of spacy using command python -m spacy link en_core_web_sm en closes #2948 * Update spacy/compat.py Co-Authored-By: cicorias <cicorias@users.noreply.github.com>	2018-11-24 15:34:23 +01:00
Ines Montani	02fc73ca53	💫 Create random IDs for SVGs to prevent ID clashes (#2927 ) Resolves #2924. ## Description Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.) ### Types of change bug fix ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-11-15 11:40:10 +01:00
mauryaland	87ce435aff	Check if the word is in one of the regular lists specific to each POS (#2886 )	2018-11-14 15:58:43 +01:00
Daniel Hershcovich	d3d419ecc0	Allow input text of length up to max_length, inclusive (#2922 )	2018-11-13 16:46:29 +01:00
Matthew Honnibal	db08b168a3	Set version to 2.0.17	2018-10-29 23:22:18 +01:00
Matthew Honnibal	e2ae25d6f5	Try setting older regex version, to align with conda	2018-10-29 13:39:00 +01:00
Matthew Honnibal	d4fa9af56f	Set version to 2.0.17.dev0	2018-10-28 16:15:26 +01:00
Matthew Honnibal	b2e2bba8b0	Fix missing comma	2018-10-28 00:09:16 +02:00
Wannaphong Phatthiyaphaibun	2d2765fd8a	Change PyThaiNLP Url (#2876 )	2018-10-27 14:46:07 +02:00
Matthew Honnibal	9447739027	Merge branch 'master' of https://github.com/explosion/spaCy	2018-10-27 00:50:48 +02:00
Matthew Honnibal	ad068f51be	Fix out-of-bounds access in NER training The helper method state.B(1) gets the index of the first token of the buffer, or -1 if no such token exists. Normally this is safe because we pass this to functions like state.safe_get(), which returns an empty token. Here we used it directly as an array index, which is not okay! This error may have been the cause of out-of-bounds access errors during training. Similar errors may still be around, so much be hunted down. Hunting this one down took a long time...I printed out values across training runs and diffed, looking for points of divergence between runs, when no randomness should be allowed.	2018-10-27 00:46:30 +02:00
Grivaz	57f274b693	raise error when setting overlapping entities as doc.ents (#2880 )	2018-10-26 23:29:16 +02:00
Ines Montani	48b1bc44d3	Update version to 2.0.16	2018-10-15 14:39:25 +02:00
Ines Montani	a0f6647160	Increment version	2018-10-15 14:20:55 +02:00
Ines Montani	7bc7fa8f1e	Increment version	2018-10-15 01:40:44 +02:00
Matthew Honnibal	8612b75890	Set version to 2.0.14	2018-10-15 00:10:04 +02:00
Matthew Honnibal	d6e9cf8b09	Set version to 2.0.14.dev1	2018-10-15 00:09:02 +02:00
Matthew Honnibal	8ccfa52d19	Unhack prefer_gpu	2018-10-14 23:27:09 +02:00
Matthew Honnibal	41adf3572b	Set version to v2.0.14	2018-10-14 23:15:34 +02:00
Matthew Honnibal	38aa835ada	Workaround bug in thinc require_gpu	2018-10-14 23:15:08 +02:00
Matthew Honnibal	91593b7378	Add tests for prefer_gpu() and require_gpu()	2018-10-14 23:05:22 +02:00
Matthew Honnibal	62c70b3163	Import prefer_gpu and require_gpu functions from Thinc	2018-10-14 23:03:06 +02:00
Ines Montani	295da0f11b	Increment version to 2.0.14.dev0	2018-10-14 16:37:46 +02:00
Matthew Honnibal	7de0dcb91f	Merge branch 'master' of https://github.com/explosion/spaCy	2018-10-14 16:12:23 +02:00
Keshan	cb075c8e72	Adding "This is a sentence" example to Sinhala (#2846 )	2018-10-14 00:06:40 +02:00
Matthew Honnibal	9cfab5933a	Set version to 2.0.13	2018-10-13 19:42:16 +02:00
Matthew Honnibal	6a6ae5b0af	Merge branch 'master' of https://github.com/explosion/spaCy	2018-10-13 19:41:00 +02:00
mauryaland	36514b5762	Rule-based French Lemmatizer (#2818 ) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class. ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> - Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version. - Add several files containing exhaustive list of words for each part of speech - Add some lemma rules - Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX - Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned - Modify the lemmatize function to check in lookup table as a last resort - Init files are updated so the model can support all the functionalities mentioned above - Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [X] I have submitted the spaCy Contributor Agreement. - [X] I ran the tests, and all new and existing tests passed. - [X] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-10-13 16:38:21 +02:00
Matthew Honnibal	de46286107	Merge branch 'master' of https://github.com/explosion/spaCy	2018-10-13 16:11:16 +02:00
Ines Montani	cb57b35bb8	Also include lowercase norm exceptions	2018-10-13 15:37:30 +02:00
JKhakpour	74a30d883c	Add Persian(Farsi) language support (#2797 )	2018-10-13 15:31:49 +02:00
Matthew Honnibal	c3ddf98b1e	Set version to 2.0.13.dev4	2018-10-13 15:20:59 +02:00
Marina Lysyuk	b76fe08308	Correcting lang/ru/examples.py (#2845 ) * Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement * Correct some grammatical inaccuracies in lang\ru\examples.py * Move contributor agreement to separate file	2018-10-13 15:19:43 +02:00
Matthew Honnibal	67ddce68d8	Unskip test	2018-10-02 23:47:55 +02:00
Matthew Honnibal	4cf5ce2cc2	Revert "Remove problematic test" This reverts commit `bdebbef455`.	2018-10-02 23:47:24 +02:00
Matthew Honnibal	bdebbef455	Remove problematic test	2018-10-02 23:16:29 +02:00
Matthew Honnibal	6afc6ffe56	Skip seemingly problematic test	2018-10-02 22:33:40 +02:00

1 2 3 4 5 ...

5019 Commits