Commit Graph

8951 Commits

Author SHA1 Message Date
Ines Montani
6f1438b5d9 Auto-format example 2018-12-17 13:44:38 +01:00
Amandine Périnet
361554f629 Lemmatization of Adjectives - French : adding rules and vocabulary (#3045)
* modifying FR lemmatisation for Adjectives

* adding contributor agreement for amperinet

* correcting some errors in vocabulary files
2018-12-16 18:11:07 +01:00
Shooter23
6ae8e49bff Fix docstring for is_right_punct(). (#3044) 2018-12-14 10:11:11 +01:00
Matthew Honnibal
e5685d98a2 Fix averaging in textcat example (closes #2745) (#3032) [ci skip] 2018-12-08 13:27:05 +01:00
Ines Montani
8c0f0f50bc Use nlp.make_doc instead of nlp for patterns [ci skip] 2018-12-08 11:56:01 +01:00
Paul O'Leary McCann
7dd21b66d5 Extras require mecab (#3024)
* Add note that Unidic is required for Japanese

This addresses #3001. -POLM

* Add extras_require for mecab with old version

Related to issue #3018.

* mecab → ja

Co-Authored-By: polm <polm@dampfkraft.com>
2018-12-08 06:34:49 +01:00
Aki Ariga
7fcd6419ff Upadate the document for Unidic link with latest version URL (#3022)
* Upadate Unidic link for latest version in document

This patch improves #3017 . The link for Unidic was old version one, so will the lates version.

* Add contributor agreement

* Use more specific link for unidic-cwj
2018-12-07 17:24:48 +01:00
Amandine Périnet
0b44ea23bd Lemmatization of Nouns - French : adding rules and vocabulary (#2992)
* modifying FR lemmatization for nouns

* modifying FR lemmatization for nouns

* adding contributor agreement for amperinet

* adding rules for words with inclusive parentheses wrongly tokenized

* adding contributor agreement for amperinet

* adding a missing comma
2018-12-06 22:42:18 +01:00
Ines Montani
27905a7b14 Remove reference to cuda10 in docs (closes #2894) [ci skip] 2018-12-06 16:05:37 +01:00
Gavriel Loria
9c8c4287bf Accept iob2 and allow generic whitespace (#2999)
* accept non-pipe whitespace as delimiter; allow iob2 filename

* added small documentation note for IOB2 allowance

* added contributor agreement
2018-12-06 15:50:25 +01:00
Amandine Périnet
2457318b7a Lemmatization of Verbs - French : adding rules and vocabulary (#3006)
* updating rules and vocabulary for French lemmatization of verbs

* updating the file with French auxiliary verb

* updating rules and vocabulary for French lemmatization of verbs

* adding contributor agreement for amperinet

* adding rules for words with inclusive parentheses wrongly tokenized
2018-12-06 15:49:28 +01:00
Beate Sildnes
f0d7e206ec Updated wordforms for Norwegian lemmatizer (#3007)
* Updated wordforms for Norwegian lemmatizer

Upload of updated lists of wordforms for the Norwegian lemmatizer (nouns, verbs, adverbs, adjectives and lookup).

* Add spaCy contributor agreement for user beatesi

*  Updated wordforms for Norwegian lemmatizer
2018-12-06 15:46:18 +01:00
Paul O'Leary McCann
b36f6eabfb Add note that Unidic is required for Japanese (#3017)
This addresses #3001. -POLM
2018-12-06 15:14:10 +01:00
Gavriel Loria
ae5601beae Initialize trues to 0.0 in training example (#3004)
* added contributor agreement

* if there are no true positives, precision should be 0.0
2018-12-03 01:33:22 +01:00
Justin DuJardin
33fca8672f fix issue compiling the latest spacy on MacOS 10.3.6 (#2998) 2018-12-02 05:51:11 +01:00
Matthew Honnibal
bbaca991ba Set version to v2.0.18 2018-12-01 03:35:09 +01:00
Matthew Honnibal
05b2336ffa Try again to fix OSX build 2018-12-01 03:12:21 +01:00
Matthew Honnibal
e1a4b0d7f7 Set version to v2.0.18.dev1 2018-12-01 03:12:12 +01:00
Matthew Honnibal
413530b269 Set version to 2.0.18 2018-12-01 03:00:27 +01:00
Matthew Honnibal
24d52876e1 Set version to v2.0.18.dev0 2018-12-01 02:38:04 +01:00
Matthew Honnibal
4895b2e830 Merge branch 'master' of https://github.com/explosion/spaCy 2018-12-01 02:37:21 +01:00
Matthew Honnibal
3f16af123e Try to fix OSX build error 2018-12-01 02:36:56 +01:00
Matthew Honnibal
61abb1ef70 Remove msgpack dependency, to try to fix #2995 2018-12-01 02:36:41 +01:00
Ines Montani
add6469225 Add "new in v2.0.12" note to Span.ents (closes #2986) 2018-11-30 20:50:55 +01:00
Ines Montani
c9bdeafbc7 Don't run weird failing test for now 2018-11-30 16:13:40 +01:00
wxv
06820ef6e7 Fix is_ascii documentation and create contributor file (#2988)
Proposed in #2933
2018-11-30 15:57:58 +01:00
Sofie
585de273cd Fix small typo bug in French regexp + relevant unit test (#2980)
* additional unit test for new entr word not in other lists

* bugfix - unit test works

* use _latin_lower instead of alpha_lower for french

* revert back to ALPHA_LOWER (following the code for languages)

* contributor agreement
2018-11-29 20:16:13 +01:00
Ben Batorsky
658f7e0dc8 OntoNotes url fix (#2981)
The website for OntoNotes 5 is: https://catalog.ldc.upenn.edu/LDC2013T19, currently the named entity section has it as https://catalog.ldc.upenn.edu/ldc2013T19.
2018-11-29 19:34:30 +01:00
Adam Schwalm
00566949de Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
2018-11-28 19:49:33 +01:00
Ines Montani
58757c5684
Update README.rst 2018-11-26 20:56:17 +01:00
Matthew Honnibal
9e2ff2f583
Fix regex pin to harmonize with conda (#2964) 2018-11-26 19:28:54 +01:00
Ines Montani
968aff2f6a
Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->

## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-26 18:14:57 +01:00
Ines Montani
c80c20e1ec Sort languages alphabetically [ci skip] 2018-11-26 15:37:53 +01:00
Marc Puig
98fe1ab259 Catalan Language Support (#2940)
* Catalan language Support

* Ddding Catalan to documentation
2018-11-26 15:25:47 +01:00
Ines Montani
1844bc238a Update universe [ci skip] 2018-11-26 14:16:22 +01:00
Ines Montani
048416f265 Fix formatting 2018-11-26 13:27:41 +01:00
Shawn Cicoria
7601ae0cff fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948

* Update spacy/compat.py

Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
2018-11-24 15:34:23 +01:00
Ines Montani
696acb0f92 Fix typo [ci skip] 2018-11-24 15:20:57 +01:00
Ines Montani
02fc73ca53
💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.

## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)

### Types of change
bug fix

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-15 11:40:10 +01:00
mauryaland
87ce435aff Check if the word is in one of the regular lists specific to each POS (#2886) 2018-11-14 15:58:43 +01:00
Ines Montani
dfcc8f02af Fix image [ci skip]
Twitter URL doesn't work on live site
2018-11-14 01:01:33 +01:00
Ines Montani
1aa91e926f Minor formatting changes [ci skip] 2018-11-13 23:59:59 +01:00
Francisco Aranda
be99f1cac5 Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component

* chore: include spaCy contributor agreement
2018-11-13 23:54:46 +01:00
Daniel Hershcovich
d3d419ecc0 Allow input text of length up to max_length, inclusive (#2922) 2018-11-13 16:46:29 +01:00
mikelibg
75e7d503b7 Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation

* - added contributor info
2018-11-08 14:18:25 +01:00
Ines Montani
11db4d2f27 Add script to validate universe json [ci skip] 2018-11-06 12:50:41 +01:00
Ines Montani
a9fda638a9 Add spacy-raspberry to universe (closes #2889) 2018-11-06 12:45:50 +01:00
Ines Montani
c235ddf44f Add spacy-js to universe [ci-skip] 2018-11-06 12:45:03 +01:00
Matthew Honnibal
db08b168a3 Set version to 2.0.17 2018-10-29 23:22:18 +01:00
Matthew Honnibal
e2ae25d6f5 Try setting older regex version, to align with conda 2018-10-29 13:39:00 +01:00