Commit Graph

11774 Commits

Author SHA1 Message Date
Paul O'Leary McCann
61ef0739b8 Add Japanese stop words. (#2549)
List created by taking the 2000 top words from a Wikipedia dump and
removing everything that wasn't hiragana.

Tried going through kanji words and deciding what to keep but there were
too many obvious non-stopwords (東京 was in the top 500) and many other
words where it wasn't clear if they should be included or not.
2018-07-17 10:12:48 +02:00
Tero K
f35980f865 Enhancement/lang fi examples (#2547)
* Added a file with examples in finnish

* added contributor agreement
2018-07-15 09:50:27 +02:00
Paul O'Leary McCann
1987f3f784 Add Japanese lemmas (#2543)
This info was already available from Mecab, forgot to add it before.
2018-07-13 10:55:14 +02:00
ines
50c367ee96 Update meta [ci skip] 2018-07-10 13:51:45 +02:00
ines
3a321e79ac Merge branch 'master' into develop 2018-07-10 13:49:08 +02:00
Eleni170
6042723535 Add support for Greek language (#2535)
* Add contributor agreement

* Support for Greek language

* Fix missing el_tokenizer
2018-07-10 13:48:38 +02:00
ines
71bfc92913 Exclude models for non-stable versions [ci skip] 2018-07-10 13:44:55 +02:00
Stefan Schweter
3dfc7f86be lemmatizer: correct lemma for Rang (#2537)
<!--- Provide a general summary of your changes in the title. -->

## Description

This PR corrects the German lemma form for the word "Rang". Initially, the lemma form was "ringen", which is not correct, because it refers to the verb ("ringen") and not to the noun ("Rang").

### Types of change

The lemma form for "Rang" is corrected to "Rang", see also the [Duden](https://www.duden.de/rechtschreibung/Rang) entry.

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-07-10 13:11:19 +02:00
ines
b5200962c0 Adjust formatting [ci skip] 2018-07-09 18:35:46 +02:00
Alex Villarreal
bd35bf7f09 Guidance to handle binary files in git in Windows (#2526)
Adds guidance on what to do if users encounter the error described in [1634](https://github.com/explosion/spaCy/issues/1634), which probably only happens in Windows environments.
2018-07-09 18:31:37 +02:00
ines
fd6207426a Merge branch 'master' into develop 2018-07-09 18:05:10 +02:00
Duygu Altinok
00b9a58558 German lemmatizer additions (#2529)
* lemma of was-> was

* added new pairs issue @2486

* added article tests
2018-07-09 11:10:15 +02:00
Ole Henrik Skogstrøm
c21efea9bb Add sent property to token (#2521)
* Add sent property to token

* Refactored and cleaned up copy paste errors.
2018-07-06 15:54:15 +02:00
ines
38e07ade4c Add test for custom tokenizer serialization (resolves #2494) 2018-07-06 12:40:51 +02:00
ines
c2581f9172 Tidy up tokenizer test 2018-07-06 12:40:28 +02:00
Matthew Honnibal
43dcaa473e Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-07-06 12:36:42 +02:00
Matthew Honnibal
6c8d627733 Fix tokenizer deserialization 2018-07-06 12:36:33 +02:00
ines
c001d46153 Tidy up 2018-07-06 12:33:42 +02:00
Matthew Honnibal
63f5651f8d Fix tokenizer serialization 2018-07-06 12:32:11 +02:00
Matthew Honnibal
e1569fda4e Fix compile error in matcher 2018-07-06 12:29:23 +02:00
Matthew Honnibal
f5b2076700 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-07-06 12:23:14 +02:00
Matthew Honnibal
1a2f61725c Fix tokenizer serialization 2018-07-06 12:23:04 +02:00
ines
9e09477b2f Remove unused import 2018-07-06 12:18:17 +02:00
ines
26f04a6ac3 Fix Matcher tests and add test for any token with operator 2018-07-06 12:17:50 +02:00
Matthew Honnibal
f5703b7a91 Clean up unused stuff in matcher 2018-07-06 12:16:44 +02:00
Matthew Honnibal
08c362d541 Suppress compiler warning about unreachable code 2018-07-06 11:31:22 +02:00
Matthew Honnibal
8ae1bec8bf Fix init_model 2018-07-05 14:02:06 +02:00
Matthew Honnibal
7b09a4ca49 Fix lemmatization 2018-07-05 13:56:02 +02:00
Matthew Honnibal
ec41ceb383 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-07-05 13:49:42 +02:00
Matthew Honnibal
4eb3405df7 Fix lemmatizer ordering, re Issue #1387 2018-07-05 13:49:29 +02:00
ines
f575b01595 Update language and license meta [ci skip] 2018-07-04 15:09:36 +02:00
ines
83e88553d4 Add Python 3.7 to setup.py (resolves #2505) 2018-07-04 14:53:51 +02:00
ines
63666af328 Merge branch 'master' into develop 2018-07-04 14:52:25 +02:00
ines
8feb7cfe2d Remove model dependency from French lemmatizer tests 2018-07-04 14:46:45 +02:00
kleinay
a82c3153ad fix issue #2452 - displacy arrow direction is always forward (#2506) (closes #2452)
<!--- Provide a general summary of your changes in the title. -->
Referring #2452, fixing displacy arrow directions to match the input. 

## Description
The fix is simply replacing `direction is 'left'` with `direction == 'left'` to include the case `direction` is a `str` and not a `unicode`.

### Types of change
bug fix

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-07-04 14:12:08 +02:00
Matthew Honnibal
a0db2e0077 Build pytest into spaCy pex, for ease of testing 2018-07-04 13:22:58 +02:00
Bùi Trung Chí
9af46b4f1b Fix loading tokenizer with custom prefix search (#2495)
* Add contributor agreement

* Fix loading tokenizer with cutom prefix search
2018-07-04 12:56:07 +02:00
Matthew Honnibal
dee8bdb900 Fix init-model for npz vectors 2018-07-04 02:29:48 +02:00
Matthew Honnibal
59d655e8d0 Fix model init from jsonl 2018-07-04 01:30:40 +02:00
Matthew Honnibal
1e38bea6e9 Save vectors init 2018-07-03 23:55:04 +02:00
Matthew Honnibal
6692833887 Fix init_model 2018-07-03 23:24:11 +02:00
Matthew Honnibal
4a38a26cb5 Fix init_model 2018-07-03 22:57:11 +02:00
Matthew Honnibal
019d09e3c3 Fix init model 2018-07-03 22:16:44 +02:00
Matthew Honnibal
2543f8c93a Support .npz vectors in init-model command 2018-07-03 21:42:16 +02:00
Matthew Honnibal
86aad11939 Fix init_model arg 2018-07-03 17:00:42 +02:00
Matthew Honnibal
eff42d36e3 Fix init model command 2018-07-03 16:32:23 +02:00
Matthew Honnibal
97487122ea Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-07-03 15:44:37 +02:00
Matthew Honnibal
6a89faf12e Add support for jsonl-formatted lexical attributes to init-model command. 2018-07-03 12:22:56 +02:00
Matthew Honnibal
a85620a731 Note CoreNLP tokenizer correction on website 2018-07-02 11:35:31 +02:00
Matthew Honnibal
3c3020fccc Update Makefile 2018-06-29 21:21:30 +02:00