Matthew Honnibal
b69013e2d7
Fix passing of morphological features to lemmatizer
2019-03-07 13:11:38 +01:00
Matthew Honnibal
34651c8ddf
Fix lemmatizer
2019-03-07 12:13:47 +01:00
Matthew Honnibal
3993f41cc4
Update morphology branch from develop
2019-03-07 00:14:43 +01:00
Ines Montani
eddeb36c96
💫 Tidy up and auto-format .py files ( #2983 )
...
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Use [`black`](https://github.com/ambv/black ) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)
Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.
At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions ) will let us auto-format pull requests and post comments with relevant linting information.
### Types of change
enhancement, code style
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-30 17:03:03 +01:00
Matthew Honnibal
8308c1525e
Fix exception loading
2018-09-25 15:18:21 +02:00
Matthew Honnibal
7b09a4ca49
Fix lemmatization
2018-07-05 13:56:02 +02:00
Matthew Honnibal
4eb3405df7
Fix lemmatizer ordering, re Issue #1387
2018-07-05 13:49:29 +02:00
Matthew Honnibal
2ec2192000
Revert #1389 : Don't overrule rules when lemma exception is present
2018-06-29 19:43:02 +02:00
Matthew Honnibal
1f7229f40f
Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
...
This reverts commit c9ba3d3c2d
, reversing
changes made to 92c26a35d4
.
2018-03-27 19:23:02 +02:00
Matthew Honnibal
361944e512
If no rules are set, lemmatize by lookup
2017-12-06 12:12:11 +01:00
ines
91899d337b
Tidy up language, lemmatizer and scorer
2017-10-27 14:40:14 +02:00
ines
8492d5be6d
Always make lemmatizer return a list of lemmas, not a set
2017-10-24 16:00:56 +02:00
ines
95f866f99f
Add lookup argument to Lemmatizer.load
2017-10-24 16:00:56 +02:00
ines
3516aa0cea
Port over changes from #1389
2017-10-14 13:32:55 +02:00
Matthew Honnibal
9b90d235d1
Fix tag check in lemmatizer
2017-10-12 22:50:43 +02:00
ines
9fd471372a
Add lookup lemmatizer to lemmatizer as lookup() method
2017-10-11 13:25:51 +02:00
Matthew Honnibal
a6ac4699eb
Allow Morphology class to setup tokens
...
Add Morphology.assign_untagged() C-method, and call it from
Doc.push_back() when a token is created. This gives a place
to allow the Morphology class to initialize token data.
2017-10-11 03:24:14 +02:00
Matthew Honnibal
c15d8278cb
Avoid lemmatizing inappropriate tags in English lemmatizer
2017-10-11 03:23:23 +02:00
ines
820bf85075
Move LookupLemmatizer to spacy.lemmatizer
2017-10-11 02:25:13 +02:00
Matthew Honnibal
9cb2aef587
Remove print statement
2017-09-14 13:38:28 +02:00
Matthew Honnibal
5c3ff06924
Fix lemmatizer rules
2017-09-06 19:13:24 +02:00
Matthew Honnibal
bfddf50081
Fix #1296 : Incorrect lemmatization of base form verbs
2017-09-04 15:18:41 +02:00
ines
d24589aa72
Clean up imports, unused code, whitespace, docstrings
2017-04-15 12:05:47 +02:00
ines
561f2a3eb4
Use consistent formatting for docstrings
2017-04-15 11:59:21 +02:00
Matthew Honnibal
ed2b106f4d
Fix circular import in lemmatizer
2017-03-26 07:17:07 -05:00
Matthew Honnibal
c748907a66
Fix errors in previous commit
2017-03-25 22:25:01 +01:00
Matthew Honnibal
4f400fa486
Prevent lemmatization of base nouns
...
Update lemmatizer's base-form check, for change in morphology class.
Closes #903 .
2017-03-25 21:51:12 +01:00
Matthew Honnibal
4454c1b23f
Block lemmatization of base-form adjectives
...
Fixes check that an adjective is a base form (as opposed to a
comparative or superlative), so that it's not lemmatized.
e.g. inner -!> inn. Closes #912 .
2017-03-25 21:29:57 +01:00
Matthew Honnibal
413138de79
Fix #719 : Lemmatizer can no longer output empty string
2017-03-18 16:02:06 +01:00
Matthew Honnibal
c4351e1165
Update base-form check in lemmatizer, for UD 2.0 morphology
2017-03-16 17:59:31 -05:00
Matthew Honnibal
fea9fe08af
Merge pull request #866 from juanmirocks/master
...
Fix lemmatization of OOV words
2017-03-16 23:37:36 +01:00
ines
1da29a7146
Use new Lemmatizer data and remove file import
...
Since there's currently only an English lemmatizer, the global
Lemmatizer imports from spacy.en. This is unideal and still needs to be
fixed.
2017-03-12 13:58:22 +01:00
Juan Miguel Cejuela
25c29f072d
apply patch
2017-03-01 21:44:17 +01:00
Matthew Honnibal
44f4f008bd
Wire up lemmatizer rules for English
2016-12-18 15:50:09 +01:00
Matthew Honnibal
a4eb5c2bff
Check POS key in lemmatizer, to update it for new data format
2016-12-18 13:28:20 +01:00
Ines Montani
8350d65695
Change morphology and lemmatizer API
...
Take morphology features as object instead of keyword arguments
2016-12-07 21:12:49 +01:00
Matthew Honnibal
e30348b331
Prefer to import from symbols instead of parts_of_speech
2016-11-04 00:27:55 +01:00
Matthew Honnibal
f5fe4f595b
Fix json loading, for Python 3.
2016-10-20 21:23:26 +02:00
Matthew Honnibal
2e92c6fb3a
Fix JSON encoding issue on load
2016-10-20 21:06:48 +02:00
Matthew Honnibal
f189a3cb00
Fix encoding when opening files in Python 2.7, re Issue #539
2016-10-20 14:42:56 +02:00
Matthew Honnibal
a2f3510d6d
Fix lemmatizer
2016-09-27 17:47:05 +02:00
Matthew Honnibal
35cd953f9e
Fix pos name conflict with morphology
2016-09-27 14:16:22 +02:00
Matthew Honnibal
40509e8bca
Tweak the new is_base_form logic, because we can expect the 'pos' key in the morphology we're passed.
2016-09-27 14:01:16 +02:00
Matthew Honnibal
3cb4d455d2
Pass lemmatizer morphological features, so that rules are sensitive to base/inflected distinction, which is how the WordNet data is designed. See Issue #435
2016-09-27 13:52:11 +02:00
Matthew Honnibal
fd65cf6cbb
Finish refactoring data loading
2016-09-24 20:26:17 +02:00
Matthew Honnibal
83e364188c
Mostly finished loading refactoring. Design is in place, but doesn't work yet.
2016-09-24 15:42:01 +02:00
Henning Peters
846fa49b2a
distinct load() and from_package() methods
2016-01-16 10:00:57 +01:00
Henning Peters
788f734513
refactored data_dir->via, add zip_safe, add spacy.load()
2016-01-15 18:01:02 +01:00
Henning Peters
bc229790ac
integrate with sputnik
2016-01-13 19:46:17 +01:00
Matthew Honnibal
eaf2ad59f1
* Fix use of mock Package object
2015-12-31 04:13:15 +01:00