Ines Montani
ad834be494
Tidy up and auto-format
2019-03-08 13:28:53 +01:00
Ines Montani
d260aa17fd
Merge branch 'develop' into feature/lemmatizer
2019-03-08 13:25:00 +01:00
Ines Montani
296446a1c8
Tidy up and improve docs and docstrings ( #3370 )
...
<!--- Provide a general summary of your changes in the title. -->
## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs
### Types of change
enhancement, docs
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-08 11:42:26 +01:00
Matthew Honnibal
19e6b39786
Test morphological features
2019-03-08 01:38:54 +01:00
Matthew Honnibal
9dceb97570
Extend morphanalysis API
2019-03-08 01:38:34 +01:00
Matthew Honnibal
322b64dca0
Allow lookup of morphology by attribute name
2019-03-08 01:38:15 +01:00
Matthew Honnibal
3c32590243
Add test for morph analysis
2019-03-08 00:10:07 +01:00
Matthew Honnibal
3300e3d7ab
Implement more MorphAnalysis API
2019-03-08 00:09:16 +01:00
Matthew Honnibal
9a2d1cc6e0
Add length attribute to MorphAnalysisC
2019-03-08 00:08:57 +01:00
Matthew Honnibal
b5f2b7b454
Add list_features() helper, clean up
2019-03-08 00:08:35 +01:00
Ines Montani
daaeeb7a2b
Merge branch 'master' into develop
2019-03-07 22:07:31 +01:00
Matthew Honnibal
a40d73cb2a
Build out morphological analysis API
2019-03-07 21:59:25 +01:00
Matthew Honnibal
dd9ea478c5
Fix intify_attrs function for obsolete data
2019-03-07 21:59:03 +01:00
Matthew Honnibal
987ee6e884
Fix data reading in morphology
2019-03-07 21:58:43 +01:00
Matthew Honnibal
00cfadbf63
Fix obsolete data in English tokenizer exceptions
2019-03-07 21:58:16 +01:00
Matthew Honnibal
7afe56a360
Fix morphological features in en tag_map
2019-03-07 21:57:56 +01:00
Matthew Honnibal
3a667833d1
Fix morphological features in de tag_map
2019-03-07 21:57:43 +01:00
Adrien Ball
88909a9adb
Fix egg fragments in direct download ( #3369 )
...
## Description
The egg fragment in the URL must be of the form `#egg=package_name==version` instead of `#egg=package_name-version`.
One of the consequences of specifying wrong egg fragments is that `pip` does not recognize the package and its version properly, and thus it re-downloads the package systematically.
I'm not sure how this should be tested properly.
Here is what I had before the fix when running the same direct download twice:
```
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Collecting en_core_web_sm-2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
100% |████████████████████████████████| 37.4MB 1.6MB/s
Generating metadata for package en-core-web-sm-2.0.0 produced metadata for project name en-core-web-sm. Fix your #egg=en-core-web-sm-2.0.0 fragments.
Installing collected packages: en-core-web-sm
Running setup.py install for en-core-web-sm ... done
Successfully installed en-core-web-sm-2.0.0
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Collecting en_core_web_sm-2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
100% |████████████████████████████████| 37.4MB 919kB/s
Generating metadata for package en-core-web-sm-2.0.0 produced metadata for project name en-core-web-sm. Fix your #egg=en-core-web-sm-2.0.0 fragments.
Requirement already satisfied (use --upgrade to upgrade): en-core-web-sm from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0 in ./venv3/lib/python3.6/site-packages
```
And after the fix:
```
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Collecting en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
100% |████████████████████████████████| 37.4MB 1.1MB/s
Installing collected packages: en-core-web-sm
Running setup.py install for en-core-web-sm ... done
Successfully installed en-core-web-sm-2.0.0
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Requirement already satisfied: en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0 in ./venv3/lib/python3.6/site-packages (2.0.0)
```
### Types of change
This is an enhancement as it avoids unnecessary downloads of (potentially big) spacy models, when they have already been downloaded.
## Checklist
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-07 21:07:19 +01:00
Matthew Honnibal
1a10bf29bc
Remove morph_key from token api
2019-03-07 18:33:17 +01:00
Matthew Honnibal
c1888b05d2
Export helper functions for morphology
2019-03-07 18:33:06 +01:00
Matthew Honnibal
357066ee2f
Work on morphanalysis class
2019-03-07 18:32:51 +01:00
Matthew Honnibal
2669190b85
Normalize props for morph exceptions
2019-03-07 18:32:36 +01:00
Matthew Honnibal
e585b50458
Fix features in English tag map
2019-03-07 18:32:09 +01:00
Matthew Honnibal
0ad09b16ad
Add header for morphanalysis
2019-03-07 17:24:57 +01:00
Matthew Honnibal
fed0371db7
Remove enums from morphology
2019-03-07 17:14:57 +01:00
Matthew Honnibal
932d7dde1c
Fix compile error
2019-03-07 14:34:54 +01:00
Matthew Honnibal
b9ade7d4e0
Add MorphAnalysisC struct
2019-03-07 14:03:07 +01:00
Matthew Honnibal
b69013e2d7
Fix passing of morphological features to lemmatizer
2019-03-07 13:11:38 +01:00
Matthew Honnibal
74db1d9602
Revert "Space out symbols enum, to make maintaining easier"
...
This reverts commit be5235369c
.
2019-03-07 12:52:30 +01:00
Matthew Honnibal
c773b5011c
Revert "Fix StringStore after symbols changes"
...
This reverts commit bcfe3bd312
.
2019-03-07 12:52:15 +01:00
Matthew Honnibal
bcfe3bd312
Fix StringStore after symbols changes
2019-03-07 12:51:11 +01:00
Ines Montani
96b91a8898
Fix noqa [ci skip]
2019-03-07 12:25:00 +01:00
Matthew Honnibal
d0ca64bb07
Fix imports in morphanalysis
2019-03-07 12:14:53 +01:00
Matthew Honnibal
6734cfec88
Add comment
2019-03-07 12:14:37 +01:00
Matthew Honnibal
be5235369c
Space out symbols enum, to make maintaining easier
2019-03-07 12:14:23 +01:00
Matthew Honnibal
34651c8ddf
Fix lemmatizer
2019-03-07 12:13:47 +01:00
Matthew Honnibal
8805966460
Fix moved Morphologizer class
2019-03-07 10:46:27 +01:00
Matthew Honnibal
21008ad2d8
Draft API for morphological analysis class
2019-03-07 10:45:24 +01:00
Matthew Honnibal
fc1cc4c529
Move morphologizer under spacy/pipes
2019-03-07 01:36:26 +01:00
Matthew Honnibal
bfa52d9d8a
Move morphologizer within spacy/pipes
2019-03-07 01:34:32 +01:00
Matthew Honnibal
98dfe5e433
Fix ud_train.py
2019-03-07 01:31:23 +01:00
Matthew Honnibal
ae7c728c5f
Fix json dependency
2019-03-07 01:17:19 +01:00
Ines Montani
9d6ca18a10
Tidy up and only use self.vector once
2019-03-07 01:06:12 +01:00
Ines Montani
a8f1efd2f5
Merge branch 'master' into develop
2019-03-07 00:56:31 +01:00
Matthew Honnibal
010f846d5f
Fix dependencies in morphologizer
2019-03-07 00:16:51 +01:00
Matthew Honnibal
3993f41cc4
Update morphology branch from develop
2019-03-07 00:14:43 +01:00
Daniel King
5f40229397
Don't use numpy directly for similarity ( #3362 )
...
* Don't use numpy directly for similarity
* Contributor agreement
2019-03-06 22:58:38 +00:00
Ines Montani
6bd34e9d54
Expose Japanese stop words ( closes #3346 )
2019-03-06 14:21:15 +01:00
Ines Montani
85deb96278
Fix whitespace
2019-03-06 14:20:34 +01:00
Ines Montani
23f6ebf0f3
Add missing " ( closes #3343 )
2019-02-27 16:37:03 +01:00