Commit Graph

124 Commits

Author SHA1 Message Date
Matthew Honnibal
6ae645c4ef WIP on supporting morphology features 2018-09-24 23:57:41 +02:00
Matthew Honnibal
7b09a4ca49 Fix lemmatization 2018-07-05 13:56:02 +02:00
Matthew Honnibal
2c4a6d66fa Merge master into develop. Big merge, many conflicts -- need to review 2018-04-29 14:49:26 +02:00
Ines Montani
3141e04822
💫 New system for error messages and warnings (#2163)
* Add spacy.errors module

* Update deprecation and user warnings

* Replace errors and asserts with new error message system

* Remove redundant asserts

* Fix whitespace

* Add messages for print/util.prints statements

* Fix typo

* Fix typos

* Move CLI messages to spacy.cli._messages

* Add decorator to display error code with message

An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.

* Remove unused link in spacy.about

* Update errors for invalid pipeline components

* Improve error for unknown factories

* Add displaCy warnings

* Update formatting consistency

* Move error message to spacy.errors

* Update errors and check if doc returned by component is None
2018-04-03 15:50:31 +02:00
Matthew Honnibal
1f7229f40f Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit c9ba3d3c2d, reversing
changes made to 92c26a35d4.
2018-03-27 19:23:02 +02:00
Matthew Honnibal
31babe3c3f Fix non-clobbering lemmatization 2017-11-06 12:36:05 +01:00
Matthew Honnibal
134d3b8143 Fix morphology 2017-11-05 22:18:22 +01:00
Matthew Honnibal
bb25cb0f76 Avoid clobbering preset lemmas 2017-11-05 19:39:38 +01:00
Matthew Honnibal
bd2cbdfa85 Make Morphology not fail on unknown tags 2017-11-03 13:29:09 +01:00
ines
d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
Matthew Honnibal
66766c1454 Restore SP tag to English tag_map, until models migrate 2017-10-24 17:05:00 +02:00
ines
8492d5be6d Always make lemmatizer return a list of lemmas, not a set 2017-10-24 16:00:56 +02:00
Matthew Honnibal
49895fbef6 Rename 'SP' special tag to '_SP'
Renaming the tag with an underscore lets us add it to the tag map
without worrying that we'll change the sequence of tags, which throws
off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag,
the "VERB" tag is pushed to a different class ID, and the model is all
messed up.
2017-10-20 14:01:12 +02:00
Matthew Honnibal
506cf2eb13 Remove cpdef enum, to avoid too much code generation 2017-10-20 14:00:23 +02:00
ines
6dd14dc342 Add lookup lemmas to tokens without POS tags 2017-10-11 13:27:10 +02:00
Matthew Honnibal
17c467e0ab Avoid clobbering existing lemmas 2017-10-11 03:33:06 -05:00
Matthew Honnibal
d528b6e36d Add assign_untagged method in Morphology 2017-10-11 03:22:49 +02:00
Matthew Honnibal
72bbcc0871 Handle lemmatization for unknown string IDs 2017-09-24 05:01:31 -05:00
Matthew Honnibal
b78cc318c3 Fix loading of morphology exceptions 2017-06-04 16:34:32 -05:00
Matthew Honnibal
805495af27 Fix off-by-one in number of tags 2017-06-03 13:29:23 -05:00
Matthew Honnibal
11840ff5dd Store tag map before normalizing props 2017-05-29 17:53:48 -05:00
Matthew Honnibal
fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
Matthew Honnibal
84e66ca6d4 WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
ines
d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines
561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
Matthew Honnibal
c748907a66 Fix errors in previous commit 2017-03-25 22:25:01 +01:00
Matthew Honnibal
850d35dcb3 Make morphology use int attributes internally
The morphology class was calling the lemmatizer inconsistently,
which some string-valued attributes. This caused Issue #903.
2017-03-25 21:49:10 +01:00
Raphaël Bournhonesque
f332bf05be Remove unused import statements 2017-03-21 21:08:54 +01:00
Roman Inflianskas
66e1109b53 Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
Matthew Honnibal
95a52005df Revert "Fix Issue #683: Add 'SP' to tag_map, if it's not there already, within the Morphology class."
This reverts commit 40e71586d6.
2017-01-09 09:55:55 -06:00
Matthew Honnibal
40e71586d6 Fix Issue #683: Add 'SP' to tag_map, if it's not there already, within the Morphology class. 2016-12-18 23:44:05 +01:00
Matthew Honnibal
813249f826 Work on morphology class. Still not fully consistent with rest of library. 2016-12-18 17:35:22 +01:00
Matthew Honnibal
837a5d4100 Update morphology class so that exceptions can be added one-by-one, and so that arbitrary attributes can be referenced. 2016-12-18 16:49:46 +01:00
Matthew Honnibal
e6fc4afb04 Whitespace 2016-12-18 15:48:00 +01:00
Matthew Honnibal
57c4341453 Refactor loading of morphology exceptions, adding a method add_special_case. 2016-12-18 14:59:44 +01:00
Ines Montani
8350d65695 Change morphology and lemmatizer API
Take morphology features as object instead of keyword arguments
2016-12-07 21:12:49 +01:00
Matthew Honnibal
1fb09c3dc1 Fix morphology tagger 2016-11-04 19:19:09 +01:00
Matthew Honnibal
6e37ba1d82 Fix #602, #603 --- Broken build 2016-11-04 09:54:24 +01:00
Matthew Honnibal
293c79c09a Fix #595: Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly. 2016-11-04 00:29:07 +01:00
Matthew Honnibal
07776d8096 Fix pos name conflict in lemmatize 2016-09-27 17:35:58 +02:00
Matthew Honnibal
bb4f201ad2 Pass morphological features from tag map into the lemmatizer. 2016-09-27 14:01:43 +02:00
Matthew Honnibal
7abe653223 * Fix imports 2016-01-19 03:36:51 +01:00
Matthew Honnibal
590f38bdb2 * Add hacky solution to Issue #220. Currently specials.json only supports literal patterns, which doesn't allow us to pre-tag whitespace with the correct token, SP, as a rule. The data-driven approach should be easy but for some reason fails here. Adding a hard code in Morphology isn't a good solution, but we do want to fix the behaviour right away, and don't want to wait for an architecturally better solution. 2016-01-19 03:35:20 +01:00
Matthew Honnibal
9d1b2a103a * Fix capitalization in lemmatizer 2015-11-06 05:44:35 +11:00
Matthew Honnibal
5b2af4864f * When lemmatizing non-noun, non-verb, non-adj words, output lower-case 2015-11-06 00:45:09 +11:00
Matthew Honnibal
dde9e1357c * Add todo to morphology.lemmatize 2015-11-03 18:54:35 +11:00
Matthew Honnibal
833eb35c57 * Fix tag assignment in doc.from_array 2015-11-03 18:45:54 +11:00
Matthew Honnibal
5ca57bd859 * Ensure Morphology can be pickled, to address Issue #125. 2015-10-13 13:44:41 +11:00
Matthew Honnibal
278e12f7e8 * Addmorphology symbols to morphology. May need to remove these as an enum. 2015-10-13 13:44:40 +11:00
Matthew Honnibal
74c0853471 * Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS 2015-10-13 13:44:39 +11:00
Matthew Honnibal
2d9e5bf566 * Allow punctuation to be lemmatized 2015-10-09 19:02:42 +11:00
Matthew Honnibal
b3a70e6375 * Clean up unnecessary try/except block 2015-10-08 14:34:11 +11:00
Matthew Honnibal
85c3fec1d1 * Fix morphology loading 2015-09-10 14:52:23 +02:00
Matthew Honnibal
31ccf494e6 Merge branch 'develop' of https://github.com/honnibal/spaCy into develop 2015-09-09 14:33:38 +02:00
Matthew Honnibal
0b527fbdc8 * Set POS tag in morphology 2015-09-09 14:30:24 +02:00
Matthew Honnibal
2be3620333 * Save morphological analyses in a cache 2015-09-08 15:39:24 +02:00
Matthew Honnibal
9eae9837c4 * Fix morphology look up 2015-09-06 17:53:39 +02:00
Matthew Honnibal
534e3dda3c * More work on language independent parsing 2015-08-28 03:44:54 +02:00
Matthew Honnibal
c2307fa9ee * More work on language-generic parsing 2015-08-28 02:02:33 +02:00
Matthew Honnibal
86c4a8e3e2 * Work on new morphology organization 2015-08-27 23:11:51 +02:00
Matthew Honnibal
0af139e183 * Tagger training now working. Still need to test load/save of model. Morphology still broken. 2015-08-27 09:16:11 +02:00
Matthew Honnibal
378729f81a * Hack Morphology class towards usability 2015-08-26 19:17:21 +02:00
Matthew Honnibal
3f1944d688 * Make PyPy work 2015-01-05 17:54:38 +11:00
Matthew Honnibal
b00bc01d8c * All tests now passing for reorg 2014-12-23 13:18:59 +11:00
Matthew Honnibal
73f200436f * Tests passing except for morphology/lemmatization stuff 2014-12-23 11:40:32 +11:00
Matthew Honnibal
cf8d26c3d2 * POS tagger training working after reorg 2014-12-22 08:54:47 +11:00
Matthew Honnibal
4c4aa2c5c9 * Work on train 2014-12-22 07:25:43 +11:00
Matthew Honnibal
2a89d70429 * Add vocab.pyx to setup, and ensure we can import spacy.en.lang 2014-12-21 06:03:53 +11:00
Matthew Honnibal
e1c1a4b868 * Tmp 2014-12-21 05:36:29 +11:00
Matthew Honnibal
4e30195c6d * Refactor morphology.pyx 2014-12-20 07:27:28 +11:00
Matthew Honnibal
95ccea03b2 * Work on greedy parser 2014-12-16 22:46:55 +11:00
Matthew Honnibal
9959a64f7b * Working morphology and lemmatisation. POS tagging quite fast. 2014-12-10 08:09:32 +11:00
Matthew Honnibal
42973c4b37 * Improve efficiency of tagger, and improve morphological processing 2014-12-10 01:02:04 +11:00
Matthew Honnibal
6b34a2f34b * Move morphological analysis into its own module, morphology.pyx 2014-12-09 21:16:17 +11:00