Commit Graph

143 Commits

Author SHA1 Message Date
Matthew Honnibal
813249f826 Work on morphology class. Still not fully consistent with rest of library. 2016-12-18 17:35:22 +01:00
Matthew Honnibal
837a5d4100 Update morphology class so that exceptions can be added one-by-one, and so that arbitrary attributes can be referenced. 2016-12-18 16:49:46 +01:00
Matthew Honnibal
e6fc4afb04 Whitespace 2016-12-18 15:48:00 +01:00
Matthew Honnibal
57c4341453 Refactor loading of morphology exceptions, adding a method add_special_case. 2016-12-18 14:59:44 +01:00
Ines Montani
8350d65695 Change morphology and lemmatizer API
Take morphology features as object instead of keyword arguments
2016-12-07 21:12:49 +01:00
Matthew Honnibal
1fb09c3dc1 Fix morphology tagger 2016-11-04 19:19:09 +01:00
Matthew Honnibal
6e37ba1d82 Fix #602, #603 --- Broken build 2016-11-04 09:54:24 +01:00
Matthew Honnibal
293c79c09a Fix #595: Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly. 2016-11-04 00:29:07 +01:00
Matthew Honnibal
07776d8096 Fix pos name conflict in lemmatize 2016-09-27 17:35:58 +02:00
Matthew Honnibal
bb4f201ad2 Pass morphological features from tag map into the lemmatizer. 2016-09-27 14:01:43 +02:00
Matthew Honnibal
7abe653223 * Fix imports 2016-01-19 03:36:51 +01:00
Matthew Honnibal
590f38bdb2 * Add hacky solution to Issue #220. Currently specials.json only supports literal patterns, which doesn't allow us to pre-tag whitespace with the correct token, SP, as a rule. The data-driven approach should be easy but for some reason fails here. Adding a hard code in Morphology isn't a good solution, but we do want to fix the behaviour right away, and don't want to wait for an architecturally better solution. 2016-01-19 03:35:20 +01:00
Matthew Honnibal
9d1b2a103a * Fix capitalization in lemmatizer 2015-11-06 05:44:35 +11:00
Matthew Honnibal
5b2af4864f * When lemmatizing non-noun, non-verb, non-adj words, output lower-case 2015-11-06 00:45:09 +11:00
Matthew Honnibal
dde9e1357c * Add todo to morphology.lemmatize 2015-11-03 18:54:35 +11:00
Matthew Honnibal
833eb35c57 * Fix tag assignment in doc.from_array 2015-11-03 18:45:54 +11:00
Matthew Honnibal
5ca57bd859 * Ensure Morphology can be pickled, to address Issue #125. 2015-10-13 13:44:41 +11:00
Matthew Honnibal
278e12f7e8 * Addmorphology symbols to morphology. May need to remove these as an enum. 2015-10-13 13:44:40 +11:00
Matthew Honnibal
74c0853471 * Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS 2015-10-13 13:44:39 +11:00
Matthew Honnibal
2d9e5bf566 * Allow punctuation to be lemmatized 2015-10-09 19:02:42 +11:00
Matthew Honnibal
b3a70e6375 * Clean up unnecessary try/except block 2015-10-08 14:34:11 +11:00
Matthew Honnibal
85c3fec1d1 * Fix morphology loading 2015-09-10 14:52:23 +02:00
Matthew Honnibal
31ccf494e6 Merge branch 'develop' of https://github.com/honnibal/spaCy into develop 2015-09-09 14:33:38 +02:00
Matthew Honnibal
0b527fbdc8 * Set POS tag in morphology 2015-09-09 14:30:24 +02:00
Matthew Honnibal
2be3620333 * Save morphological analyses in a cache 2015-09-08 15:39:24 +02:00
Matthew Honnibal
9eae9837c4 * Fix morphology look up 2015-09-06 17:53:39 +02:00
Matthew Honnibal
534e3dda3c * More work on language independent parsing 2015-08-28 03:44:54 +02:00
Matthew Honnibal
c2307fa9ee * More work on language-generic parsing 2015-08-28 02:02:33 +02:00
Matthew Honnibal
86c4a8e3e2 * Work on new morphology organization 2015-08-27 23:11:51 +02:00
Matthew Honnibal
0af139e183 * Tagger training now working. Still need to test load/save of model. Morphology still broken. 2015-08-27 09:16:11 +02:00
Matthew Honnibal
378729f81a * Hack Morphology class towards usability 2015-08-26 19:17:21 +02:00
Matthew Honnibal
3f1944d688 * Make PyPy work 2015-01-05 17:54:38 +11:00
Matthew Honnibal
b00bc01d8c * All tests now passing for reorg 2014-12-23 13:18:59 +11:00
Matthew Honnibal
73f200436f * Tests passing except for morphology/lemmatization stuff 2014-12-23 11:40:32 +11:00
Matthew Honnibal
cf8d26c3d2 * POS tagger training working after reorg 2014-12-22 08:54:47 +11:00
Matthew Honnibal
4c4aa2c5c9 * Work on train 2014-12-22 07:25:43 +11:00
Matthew Honnibal
2a89d70429 * Add vocab.pyx to setup, and ensure we can import spacy.en.lang 2014-12-21 06:03:53 +11:00
Matthew Honnibal
e1c1a4b868 * Tmp 2014-12-21 05:36:29 +11:00
Matthew Honnibal
4e30195c6d * Refactor morphology.pyx 2014-12-20 07:27:28 +11:00
Matthew Honnibal
95ccea03b2 * Work on greedy parser 2014-12-16 22:46:55 +11:00
Matthew Honnibal
9959a64f7b * Working morphology and lemmatisation. POS tagging quite fast. 2014-12-10 08:09:32 +11:00
Matthew Honnibal
42973c4b37 * Improve efficiency of tagger, and improve morphological processing 2014-12-10 01:02:04 +11:00
Matthew Honnibal
6b34a2f34b * Move morphological analysis into its own module, morphology.pyx 2014-12-09 21:16:17 +11:00