Matthew Honnibal
49895fbef6
Rename 'SP' special tag to '_SP'
...
Renaming the tag with an underscore lets us add it to the tag map
without worrying that we'll change the sequence of tags, which throws
off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag,
the "VERB" tag is pushed to a different class ID, and the model is all
messed up.
2017-10-20 14:01:12 +02:00
Matthew Honnibal
506cf2eb13
Remove cpdef enum, to avoid too much code generation
2017-10-20 14:00:23 +02:00
ines
6dd14dc342
Add lookup lemmas to tokens without POS tags
2017-10-11 13:27:10 +02:00
Matthew Honnibal
17c467e0ab
Avoid clobbering existing lemmas
2017-10-11 03:33:06 -05:00
Matthew Honnibal
d528b6e36d
Add assign_untagged method in Morphology
2017-10-11 03:22:49 +02:00
Matthew Honnibal
72bbcc0871
Handle lemmatization for unknown string IDs
2017-09-24 05:01:31 -05:00
Matthew Honnibal
b78cc318c3
Fix loading of morphology exceptions
2017-06-04 16:34:32 -05:00
Matthew Honnibal
805495af27
Fix off-by-one in number of tags
2017-06-03 13:29:23 -05:00
Matthew Honnibal
11840ff5dd
Store tag map before normalizing props
2017-05-29 17:53:48 -05:00
Matthew Honnibal
fe11564b8e
Finish stringstore change. Also xfail vectors tests
2017-05-28 15:10:22 +02:00
Matthew Honnibal
84e66ca6d4
WIP on stringstore change. 27 failures
2017-05-28 14:06:40 +02:00
ines
d24589aa72
Clean up imports, unused code, whitespace, docstrings
2017-04-15 12:05:47 +02:00
ines
561f2a3eb4
Use consistent formatting for docstrings
2017-04-15 11:59:21 +02:00
Matthew Honnibal
c748907a66
Fix errors in previous commit
2017-03-25 22:25:01 +01:00
Matthew Honnibal
850d35dcb3
Make morphology use int attributes internally
...
The morphology class was calling the lemmatizer inconsistently,
which some string-valued attributes. This caused Issue #903 .
2017-03-25 21:49:10 +01:00
Raphaël Bournhonesque
f332bf05be
Remove unused import statements
2017-03-21 21:08:54 +01:00
Roman Inflianskas
66e1109b53
Add support for Universal Dependencies v2.0
2017-03-03 13:17:34 +01:00
Matthew Honnibal
95a52005df
Revert "Fix Issue #683 : Add 'SP' to tag_map, if it's not there already, within the Morphology class."
...
This reverts commit 40e71586d6
.
2017-01-09 09:55:55 -06:00
Matthew Honnibal
40e71586d6
Fix Issue #683 : Add 'SP' to tag_map, if it's not there already, within the Morphology class.
2016-12-18 23:44:05 +01:00
Matthew Honnibal
813249f826
Work on morphology class. Still not fully consistent with rest of library.
2016-12-18 17:35:22 +01:00
Matthew Honnibal
837a5d4100
Update morphology class so that exceptions can be added one-by-one, and so that arbitrary attributes can be referenced.
2016-12-18 16:49:46 +01:00
Matthew Honnibal
e6fc4afb04
Whitespace
2016-12-18 15:48:00 +01:00
Matthew Honnibal
57c4341453
Refactor loading of morphology exceptions, adding a method add_special_case.
2016-12-18 14:59:44 +01:00
Ines Montani
8350d65695
Change morphology and lemmatizer API
...
Take morphology features as object instead of keyword arguments
2016-12-07 21:12:49 +01:00
Matthew Honnibal
1fb09c3dc1
Fix morphology tagger
2016-11-04 19:19:09 +01:00
Matthew Honnibal
6e37ba1d82
Fix #602 , #603 --- Broken build
2016-11-04 09:54:24 +01:00
Matthew Honnibal
293c79c09a
Fix #595 : Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly.
2016-11-04 00:29:07 +01:00
Matthew Honnibal
07776d8096
Fix pos name conflict in lemmatize
2016-09-27 17:35:58 +02:00
Matthew Honnibal
bb4f201ad2
Pass morphological features from tag map into the lemmatizer.
2016-09-27 14:01:43 +02:00
Matthew Honnibal
7abe653223
* Fix imports
2016-01-19 03:36:51 +01:00
Matthew Honnibal
590f38bdb2
* Add hacky solution to Issue #220 . Currently specials.json only supports literal patterns, which doesn't allow us to pre-tag whitespace with the correct token, SP, as a rule. The data-driven approach should be easy but for some reason fails here. Adding a hard code in Morphology isn't a good solution, but we do want to fix the behaviour right away, and don't want to wait for an architecturally better solution.
2016-01-19 03:35:20 +01:00
Matthew Honnibal
9d1b2a103a
* Fix capitalization in lemmatizer
2015-11-06 05:44:35 +11:00
Matthew Honnibal
5b2af4864f
* When lemmatizing non-noun, non-verb, non-adj words, output lower-case
2015-11-06 00:45:09 +11:00
Matthew Honnibal
dde9e1357c
* Add todo to morphology.lemmatize
2015-11-03 18:54:35 +11:00
Matthew Honnibal
833eb35c57
* Fix tag assignment in doc.from_array
2015-11-03 18:45:54 +11:00
Matthew Honnibal
5ca57bd859
* Ensure Morphology can be pickled, to address Issue #125 .
2015-10-13 13:44:41 +11:00
Matthew Honnibal
278e12f7e8
* Addmorphology symbols to morphology. May need to remove these as an enum.
2015-10-13 13:44:40 +11:00
Matthew Honnibal
74c0853471
* Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS
2015-10-13 13:44:39 +11:00
Matthew Honnibal
2d9e5bf566
* Allow punctuation to be lemmatized
2015-10-09 19:02:42 +11:00
Matthew Honnibal
b3a70e6375
* Clean up unnecessary try/except block
2015-10-08 14:34:11 +11:00
Matthew Honnibal
85c3fec1d1
* Fix morphology loading
2015-09-10 14:52:23 +02:00
Matthew Honnibal
31ccf494e6
Merge branch 'develop' of https://github.com/honnibal/spaCy into develop
2015-09-09 14:33:38 +02:00
Matthew Honnibal
0b527fbdc8
* Set POS tag in morphology
2015-09-09 14:30:24 +02:00
Matthew Honnibal
2be3620333
* Save morphological analyses in a cache
2015-09-08 15:39:24 +02:00
Matthew Honnibal
9eae9837c4
* Fix morphology look up
2015-09-06 17:53:39 +02:00
Matthew Honnibal
534e3dda3c
* More work on language independent parsing
2015-08-28 03:44:54 +02:00
Matthew Honnibal
c2307fa9ee
* More work on language-generic parsing
2015-08-28 02:02:33 +02:00
Matthew Honnibal
86c4a8e3e2
* Work on new morphology organization
2015-08-27 23:11:51 +02:00
Matthew Honnibal
0af139e183
* Tagger training now working. Still need to test load/save of model. Morphology still broken.
2015-08-27 09:16:11 +02:00
Matthew Honnibal
378729f81a
* Hack Morphology class towards usability
2015-08-26 19:17:21 +02:00
Matthew Honnibal
3f1944d688
* Make PyPy work
2015-01-05 17:54:38 +11:00
Matthew Honnibal
b00bc01d8c
* All tests now passing for reorg
2014-12-23 13:18:59 +11:00
Matthew Honnibal
73f200436f
* Tests passing except for morphology/lemmatization stuff
2014-12-23 11:40:32 +11:00
Matthew Honnibal
cf8d26c3d2
* POS tagger training working after reorg
2014-12-22 08:54:47 +11:00
Matthew Honnibal
4c4aa2c5c9
* Work on train
2014-12-22 07:25:43 +11:00
Matthew Honnibal
2a89d70429
* Add vocab.pyx to setup, and ensure we can import spacy.en.lang
2014-12-21 06:03:53 +11:00
Matthew Honnibal
e1c1a4b868
* Tmp
2014-12-21 05:36:29 +11:00
Matthew Honnibal
4e30195c6d
* Refactor morphology.pyx
2014-12-20 07:27:28 +11:00
Matthew Honnibal
95ccea03b2
* Work on greedy parser
2014-12-16 22:46:55 +11:00
Matthew Honnibal
9959a64f7b
* Working morphology and lemmatisation. POS tagging quite fast.
2014-12-10 08:09:32 +11:00
Matthew Honnibal
42973c4b37
* Improve efficiency of tagger, and improve morphological processing
2014-12-10 01:02:04 +11:00
Matthew Honnibal
6b34a2f34b
* Move morphological analysis into its own module, morphology.pyx
2014-12-09 21:16:17 +11:00