Commit Graph

1084 Commits

Author SHA1 Message Date
Matthew Honnibal
e35bb36be7 * Ensure Lexeme.check_flag returns a boolean value 2015-09-06 17:52:32 +02:00
Matthew Honnibal
7e4fea67d3 * Fix bug in token subtree, introduced by duplication of L/R code in Stateclass. Need to consolidate the two methods. 2015-09-06 10:48:36 +02:00
Matthew Honnibal
5edac11225 * Wrap self.parse in nogil, and break if an invalid move is predicted. The invalid break is a work-around that papers over likely bugs, but we can't easily break in the nogil block, and otherwise we'll get an infinite loop. Need to set this as an error flag. 2015-09-06 04:15:00 +02:00
Matthew Honnibal
fd1eeb3102 * Add POS attribute support in get_attr 2015-09-06 04:13:03 +02:00
Matthew Honnibal
534e3dda3c * More work on language independent parsing 2015-08-28 03:44:54 +02:00
Matthew Honnibal
c2307fa9ee * More work on language-generic parsing 2015-08-28 02:02:33 +02:00
Matthew Honnibal
86c4a8e3e2 * Work on new morphology organization 2015-08-27 23:11:51 +02:00
Matthew Honnibal
5b89e2454c * Improve error-reporting in tagger 2015-08-27 10:26:36 +02:00
Matthew Honnibal
f0a7c99554 * Relax rule-requirement in lemmatizer 2015-08-27 10:26:19 +02:00
Matthew Honnibal
0af139e183 * Tagger training now working. Still need to test load/save of model. Morphology still broken. 2015-08-27 09:16:11 +02:00
Matthew Honnibal
1302d35dff * Rework interfaces in vocab 2015-08-26 19:21:46 +02:00
Matthew Honnibal
2d521768a3 * Store Morphology class in Vocab 2015-08-26 19:21:03 +02:00
Matthew Honnibal
d30029979e * Avoid import of morphology in spans 2015-08-26 19:20:46 +02:00
Matthew Honnibal
119c0f8c3f * Hack out morphology stuff from tokenizer, while morphology being reimplemented. 2015-08-26 19:20:11 +02:00
Matthew Honnibal
b4faf551f5 * Refactor language-independent tagger class 2015-08-26 19:19:21 +02:00
Matthew Honnibal
a3d5e6c0dd * Reform constructor and save/load workflow in parser model 2015-08-26 19:19:01 +02:00
Matthew Honnibal
1d7f2d3abc * Hack on morphology structs 2015-08-26 19:18:36 +02:00
Matthew Honnibal
f8f2f4e545 * Temporarily add PUNC name to parts_of_specch dictionary, until better solution 2015-08-26 19:18:19 +02:00
Matthew Honnibal
008b02b035 * Comment out enums in Morpohlogy for now 2015-08-26 19:17:35 +02:00
Matthew Honnibal
378729f81a * Hack Morphology class towards usability 2015-08-26 19:17:21 +02:00
Matthew Honnibal
430affc347 * Fix missing n_patterns property in Matcher class. Fix from_dir method 2015-08-26 19:17:02 +02:00
Matthew Honnibal
3acf60df06 * Add missing properties in Lexeme class 2015-08-26 19:16:28 +02:00
Matthew Honnibal
76996f4145 * Hack on generic Language class. Still needs work for morphology, defaults, etc 2015-08-26 19:16:09 +02:00
Matthew Honnibal
e2ef78b29c * Gut pos.pyx module, since functionality moved to spacy/tagger.pyx 2015-08-26 19:15:42 +02:00
Matthew Honnibal
c4d8754385 * Specify LOCAL_DATA_DIR global in spacy.en.__init__.py 2015-08-26 19:15:07 +02:00
Matthew Honnibal
c2d8edd0bd * Add PROB attribute in attrs.pxd 2015-08-26 19:14:19 +02:00
Matthew Honnibal
c5a27d1821 * Move lemmatizer to spacy 2015-08-25 15:47:08 +02:00
Matthew Honnibal
82217c6ec6 * Generalize lemmatizer 2015-08-25 15:46:19 +02:00
Matthew Honnibal
8083a07c3e * Use language base class 2015-08-25 15:37:30 +02:00
Matthew Honnibal
f2f699ac18 * Add language base class 2015-08-25 15:37:17 +02:00
Matthew Honnibal
5dd76be446 * Split EnPosTagger up into base class and subclass 2015-08-24 05:25:55 +02:00
Matthew Honnibal
5d5922dbfa * Begin laying out morphological features 2015-08-24 01:04:30 +02:00
Matthew Honnibal
6f1743692a * Work on language-independent refactoring 2015-08-23 20:49:18 +02:00
Matthew Honnibal
3879d28457 * Fix https for url detection 2015-08-23 02:40:35 +02:00
Matthew Honnibal
cad0cca4e3 * Tmp 2015-08-22 22:04:34 +02:00
Matthew Honnibal
bf38b3b883 * Hack on l/r reversal bug 2015-08-10 05:58:43 +02:00
Matthew Honnibal
6116413b47 * Fix label prediction in StepwiseState 2015-08-10 05:05:31 +02:00
Matthew Honnibal
2c9753eff2 * Whitespace 2015-08-10 00:09:02 +02:00
Matthew Honnibal
9de98f5a6f * Add Parser.stepthrough method, with context manager 2015-08-10 00:08:46 +02:00
Matthew Honnibal
fe43f8cf39 * Whitespace 2015-08-09 02:31:53 +02:00
Matthew Honnibal
9c090945e0 * Add Parser.predict method, and clean up Parser.get_state 2015-08-09 02:29:58 +02:00
Matthew Honnibal
04fccfb984 * Fix get_state for parser prediction 2015-08-09 02:11:22 +02:00
Matthew Honnibal
55fde0e240 * Fix get_state 2015-08-09 01:45:30 +02:00
Matthew Honnibal
f0f4fa9838 * Fix Parser.get_state 2015-08-09 01:40:13 +02:00
Matthew Honnibal
18331dca89 * Add continue_for argument to parser 'partial' function, which is now renamed to get_state 2015-08-09 01:31:54 +02:00
Matthew Honnibal
0653288fa5 * Fix stateclass.queue 2015-08-09 00:39:02 +02:00
Matthew Honnibal
9de218b7ba * Fix Parser.partial function 2015-08-08 23:45:18 +02:00
Matthew Honnibal
01be34d55a * Whitespace 2015-08-08 23:37:44 +02:00
Matthew Honnibal
cc9deae960 * Add is_valid method to transition_system 2015-08-08 23:36:18 +02:00
Matthew Honnibal
2a46c77324 * Whitespace 2015-08-08 23:35:59 +02:00
Matthew Honnibal
7bafc789e7 * Add stack and queue properties to stateclass, for python access 2015-08-08 23:32:42 +02:00
Matthew Honnibal
3af938365f * Add function partial to Parser 2015-08-08 23:32:15 +02:00
Matthew Honnibal
76a1f0481a * Whitespace 2015-08-08 23:31:54 +02:00
Matthew Honnibal
b0f5c39084 * Fix handling of exclusion entities 2015-08-06 17:28:43 +02:00
Matthew Honnibal
9f65879991 * Fix shape attr bug, and fix handling of false positive matches 2015-08-06 17:28:14 +02:00
Matthew Honnibal
10d869d102 * Don't allow conjunction between NPs in base NP chunks 2015-08-06 16:31:53 +02:00
Matthew Honnibal
383dfabd67 * Fix matcher setting of entities 2015-08-06 16:27:01 +02:00
Matthew Honnibal
59c3bf60a6 * Ensure entity recognizer doesn't over-write preset types 2015-08-06 16:09:08 +02:00
Matthew Honnibal
cd7d1682cd * Fix loading of gazetteer.json file 2015-08-06 16:08:25 +02:00
Matthew Honnibal
9c667b7f15 * Set a value in attrs.pxd on the first flag, to reduce bugs 2015-08-06 16:08:04 +02:00
Matthew Honnibal
c263577424 * Fix lower attribute in lexeme.pxd 2015-08-06 16:07:41 +02:00
Matthew Honnibal
5737115e1e * Work on gazetteer matching 2015-08-06 14:33:21 +02:00
Matthew Honnibal
9c1724ecae * Gazetteer stuff working, now need to wire up to API 2015-08-06 00:35:40 +02:00
Matthew Honnibal
5bc0e83f9a * Reimplement matching in Cython, instead of Python. 2015-08-05 01:05:54 +02:00
Matthew Honnibal
4c87a696b3 * Add draft dfa matcher, in Python. Passing tests. 2015-08-04 15:55:28 +02:00
Matthew Honnibal
eb7138c761 * Add attr relation in base NP detection 2015-08-01 00:34:40 +02:00
Matthew Honnibal
4988356cf0 * Fix dependency type bug from merged tokens 2015-08-01 00:33:24 +02:00
Matthew Honnibal
78a9068319 * Fix spacy attr on merged tokens 2015-07-30 04:25:58 +02:00
Matthew Honnibal
430e2edb96 * Fix noun_chunks issue 2015-07-30 03:51:50 +02:00
Matthew Honnibal
9590968fc1 * Fix negative indices in Span 2015-07-30 02:30:24 +02:00
Matthew Honnibal
74d8cb3980 * Add noun_chunks iterator, and fix left/right child setting in Doc.merge 2015-07-30 02:29:49 +02:00
Matthew Honnibal
d153f18969 * Fix negative indices on spans 2015-07-29 22:36:03 +02:00
Matthew Honnibal
b5132bed7d * Set left and right children when loading parse from byte string 2015-07-28 21:03:18 +02:00
Matthew Honnibal
6609fcf4b2 * Make mem and vocab python-visible in Doc 2015-07-28 20:46:59 +02:00
Matthew Honnibal
d42fe2e694 * Add unicode_literals to strings.pyx 2015-07-28 16:15:53 +02:00
Matthew Honnibal
bb910cff92 * Fix Python3 problem in align_raw 2015-07-28 16:06:53 +02:00
Matthew Honnibal
dcafb181b9 * Fix Python3 problem in align_raw 2015-07-28 15:52:10 +02:00
Matthew Honnibal
c609ea18f0 * Increment version in download script 2015-07-28 15:22:17 +02:00
Matthew Honnibal
9c4d0aae62 * Switch to better Python2/3 compatible unicode handling 2015-07-28 14:45:37 +02:00
Matthew Honnibal
7606d9936f * Python3 correction for GoldParse 2015-07-28 14:44:53 +02:00
Matthew Honnibal
ddc1a5cfe5 * Fix training under python3 2015-07-28 14:09:30 +02:00
Matthew Honnibal
a8bbd7312c * Hackishly patch long dependencies problem 2015-07-28 00:14:29 +02:00
Matthew Honnibal
bb583f7f09 * Hackishly patch long dependencies problem 2015-07-27 23:14:33 +02:00
Matthew Honnibal
aa7a964a4f * Add a type declaration for doc.from_array 2015-07-27 22:57:22 +02:00
Matthew Honnibal
25a8774f42 * Fix regression in packer 2015-07-27 21:53:38 +02:00
Matthew Honnibal
1601e488ee * Fix bug in decoding non-ascii characters 2015-07-27 21:43:58 +02:00
Matthew Honnibal
6a95409cd2 * Fix type on bits 2015-07-27 21:16:49 +02:00
Matthew Honnibal
a296d72b54 * Fix en/attrs 2015-07-27 21:16:33 +02:00
Matthew Honnibal
45460f505c * Fix data type on read32 in BitArray 2015-07-27 21:12:13 +02:00
Matthew Honnibal
3d43f49f69 * Revert prev change 2015-07-27 10:58:15 +02:00
Matthew Honnibal
6b586cdad4 * Change lexemes.bin format. Add a header specifying size of LexemeC and number of lexemes, and don't have the redundant orth information. 2015-07-27 08:31:51 +02:00
Matthew Honnibal
af6ed18f2a * Ensure we don't use orth_encode on OOV words. 2015-07-27 02:12:01 +02:00
Matthew Honnibal
8535d872e8 * Set is_oov property in get_flags 2015-07-27 01:51:24 +02:00
Matthew Honnibal
8e4c69ee8c * Add is_oov property, and fix up handling of attributes 2015-07-27 01:50:06 +02:00
Matthew Honnibal
fc268f03eb * Assert against null pointer exceptions in vocab 2015-07-27 01:00:10 +02:00
Matthew Honnibal
0f093fdb30 * Fix get_by_orth for py3 2015-07-26 19:26:41 +02:00
Matthew Honnibal
ceeda5a739 * Fix get_by_orth for py3 2015-07-26 18:39:27 +02:00
Matthew Honnibal
6bb96c122d * Host IS_ flags in attrs.pxd, and add properties for them on Token and Lexeme objects 2015-07-26 16:37:16 +02:00
Matthew Honnibal
eeaea25f0c * Check oov_prob file is present 2015-07-26 16:36:38 +02:00
Matthew Honnibal
7eb2446082 * Return empty lexeme on empty string 2015-07-26 00:18:30 +02:00