Matthew Honnibal
|
3c270fc8ff
|
* Remove has_sense method from Lexeme
|
2015-07-08 19:28:29 +02:00 |
|
Matthew Honnibal
|
b64c843861
|
* Remove senses attr
|
2015-07-08 19:26:24 +02:00 |
|
Matthew Honnibal
|
1d3a592edf
|
* Remove the senses attr from LexemeC, to keep data compatibility
|
2015-07-08 19:24:44 +02:00 |
|
Matthew Honnibal
|
0ceb1f71c2
|
* Update parse features
|
2015-07-08 19:11:36 +02:00 |
|
Matthew Honnibal
|
2e51b5027a
|
* Alias Doc to Tokens, for backwards compatibility
|
2015-07-08 18:59:35 +02:00 |
|
Matthew Honnibal
|
e3c53f5ecd
|
* Fix mention of Tokens in docstring
|
2015-07-08 18:56:27 +02:00 |
|
Matthew Honnibal
|
bb522496dd
|
* Rename Tokens to Doc
|
2015-07-08 18:53:00 +02:00 |
|
Matthew Honnibal
|
b24e8be2b9
|
* Whitespace in docstring
|
2015-07-08 12:37:03 +02:00 |
|
Matthew Honnibal
|
abc43b852d
|
* Add pos_tags attr to Vocab.
|
2015-07-08 12:36:38 +02:00 |
|
Matthew Honnibal
|
935bcdf3e5
|
* Remove redundant tag_names argument to Tokenizer
|
2015-07-08 12:36:04 +02:00 |
|
Matthew Honnibal
|
ff885e8511
|
* Add ParserFactory convenience function
|
2015-07-08 12:35:46 +02:00 |
|
Matthew Honnibal
|
4e4fac452b
|
* Refactor __init__ for simplicity. Allow parse=True, tag=True etc flags to be passed at top-level. Do not lazy-load parser.
|
2015-07-08 12:35:29 +02:00 |
|
Matthew Honnibal
|
1d2deb4616
|
* Work on refactoring default arguments to English.__init__
|
2015-07-07 15:53:25 +02:00 |
|
Matthew Honnibal
|
2d0e99a096
|
* Pass pos_tags into Tokenizer.from_dir
|
2015-07-07 14:23:08 +02:00 |
|
Matthew Honnibal
|
6788c86b2f
|
* Begin refactor
|
2015-07-07 14:00:07 +02:00 |
|
Matthew Honnibal
|
52fd80c6c6
|
* Add experimental supersense features for parsing, based on lookup into wordnet.
|
2015-07-01 20:12:44 +02:00 |
|
Matthew Honnibal
|
e6d828a9af
|
* Set up an array POS_SENSES that denotes the set of valid senses for each POS tag. This way, we can do bitwise & between a lexeme's senses and the ones available for its POS tag, to get the allowable senses for the token.
|
2015-07-01 20:12:13 +02:00 |
|
Matthew Honnibal
|
2b8459d9a8
|
* Add senses flag to Lexeme
|
2015-07-01 20:10:41 +02:00 |
|
Matthew Honnibal
|
e23d1582a2
|
* Add supersense data to Lexeme objects. Add simple has_sense method to check the flag.
|
2015-07-01 18:50:37 +02:00 |
|
Matthew Honnibal
|
64fafa98be
|
* Add senses.pyx and senses.pxd
|
2015-07-01 18:49:44 +02:00 |
|
Matthew Honnibal
|
94dab94e5f
|
uerge branch 'master' of https://github.com/honnibal/spaCy
|
2015-06-30 18:16:26 +02:00 |
|
Matthew Honnibal
|
9af86b0b0b
|
* Fix attrs.pxd
|
2015-06-30 18:16:30 +02:00 |
|
Matthew Honnibal
|
af9c82f7a6
|
Merge branch 'master' of https://github.com/honnibal/spaCy
|
2015-06-30 18:11:37 +02:00 |
|
Matthew Honnibal
|
5d595b5a8c
|
* Inc versions
|
2015-06-30 18:11:06 +02:00 |
|
Matthew Honnibal
|
d2eeba6667
|
* Start wiring up color and emotion lexicons. Hopefully we get to use them.
|
2015-06-30 16:22:23 +02:00 |
|
Matthew Honnibal
|
e20106fdff
|
* Begin reorganizing neuralnet work
|
2015-06-30 14:26:32 +02:00 |
|
Matthew Honnibal
|
5cd3ed42d4
|
* Reenable averaging
|
2015-06-29 16:44:42 +02:00 |
|
Matthew Honnibal
|
894cbef8ba
|
* Wire eta and mu parameters up for neural net
|
2015-06-29 07:10:33 +02:00 |
|
Matthew Honnibal
|
3bb5876c5a
|
* Inline methods in StateClass
|
2015-06-29 01:10:14 +02:00 |
|
Matthew Honnibal
|
313a7f87b3
|
* Inline methods in StateClass
|
2015-06-29 01:06:28 +02:00 |
|
Matthew Honnibal
|
a02fd3af5d
|
* Check valency in L and R feature methods, to make feaure calculation faster
|
2015-06-29 00:27:56 +02:00 |
|
Matthew Honnibal
|
5d870720bc
|
* Check valency in L and R feature methods, to make feaure calculation faster
|
2015-06-29 00:17:29 +02:00 |
|
Matthew Honnibal
|
f4986d5d3c
|
* Use new Example class
|
2015-06-28 22:36:03 +02:00 |
|
Matthew Honnibal
|
735f1af91f
|
* Fix neural net stuff
|
2015-06-28 11:44:58 +02:00 |
|
Matthew Honnibal
|
e7003f1cf3
|
* Remove hard-coding of vector lengths
|
2015-06-28 11:37:17 +02:00 |
|
Matthew Honnibal
|
897dd0dd0b
|
* Merge changes, and adjust Example to use memoryview
|
2015-06-28 11:36:11 +02:00 |
|
Matthew Honnibal
|
9282a8e72c
|
* Prepare for new models to be plugged in by using Example class
|
2015-06-28 11:02:35 +02:00 |
|
Matthew Honnibal
|
75aeccc064
|
* Rejig parser interface to use new thinc.api.Example class, in prep of theano model. Comment out beam search
|
2015-06-28 11:02:34 +02:00 |
|
Matthew Honnibal
|
bf33598b34
|
* Work on a theano-driven model for the parser
|
2015-06-28 11:02:34 +02:00 |
|
Matthew Honnibal
|
bbef71f213
|
* Fix min function in fill_context
|
2015-06-28 10:46:39 +02:00 |
|
Matthew Honnibal
|
142b6f9510
|
* Revert last changes
|
2015-06-28 10:44:28 +02:00 |
|
Matthew Honnibal
|
b06962f18b
|
* Pad buffers in state
|
2015-06-28 10:36:14 +02:00 |
|
Matthew Honnibal
|
53be72387c
|
* Hack at fill_context to investigate performance loss
|
2015-06-28 10:34:28 +02:00 |
|
Matthew Honnibal
|
71a4e876a9
|
* Fix parse features
|
2015-06-28 09:27:33 +02:00 |
|
Matthew Honnibal
|
0c4b5a2bb0
|
* Start scoring tokens
|
2015-06-28 06:21:38 +02:00 |
|
Matthew Honnibal
|
5af500909c
|
* Remove unused directve from parser.pyx
|
2015-06-28 06:20:21 +02:00 |
|
Matthew Honnibal
|
d5b4090705
|
* Add profile directive
|
2015-06-28 06:19:33 +02:00 |
|
Matthew Honnibal
|
2b5421e60c
|
* Add profile directive
|
2015-06-28 06:07:04 +02:00 |
|
Matthew Honnibal
|
8b5de4a411
|
* Add word / tag / label sets, for use in neural net
|
2015-06-28 05:46:53 +02:00 |
|
Matthew Honnibal
|
cfcbd8d256
|
* Fix punctuation eval in scorer.py
|
2015-06-28 01:31:39 +02:00 |
|
Matthew Honnibal
|
ed40a8380e
|
* Remove hard-coding of vector lengths
|
2015-06-27 04:18:47 +02:00 |
|
Matthew Honnibal
|
ebe630cc8d
|
* Enable more features for NN
|
2015-06-27 04:17:29 +02:00 |
|
Matthew Honnibal
|
f8bb43475e
|
* Bridge to Theano working. Very disorganised. Using thinc adb60aba966ed2
|
2015-06-27 02:39:18 +02:00 |
|
Matthew Honnibal
|
2fe98b8a9a
|
* Prepare for new models to be plugged in by using Example class
|
2015-06-26 13:51:39 +02:00 |
|
Matthew Honnibal
|
6896455884
|
* Rejig parser interface to use new thinc.api.Example class, in prep of theano model. Comment out beam search
|
2015-06-26 06:25:36 +02:00 |
|
Matthew Honnibal
|
b266a63f2c
|
* Inc version of downloadble data
|
2015-06-24 04:53:08 +02:00 |
|
Matthew Honnibal
|
02b171ee67
|
* Bug fixes to edge calculation
|
2015-06-24 04:28:02 +02:00 |
|
Matthew Honnibal
|
a4e9bdf4c1
|
* Work on a theano-driven model for the parser
|
2015-06-24 01:02:40 +02:00 |
|
Matthew Honnibal
|
7f9384f53c
|
* Remove deprecated _state module
|
2015-06-23 17:28:24 +02:00 |
|
Matthew Honnibal
|
6dbe182491
|
* Fix merge conflicts
|
2015-06-23 17:28:00 +02:00 |
|
Matthew Honnibal
|
579735a095
|
* Remove import of _state module
|
2015-06-23 17:25:08 +02:00 |
|
Matthew Honnibal
|
88f55d136b
|
* Remove deprecated _state module
|
2015-06-23 17:19:51 +02:00 |
|
Matthew Honnibal
|
9ab9dd2bf7
|
* Clean up unused orig_arc_eager and tree_arc_eager modules, which were only added for EMNLP experiments
|
2015-06-23 17:17:33 +02:00 |
|
Matthew Honnibal
|
7ebfe4b983
|
* Fixes to edge features
|
2015-06-23 16:32:54 +02:00 |
|
Matthew Honnibal
|
7b125f5a86
|
* Fixes to edge features
|
2015-06-23 16:31:01 +02:00 |
|
Matthew Honnibal
|
8d4bbacfc5
|
* Fix edge navigation in Token objects
|
2015-06-23 16:07:34 +02:00 |
|
Matthew Honnibal
|
35c290bee4
|
* Fix edge features
|
2015-06-23 15:50:56 +02:00 |
|
Matthew Honnibal
|
221e2e485f
|
* Assign 'ROOT' as label, not 'root'
|
2015-06-23 15:09:54 +02:00 |
|
Matthew Honnibal
|
a7bf7b0626
|
* Rename sent_start to sent_end, to reflect its new usage in the Break transition
|
2015-06-23 05:39:43 +02:00 |
|
Matthew Honnibal
|
ee3e56f27b
|
* Fix bounds checking on entities
|
2015-06-23 04:35:08 +02:00 |
|
Matthew Honnibal
|
43ef5ddea5
|
* Ensure root albel is spelled ROOT, for backwards compatibility
|
2015-06-23 04:14:03 +02:00 |
|
Matthew Honnibal
|
065c2e1d2d
|
* Add some bounds checking around state arrays
|
2015-06-23 04:13:09 +02:00 |
|
Matthew Honnibal
|
89ae218b75
|
* Add import to tokens.pyx from weird Cython compiler issue with casting from memory views
|
2015-06-23 03:04:34 +02:00 |
|
Matthew Honnibal
|
f01b3d043e
|
* Add padding to arrays in stateclass. May be papering over a deeper bug.
|
2015-06-23 03:03:41 +02:00 |
|
Matthew Honnibal
|
5e94b5d581
|
* Have Tokens return proper numpy arrays, not Cython views.
|
2015-06-23 00:07:34 +02:00 |
|
Matthew Honnibal
|
69507bc729
|
* Re-enable Break transition in arc_eager.pyx
|
2015-06-23 00:03:30 +02:00 |
|
Matthew Honnibal
|
cc579ed429
|
* Add __len__ function to StringStore
|
2015-06-23 00:02:50 +02:00 |
|
Matthew Honnibal
|
46fb24e9fd
|
* Add cycle-checking code in gold.pyx
|
2015-06-23 00:02:22 +02:00 |
|
Matthew Honnibal
|
60d26243e3
|
* Fix head alignment in read_conll.parse, which was causing corrupt parses when strip_bad_periods=True. A similar problem may apply to other data readers.
|
2015-06-18 16:35:27 +02:00 |
|
Matthew Honnibal
|
f868175e43
|
* Whitespace
|
2015-06-16 23:37:46 +02:00 |
|
Matthew Honnibal
|
ab110be125
|
* Remove debugging in parser.pyx
|
2015-06-16 23:37:25 +02:00 |
|
Matthew Honnibal
|
9b13d11ab3
|
* Fix handling of entities in StateClass
|
2015-06-16 23:35:21 +02:00 |
|
Matthew Honnibal
|
c40a2c661c
|
* Add tree_arc_eager
|
2015-06-15 08:23:24 +02:00 |
|
Matthew Honnibal
|
5da5cf7084
|
* Add some more features for S1/S0
|
2015-06-15 04:07:13 +02:00 |
|
Matthew Honnibal
|
8156a01bca
|
* Fix root label for orig_arc_eager
|
2015-06-15 02:54:55 +02:00 |
|
Matthew Honnibal
|
21930ede15
|
* Switch toggle on USE_ROOT_ARC_SEGMENT
|
2015-06-15 02:54:32 +02:00 |
|
Matthew Honnibal
|
38a6afa484
|
* Make possibly dubious correction to the unshift oracle
|
2015-06-15 02:50:00 +02:00 |
|
Matthew Honnibal
|
f66228f253
|
* Add some more features, esp for labels
|
2015-06-14 21:18:02 +02:00 |
|
Matthew Honnibal
|
3da8e0f317
|
* Add orig_arc_eager
|
2015-06-14 20:31:44 +02:00 |
|
Matthew Honnibal
|
ea8a103007
|
* Fix import of TransitionSystem in parser.pyx
|
2015-06-14 19:01:26 +02:00 |
|
Matthew Honnibal
|
e0984ca139
|
* Fix valency features in StateClass
|
2015-06-14 17:50:26 +02:00 |
|
Matthew Honnibal
|
e50ac1a47f
|
* Add verbose printing to scorer
|
2015-06-14 17:45:50 +02:00 |
|
Matthew Honnibal
|
763cbd23d5
|
* Upd stateclass.print_state
|
2015-06-14 17:44:29 +02:00 |
|
Matthew Honnibal
|
bdd07bf000
|
* Fix Break oracle, but disable the Break transition for now, while we finalize the gold-standard experiments
|
2015-06-14 17:44:03 +02:00 |
|
Matthew Honnibal
|
399f15fbdf
|
* Add flag to toggle handling of multi-root inputs without the Break transition. Clear up now unused best_valid stuff.
|
2015-06-14 00:28:37 +02:00 |
|
Matthew Honnibal
|
75289b4761
|
* Don't refuse to parse single token sentences, incase some transition system needs them, e.g. single word entity. Instead fix error in _init_state.
|
2015-06-13 22:55:55 +02:00 |
|
Matthew Honnibal
|
77d7e79c7e
|
* Fix r/l and distance features.
|
2015-06-12 13:06:15 +02:00 |
|
Matthew Honnibal
|
b643cb3d5c
|
* Allow training documents to be filtered in gold.pyx
|
2015-06-12 02:42:08 +02:00 |
|
Matthew Honnibal
|
15e177d7a1
|
* Fixes to unshift/fast-forward strategy. Getting 91.55 greedy on NW dev, gold preproc
|
2015-06-12 01:50:23 +02:00 |
|
Matthew Honnibal
|
afd77a529b
|
* Prepare for break transition, with fast-forwarding. 86.5 on 1k nw gold preproc
|
2015-06-10 14:08:30 +02:00 |
|