Matthew Honnibal
|
18063803de
|
Make TokenC.sent_tart an int, to allow ternary value
|
2017-10-08 19:58:54 +02:00 |
|
Matthew Honnibal
|
84e66ca6d4
|
WIP on stringstore change. 27 failures
|
2017-05-28 14:06:40 +02:00 |
|
Matthew Honnibal
|
f51e6a6c16
|
Adjust lexeme sizing for attr_t being 64 bit
|
2017-05-28 12:51:09 +02:00 |
|
Matthew Honnibal
|
3ea98e2043
|
Remove vector member from lexeme
|
2017-05-28 11:46:24 +02:00 |
|
Matthew Honnibal
|
793430aa7a
|
Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
|
2017-05-17 12:04:50 +02:00 |
|
Matthew Honnibal
|
58e83fe34b
|
Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match.
|
2016-09-21 14:54:55 +02:00 |
|
Wolfgang Seeker
|
03fb498dbe
|
introduce lang field for LexemeC to hold language id
put noun_chunk logic into iterators.py for each language separately
|
2016-03-10 13:01:34 +01:00 |
|
Matthew Honnibal
|
9ec7b9c454
|
* Clean up unused Constituent struct.
|
2015-11-03 23:48:21 +11:00 |
|
Matthew Honnibal
|
1e99fcd413
|
* Rename .repvec to .vector in C API
|
2015-11-03 23:47:59 +11:00 |
|
Matthew Honnibal
|
7ac6cacc26
|
* Remove const qualifier on LexemeC.repvec
|
2015-09-15 14:42:51 +10:00 |
|
Matthew Honnibal
|
c2307fa9ee
|
* More work on language-generic parsing
|
2015-08-28 02:02:33 +02:00 |
|
Matthew Honnibal
|
1d7f2d3abc
|
* Hack on morphology structs
|
2015-08-26 19:18:36 +02:00 |
|
Matthew Honnibal
|
815bda201d
|
* Remove UniStr struct
|
2015-07-22 13:39:17 +02:00 |
|
Matthew Honnibal
|
128b6d9714
|
* Move Utf8Str struct to strings module, as that's the only place it's relevant
|
2015-07-20 12:06:41 +02:00 |
|
Matthew Honnibal
|
4dddc8a69b
|
* Fix type declarations for attr_t. Remove unused id_t.
|
2015-07-18 22:39:57 +02:00 |
|
Matthew Honnibal
|
95e57c2780
|
* Remove unnecessary key and id properties from Utf8String.
|
2015-07-17 01:40:18 +02:00 |
|
Matthew Honnibal
|
aa82caf8f5
|
* Add TokenC.spacy attr
|
2015-07-13 19:48:07 +02:00 |
|
Matthew Honnibal
|
1d3a592edf
|
* Remove the senses attr from LexemeC, to keep data compatibility
|
2015-07-08 19:24:44 +02:00 |
|
Matthew Honnibal
|
e23d1582a2
|
* Add supersense data to Lexeme objects. Add simple has_sense method to check the flag.
|
2015-07-01 18:50:37 +02:00 |
|
Matthew Honnibal
|
a7bf7b0626
|
* Rename sent_start to sent_end, to reflect its new usage in the Break transition
|
2015-06-23 05:39:43 +02:00 |
|
Matthew Honnibal
|
8ee7c541f1
|
* Update Constituent definition
|
2015-05-20 16:03:26 +02:00 |
|
Matthew Honnibal
|
03a6626545
|
* Tmp commit
|
2015-05-12 20:27:56 +02:00 |
|
Matthew Honnibal
|
d2ac8d8007
|
* Add ctnt field to State, in preparation for constituency parsing
|
2015-05-12 20:27:56 +02:00 |
|
Matthew Honnibal
|
d634038eb6
|
* Add l_edge and r_edge props in TokenC for tracking the parse-yield of the token
|
2015-05-12 20:26:41 +02:00 |
|
Jordan Suchow
|
3a8d9b37a6
|
Remove trailing whitespace
|
2015-04-19 13:01:38 -07:00 |
|
Matthew Honnibal
|
8057a95f20
|
* NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring.
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
b3eda03c9c
|
* Tmp
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
135756ac3d
|
* Tmp commit of NER refactoring
|
2015-03-26 16:44:42 +01:00 |
|
Matthew Honnibal
|
b139aa92ba
|
* Start setting out how NER will be implemented in the data model
|
2015-03-26 16:44:41 +01:00 |
|
Matthew Honnibal
|
75f9b7d6bf
|
* Add L2 norm field to LexemeC struct
|
2015-02-07 08:43:17 -05:00 |
|
Matthew Honnibal
|
08ca5c8970
|
* Add sent_end flag to TokenC struct
|
2015-01-31 13:44:16 +11:00 |
|
Matthew Honnibal
|
12b034e3ef
|
* Move POS tag definitions to parts_of_speech.pxd
|
2015-01-25 16:31:07 +11:00 |
|
Matthew Honnibal
|
fda94271af
|
* Rename NORM1 and NORM2 attrs to lower and norm
|
2015-01-24 06:17:03 +11:00 |
|
Matthew Honnibal
|
5ed8b2b98f
|
* Rename sic to orth
|
2015-01-23 02:08:25 +11:00 |
|
Matthew Honnibal
|
45264e356b
|
* Rename vec to repvec
|
2015-01-22 02:04:24 +11:00 |
|
Matthew Honnibal
|
6c7e44140b
|
* Work on word vectors, and other stuff
|
2015-01-17 16:21:17 +11:00 |
|
Matthew Honnibal
|
46da3d74d2
|
* Tmp. Refactoring, introducing a Lexeme PyObject.
|
2015-01-12 11:23:44 +11:00 |
|
Matthew Honnibal
|
ce2edd6312
|
* Tmp commit. Refactoring to create a Python Lexeme class.
|
2015-01-12 10:26:22 +11:00 |
|
Matthew Honnibal
|
b8b65903fc
|
* Tmp
|
2014-12-24 17:42:00 +11:00 |
|
Matthew Honnibal
|
e1c1a4b868
|
* Tmp
|
2014-12-21 05:36:29 +11:00 |
|
Matthew Honnibal
|
780cbd68b1
|
* Move all struct definitions to structs.pxd, to avoid circular dependencies
|
2014-12-20 06:51:33 +11:00 |
|