Commit Graph

1340 Commits

Author SHA1 Message Date
Henning Peters
ac318b568c new approach to dependency headers 2015-12-13 11:49:17 +01:00
Henning Peters
345dda6f53 small fixes, add package build step 2015-12-07 06:50:26 +01:00
Henning Peters
9027cef3bc access model via sputnik 2015-12-07 06:01:28 +01:00
Henning Peters
73e5650be5 change index server 2015-11-18 18:09:46 +01:00
Henning Peters
50d15ea5d2 fix 2015-11-18 17:35:21 +01:00
Henning Peters
02a1dcec76 add data dir 2015-11-18 11:48:55 +01:00
Henning Peters
919a4f0b04 change data path, add repository 2015-11-18 11:40:46 +01:00
Henning Peters
12de895e60 fix version 2015-11-15 16:38:16 +01:00
Henning Peters
03d2f98cd5 add sputnik 2015-11-15 15:58:21 +01:00
Matthew Honnibal
ec7d36c3a4 * Add test for matcher end-point problem 2015-11-12 05:00:40 +11:00
Matthew Honnibal
d309622a27 * Add test for matcher end-point problem 2015-11-12 04:59:11 +11:00
Matthew Honnibal
56ea20a886 * Add test for matcher end-point problem 2015-11-12 04:58:53 +11:00
Matthew Honnibal
cfa4062147 * Add test for matcher end-point problem 2015-11-12 04:56:07 +11:00
Matthew Honnibal
5623242b3e * Adjust NER rules, so that U entries in gazetteer don't become B moves to the model 2015-11-12 04:48:23 +11:00
Matthew Honnibal
d67d7d5a86 * Add test for NER inconsistency bug 2015-11-08 16:19:33 +01:00
Matthew Honnibal
44fbdc7260 * Fix bug in NER transition system, that sometimes left no valid moves 2015-11-08 16:19:12 +01:00
Matthew Honnibal
ab5aac5b2f * Add .rank property to Token and Lexeme, for frequency rank 2015-11-08 16:18:25 +01:00
Matthew Honnibal
fde9a22ec2 * Add new test for ner 2015-11-08 13:57:15 +01:00
Matthew Honnibal
e92371bb54 * Fix rule that made Last action invalid if there was a preset of O, since if the entity is already open, that ship has sailed. 2015-11-08 22:17:51 +11:00
Matthew Honnibal
3b74739c3e * Download updated data 2015-11-08 21:24:25 +11:00
Matthew Honnibal
31da42eb27 * Mark tests that require models 2015-11-07 19:27:38 +11:00
Matthew Honnibal
8e26a28616 * Mark tests that require models 2015-11-07 19:10:56 +11:00
Matthew Honnibal
15eab7354f * Remove extraneous test files 2015-11-07 18:45:13 +11:00
Matthew Honnibal
6f47074214 * Make constructor of ParserModel and TaggerModel the same as AveragedPerceptron, for each pickling. 2015-11-07 18:25:17 +11:00
Matthew Honnibal
1cfa20fb17 * Fix sentence-final whitespace issue 2015-11-07 17:34:46 +11:00
Matthew Honnibal
7663970d5f * Removed unused i variable from Span, and set attributes to read-only 2015-11-07 17:06:15 +11:00
Matthew Honnibal
4b3c96d76d * Fix zero-length spans 2015-11-07 17:05:16 +11:00
Matthew Honnibal
888c05a7fa * Fix variable naming in StepwiseState, for thinc 4.0 2015-11-07 11:02:44 +11:00
Matthew Honnibal
fc2185bfe3 * Fix variable naming in StepwiseState, for thinc 4.0 2015-11-07 10:48:31 +11:00
Matthew Honnibal
954442a807 * Fix variable naming in StepwiseState, for thinc 4.0 2015-11-07 10:30:45 +11:00
Matthew Honnibal
06f26d258e * Fix test_basic_create 2015-11-07 10:04:37 +11:00
Matthew Honnibal
1d3884c46d * Fix test_basic_create 2015-11-07 10:03:56 +11:00
Matthew Honnibal
cc8febcbe1 * Fix Span comparison 2015-11-07 09:54:14 +11:00
Matthew Honnibal
af70dc166a * Fix Last restriction, that was supposed to prevent conflicts with presets, but was incorrect. 2015-11-07 09:52:00 +11:00
Matthew Honnibal
a9b612abdf * Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient 2015-11-07 09:01:12 +11:00
Matthew Honnibal
56499d89ef * Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient 2015-11-07 08:55:34 +11:00
Andreas Grivas
83ca4e0b93 * use old merge tests - add more 2015-11-07 07:57:04 +11:00
Andreas Grivas
4be7fda453 * span start, end -> properties. autoupdate after merge 2015-11-07 07:57:04 +11:00
Andreas Grivas
562db6d2d0 * merge add lex last - add index finder funcs 2015-11-07 07:57:04 +11:00
Matthew Honnibal
a06e3c8963 * Fix bone-headed mistake in StateClass.E 2015-11-07 07:35:28 +11:00
Matthew Honnibal
d24b8509e4 * Correct screw ups from the previous commits 2015-11-07 06:51:41 +11:00
Matthew Honnibal
5efad178b5 * Set ent tag when close entity 2015-11-07 06:09:25 +11:00
Matthew Honnibal
9285f01d26 * Fix broken StateClass.E tracking 2015-11-07 06:06:39 +11:00
Matthew Honnibal
19136b0e7d * Add better debug message for illegal move 2015-11-07 05:34:37 +11:00
Matthew Honnibal
2733816b7b * Fix whitespace 2015-11-07 05:31:06 +11:00
Matthew Honnibal
01ab464383 * Prevent Begin and In moves from applying in NER if we're at the last token of a sentence, as this would mean the entity would span over a sentence boundary. Re Issue #169 2015-11-07 05:30:44 +11:00
Matthew Honnibal
b65633f270 * Fix function that returns nth entity in StateClass. Was only returning the first. 2015-11-07 05:29:11 +11:00
Matthew Honnibal
410b6f9ec1 * Remove deprecated _ml.pyx. We now use the nicer APIs provided by thinc 4.0, and subclass the AveragedPerceptron class. 2015-11-07 05:13:10 +11:00
Matthew Honnibal
3c162dcac3 * Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc. 2015-11-07 03:24:30 +11:00
Matthew Honnibal
9d1b2a103a * Fix capitalization in lemmatizer 2015-11-06 05:44:35 +11:00
Matthew Honnibal
6ed3aedf79 * Merge vocab changes 2015-11-06 00:48:08 +11:00
Matthew Honnibal
72abbb43fb * Add type declarations in strings.pyx 2015-11-06 00:47:26 +11:00
Matthew Honnibal
5b2af4864f * When lemmatizing non-noun, non-verb, non-adj words, output lower-case 2015-11-06 00:45:09 +11:00
Matthew Honnibal
754bf04162 * Remove declaration of Model.update 2015-11-06 00:31:15 +11:00
Matthew Honnibal
e18bdff23a Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-11-06 00:26:15 +11:00
Matthew Honnibal
b9991fbd20 * Update to use thinc 3.0 2015-11-06 00:25:59 +11:00
Matthew Honnibal
864a8f45d8 * Use unicode in StringStore.intern, instead of unreliably casting to bytes. 2015-11-05 11:32:19 +00:00
Matthew Honnibal
b18204cd52 * Fix StringStore._realloc, re Issue #155 2015-11-05 11:28:26 +00:00
Matthew Honnibal
f8004c5f65 * Begin upgrading to improved thinc API 2015-11-05 03:53:03 +11:00
Matthew Honnibal
adc7bbd6cf * Fix name of like_num in default_lex_attrs 2015-11-04 22:02:47 +11:00
Matthew Honnibal
e96faf29e7 * Rename like_number to like_num, to fix inconsistency re Issue #166 2015-11-04 22:01:44 +11:00
Matthew Honnibal
65934b7cd4 * Enforce import of ujson in strings.pyx, because otherwise it's too slow 2015-11-04 00:32:02 +11:00
Matthew Honnibal
1ce5d5602d * Rename Doc.data to Doc.c 2015-11-04 00:17:13 +11:00
Matthew Honnibal
68f479e821 * Rename Doc.data to Doc.c 2015-11-04 00:15:14 +11:00
Matthew Honnibal
3ddea19b2b * Rename spans.pyx to span.pyx 2015-11-04 00:14:40 +11:00
Matthew Honnibal
9482d616bc * Rename spans.pyx to span.pyx 2015-11-03 23:51:05 +11:00
Matthew Honnibal
116da5990a * Clean up setting of tag in doc.from_bytes 2015-11-03 23:48:57 +11:00
Matthew Honnibal
9ec7b9c454 * Clean up unused Constituent struct. 2015-11-03 23:48:21 +11:00
Matthew Honnibal
1e99fcd413 * Rename .repvec to .vector in C API 2015-11-03 23:47:59 +11:00
Matthew Honnibal
ee3f9ba581 * Fix test of serializer 2015-11-03 19:45:16 +11:00
Matthew Honnibal
d06ba26371 * Fix test of serializer 2015-11-03 19:43:27 +11:00
Matthew Honnibal
4083059650 Merge branch 'master' of https://github.com/honnibal/spaCy 2015-11-03 09:07:19 +01:00
Matthew Honnibal
9e37437ba8 * Fix assign_tag in doc.merge 2015-11-03 19:07:02 +11:00
Matthew Honnibal
dde9e1357c * Add todo to morphology.lemmatize 2015-11-03 18:54:35 +11:00
Matthew Honnibal
ffedff9e6c * Remove the archive after download, to save disk space 2015-11-03 18:54:05 +11:00
Matthew Honnibal
85372468e3 * Fix serialize test 2015-11-03 08:51:33 +01:00
Matthew Honnibal
833eb35c57 * Fix tag assignment in doc.from_array 2015-11-03 18:45:54 +11:00
Matthew Honnibal
09664177d7 * Fix tag handling in doc.merge, and assign sent_start when setting heads. 2015-11-03 18:15:52 +11:00
Matthew Honnibal
389a373807 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-11-03 18:07:25 +11:00
Matthew Honnibal
3f44b3e43f * Mark serializer test as requiring models 2015-11-03 18:07:08 +11:00
Matthew Honnibal
25ed7be8f8 Merge branch 'master' of https://github.com/honnibal/spaCy 2015-11-03 07:58:17 +01:00
Matthew Honnibal
604ceac4c6 * Fix morphological assignment in doc.merge() 2015-11-03 17:57:51 +11:00
Matthew Honnibal
5e040855a5 * Ensure morphological features and lemmas are loaded in from_array, re Issue #152 2015-11-03 17:56:50 +11:00
Matthew Honnibal
5668feb235 * Fix pickle test for python3 2015-11-03 04:57:02 +01:00
Matthew Honnibal
6161d2529a Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-11-03 13:36:30 +11:00
Matthew Honnibal
5887506f5d * Don't expect lexemes.bin in Vocab 2015-11-03 13:23:39 +11:00
Matthew Honnibal
f7dd377575 * Adjust conjuncts iterator in Token 2015-11-03 13:23:22 +11:00
Andreas Grivas
d418f00eb1 fixed error when printing unicode 2015-11-02 20:23:18 +02:00
Matthew Honnibal
52fc338001 * Set is_parsed and is_tagged attrs when loading annotations into Doc, re Issue #152 2015-10-28 10:43:22 +11:00
Matthew Honnibal
1c0356e4c2 * Set test file mode to w+t 2015-10-26 22:40:48 +11:00
Matthew Honnibal
0fe98f358b * Fix mode on text file for Python3 in strings test 2015-10-26 22:25:16 +11:00
Matthew Honnibal
8ba9cf905e * Fix mode on text file for Python3 in strings test 2015-10-26 21:44:34 +11:00
Matthew Honnibal
a0730699b1 * Fix mode on text file for Python3 in strings test 2015-10-26 21:25:56 +11:00
Matthew Honnibal
725344d349 * Fix tempfile in test 2015-10-26 21:08:18 +11:00
Matthew Honnibal
f11030aadc * Remove out-dated TODO comment 2015-10-26 12:33:38 +11:00
Matthew Honnibal
a371a1071d * Save and load word vectors during pickling, re Issue #125 2015-10-26 12:33:04 +11:00
Matthew Honnibal
a824a98312 * Add tests for pickling vectors, re: Issue #125 2015-10-26 12:31:05 +11:00
Matthew Honnibal
314090cc78 * Set vectors length when unpickling vocab, re Issue #125 2015-10-26 12:05:08 +11:00
Matthew Honnibal
4e16f9e435 * Move tests underneath spacy/ 2015-10-26 00:07:31 +11:00
Matthew Honnibal
3a6e48e814 Merge pull request #149 from chrisdubois/pickle-patch
Add __reduce__ to Tokenizer so that English pickles.
2015-10-25 15:30:31 +11:00