Matthew Honnibal
|
7663970d5f
|
* Removed unused i variable from Span, and set attributes to read-only
|
2015-11-07 17:06:15 +11:00 |
|
Matthew Honnibal
|
4b3c96d76d
|
* Fix zero-length spans
|
2015-11-07 17:05:16 +11:00 |
|
Matthew Honnibal
|
888c05a7fa
|
* Fix variable naming in StepwiseState, for thinc 4.0
|
2015-11-07 11:02:44 +11:00 |
|
Matthew Honnibal
|
fc2185bfe3
|
* Fix variable naming in StepwiseState, for thinc 4.0
|
2015-11-07 10:48:31 +11:00 |
|
Matthew Honnibal
|
954442a807
|
* Fix variable naming in StepwiseState, for thinc 4.0
|
2015-11-07 10:30:45 +11:00 |
|
Matthew Honnibal
|
06f26d258e
|
* Fix test_basic_create
|
2015-11-07 10:04:37 +11:00 |
|
Matthew Honnibal
|
1d3884c46d
|
* Fix test_basic_create
|
2015-11-07 10:03:56 +11:00 |
|
Matthew Honnibal
|
cc8febcbe1
|
* Fix Span comparison
|
2015-11-07 09:54:14 +11:00 |
|
Matthew Honnibal
|
af70dc166a
|
* Fix Last restriction, that was supposed to prevent conflicts with presets, but was incorrect.
|
2015-11-07 09:52:00 +11:00 |
|
Matthew Honnibal
|
a9b612abdf
|
* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient
|
2015-11-07 09:01:12 +11:00 |
|
Matthew Honnibal
|
56499d89ef
|
* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient
|
2015-11-07 08:55:34 +11:00 |
|
Andreas Grivas
|
83ca4e0b93
|
* use old merge tests - add more
|
2015-11-07 07:57:04 +11:00 |
|
Andreas Grivas
|
4be7fda453
|
* span start, end -> properties. autoupdate after merge
|
2015-11-07 07:57:04 +11:00 |
|
Andreas Grivas
|
562db6d2d0
|
* merge add lex last - add index finder funcs
|
2015-11-07 07:57:04 +11:00 |
|
Matthew Honnibal
|
a06e3c8963
|
* Fix bone-headed mistake in StateClass.E
|
2015-11-07 07:35:28 +11:00 |
|
Matthew Honnibal
|
d24b8509e4
|
* Correct screw ups from the previous commits
|
2015-11-07 06:51:41 +11:00 |
|
Matthew Honnibal
|
5efad178b5
|
* Set ent tag when close entity
|
2015-11-07 06:09:25 +11:00 |
|
Matthew Honnibal
|
9285f01d26
|
* Fix broken StateClass.E tracking
|
2015-11-07 06:06:39 +11:00 |
|
Matthew Honnibal
|
19136b0e7d
|
* Add better debug message for illegal move
|
2015-11-07 05:34:37 +11:00 |
|
Matthew Honnibal
|
2733816b7b
|
* Fix whitespace
|
2015-11-07 05:31:06 +11:00 |
|
Matthew Honnibal
|
01ab464383
|
* Prevent Begin and In moves from applying in NER if we're at the last token of a sentence, as this would mean the entity would span over a sentence boundary. Re Issue #169
|
2015-11-07 05:30:44 +11:00 |
|
Matthew Honnibal
|
b65633f270
|
* Fix function that returns nth entity in StateClass. Was only returning the first.
|
2015-11-07 05:29:11 +11:00 |
|
Matthew Honnibal
|
410b6f9ec1
|
* Remove deprecated _ml.pyx. We now use the nicer APIs provided by thinc 4.0, and subclass the AveragedPerceptron class.
|
2015-11-07 05:13:10 +11:00 |
|
Matthew Honnibal
|
3c162dcac3
|
* Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc.
|
2015-11-07 03:24:30 +11:00 |
|
Matthew Honnibal
|
9d1b2a103a
|
* Fix capitalization in lemmatizer
|
2015-11-06 05:44:35 +11:00 |
|
Matthew Honnibal
|
6ed3aedf79
|
* Merge vocab changes
|
2015-11-06 00:48:08 +11:00 |
|
Matthew Honnibal
|
72abbb43fb
|
* Add type declarations in strings.pyx
|
2015-11-06 00:47:26 +11:00 |
|
Matthew Honnibal
|
5b2af4864f
|
* When lemmatizing non-noun, non-verb, non-adj words, output lower-case
|
2015-11-06 00:45:09 +11:00 |
|
Matthew Honnibal
|
754bf04162
|
* Remove declaration of Model.update
|
2015-11-06 00:31:15 +11:00 |
|
Matthew Honnibal
|
e18bdff23a
|
Merge branch 'master' of ssh://github.com/honnibal/spaCy
|
2015-11-06 00:26:15 +11:00 |
|
Matthew Honnibal
|
b9991fbd20
|
* Update to use thinc 3.0
|
2015-11-06 00:25:59 +11:00 |
|
Matthew Honnibal
|
864a8f45d8
|
* Use unicode in StringStore.intern, instead of unreliably casting to bytes.
|
2015-11-05 11:32:19 +00:00 |
|
Matthew Honnibal
|
b18204cd52
|
* Fix StringStore._realloc, re Issue #155
|
2015-11-05 11:28:26 +00:00 |
|
Matthew Honnibal
|
f8004c5f65
|
* Begin upgrading to improved thinc API
|
2015-11-05 03:53:03 +11:00 |
|
Matthew Honnibal
|
adc7bbd6cf
|
* Fix name of like_num in default_lex_attrs
|
2015-11-04 22:02:47 +11:00 |
|
Matthew Honnibal
|
e96faf29e7
|
* Rename like_number to like_num, to fix inconsistency re Issue #166
|
2015-11-04 22:01:44 +11:00 |
|
Matthew Honnibal
|
65934b7cd4
|
* Enforce import of ujson in strings.pyx, because otherwise it's too slow
|
2015-11-04 00:32:02 +11:00 |
|
Matthew Honnibal
|
1ce5d5602d
|
* Rename Doc.data to Doc.c
|
2015-11-04 00:17:13 +11:00 |
|
Matthew Honnibal
|
68f479e821
|
* Rename Doc.data to Doc.c
|
2015-11-04 00:15:14 +11:00 |
|
Matthew Honnibal
|
3ddea19b2b
|
* Rename spans.pyx to span.pyx
|
2015-11-04 00:14:40 +11:00 |
|
Matthew Honnibal
|
9482d616bc
|
* Rename spans.pyx to span.pyx
|
2015-11-03 23:51:05 +11:00 |
|
Matthew Honnibal
|
116da5990a
|
* Clean up setting of tag in doc.from_bytes
|
2015-11-03 23:48:57 +11:00 |
|
Matthew Honnibal
|
9ec7b9c454
|
* Clean up unused Constituent struct.
|
2015-11-03 23:48:21 +11:00 |
|
Matthew Honnibal
|
1e99fcd413
|
* Rename .repvec to .vector in C API
|
2015-11-03 23:47:59 +11:00 |
|
Matthew Honnibal
|
ee3f9ba581
|
* Fix test of serializer
|
2015-11-03 19:45:16 +11:00 |
|
Matthew Honnibal
|
d06ba26371
|
* Fix test of serializer
|
2015-11-03 19:43:27 +11:00 |
|
Matthew Honnibal
|
4083059650
|
Merge branch 'master' of https://github.com/honnibal/spaCy
|
2015-11-03 09:07:19 +01:00 |
|
Matthew Honnibal
|
9e37437ba8
|
* Fix assign_tag in doc.merge
|
2015-11-03 19:07:02 +11:00 |
|
Matthew Honnibal
|
dde9e1357c
|
* Add todo to morphology.lemmatize
|
2015-11-03 18:54:35 +11:00 |
|
Matthew Honnibal
|
ffedff9e6c
|
* Remove the archive after download, to save disk space
|
2015-11-03 18:54:05 +11:00 |
|
Matthew Honnibal
|
85372468e3
|
* Fix serialize test
|
2015-11-03 08:51:33 +01:00 |
|
Matthew Honnibal
|
833eb35c57
|
* Fix tag assignment in doc.from_array
|
2015-11-03 18:45:54 +11:00 |
|
Matthew Honnibal
|
09664177d7
|
* Fix tag handling in doc.merge, and assign sent_start when setting heads.
|
2015-11-03 18:15:52 +11:00 |
|
Matthew Honnibal
|
389a373807
|
Merge branch 'master' of ssh://github.com/honnibal/spaCy
|
2015-11-03 18:07:25 +11:00 |
|
Matthew Honnibal
|
3f44b3e43f
|
* Mark serializer test as requiring models
|
2015-11-03 18:07:08 +11:00 |
|
Matthew Honnibal
|
25ed7be8f8
|
Merge branch 'master' of https://github.com/honnibal/spaCy
|
2015-11-03 07:58:17 +01:00 |
|
Matthew Honnibal
|
604ceac4c6
|
* Fix morphological assignment in doc.merge()
|
2015-11-03 17:57:51 +11:00 |
|
Matthew Honnibal
|
5e040855a5
|
* Ensure morphological features and lemmas are loaded in from_array, re Issue #152
|
2015-11-03 17:56:50 +11:00 |
|
Matthew Honnibal
|
5668feb235
|
* Fix pickle test for python3
|
2015-11-03 04:57:02 +01:00 |
|
Matthew Honnibal
|
6161d2529a
|
Merge branch 'master' of ssh://github.com/honnibal/spaCy
|
2015-11-03 13:36:30 +11:00 |
|
Matthew Honnibal
|
5887506f5d
|
* Don't expect lexemes.bin in Vocab
|
2015-11-03 13:23:39 +11:00 |
|
Matthew Honnibal
|
f7dd377575
|
* Adjust conjuncts iterator in Token
|
2015-11-03 13:23:22 +11:00 |
|
Andreas Grivas
|
d418f00eb1
|
fixed error when printing unicode
|
2015-11-02 20:23:18 +02:00 |
|
Matthew Honnibal
|
52fc338001
|
* Set is_parsed and is_tagged attrs when loading annotations into Doc, re Issue #152
|
2015-10-28 10:43:22 +11:00 |
|
Matthew Honnibal
|
1c0356e4c2
|
* Set test file mode to w+t
|
2015-10-26 22:40:48 +11:00 |
|
Matthew Honnibal
|
0fe98f358b
|
* Fix mode on text file for Python3 in strings test
|
2015-10-26 22:25:16 +11:00 |
|
Matthew Honnibal
|
8ba9cf905e
|
* Fix mode on text file for Python3 in strings test
|
2015-10-26 21:44:34 +11:00 |
|
Matthew Honnibal
|
a0730699b1
|
* Fix mode on text file for Python3 in strings test
|
2015-10-26 21:25:56 +11:00 |
|
Matthew Honnibal
|
725344d349
|
* Fix tempfile in test
|
2015-10-26 21:08:18 +11:00 |
|
Matthew Honnibal
|
f11030aadc
|
* Remove out-dated TODO comment
|
2015-10-26 12:33:38 +11:00 |
|
Matthew Honnibal
|
a371a1071d
|
* Save and load word vectors during pickling, re Issue #125
|
2015-10-26 12:33:04 +11:00 |
|
Matthew Honnibal
|
a824a98312
|
* Add tests for pickling vectors, re: Issue #125
|
2015-10-26 12:31:05 +11:00 |
|
Matthew Honnibal
|
314090cc78
|
* Set vectors length when unpickling vocab, re Issue #125
|
2015-10-26 12:05:08 +11:00 |
|
Matthew Honnibal
|
4e16f9e435
|
* Move tests underneath spacy/
|
2015-10-26 00:07:31 +11:00 |
|
Matthew Honnibal
|
3a6e48e814
|
Merge pull request #149 from chrisdubois/pickle-patch
Add __reduce__ to Tokenizer so that English pickles.
|
2015-10-25 15:30:31 +11:00 |
|
Chris DuBois
|
dac8fe7bdb
|
Add __reduce__ to Tokenizer so that English pickles.
- Add tests to test_pickle and test_tokenizer that save to tempfiles.
|
2015-10-23 22:24:03 -07:00 |
|
Matthew Honnibal
|
ff4fe524ee
|
* Fix exception for python 2
|
2015-10-23 01:56:13 +02:00 |
|
Matthew Honnibal
|
341a3e85cd
|
* Upd downloaded data version
|
2015-10-23 00:56:57 +02:00 |
|
Matthew Honnibal
|
f18fd8c659
|
* Fix language.py for change in StringStore load API
|
2015-10-23 03:48:12 +11:00 |
|
Matthew Honnibal
|
23855db3ca
|
Merge branch 'master' of ssh://github.com/honnibal/spaCy into develop
|
2015-10-23 03:46:09 +11:00 |
|
Matthew Honnibal
|
4f13849065
|
Merge pull request #145 from henningpeters/master
better error reporting, cleanup
|
2015-10-23 03:45:47 +11:00 |
|
Matthew Honnibal
|
3be94be0c0
|
Merge pull request #148 from maxirmx/master
Utf8 encoding for lemma_rules.json
|
2015-10-22 21:46:28 +11:00 |
|
Matthew Honnibal
|
c86bda8d1a
|
* Fix import of uget
|
2015-10-22 21:13:56 +11:00 |
|
Matthew Honnibal
|
2348a08481
|
* Load/dump strings with a json file, instead of the hacky strings file we were using.
|
2015-10-22 21:13:03 +11:00 |
|
Matthew Honnibal
|
9baf0abd59
|
* Save vocab after training.
|
2015-10-22 21:09:14 +11:00 |
|
maxirmx
|
f07e4accd7
|
Fixing encoding issue #4
|
2015-10-21 20:45:56 +03:00 |
|
maxirmx
|
fcbfff043f
|
Fixing encoding issue #3
|
2015-10-21 15:52:34 +03:00 |
|
maxirmx
|
fe9d2e2c4e
|
Fixing encode issue #2
|
2015-10-21 15:36:21 +03:00 |
|
maxirmx
|
e4a1726f77
|
Fixing encoding issue
UTF-8
|
2015-10-21 14:16:37 +03:00 |
|
Andreas Grivas
|
93ada458e2
|
added __repr__ that prints text in ipython for doc, token, and span objects
|
2015-10-21 14:11:46 +03:00 |
|
Henning Peters
|
ccffd2ef53
|
fixed extract directory
|
2015-10-21 07:59:34 +02:00 |
|
Henning Peters
|
da4c9cee06
|
assert filename match
|
2015-10-20 19:33:59 +02:00 |
|
Henning Peters
|
4f703f0cb4
|
better error reporting, cleanup
|
2015-10-20 19:11:29 +02:00 |
|
Matthew Honnibal
|
9cdea6e450
|
* Import uget correctly
|
2015-10-19 08:32:41 +02:00 |
|
Matthew Honnibal
|
6727a46bb5
|
* Fix Issue #118: Matcher behaves unpredictably when matches overlap.
|
2015-10-19 16:45:32 +11:00 |
|
Matthew Honnibal
|
135062d23c
|
* Fix error with merged text when merged region did not have trailing whitespace
|
2015-10-19 15:47:04 +11:00 |
|
Henning Peters
|
bfde91fa49
|
add custom download tool (uget), replace wget with uget
|
2015-10-18 12:35:04 +02:00 |
|
Matthew Honnibal
|
9839cd2c0b
|
* Fix whitespace_ calculation in Token
|
2015-10-18 17:21:11 +11:00 |
|
Matthew Honnibal
|
c99285b8b9
|
* Clean up C++ usage in spacy/matcher.pyx
|
2015-10-18 17:20:50 +11:00 |
|
Matthew Honnibal
|
a7e6c5ac8f
|
* Fix Issue #122: Incorrect calculation of children after Doc.merge()
|
2015-10-18 17:17:27 +11:00 |
|