Matthew Honnibal
2169bbb7ea
* Shadow StateClass with StateC, to start proxying
2016-02-01 01:16:14 +01:00
Matthew Honnibal
2fa228458e
* Add _state file, which StateClass will proxy to
2016-02-01 01:09:21 +01:00
Matthew Honnibal
6bb007d16e
* Make set_parse nogil
2016-01-30 20:27:52 +01:00
Matthew Honnibal
9410e74c92
* Switch parser to use nogil functions
2016-01-30 20:27:07 +01:00
Matthew Honnibal
10877a7791
* Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser
2016-01-30 14:31:36 +01:00
Matthew Honnibal
ea4ff94cde
* Whitespace
2016-01-29 03:59:22 +01:00
Matthew Honnibal
b0718b6ee1
* Move to thinc 5.0
2016-01-29 03:58:55 +01:00
Matthew Honnibal
9721502c81
* Update version
2016-01-25 15:52:59 +01:00
Matthew Honnibal
907e8cf07d
* Add u prefix to string in web example
2016-01-25 15:51:38 +01:00
Matthew Honnibal
eba03695ef
* Comment out pickle tests
2016-01-25 15:51:13 +01:00
Matthew Honnibal
de94e6c525
* Mark pickle tests as xfail, due to temp files problem
2016-01-25 15:24:17 +01:00
Matthew Honnibal
87172a15c6
* Fix runtime error bug that arose from updated Span.root function.
2016-01-25 15:22:42 +01:00
Matthew Honnibal
2c8dd91785
* Fix first code example on the website
2016-01-23 18:09:19 +01:00
Matthew Honnibal
3af84cfd6e
* Increment version
2016-01-21 17:49:27 +01:00
Henning Peters
65aeac24cb
remove package version constraint
2016-01-21 17:40:51 +01:00
Matthew Honnibal
792c98a438
* Increment version for OSX-fixed release of v0.100
2016-01-21 00:23:04 +01:00
Matthew Honnibal
82d011ac43
* Fix test for whitespace
2016-01-19 20:38:26 +01:00
Matthew Honnibal
e89069dcae
* Fix matcher test
2016-01-19 20:24:01 +01:00
Matthew Honnibal
63e3d4e27f
* Add comment on Vocab.__reduce__
2016-01-19 20:11:25 +01:00
Matthew Honnibal
e1282b7f2f
* Require user-custom NER classes to work without adding the label.
2016-01-19 20:11:03 +01:00
Matthew Honnibal
84c5dfbfc3
* Clean up debugging python list
2016-01-19 20:10:32 +01:00
Matthew Honnibal
04d0686b26
* Make TransitionSystem.add_action idempotent, i.e. ignore duplicate added actions.
2016-01-19 20:10:04 +01:00
Matthew Honnibal
c4a89d56bd
* Automatically register any entity types pre-set on the tokens, so that the NER works with user-given entity types.
2016-01-19 20:09:26 +01:00
Matthew Honnibal
f0f92793f6
* Add test for user NER classes in matcher blocking the NER model. Re Issue #178 and Issue #217
2016-01-19 19:23:16 +01:00
Matthew Honnibal
65c5bc4988
* Add add_label method, to allow users to register new entity types and dependency labels.
2016-01-19 19:11:02 +01:00
Matthew Honnibal
151aa0b0e2
* Allow users to add_label, in order to extend the entity recogniser to new classes. Does not by itself add a class to the model
2016-01-19 19:09:33 +01:00
Matthew Honnibal
c8e0011ebc
* Add iterators to the NER and parser transition systems, to get the action types
2016-01-19 19:07:43 +01:00
Matthew Honnibal
515493c675
* Add xfail test for Issue #225 : tokenization with non-whitespace delimiters
2016-01-19 13:20:14 +01:00
Matthew Honnibal
7abe653223
* Fix imports
2016-01-19 03:36:51 +01:00
Matthew Honnibal
590f38bdb2
* Add hacky solution to Issue #220 . Currently specials.json only supports literal patterns, which doesn't allow us to pre-tag whitespace with the correct token, SP, as a rule. The data-driven approach should be easy but for some reason fails here. Adding a hard code in Morphology isn't a good solution, but we do want to fix the behaviour right away, and don't want to wait for an architecturally better solution.
2016-01-19 03:35:20 +01:00
Matthew Honnibal
445164d5b4
* Restore the LOCAL_DATA_DIR global in spacy/en/__init__.py, although this is now deprecated
2016-01-19 02:54:56 +01:00
Matthew Honnibal
04177debd0
* Unwind limit to sentence boundary detection that prevents it from inserting boundaries on whitespace. Replace it with a check for whitespace in StateClass.fast_forward, so that whitespace is LeftArced when it's on the stack. This should prevent the previous problem of whitespace-only sentences. Should fix Issue #184 , but may cause further problems. Needs testing.
2016-01-19 02:54:15 +01:00
Matthew Honnibal
7893de3203
* Add test for Issue #184 : Whitespace at sentence boundary causes sentence boundary error.
2016-01-18 23:04:38 +01:00
Matthew Honnibal
bba0a5e078
* Handle string paths in default_vocab, default_parser, default_entity in Language class
2016-01-18 22:37:24 +01:00
Matthew Honnibal
e825fd9554
* Make some of the website tests work without models
2016-01-18 18:14:44 +01:00
Matthew Honnibal
334c4b2b57
* Disprefer punctuation and spaces as heads of spans
2016-01-18 18:14:09 +01:00
Matthew Honnibal
bed36ab0ff
* Fix import of HEAD attribute
2016-01-18 17:34:43 +01:00
Matthew Honnibal
28c659c1fe
* Fix import for numpy
2016-01-18 17:25:04 +01:00
Matthew Honnibal
fc36bcf458
* Fix import for English
2016-01-18 17:14:40 +01:00
Matthew Honnibal
cc4c335e14
* Set heads for test_merge_tokens, to make the test run without models
2016-01-18 17:00:11 +01:00
Matthew Honnibal
c107da9738
* Bug fix to _count_words_to_root
2016-01-18 16:59:38 +01:00
Matthew Honnibal
f24833d607
* Fix merge for coordinations
2016-01-18 16:03:19 +01:00
Matthew Honnibal
14534958a9
* Fix bug in Span.root
2016-01-18 15:40:28 +01:00
Matthew Honnibal
714cbc03d5
* Add test for Issue #203 : nested noun chunks.
2016-01-16 18:02:30 +01:00
Matthew Honnibal
4e2253170c
* Move test for doc.merge to tokens_api file, to avoid name conflicts which upset pytest
2016-01-16 18:01:36 +01:00
Matthew Honnibal
34a157511f
* Move test_merge_hang to test_tokens_api
2016-01-16 18:00:26 +01:00
Matthew Honnibal
fc8f26584a
* Don't consider NPs connected to parse via conj relation as noun chunks. Change motivated by the nested noun chunks identified in Issue #203 , but might be problematic. Also allow root NPs to be considered noun chunks.
2016-01-16 17:52:40 +01:00
Matthew Honnibal
4a16dbfeca
* Add test for Issue #203 : noun chunks should be flat, but sometimes are nested
2016-01-16 17:41:25 +01:00
Matthew Honnibal
995b2d18fd
* Route token.string via token.txt_with_ws, to deprecate token.string in future
2016-01-16 17:14:34 +01:00
Matthew Honnibal
54a98eaf19
* Fix typo text_wth_ws --> text_with_ws. Reroute .string attribute to text_with_ws, to deprecate .string in future
2016-01-16 17:13:50 +01:00
Matthew Honnibal
3e9961d2c4
* If final token is whitespace, don't mark it as owning a trailing space. Fixes Issue #154
2016-01-16 17:08:59 +01:00
Matthew Honnibal
223d2b3484
* Add test for Issue #154 : Additional whitespace introduced when string ends with a whitespace token.
2016-01-16 17:08:07 +01:00
Matthew Honnibal
3dc398b727
* Fix merge conflict in requirements.txt
2016-01-16 16:20:49 +01:00
Matthew Honnibal
fc5962a77d
* Improve test for root token in Span
2016-01-16 16:19:09 +01:00
Matthew Honnibal
c025a0c64b
* Check for KeyboardInerrupt in parser.__call__
2016-01-16 16:18:44 +01:00
Matthew Honnibal
03e8a4293d
* Add loop guard to Token.lefts and Token.rights properties
2016-01-16 16:18:17 +01:00
Matthew Honnibal
304339985e
* Add a linear scan to Span.root method, to help with long sentences
2016-01-16 16:17:28 +01:00
Matthew Honnibal
aa0dd79f52
* Delete test_token_references, which checked a flakey strategy for preventing orphan tokens from a while ago. Now orphan tokens simply hold a reference to Pool, preventing the memory from being freed underneath them. This means that we don't need to run this slow test.
2016-01-16 16:03:35 +01:00
Matthew Honnibal
8cbcc3a799
* Fix calculation of root token in Span. Now take root to be word with shortest tree path. Avoids parse trees ending up in inconsistent state, as had occurred in Issue #214 .
2016-01-16 15:38:50 +01:00
Matthew Honnibal
c1039fa4b4
* Add test for Issue #214 . Resolved in change to Span.root
2016-01-16 15:37:47 +01:00
Henning Peters
41ea14a56f
fix pickling
2016-01-16 13:23:11 +01:00
Henning Peters
5551052840
fix py2/3 issue
2016-01-16 12:44:53 +01:00
Henning Peters
235f094534
untangle data_path/via
2016-01-16 12:23:45 +01:00
Matthew Honnibal
42a9f29b40
* Add loop guard in Span.root, to raise errors if there is a cycle in the dependency parse, instead of entering an infinite loop. Re Issue #214
2016-01-16 11:53:37 +01:00
Henning Peters
6d1a3af343
cleanup unused
2016-01-16 10:05:04 +01:00
Henning Peters
846fa49b2a
distinct load() and from_package() methods
2016-01-16 10:00:57 +01:00
Henning Peters
211913d689
add about.py, adapt setup.py
2016-01-15 18:57:01 +01:00
Henning Peters
f8a8f97d25
cleanup
2016-01-15 18:13:37 +01:00
Henning Peters
780cb847c9
add default_model to about
2016-01-15 18:07:15 +01:00
Henning Peters
788f734513
refactored data_dir->via, add zip_safe, add spacy.load()
2016-01-15 18:01:02 +01:00
Matthew Honnibal
478a79a3d5
* Add test for Issue #220 : Whitespace being tagged as noun
2016-01-15 16:17:07 +01:00
Henning Peters
d9471f684f
fix typo
2016-01-14 12:14:12 +01:00
Henning Peters
9b75d872b0
fix model download
2016-01-14 12:02:56 +01:00
Henning Peters
bc229790ac
integrate with sputnik
2016-01-13 19:46:17 +01:00
Matthew Honnibal
3fbfba575a
* xfail the contractions test
2015-12-31 13:16:28 +01:00
Matthew Honnibal
3bd910ccad
* Merge therell test
2015-12-31 11:55:18 +01:00
Matthew Honnibal
eaf2ad59f1
* Fix use of mock Package object
2015-12-31 04:13:15 +01:00
Matthew Honnibal
029136a007
* Fix resource loading for Matcher
2015-12-31 02:45:12 +01:00
Matthew Honnibal
55bcdf8bdd
* Fix errors
2015-12-29 22:32:03 +01:00
Matthew Honnibal
a6ba43ecaf
* Fix errors in packaging revision
2015-12-29 18:37:26 +01:00
Matthew Honnibal
4b4eec8b47
* Fix Issue #201 : Tokenization of there'll
2015-12-29 18:09:09 +01:00
Matthew Honnibal
86ee9d046d
* Remove test that belongs to a change for master
2015-12-29 18:07:23 +01:00
Matthew Honnibal
a2dfdec85d
* Clean up spacy.util
2015-12-29 18:06:09 +01:00
Matthew Honnibal
aec130af56
Use util.Package class for io
...
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().
Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.
Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00
Matthew Honnibal
0e2498da00
* Replace from_package with load() classmethod in Vocab
2015-12-29 16:56:51 +01:00
Matthew Honnibal
c5902f2b4b
* Upd Lemmatizer to use MockPackage. Replace from_package with load() classmethod
2015-12-29 16:56:02 +01:00
Matthew Honnibal
4131e45543
* Add MockPackage class, to see whether we can proxy for Sputnik in a lightweight way
2015-12-29 16:55:03 +01:00
Matthew Honnibal
f5dea1406d
* Fix silly mistake in Language.__init__
2015-12-28 18:48:57 +01:00
Matthew Honnibal
187960606f
* Fix pickle problems
2015-12-28 16:54:03 +01:00
Matthew Honnibal
8c7e149ec9
* Replace kwargs argument of Language.__init__ with explicit arguments, to fix pickle bug
2015-12-28 15:56:27 +01:00
Henning Peters
32d655b6e1
bump version
2015-12-28 09:34:39 +01:00
Matthew Honnibal
8b61d45ed0
* Fix merge conflicts for headers branch
2015-12-27 17:46:25 +01:00
Matthew Honnibal
6bb9c7f311
Merge pull request #202 from henningpeters/sputnik
...
access model via sputnik
2015-12-28 03:29:53 +11:00
Henning Peters
0e321a7105
get mingw32 to work
2015-12-22 23:25:38 +01:00
Henning Peters
d8d348bb55
allow to specify version constraint within model name
2015-12-18 19:12:08 +01:00
Henning Peters
7f7299cafb
Merge branch 'tmpdir' into headers
2015-12-18 12:25:25 +01:00
Henning Peters
cfa187aaf0
fix tests
2015-12-18 10:58:02 +01:00
Henning Peters
8359bd4d93
strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible
2015-12-18 09:52:55 +01:00
Henning Peters
970278a3d6
no need to link data dir anymore
2015-12-18 09:49:45 +01:00
Henning Peters
4f3efb8eaf
avoid writing to /tmp (not cross-platform compatible)
2015-12-16 19:56:40 +01:00
Henning Peters
4ada39f472
avoid writing to /tmp (not cross-platform compatible)
2015-12-16 19:53:06 +01:00
Henning Peters
2d4efe40f9
fix sputnik call
2015-12-13 14:46:08 +01:00
Henning Peters
ac318b568c
new approach to dependency headers
2015-12-13 11:49:17 +01:00
Henning Peters
345dda6f53
small fixes, add package build step
2015-12-07 06:50:26 +01:00
Henning Peters
9027cef3bc
access model via sputnik
2015-12-07 06:01:28 +01:00
Henning Peters
73e5650be5
change index server
2015-11-18 18:09:46 +01:00
Henning Peters
50d15ea5d2
fix
2015-11-18 17:35:21 +01:00
Henning Peters
02a1dcec76
add data dir
2015-11-18 11:48:55 +01:00
Henning Peters
919a4f0b04
change data path, add repository
2015-11-18 11:40:46 +01:00
Henning Peters
12de895e60
fix version
2015-11-15 16:38:16 +01:00
Henning Peters
03d2f98cd5
add sputnik
2015-11-15 15:58:21 +01:00
Matthew Honnibal
ec7d36c3a4
* Add test for matcher end-point problem
2015-11-12 05:00:40 +11:00
Matthew Honnibal
d309622a27
* Add test for matcher end-point problem
2015-11-12 04:59:11 +11:00
Matthew Honnibal
56ea20a886
* Add test for matcher end-point problem
2015-11-12 04:58:53 +11:00
Matthew Honnibal
cfa4062147
* Add test for matcher end-point problem
2015-11-12 04:56:07 +11:00
Matthew Honnibal
5623242b3e
* Adjust NER rules, so that U entries in gazetteer don't become B moves to the model
2015-11-12 04:48:23 +11:00
Matthew Honnibal
d67d7d5a86
* Add test for NER inconsistency bug
2015-11-08 16:19:33 +01:00
Matthew Honnibal
44fbdc7260
* Fix bug in NER transition system, that sometimes left no valid moves
2015-11-08 16:19:12 +01:00
Matthew Honnibal
ab5aac5b2f
* Add .rank property to Token and Lexeme, for frequency rank
2015-11-08 16:18:25 +01:00
Matthew Honnibal
fde9a22ec2
* Add new test for ner
2015-11-08 13:57:15 +01:00
Matthew Honnibal
e92371bb54
* Fix rule that made Last action invalid if there was a preset of O, since if the entity is already open, that ship has sailed.
2015-11-08 22:17:51 +11:00
Matthew Honnibal
3b74739c3e
* Download updated data
2015-11-08 21:24:25 +11:00
Matthew Honnibal
31da42eb27
* Mark tests that require models
2015-11-07 19:27:38 +11:00
Matthew Honnibal
8e26a28616
* Mark tests that require models
2015-11-07 19:10:56 +11:00
Matthew Honnibal
15eab7354f
* Remove extraneous test files
2015-11-07 18:45:13 +11:00
Matthew Honnibal
6f47074214
* Make constructor of ParserModel and TaggerModel the same as AveragedPerceptron, for each pickling.
2015-11-07 18:25:17 +11:00
Matthew Honnibal
1cfa20fb17
* Fix sentence-final whitespace issue
2015-11-07 17:34:46 +11:00
Matthew Honnibal
7663970d5f
* Removed unused i variable from Span, and set attributes to read-only
2015-11-07 17:06:15 +11:00
Matthew Honnibal
4b3c96d76d
* Fix zero-length spans
2015-11-07 17:05:16 +11:00
Matthew Honnibal
888c05a7fa
* Fix variable naming in StepwiseState, for thinc 4.0
2015-11-07 11:02:44 +11:00
Matthew Honnibal
fc2185bfe3
* Fix variable naming in StepwiseState, for thinc 4.0
2015-11-07 10:48:31 +11:00
Matthew Honnibal
954442a807
* Fix variable naming in StepwiseState, for thinc 4.0
2015-11-07 10:30:45 +11:00
Matthew Honnibal
06f26d258e
* Fix test_basic_create
2015-11-07 10:04:37 +11:00
Matthew Honnibal
1d3884c46d
* Fix test_basic_create
2015-11-07 10:03:56 +11:00
Matthew Honnibal
cc8febcbe1
* Fix Span comparison
2015-11-07 09:54:14 +11:00
Matthew Honnibal
af70dc166a
* Fix Last restriction, that was supposed to prevent conflicts with presets, but was incorrect.
2015-11-07 09:52:00 +11:00
Matthew Honnibal
a9b612abdf
* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient
2015-11-07 09:01:12 +11:00
Matthew Honnibal
56499d89ef
* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient
2015-11-07 08:55:34 +11:00
Andreas Grivas
83ca4e0b93
* use old merge tests - add more
2015-11-07 07:57:04 +11:00
Andreas Grivas
4be7fda453
* span start, end -> properties. autoupdate after merge
2015-11-07 07:57:04 +11:00
Andreas Grivas
562db6d2d0
* merge add lex last - add index finder funcs
2015-11-07 07:57:04 +11:00
Matthew Honnibal
a06e3c8963
* Fix bone-headed mistake in StateClass.E
2015-11-07 07:35:28 +11:00
Matthew Honnibal
d24b8509e4
* Correct screw ups from the previous commits
2015-11-07 06:51:41 +11:00
Matthew Honnibal
5efad178b5
* Set ent tag when close entity
2015-11-07 06:09:25 +11:00
Matthew Honnibal
9285f01d26
* Fix broken StateClass.E tracking
2015-11-07 06:06:39 +11:00
Matthew Honnibal
19136b0e7d
* Add better debug message for illegal move
2015-11-07 05:34:37 +11:00
Matthew Honnibal
2733816b7b
* Fix whitespace
2015-11-07 05:31:06 +11:00
Matthew Honnibal
01ab464383
* Prevent Begin and In moves from applying in NER if we're at the last token of a sentence, as this would mean the entity would span over a sentence boundary. Re Issue #169
2015-11-07 05:30:44 +11:00
Matthew Honnibal
b65633f270
* Fix function that returns nth entity in StateClass. Was only returning the first.
2015-11-07 05:29:11 +11:00
Matthew Honnibal
410b6f9ec1
* Remove deprecated _ml.pyx. We now use the nicer APIs provided by thinc 4.0, and subclass the AveragedPerceptron class.
2015-11-07 05:13:10 +11:00
Matthew Honnibal
3c162dcac3
* Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc.
2015-11-07 03:24:30 +11:00
Matthew Honnibal
9d1b2a103a
* Fix capitalization in lemmatizer
2015-11-06 05:44:35 +11:00
Matthew Honnibal
6ed3aedf79
* Merge vocab changes
2015-11-06 00:48:08 +11:00
Matthew Honnibal
72abbb43fb
* Add type declarations in strings.pyx
2015-11-06 00:47:26 +11:00
Matthew Honnibal
5b2af4864f
* When lemmatizing non-noun, non-verb, non-adj words, output lower-case
2015-11-06 00:45:09 +11:00
Matthew Honnibal
754bf04162
* Remove declaration of Model.update
2015-11-06 00:31:15 +11:00
Matthew Honnibal
e18bdff23a
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2015-11-06 00:26:15 +11:00
Matthew Honnibal
b9991fbd20
* Update to use thinc 3.0
2015-11-06 00:25:59 +11:00
Matthew Honnibal
864a8f45d8
* Use unicode in StringStore.intern, instead of unreliably casting to bytes.
2015-11-05 11:32:19 +00:00
Matthew Honnibal
b18204cd52
* Fix StringStore._realloc, re Issue #155
2015-11-05 11:28:26 +00:00
Matthew Honnibal
f8004c5f65
* Begin upgrading to improved thinc API
2015-11-05 03:53:03 +11:00
Matthew Honnibal
adc7bbd6cf
* Fix name of like_num in default_lex_attrs
2015-11-04 22:02:47 +11:00
Matthew Honnibal
e96faf29e7
* Rename like_number to like_num, to fix inconsistency re Issue #166
2015-11-04 22:01:44 +11:00
Matthew Honnibal
65934b7cd4
* Enforce import of ujson in strings.pyx, because otherwise it's too slow
2015-11-04 00:32:02 +11:00
Matthew Honnibal
1ce5d5602d
* Rename Doc.data to Doc.c
2015-11-04 00:17:13 +11:00
Matthew Honnibal
68f479e821
* Rename Doc.data to Doc.c
2015-11-04 00:15:14 +11:00
Matthew Honnibal
3ddea19b2b
* Rename spans.pyx to span.pyx
2015-11-04 00:14:40 +11:00
Matthew Honnibal
9482d616bc
* Rename spans.pyx to span.pyx
2015-11-03 23:51:05 +11:00
Matthew Honnibal
116da5990a
* Clean up setting of tag in doc.from_bytes
2015-11-03 23:48:57 +11:00
Matthew Honnibal
9ec7b9c454
* Clean up unused Constituent struct.
2015-11-03 23:48:21 +11:00
Matthew Honnibal
1e99fcd413
* Rename .repvec to .vector in C API
2015-11-03 23:47:59 +11:00
Matthew Honnibal
ee3f9ba581
* Fix test of serializer
2015-11-03 19:45:16 +11:00
Matthew Honnibal
d06ba26371
* Fix test of serializer
2015-11-03 19:43:27 +11:00
Matthew Honnibal
4083059650
Merge branch 'master' of https://github.com/honnibal/spaCy
2015-11-03 09:07:19 +01:00
Matthew Honnibal
9e37437ba8
* Fix assign_tag in doc.merge
2015-11-03 19:07:02 +11:00
Matthew Honnibal
dde9e1357c
* Add todo to morphology.lemmatize
2015-11-03 18:54:35 +11:00
Matthew Honnibal
ffedff9e6c
* Remove the archive after download, to save disk space
2015-11-03 18:54:05 +11:00
Matthew Honnibal
85372468e3
* Fix serialize test
2015-11-03 08:51:33 +01:00
Matthew Honnibal
833eb35c57
* Fix tag assignment in doc.from_array
2015-11-03 18:45:54 +11:00
Matthew Honnibal
09664177d7
* Fix tag handling in doc.merge, and assign sent_start when setting heads.
2015-11-03 18:15:52 +11:00
Matthew Honnibal
389a373807
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2015-11-03 18:07:25 +11:00
Matthew Honnibal
3f44b3e43f
* Mark serializer test as requiring models
2015-11-03 18:07:08 +11:00
Matthew Honnibal
25ed7be8f8
Merge branch 'master' of https://github.com/honnibal/spaCy
2015-11-03 07:58:17 +01:00
Matthew Honnibal
604ceac4c6
* Fix morphological assignment in doc.merge()
2015-11-03 17:57:51 +11:00
Matthew Honnibal
5e040855a5
* Ensure morphological features and lemmas are loaded in from_array, re Issue #152
2015-11-03 17:56:50 +11:00
Matthew Honnibal
5668feb235
* Fix pickle test for python3
2015-11-03 04:57:02 +01:00
Matthew Honnibal
6161d2529a
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2015-11-03 13:36:30 +11:00
Matthew Honnibal
5887506f5d
* Don't expect lexemes.bin in Vocab
2015-11-03 13:23:39 +11:00
Matthew Honnibal
f7dd377575
* Adjust conjuncts iterator in Token
2015-11-03 13:23:22 +11:00
Andreas Grivas
d418f00eb1
fixed error when printing unicode
2015-11-02 20:23:18 +02:00
Matthew Honnibal
52fc338001
* Set is_parsed and is_tagged attrs when loading annotations into Doc, re Issue #152
2015-10-28 10:43:22 +11:00
Matthew Honnibal
1c0356e4c2
* Set test file mode to w+t
2015-10-26 22:40:48 +11:00
Matthew Honnibal
0fe98f358b
* Fix mode on text file for Python3 in strings test
2015-10-26 22:25:16 +11:00
Matthew Honnibal
8ba9cf905e
* Fix mode on text file for Python3 in strings test
2015-10-26 21:44:34 +11:00
Matthew Honnibal
a0730699b1
* Fix mode on text file for Python3 in strings test
2015-10-26 21:25:56 +11:00
Matthew Honnibal
725344d349
* Fix tempfile in test
2015-10-26 21:08:18 +11:00
Matthew Honnibal
f11030aadc
* Remove out-dated TODO comment
2015-10-26 12:33:38 +11:00
Matthew Honnibal
a371a1071d
* Save and load word vectors during pickling, re Issue #125
2015-10-26 12:33:04 +11:00
Matthew Honnibal
a824a98312
* Add tests for pickling vectors, re: Issue #125
2015-10-26 12:31:05 +11:00
Matthew Honnibal
314090cc78
* Set vectors length when unpickling vocab, re Issue #125
2015-10-26 12:05:08 +11:00
Matthew Honnibal
4e16f9e435
* Move tests underneath spacy/
2015-10-26 00:07:31 +11:00
Matthew Honnibal
3a6e48e814
Merge pull request #149 from chrisdubois/pickle-patch
...
Add __reduce__ to Tokenizer so that English pickles.
2015-10-25 15:30:31 +11:00
Chris DuBois
dac8fe7bdb
Add __reduce__ to Tokenizer so that English pickles.
...
- Add tests to test_pickle and test_tokenizer that save to tempfiles.
2015-10-23 22:24:03 -07:00
Matthew Honnibal
ff4fe524ee
* Fix exception for python 2
2015-10-23 01:56:13 +02:00
Matthew Honnibal
341a3e85cd
* Upd downloaded data version
2015-10-23 00:56:57 +02:00
Matthew Honnibal
f18fd8c659
* Fix language.py for change in StringStore load API
2015-10-23 03:48:12 +11:00
Matthew Honnibal
23855db3ca
Merge branch 'master' of ssh://github.com/honnibal/spaCy into develop
2015-10-23 03:46:09 +11:00
Matthew Honnibal
4f13849065
Merge pull request #145 from henningpeters/master
...
better error reporting, cleanup
2015-10-23 03:45:47 +11:00
Matthew Honnibal
3be94be0c0
Merge pull request #148 from maxirmx/master
...
Utf8 encoding for lemma_rules.json
2015-10-22 21:46:28 +11:00
Matthew Honnibal
c86bda8d1a
* Fix import of uget
2015-10-22 21:13:56 +11:00
Matthew Honnibal
2348a08481
* Load/dump strings with a json file, instead of the hacky strings file we were using.
2015-10-22 21:13:03 +11:00
Matthew Honnibal
9baf0abd59
* Save vocab after training.
2015-10-22 21:09:14 +11:00
maxirmx
f07e4accd7
Fixing encoding issue #4
2015-10-21 20:45:56 +03:00
maxirmx
fcbfff043f
Fixing encoding issue #3
2015-10-21 15:52:34 +03:00
maxirmx
fe9d2e2c4e
Fixing encode issue #2
2015-10-21 15:36:21 +03:00
maxirmx
e4a1726f77
Fixing encoding issue
...
UTF-8
2015-10-21 14:16:37 +03:00
Andreas Grivas
93ada458e2
added __repr__ that prints text in ipython for doc, token, and span objects
2015-10-21 14:11:46 +03:00
Henning Peters
ccffd2ef53
fixed extract directory
2015-10-21 07:59:34 +02:00
Henning Peters
da4c9cee06
assert filename match
2015-10-20 19:33:59 +02:00
Henning Peters
4f703f0cb4
better error reporting, cleanup
2015-10-20 19:11:29 +02:00
Matthew Honnibal
9cdea6e450
* Import uget correctly
2015-10-19 08:32:41 +02:00
Matthew Honnibal
6727a46bb5
* Fix Issue #118 : Matcher behaves unpredictably when matches overlap.
2015-10-19 16:45:32 +11:00
Matthew Honnibal
135062d23c
* Fix error with merged text when merged region did not have trailing whitespace
2015-10-19 15:47:04 +11:00
Henning Peters
bfde91fa49
add custom download tool (uget), replace wget with uget
2015-10-18 12:35:04 +02:00
Matthew Honnibal
9839cd2c0b
* Fix whitespace_ calculation in Token
2015-10-18 17:21:11 +11:00
Matthew Honnibal
c99285b8b9
* Clean up C++ usage in spacy/matcher.pyx
2015-10-18 17:20:50 +11:00
Matthew Honnibal
a7e6c5ac8f
* Fix Issue #122 : Incorrect calculation of children after Doc.merge()
2015-10-18 17:17:27 +11:00
Matthew Honnibal
3ba66f2dc7
* Add string length cap in Tokenizer.__call__
2015-10-16 04:54:16 +11:00
Matthew Honnibal
6e0f985afc
* Fix token.conjuncts
2015-10-15 03:49:45 +11:00
Matthew Honnibal
2e0104ac81
* Fix token.conjuncts
2015-10-15 03:47:45 +11:00
Matthew Honnibal
b8f3345a82
* Fix token.conjuncts method
2015-10-15 03:36:01 +11:00
Matthew Honnibal
23818f89b8
* Fix token.conjuncts method
2015-10-15 03:34:57 +11:00
Matthew Honnibal
7a15d1b60c
* Add Python 2/3 compatibility fix for copy_reg
2015-10-13 20:04:40 +11:00
Matthew Honnibal
329ae57520
* Fix whitespace attachment thing
2015-10-13 09:46:38 +02:00
Matthew Honnibal
37919eac82
* Fix whitespace attachment in simpler way. Leaves problem with setting left/right children.
2015-10-13 18:23:24 +11:00
Matthew Honnibal
c70eb776ae
* Fix whitespace attachment, so that left/right children are consistent with head.
2015-10-13 15:58:22 +11:00
Matthew Honnibal
531182f937
* Fix Model.__reduce__
2015-10-13 15:14:38 +11:00
Matthew Honnibal
6c227a6c1f
* Fix Model.__reduce__
2015-10-13 15:10:04 +11:00
Matthew Honnibal
358c82595c
* Fix NAMES list in spacy/parts_of_speech.pyx
2015-10-13 14:18:45 +11:00
Matthew Honnibal
c1fdc487bc
Merge branch 'attrs'
2015-10-13 14:03:41 +11:00
Matthew Honnibal
e886e6a406
* Inc version
2015-10-13 13:46:17 +11:00
Matthew Honnibal
20fd36a0f7
* Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125 : allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve.
2015-10-13 13:44:41 +11:00
Matthew Honnibal
f8de403483
* Work on pickling Vocab instances. The current implementation is not correct, but it may serve to see whether this approach is workable. Pickling is necessary to address Issue #125
2015-10-13 13:44:41 +11:00
Matthew Honnibal
85e7944572
* Start trying to pickle Vocab
2015-10-13 13:44:41 +11:00
Matthew Honnibal
5ca57bd859
* Ensure Morphology can be pickled, to address Issue #125 .
2015-10-13 13:44:41 +11:00
Matthew Honnibal
0cee928467
* Allow StringStore to be pickled, to start addressing Issue #125
2015-10-13 13:44:41 +11:00
Matthew Honnibal
41012907a8
* Fix variable name
2015-10-13 13:44:40 +11:00
Matthew Honnibal
e70368d157
* Use lower case strings for dependency label names in symbols enum
2015-10-13 13:44:40 +11:00
Matthew Honnibal
7b4af3d1e7
* Fix parts_of_speech now that symbols list has been reformed
2015-10-13 13:44:40 +11:00
Matthew Honnibal
37b909b6b6
* Use the symbols file in vocab instead of the symbols subfiles like attrs.pxd
2015-10-13 13:44:40 +11:00