Commit Graph

3976 Commits

Author SHA1 Message Date
Matthew Honnibal
6ceb0f0518 Allow Lexeme.rank to be set 2017-08-24 21:43:00 +02:00
Matthew Honnibal
44a1fa80d3 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-23 13:02:16 +02:00
ines
bb1abbeba5 Only link model if download was successfull 2017-08-23 12:36:31 +02:00
Matthew Honnibal
bb2541ffd3 Fix PROB attr for OOV words 2017-08-23 12:11:52 +02:00
Matthew Honnibal
1c5c256e58 Fix fine_tune when optimizer is None 2017-08-23 10:51:33 +02:00
Matthew Honnibal
9c580ad28a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-22 17:02:04 -05:00
Matthew Honnibal
a4633fff6f Restore use of batch norm in model 2017-08-22 17:01:58 -05:00
Matthew Honnibal
03b5b9727a Fix Doc.vector for empty doc objects 2017-08-22 19:52:19 +02:00
Matthew Honnibal
0551b7b03a Fix doc.vector 2017-08-22 19:46:52 +02:00
Matthew Honnibal
83f8e98450 Fix retrieval of OOV vectors 2017-08-22 19:46:35 +02:00
Matthew Honnibal
df2745eb08 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-22 19:00:43 +02:00
Matthew Honnibal
5b329acbf2 Fix vectors_length property in vocab 2017-08-22 19:00:27 +02:00
Matthew Honnibal
1fe605dfe5 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-21 19:18:31 -05:00
Matthew Honnibal
18b64e79ec Fix fine tuning 2017-08-21 19:18:26 -05:00
Matthew Honnibal
682346dd66 Restore optimized hidden_depth=0 for parser 2017-08-21 19:18:04 -05:00
Matthew Honnibal
a21d8f3f0b Add predict paths to _ml models 2017-08-21 23:23:45 +02:00
Matthew Honnibal
cec76801dc Add profile command to CLI 2017-08-21 23:23:05 +02:00
Matthew Honnibal
7be5f30f17 Add profile function 2017-08-21 23:22:49 +02:00
ines
a68dc891ea Port over changes from #1281 2017-08-21 23:19:18 +02:00
Matthew Honnibal
5e50a65252 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-21 14:15:46 -05:00
Matthew Honnibal
80acbc5f1f Fix fine-tune weight mixture 2017-08-21 14:15:29 -05:00
ines
d15775c3ad Fix typos and commands in alpha docs 2017-08-21 13:40:11 +02:00
Gyorgy Orosz
b3576bfc86 Added vector leading to model cli 2017-08-20 23:16:12 +02:00
Matthew Honnibal
c10f63bf10 Initialize fine tuning to 0.5 2017-08-20 15:59:48 -05:00
Matthew Honnibal
62878e50db Fix misalignment caued by filtering inputs at wrong point in parser 2017-08-20 15:59:28 -05:00
Matthew Honnibal
78a5f842e9 Fix update when update_shared=False 2017-08-20 15:58:34 -05:00
Matthew Honnibal
7a6edeea68 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-20 12:55:39 -05:00
Matthew Honnibal
f2f9229964 Fix name of update_shared flag 2017-08-20 18:19:06 +02:00
Matthew Honnibal
8a59718fd6 Fix fine-tuning 2017-08-20 18:17:35 +02:00
Matthew Honnibal
80a5146ec2 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-20 11:07:08 -05:00
Matthew Honnibal
84bb543e4d Add gold_preproc flag to cli/train 2017-08-20 11:07:00 -05:00
Matthew Honnibal
3fe0d76e6d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-20 14:50:01 +02:00
Matthew Honnibal
c1d3ff517a Track loss in tagger 2017-08-20 14:42:23 +02:00
Matthew Honnibal
8875590081 Add optimizer in Language.update if sgd=None 2017-08-20 14:42:07 +02:00
Matthew Honnibal
84b7ed49e4 Ensure updates aren't made if no gold available 2017-08-20 14:41:38 +02:00
Ines Montani
c2bbd393af Merge pull request #1276 from oroszgy/model_cli_v2
Ported model cli from v1
2017-08-20 11:52:59 +02:00
Jim Geovedi
f77443ab68 reworked 2017-08-20 13:43:21 +07:00
Jim Geovedi
fbc62a09c7 added {pre,suf,in}fix tests 2017-08-20 13:43:00 +07:00
Jim Geovedi
713d7c0aa0 added indonesian lang test 2017-08-20 12:17:14 +07:00
Jim Geovedi
b7d83f37c8 indonesian abbr. 2017-08-20 12:16:50 +07:00
Jim Geovedi
7193c47f0b direct lookup 2017-08-20 11:57:52 +07:00
Jim Geovedi
fdf802d505 added examples 2017-08-20 11:57:10 +07:00
Jim Geovedi
fa544e6c9a Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-20 11:49:40 +07:00
Matthew Honnibal
42fa84075f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-19 22:42:50 +02:00
Matthew Honnibal
aefef6fd28 Prevent strings from being lost during from_disk and from_bytes 2017-08-19 22:42:17 +02:00
ines
281e7e58b3 Don't escape forward slashes on ujson.dumps 2017-08-19 22:32:16 +02:00
ines
2d126a00ae Fix typo 2017-08-19 22:32:07 +02:00
Matthew Honnibal
41c2218c53 Fix test for vectors 2017-08-19 22:09:12 +02:00
Matthew Honnibal
b8e1603cc4 Fix load fail for missing vectors 2017-08-19 22:07:00 +02:00
Matthew Honnibal
a3c51a0355 Fix creation of pipeline 2017-08-19 21:58:57 +02:00
Gyorgy Orosz
e5344b83a3 Ported model cli from v1 2017-08-19 21:45:23 +02:00
Matthew Honnibal
6a94648373 Fix serialization 2017-08-19 21:27:35 +02:00
Matthew Honnibal
1157294434 Improve vector handling 2017-08-19 20:35:33 +02:00
Matthew Honnibal
ef87562741 Restore vectors test utils 2017-08-19 20:35:16 +02:00
Matthew Honnibal
1391f9da37 Restore vectors tests 2017-08-19 20:34:58 +02:00
Matthew Honnibal
8cfeeb4884 Increment version 2017-08-19 19:52:58 +02:00
Matthew Honnibal
93fb8b64e9 Fix vector loading 2017-08-19 19:52:25 +02:00
Matthew Honnibal
49a615e7d9 Create Vectors object in Vocab 2017-08-19 18:50:16 +02:00
Matthew Honnibal
3d049af563 Improve vectors to/from disk 2017-08-19 18:42:11 +02:00
Matthew Honnibal
d55d6e1cfa Fix comparison of Token from different docs. Closes #1257 2017-08-19 16:39:32 +02:00
Matthew Honnibal
9b6a5df15e Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-19 16:24:57 +02:00
Matthew Honnibal
4fda02c7e6 Add test for new Span.to_array method 2017-08-19 16:24:38 +02:00
Matthew Honnibal
dea229c634 Fix Span.to_array method 2017-08-19 16:24:28 +02:00
Matthew Honnibal
c606b4a42c Add test for Doc.char_span 2017-08-19 16:18:23 +02:00
Matthew Honnibal
8b7ac77c23 Allow span label to be string in Doc.char_span 2017-08-19 16:18:09 +02:00
Matthew Honnibal
7c47e38c12 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-19 09:03:15 -05:00
Matthew Honnibal
ab28f911b4 Fix parser learning rates 2017-08-19 09:02:57 -05:00
ines
1fe5e1a4d1 Add language example sentences (see #1107)
da, de, en, es, fr, he, it, nb, pl, pt, sv
2017-08-19 12:22:29 +02:00
Matthew Honnibal
97aabafb5f Document as_tuples keyword arg of Language.pipe 2017-08-19 12:21:33 +02:00
Matthew Honnibal
80236116a6 Add Doc.char_span method, to get a span by character offset 2017-08-19 12:21:09 +02:00
Matthew Honnibal
482bba1722 Add Span.to_array method 2017-08-19 12:20:45 +02:00
Matthew Honnibal
19c495f451 Fix vectors deserialization 2017-08-19 04:33:03 +02:00
Matthew Honnibal
42d47c1e5c Fix tagger serialization 2017-08-19 04:16:32 +02:00
Matthew Honnibal
2da96a0ec7 Fix beam test 2017-08-19 04:15:46 +02:00
Matthew Honnibal
a7309a217d Update tagger serialization 2017-08-18 23:12:05 +02:00
Matthew Honnibal
bae59bf92f Remove BiLSTM import 2017-08-18 22:46:59 +02:00
Matthew Honnibal
c307a0ffb8 Restore patches from nn-beam-parser to spacy/syntax 2017-08-18 22:38:59 +02:00
Matthew Honnibal
fe90dfc390 Restore changes from nn-beam-parser to spacy/_ml 2017-08-18 22:38:28 +02:00
Matthew Honnibal
de7e8703e3 Restore tests for beam parser 2017-08-18 22:27:42 +02:00
Matthew Honnibal
11c31d285c Restore changes from nn-beam-parser 2017-08-18 22:26:12 +02:00
Matthew Honnibal
ce321b0322 Restore changes from nn-beam-parser to spacy/_ml 2017-08-18 22:24:46 +02:00
Matthew Honnibal
5f81d700ff Restore patches from nn-beam-parser to spacy/syntax 2017-08-18 22:23:03 +02:00
Matthew Honnibal
ec482580b5 Restore changes to pipeline.pyx from nn-beam-parser branch 2017-08-18 22:02:35 +02:00
Matthew Honnibal
931509d96a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-18 21:57:15 +02:00
Matthew Honnibal
ed95009b5c Fix data loading on Python 2 2017-08-18 21:57:06 +02:00
Matthew Honnibal
baf36d0588 Add compat function for importlib.util 2017-08-18 21:56:47 +02:00
Matthew Honnibal
263366729e Don't import BiLSTM 2017-08-18 21:56:31 +02:00
Matthew Honnibal
28162290b3 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-18 14:55:40 -05:00
Matthew Honnibal
85794c1167 Restore state of _ml.py 2017-08-18 14:55:23 -05:00
Matthew Honnibal
d456d2efe1 Fix conflicts in nn_parser 2017-08-18 20:55:58 +02:00
Matthew Honnibal
1cec1efca7 Fix merge conflicts in nn_parser from beam stuff 2017-08-18 20:50:49 +02:00
Matthew Honnibal
69bcacdc09 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-18 20:47:13 +02:00
Matthew Honnibal
2993b54fff Load vectors in vocab 2017-08-18 20:46:56 +02:00
Matthew Honnibal
a1ec41298c Restore CFile loader 2017-08-18 20:46:16 +02:00
Matthew Honnibal
ed4fb991dc Work on vectors loading 2017-08-18 20:45:48 +02:00
Matthew Honnibal
426f84937f Resolve conflicts when merging new beam parsing stuff 2017-08-18 13:38:32 -05:00
Matthew Honnibal
5181e8bedb Fix merge conflict in _ml 2017-08-18 13:35:51 -05:00
Matthew Honnibal
f75420ae79 Unhack beam parsing, moving it under options instead of global flags 2017-08-18 13:31:15 -05:00
Jim Geovedi
7ae45bffcf Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-18 10:14:46 +07:00
Dan O'Huiginn
ebf5a3ce59 Allow loading with python < 3.6
Don't rely on recent python features to load models

Fixes Issue #1271
2017-08-17 15:15:47 +00:00
Matthew Honnibal
0209a06b4e Update beam parser 2017-08-16 18:25:49 -05:00
Matthew Honnibal
4b1e7bd6d8 Improve tensorizer model 2017-08-16 18:25:20 -05:00
Matthew Honnibal
a6d8d7c82e Add is_gold_parse method to transition system 2017-08-16 18:24:09 -05:00
Matthew Honnibal
3533bb61cb Add option of 8 feature parse state 2017-08-16 18:23:27 -05:00
Matthew Honnibal
1cb2f15d65 Clean up unused predict_confidences function 2017-08-16 18:22:26 -05:00
Matthew Honnibal
210f6d5175 Fix efficiency error in batch parse 2017-08-15 03:19:03 -05:00
Matthew Honnibal
23537a011d Tweaks to beam parser 2017-08-15 03:15:28 -05:00
Matthew Honnibal
500e92553d Fix memory error when copying scores in beam 2017-08-15 03:15:04 -05:00
Matthew Honnibal
a8e4064dd8 Fix tensor gradient in parser 2017-08-15 03:14:36 -05:00
Matthew Honnibal
e420e0366c Remove use of hash function in beam parser 2017-08-15 03:13:57 -05:00
Matthew Honnibal
6259490347 Fix mixture weights in fine_tune 2017-08-14 17:55:18 -05:00
Matthew Honnibal
335fa8b05c Fix gradient in fine_tune 2017-08-14 14:55:47 -05:00
Matthew Honnibal
d9f82f6b50 Increment version 2017-08-14 14:55:26 +02:00
ines
a29f132ffd Change python -m spacy to spacy
Reflects latest change to entry point or auto-alias
2017-08-14 13:04:48 +02:00
ines
65bf80302c Increment version 2017-08-14 13:04:30 +02:00
Matthew Honnibal
52c180ecf5 Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit ea8de11ad5, reversing
changes made to 08e443e083.
2017-08-14 13:00:23 +02:00
Matthew Honnibal
dbbfe595a5 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-14 12:09:28 +02:00
Matthew Honnibal
ac6c25f762 Check SGD is not None in update 2017-08-14 12:09:18 +02:00
Matthew Honnibal
0ae045256d Fix beam training 2017-08-13 18:02:05 -05:00
Matthew Honnibal
6a42cc16ff Fix beam parser, improve efficiency of non-beam 2017-08-13 12:37:26 +02:00
Matthew Honnibal
4363b4aa4a Fix redundant tokvecs updates during update 2017-08-13 12:36:55 +02:00
Matthew Honnibal
12de263813 Bug fixes to beam parsing. Learns small sample 2017-08-13 09:33:39 +02:00
Matthew Honnibal
4ae0d5e1e6 Set defaults for convert command 2017-08-13 09:03:38 +02:00
Matthew Honnibal
92ebab6073 Update beam-update tests 2017-08-13 08:56:02 +02:00
Matthew Honnibal
17874fe491 Disable beam parsing 2017-08-12 19:35:40 -05:00
Matthew Honnibal
69f21867b5 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-12 19:25:56 -05:00
Matthew Honnibal
3e30712b62 Improve defaults 2017-08-12 19:24:17 -05:00
Matthew Honnibal
28e930aae0 Fixes for beam parsing. Not working 2017-08-12 19:22:52 -05:00
Matthew Honnibal
c96d769836 Fix beam parse. Not sure if working 2017-08-12 18:21:54 -05:00
Matthew Honnibal
24b45b45c6 Add test for beam update 2017-08-12 17:15:28 -05:00
Matthew Honnibal
4638f4b869 Fix beam update 2017-08-12 17:15:16 -05:00
Matthew Honnibal
d4308d2363 Initialize State offset to 0 2017-08-12 17:14:39 -05:00
Matthew Honnibal
b353e4d843 Work on parser beam training 2017-08-12 14:47:45 -05:00
ines
d4f2baf7dd Add create_meta option to package command
Re-create meta.json in model directory, even if it exists. Especially
useful when updating existing spaCy models or training with Prodigy.
Ensures user won't end up with multiple "en_core_web_sm" models, and
offers easy way to change the model's name and settings without having
to edit the meta.json file.
2017-08-12 21:44:18 +02:00
Matthew Honnibal
4ab0c8c8e9 Try different drop_layer structure in Tok2Vec 2017-08-12 08:56:57 -05:00
Matthew Honnibal
cd5ecedf6a Try drop_layer in parser 2017-08-12 08:56:33 -05:00
Matthew Honnibal
8870d491f1 Remove redundant pickling during training 2017-08-12 08:55:53 -05:00
Matthew Honnibal
680043ebca Improve efficiency of tagger.set_annotations for GPU 2017-08-12 08:54:21 -05:00
Matthew Honnibal
ebe0f7f641 Pass embed size correctly in tagger, and cache embeddings for efficiency 2017-08-12 05:45:20 -05:00
Matthew Honnibal
1a59db1c86 Fix dropout and learn rate in parser 2017-08-12 05:44:39 -05:00
Matthew Honnibal
d01dc3704a Adjust parser model 2017-08-09 20:06:33 -05:00
Matthew Honnibal
f37528ef58 Pass embed size for parser fine-tune. Use SELU 2017-08-09 17:52:53 -05:00
Matthew Honnibal
f93f2bed58 Revert use of layer normalization in Tok2Vec 2017-08-09 17:47:03 -05:00
Matthew Honnibal
20944dd8aa Fix conflict in parser fine-tuning 2017-08-09 16:43:05 -05:00
Matthew Honnibal
ac2de6dced Switch to ReLu layers in Tok2Vec 2017-08-09 16:41:25 -05:00
Matthew Honnibal
bbace204be Gate parser fine-tuning behind feature flag 2017-08-09 16:40:42 -05:00
Matthew Honnibal
a59a1deac4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-09 16:23:19 -05:00
Matthew Honnibal
bcce6f7de0 Fix parser fine tuning 2017-08-09 16:23:12 -05:00
ines
28e2fec23b Fix autolinking failure on fresh model install (resolves #1138)
On fresh install via subprocess, pip.get_installed_distributions()
won't show new model, so is_package check in link command fails.
Solution for now is to get model package path explicitly and pass it to
link command.
2017-08-09 11:52:38 +02:00
Jim Geovedi
c62b49b7cc Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-09 09:17:46 +07:00
Matthew Honnibal
dbdd8afc4b Fix parser fine-tune training 2017-08-08 15:46:07 -05:00
Matthew Honnibal
88bf1cf87c Update parser for fine tuning 2017-08-08 15:34:17 -05:00
Matthew Honnibal
5d837c3776 Add mix weights on fine_tune 2017-08-07 06:32:59 -05:00
Matthew Honnibal
42bd26f6f3 Give parser its own tok2vec weights 2017-08-06 18:33:46 +02:00
Matthew Honnibal
3ed203de25 Use LayerNorm and SELU in Tok2Vec 2017-08-06 18:33:18 +02:00
Matthew Honnibal
78498a072d Return Transition for missing actions in lookup_action 2017-08-06 14:16:36 +02:00
Matthew Honnibal
4a5cc89138 Fix tagger 'fine_tune', to keep private CNN weights 2017-08-06 14:15:48 +02:00
Matthew Honnibal
3cb8f06881 Fix NeuralLabeller 2017-08-06 14:15:14 +02:00
Matthew Honnibal
0acce0521b Fix Language.update for pipeline 2017-08-06 14:13:03 +02:00
Matthew Honnibal
bfffdeabb2 Fix parser batch-size bug introduced during cleanup 2017-08-06 14:10:48 +02:00
Matthew Honnibal
0eec7c9e9b Fix Language.evaluate 2017-08-06 02:18:31 +02:00
Matthew Honnibal
0a566dc320 Add update_tensors flag to Language.update. Experimental, re #1182 2017-08-06 02:18:12 +02:00
Matthew Honnibal
cc19ea0e7c Add update_tensors flag to Language.update. Experimental, re #1182 2017-08-06 02:17:10 +02:00
Matthew Honnibal
4cfb7a54e7 Fix tagger 2017-08-06 01:53:31 +02:00
Matthew Honnibal
e9ab800e15 Fix tagging model 2017-08-06 01:50:08 +02:00
Matthew Honnibal
468c138ab3 WIP: Add fine-tuning logic to tagger model, re #1182 2017-08-06 01:13:23 +02:00
Matthew Honnibal
7f876a7a82 Clean up some unused code in parser 2017-08-06 00:00:21 +02:00
Matthew Honnibal
ae1ad81069 Increment version 2017-08-05 18:09:32 +02:00
Jim Geovedi
cc4772cac2 reworks 2017-08-03 13:08:38 +07:00
Jim Geovedi
37f19f5ed2 added more currencies based on corpus data 2017-08-03 13:03:25 +07:00
Jim Geovedi
30fd068d42 hashtag prefix should be handled somewhere else 2017-08-03 13:03:02 +07:00
Jim Geovedi
4705ae19ba Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-03 12:40:19 +07:00
Jim Geovedi
ba07e23c87 added USD in currency rules 2017-08-02 22:42:47 +07:00
Matthew Honnibal
5c323daa1a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-01 22:10:37 +02:00
Matthew Honnibal
2e00361522 Fix update when 0 docs 2017-08-01 22:10:17 +02:00
Matthew Honnibal
8fce187de4 Fix ArcEager for missing values 2017-08-01 22:10:05 +02:00
ines
78e262140f Add workaround for displaCy server on Python 2/3 (resolves #1227)
Make sure status and headers are bytes on Python 2 and strings on
Python 3
2017-08-01 01:11:35 +02:00
Jim Geovedi
2572a9ddf0 Merge remote-tracking branch 'upstream/develop' into indonesian 2017-07-30 21:24:16 +07:00
Jim Geovedi
bb08d696f9 added hashtag rule and fixed currency rules 2017-07-30 21:23:28 +07:00
Jim Geovedi
e9af79a803 added u-\d+ rules (sports team) 2017-07-30 21:23:01 +07:00
Matthew Honnibal
27abc56e98 Add method to get beam entities 2017-07-29 21:59:02 +02:00
Matthew Honnibal
ec63f4fe7b Add option to control how missing entities are handled when getting NER tags 2017-07-29 21:58:37 +02:00
Jim Geovedi
e5adc26c72 simplified rules 2017-07-29 18:21:32 +07:00
Jim Geovedi
783f7d8b86 added test set for Indonesian language 2017-07-29 18:21:07 +07:00
Jim Geovedi
4d04898dea updated regexp 2017-07-29 17:44:57 +07:00
Jim Geovedi
7d96d477ea updated like_num 2017-07-29 17:44:46 +07:00
Jim Geovedi
3cca4ed798 added lex attrs rules 2017-07-29 17:22:21 +07:00
Jim Geovedi
8b814c63f1 more exceptions 2017-07-27 19:46:30 +07:00
Jim Geovedi
6c725e8dcf updated lemma 2017-07-27 19:46:21 +07:00
Jim Geovedi
c194f7ae26 Merge remote-tracking branch 'upstream/develop' into indonesian 2017-07-27 10:55:34 +07:00
Jim Geovedi
547973b92a wip syntax iterators 2017-07-27 10:51:34 +07:00
Jim Geovedi
bbc75da38d enable syntax iterator and lemma lookup 2017-07-27 10:51:15 +07:00
Jim Geovedi
24a8c8bf28 added wip lemma dict 2017-07-26 21:39:54 +07:00
Jim Geovedi
63f14ba46b added hyphen-suffix rules 2017-07-26 19:28:57 +07:00
Jim Geovedi
f288964441 removed -el from suffix rules 2017-07-26 19:28:38 +07:00
Jim Geovedi
6eee7a7411 updated tokenizer exceptions 2017-07-26 19:13:47 +07:00
Jim Geovedi
edec51b1b1 update punctuation rules 2017-07-26 19:13:36 +07:00
Jim Geovedi
62443d495a enable token match 2017-07-26 19:13:14 +07:00
Jim Geovedi
c97f5ae0bb updated tokenizer exceptions 2017-07-26 19:12:52 +07:00
Matthew Honnibal
aff325b7e0 Increment version 2017-07-25 19:41:20 +02:00
Matthew Honnibal
6780132821 Fix tagger loading 2017-07-25 19:41:11 +02:00
Matthew Honnibal
fd20a4af55 Increment version 2017-07-25 18:58:34 +02:00
Matthew Honnibal
523b0df2c9 Update text classification model 2017-07-25 18:57:59 +02:00
Matthew Honnibal
7c7fac9337 Add spacy.blank() loading function 2017-07-25 18:56:37 +02:00
Jim Geovedi
73f6ac9d9b added hyhen 2017-07-24 15:56:31 +07:00
Jim Geovedi
68454c40bf added missing import 2017-07-24 14:12:34 +07:00
Jim Geovedi
eaf9cbd708 cursed of copy & paste 2017-07-24 14:11:51 +07:00
Jim Geovedi
7aad6718bc enable tokenizer exceptions 2017-07-24 14:11:10 +07:00
Jim Geovedi
ad56c9179a added tokenizer exceptions list 2017-07-24 14:10:16 +07:00
Jim Geovedi
c1f3fe99fe updated punctuation rules 2017-07-24 13:57:21 +07:00
Jim Geovedi
37fa2c8c80 punctution rules 2017-07-24 06:17:18 +07:00
Jim Geovedi
082e94ac1c added inflix rules 2017-07-24 06:17:07 +07:00
Jim Geovedi
d0ec484725 reverted 2017-07-24 06:16:29 +07:00
Jim Geovedi
0e590c711f added prefix & suffix rules 2017-07-23 23:46:40 +07:00
Jim Geovedi
ba922e30e8 added ampere hour unit 2017-07-23 23:46:18 +07:00
Jim Geovedi
3b17eba27b added frequency units 2017-07-23 23:10:52 +07:00
Jim Geovedi
d5fd32a572 added known currencies 2017-07-23 22:56:48 +07:00
Jim Geovedi
f6f15678fb added lex_attrs 2017-07-23 22:55:22 +07:00
Jim Geovedi
bed8162d00 added tokenizer_exceptions 2017-07-23 22:55:05 +07:00
Jim Geovedi
b80c35bc9a added norm_exceptions 2017-07-23 22:54:49 +07:00
Jim Geovedi
b5de329ea3 added norm_exceptions 2017-07-23 22:54:19 +07:00
Jim Geovedi
082e9ade46 fixed typo 2017-07-23 21:30:34 +07:00
Jim Geovedi
e2efeb186e added stopwords 2017-07-23 20:52:37 +07:00
Jim Geovedi
da98676839 use template 2017-07-23 20:51:31 +07:00
Jim Geovedi
c2b4dd7809 start working on Indonesian language 2017-07-23 20:50:56 +07:00
Matthew Honnibal
5771bd1ff8 Increment version 2017-07-23 14:18:38 +02:00
Matthew Honnibal
c4a81a47a4 Fix deserialization 2017-07-23 14:11:07 +02:00
Matthew Honnibal
2df563ad24 Remove optimization for textcat that caused loading problem 2017-07-23 14:10:51 +02:00
Matthew Honnibal
4fe77bced2 Add cfg attr to pipeline components 2017-07-23 00:52:47 +02:00
Matthew Honnibal
d8aa721664 Compute Language.meta with a property 2017-07-23 00:50:18 +02:00
Matthew Honnibal
a88a7deffe Five save/load of textcat config 2017-07-23 00:33:43 +02:00
Matthew Honnibal
9bae0ddc50 Fix minibatching 2017-07-22 20:14:49 +02:00
Matthew Honnibal
ded0df5e2f Expose hyper-param as keyword arg 2017-07-22 20:14:37 +02:00
Matthew Honnibal
f5de8deeec Increment version 2017-07-22 20:04:53 +02:00
Matthew Honnibal
b55714d5d1 Make gold_tuples arg optional in begin_training 2017-07-22 20:04:43 +02:00
Matthew Honnibal
ed6c85fa3c Fix loading of text categories in GoldParse 2017-07-22 20:04:03 +02:00
Matthew Honnibal
6ffec9dfea Update _ml, for textcat model 2017-07-22 20:03:40 +02:00
Matthew Honnibal
d6a5c2c85a Add test for NER 2017-07-22 01:48:58 +02:00
Matthew Honnibal
28244df4da Add test for beam parsing 2017-07-22 01:48:35 +02:00
Matthew Honnibal
c86445bdfd Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-07-22 01:14:28 +02:00
Matthew Honnibal
b3a749610e Fix name of TextCategorizer 2017-07-22 01:14:07 +02:00
Matthew Honnibal
2424493970 Remove unnecessary import of Mock 2017-07-22 01:13:54 +02:00
Matthew Honnibal
baa3d81c35 Add text categorizer to Language 2017-07-22 01:13:36 +02:00
Matthew Honnibal
a6a2159969 Add slot for text categories to Doc 2017-07-22 00:34:15 +02:00
Matthew Honnibal
374ab3ecfb Increment alpha version 2017-07-22 00:32:49 +02:00
Matthew Honnibal
289f23df51 Test beam parsing 2017-07-20 15:03:10 +02:00
Matthew Honnibal
3da1063b36 Add beam decoding to parser, to allow NER uncertainties 2017-07-20 15:02:55 +02:00
Matthew Honnibal
0ca5832427 Improve negative example handling in NER oracle 2017-07-20 00:18:49 +02:00
Matthew Honnibal
a231b56d40 Add text-classification hook to pipeline 2017-07-20 00:18:15 +02:00
Matthew Honnibal
7ea50182a5 Add support for text-classification labels to GoldParse 2017-07-20 00:17:47 +02:00
Matthew Honnibal
727481377e Add text-classifer thinc models 2017-07-20 00:17:17 +02:00
Matthew Honnibal
f014138c11 Fix parser tests 2017-07-20 00:16:52 +02:00
Ines Montani
c91642efd5 Port over changes from #1168 2017-07-01 11:43:54 +02:00
Jim Regan
d81ceb0cd5 Merge branch 'develop' into polish 2017-06-26 22:42:27 +01:00
Jim O'Regan
2f84c73585 a start 2017-06-26 22:40:04 +01:00
Jim O'Regan
28d7f0a672 reference 2017-06-26 22:38:28 +01:00
Matthew Honnibal
91e52543ef Merge pull request #1118 from Gregory-Howard/patch-2
Update _tokenizer_exceptions_list (adding cities)
2017-06-20 11:16:07 +02:00
Matthew Honnibal
8ea785e01a Merge pull request #1119 from oroszgy/patch-3
Fixed conllu converter
2017-06-20 11:14:41 +02:00
Tpt
7745b3ae04 Adds noun chunks to French syntax iterators 2017-06-12 15:29:58 +02:00
Tpt
57e8254f63 Adds function to extract french noun chunks 2017-06-12 15:20:49 +02:00
György Orosz
62dbf9025c Fixed conllu converter 2017-06-09 22:53:56 +02:00
Grégory Howard
cd974b32b7 Update _tokenizer_exceptions_list (adding cities) 2017-06-09 17:58:18 +02:00
ines
34a2eecb17 Add simple "naughty strings" test (see #1107) 2017-06-06 17:43:51 +02:00
ines
045574a936 Update package name and increment version 2017-06-05 20:41:30 +02:00
Matthew Honnibal
1f5874a927 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-05 20:20:00 +02:00
ines
03db56f48c Detect spaCy version and add package title
Package title allows customised package names (like spacy-nightly)
2017-06-05 20:11:02 +02:00
Matthew Honnibal
c0d90f52f7 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-05 19:20:13 +02:00
ines
cc9c5dc7a3 Fix noun chunks test 2017-06-05 16:39:04 +02:00
Matthew Honnibal
836bfa2d0f Add factory for experimental SimilarityHook component 2017-06-05 15:40:22 +02:00
Matthew Honnibal
d59fa32df1 Add experimental SimilarityHook omponent 2017-06-05 15:40:03 +02:00
Matthew Honnibal
5489b49203 Remove print statement 2017-06-05 13:20:41 +02:00
Matthew Honnibal
fc4204a12a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-05 13:13:23 +02:00
Matthew Honnibal
2479cde446 Support disable keyword in Language.__init__ 2017-06-05 13:13:07 +02:00
ines
ea167e14db Fix model package loading from link 2017-06-05 13:10:49 +02:00
ines
dd6dc4c120 Update spacy.load() helper functions 2017-06-05 13:02:31 +02:00
Matthew Honnibal
b4cdd05466 Add vectors.pyx in setup 2017-06-05 12:45:29 +02:00
Matthew Honnibal
280d419529 Add pickle method for vectors 2017-06-05 12:36:04 +02:00
Matthew Honnibal
30369d580f Start testing Vectors class 2017-06-05 12:32:49 +02:00
Matthew Honnibal
eb7cbb62c2 Flesh out Vectors class 2017-06-05 12:32:08 +02:00
ines
51d7414e94 Make sure sents are a list 2017-06-05 12:30:13 +02:00
Matthew Honnibal
ebb6c49cd5 Make alignment case-insensitive for gold 2017-06-04 20:26:42 -05:00
Matthew Honnibal
fc4dd62e84 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-04 20:19:05 -05:00
Matthew Honnibal
8f8f90b46b Disable labeller if not parsing 2017-06-04 20:18:54 -05:00
Matthew Honnibal
c52fde40f4 Improve train CLI 2017-06-04 20:18:37 -05:00
Matthew Honnibal
a053b1218e Fix item counting during training 2017-06-04 20:18:20 -05:00
Matthew Honnibal
b3b5521625 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-04 20:17:18 -05:00
Matthew Honnibal
9bc4a26213 Add option of data augmentation noise 2017-06-04 20:16:57 -05:00
Matthew Honnibal
7b2ede783d Add SP tag to tag map if missing 2017-06-04 20:16:30 -05:00
ines
a0f4592f0a Update tests 2017-06-05 02:26:13 +02:00
ines
3e105bcd36 Update tests 2017-06-05 02:09:27 +02:00
Matthew Honnibal
516798e9fc Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-05 01:35:21 +02:00
Matthew Honnibal
193bf913c0 Set is_tagged=True after tagging 2017-06-05 01:35:07 +02:00
ines
078232932c Fix tokenizer fixture scope 2017-06-05 01:06:34 +02:00
Matthew Honnibal
58be0e1f6f Update tests 2017-06-04 16:35:06 -05:00
Matthew Honnibal
b78cc318c3 Fix loading of morphology exceptions 2017-06-04 16:34:32 -05:00
Matthew Honnibal
bb98d45a63 Fix tests 2017-06-04 16:00:44 -05:00
Matthew Honnibal
55d0621532 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-04 15:53:25 -05:00
Matthew Honnibal
5b9f116aca Update tests 2017-06-04 15:53:17 -05:00
Matthew Honnibal
2a3bd5ee90 Fix fetching of noun chunk iterator 2017-06-04 15:53:05 -05:00
Matthew Honnibal
3680c51b8f Avoid clobbering preset POS tags 2017-06-04 15:52:42 -05:00
Matthew Honnibal
939e8ed567 Add lookup properties for components in Language 2017-06-04 15:52:09 -05:00
Matthew Honnibal
e28f90b672 Fix syntax iterators 2017-06-04 15:51:50 -05:00
ines
8a29308d0b Remove unused imports 2017-06-04 22:39:29 +02:00
Ines Montani
112c5787eb Merge pull request #1101 from oroszgy/hu_tokenizer_fix
More robust Hungarian tokenizer.
2017-06-04 22:37:51 +02:00
ines
96867a24ae Fix typo 2017-06-04 22:36:40 +02:00
ines
f432bb4b48 Fix fixture scopes 2017-06-04 22:34:31 +02:00
Matthew Honnibal
6d0356e6cc Whitespace 2017-06-04 14:55:24 -05:00
Matthew Honnibal
8a683a4494 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-04 21:53:56 +02:00
Matthew Honnibal
92ae36f84e Improve way noun chunks iterator is looked up 2017-06-04 21:53:39 +02:00
ines
9254a3dd78 Import and add Spanish syntax iterators 2017-06-04 21:42:15 +02:00
ines
7db1a0e83e Make sure printed values are always strings 2017-06-04 21:27:20 +02:00
Matthew Honnibal
51e1541ddb Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-04 14:26:29 -05:00
Matthew Honnibal
add9a33782 Return False for vocab.has_vector 2017-06-04 14:26:14 -05:00
Matthew Honnibal
675f448313 Fix vector linkage on Doc 2017-06-04 14:25:30 -05:00
Matthew Honnibal
f4662e9218 Fix vector linkage for token 2017-06-04 14:19:58 -05:00
ines
070e026ed9 Ensure path on read_json 2017-06-04 20:44:37 +02:00
ines
e1e73936b1 Raise correct error 2017-06-04 20:44:27 +02:00
ines
848e47669e Fix typo 2017-06-04 20:44:15 +02:00
ines
c4614c02a2 Fix dev resources URL 2017-06-04 15:45:50 +02:00
ines
a66cf24ee8 xfail tokenizer serialization tests for now
Tests pass locally, but not on Travis – needs more investigation
2017-06-04 13:58:20 +02:00
ines
7b7d46b64e Fix typo and success message 2017-06-04 13:45:50 +02:00
ines
90d117f378 Update version 2017-06-04 13:41:16 +02:00
Matthew Honnibal
7ca215bc26 Resolve lex_attr_getters conflict 2017-06-03 16:12:01 -05:00
Matthew Honnibal
21eef90dbc Support specifying which GPU 2017-06-03 16:10:23 -05:00
Matthew Honnibal
d0e42f9275 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-03 15:30:32 -05:00
Matthew Honnibal
8a17b99b1c Use NORM attribute, not LOWER 2017-06-03 15:30:16 -05:00
ines
4c643d74c5 Add norm exceptions to other Language classes 2017-06-03 22:29:21 +02:00
ines
fa7e576c57 Change order of exception dicts 2017-06-03 21:52:06 +02:00
Matthew Honnibal
3f5c85d8de Reorder setting of lex attrs, to avoid clobbering 2017-06-03 14:47:55 -05:00
Matthew Honnibal
aeb7520133 Make norm use lower-case 2017-06-03 14:47:38 -05:00
Matthew Honnibal
de3954843e Populate norm exceptions with lower-case 2017-06-03 14:47:12 -05:00
Matthew Honnibal
f6955a459c Fix prev commit 2017-06-03 14:38:37 -05:00
Matthew Honnibal
468ca6c760 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-03 14:33:51 -05:00
Matthew Honnibal
c647a0d33e Fix training counter for gold preprocessing 2017-06-03 14:33:39 -05:00
ines
e47eef5e03 Update German tokenizer exceptions and tests 2017-06-03 21:07:44 +02:00
ines
d77c2cc8bb Add tests for English norm exceptions 2017-06-03 20:59:50 +02:00
ines
0d6fa8b241 Add German norm exceptions 2017-06-03 20:54:18 +02:00
ines
5bd311c77e Fix update of norm exceptions 2017-06-03 20:54:09 +02:00
Matthew Honnibal
94e063ae2a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-03 13:31:40 -05:00
Matthew Honnibal
fea1144e6d Set max batch size in evaluate 2017-06-03 13:31:33 -05:00
Matthew Honnibal
805495af27 Fix off-by-one in number of tags 2017-06-03 13:29:23 -05:00
Matthew Honnibal
e62f46d39f Clarify gold.pyx slightly 2017-06-03 13:28:52 -05:00
Matthew Honnibal
43353b5413 Improve train CLI script 2017-06-03 13:28:20 -05:00
ines
746653880c Add English norm exceptions to lex_attrs 2017-06-03 20:27:28 +02:00
ines
095eeeb12f Update English tokenizer exceptions and add norms 2017-06-03 20:27:16 +02:00
ines
e5d426406a Add base norm exceptions 2017-06-03 20:27:05 +02:00
ines
4c2bbc3ccc Add add_lookups util function 2017-06-03 19:44:47 +02:00
ines
05fe6758a7 Set lexeme attributes for tokenizer special cases 2017-06-03 19:44:39 +02:00
ines
3152ee5ca2 Update serialization tests for tokenizer 2017-06-03 17:05:28 +02:00
ines
7c919aeb09 Make sure serializers and deserializers are ordered 2017-06-03 17:05:09 +02:00