Matthew Honnibal
6ceb0f0518
Allow Lexeme.rank to be set
2017-08-24 21:43:00 +02:00
Matthew Honnibal
44a1fa80d3
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-23 13:02:16 +02:00
ines
bb1abbeba5
Only link model if download was successfull
2017-08-23 12:36:31 +02:00
Matthew Honnibal
bb2541ffd3
Fix PROB attr for OOV words
2017-08-23 12:11:52 +02:00
Matthew Honnibal
1c5c256e58
Fix fine_tune when optimizer is None
2017-08-23 10:51:33 +02:00
Matthew Honnibal
9c580ad28a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-22 17:02:04 -05:00
Matthew Honnibal
a4633fff6f
Restore use of batch norm in model
2017-08-22 17:01:58 -05:00
Matthew Honnibal
03b5b9727a
Fix Doc.vector for empty doc objects
2017-08-22 19:52:19 +02:00
Matthew Honnibal
0551b7b03a
Fix doc.vector
2017-08-22 19:46:52 +02:00
Matthew Honnibal
83f8e98450
Fix retrieval of OOV vectors
2017-08-22 19:46:35 +02:00
Matthew Honnibal
df2745eb08
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-22 19:00:43 +02:00
Matthew Honnibal
5b329acbf2
Fix vectors_length property in vocab
2017-08-22 19:00:27 +02:00
Matthew Honnibal
1fe605dfe5
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-21 19:18:31 -05:00
Matthew Honnibal
18b64e79ec
Fix fine tuning
2017-08-21 19:18:26 -05:00
Matthew Honnibal
682346dd66
Restore optimized hidden_depth=0 for parser
2017-08-21 19:18:04 -05:00
Matthew Honnibal
a21d8f3f0b
Add predict paths to _ml models
2017-08-21 23:23:45 +02:00
Matthew Honnibal
cec76801dc
Add profile command to CLI
2017-08-21 23:23:05 +02:00
Matthew Honnibal
7be5f30f17
Add profile function
2017-08-21 23:22:49 +02:00
ines
a68dc891ea
Port over changes from #1281
2017-08-21 23:19:18 +02:00
Matthew Honnibal
5e50a65252
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-21 14:15:46 -05:00
Matthew Honnibal
80acbc5f1f
Fix fine-tune weight mixture
2017-08-21 14:15:29 -05:00
ines
d15775c3ad
Fix typos and commands in alpha docs
2017-08-21 13:40:11 +02:00
Gyorgy Orosz
b3576bfc86
Added vector leading to model cli
2017-08-20 23:16:12 +02:00
Matthew Honnibal
c10f63bf10
Initialize fine tuning to 0.5
2017-08-20 15:59:48 -05:00
Matthew Honnibal
62878e50db
Fix misalignment caued by filtering inputs at wrong point in parser
2017-08-20 15:59:28 -05:00
Matthew Honnibal
78a5f842e9
Fix update when update_shared=False
2017-08-20 15:58:34 -05:00
Matthew Honnibal
7a6edeea68
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-20 12:55:39 -05:00
Matthew Honnibal
f2f9229964
Fix name of update_shared flag
2017-08-20 18:19:06 +02:00
Matthew Honnibal
8a59718fd6
Fix fine-tuning
2017-08-20 18:17:35 +02:00
Matthew Honnibal
80a5146ec2
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-20 11:07:08 -05:00
Matthew Honnibal
84bb543e4d
Add gold_preproc flag to cli/train
2017-08-20 11:07:00 -05:00
Matthew Honnibal
3fe0d76e6d
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-20 14:50:01 +02:00
Matthew Honnibal
c1d3ff517a
Track loss in tagger
2017-08-20 14:42:23 +02:00
Matthew Honnibal
8875590081
Add optimizer in Language.update if sgd=None
2017-08-20 14:42:07 +02:00
Matthew Honnibal
84b7ed49e4
Ensure updates aren't made if no gold available
2017-08-20 14:41:38 +02:00
Ines Montani
c2bbd393af
Merge pull request #1276 from oroszgy/model_cli_v2
...
Ported model cli from v1
2017-08-20 11:52:59 +02:00
Jim Geovedi
f77443ab68
reworked
2017-08-20 13:43:21 +07:00
Jim Geovedi
fbc62a09c7
added {pre,suf,in}fix tests
2017-08-20 13:43:00 +07:00
Jim Geovedi
713d7c0aa0
added indonesian lang test
2017-08-20 12:17:14 +07:00
Jim Geovedi
b7d83f37c8
indonesian abbr.
2017-08-20 12:16:50 +07:00
Jim Geovedi
7193c47f0b
direct lookup
2017-08-20 11:57:52 +07:00
Jim Geovedi
fdf802d505
added examples
2017-08-20 11:57:10 +07:00
Jim Geovedi
fa544e6c9a
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-08-20 11:49:40 +07:00
Matthew Honnibal
42fa84075f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-19 22:42:50 +02:00
Matthew Honnibal
aefef6fd28
Prevent strings from being lost during from_disk and from_bytes
2017-08-19 22:42:17 +02:00
ines
281e7e58b3
Don't escape forward slashes on ujson.dumps
2017-08-19 22:32:16 +02:00
ines
2d126a00ae
Fix typo
2017-08-19 22:32:07 +02:00
Matthew Honnibal
41c2218c53
Fix test for vectors
2017-08-19 22:09:12 +02:00
Matthew Honnibal
b8e1603cc4
Fix load fail for missing vectors
2017-08-19 22:07:00 +02:00
Matthew Honnibal
a3c51a0355
Fix creation of pipeline
2017-08-19 21:58:57 +02:00
Gyorgy Orosz
e5344b83a3
Ported model cli from v1
2017-08-19 21:45:23 +02:00
Matthew Honnibal
6a94648373
Fix serialization
2017-08-19 21:27:35 +02:00
Matthew Honnibal
1157294434
Improve vector handling
2017-08-19 20:35:33 +02:00
Matthew Honnibal
ef87562741
Restore vectors test utils
2017-08-19 20:35:16 +02:00
Matthew Honnibal
1391f9da37
Restore vectors tests
2017-08-19 20:34:58 +02:00
Matthew Honnibal
8cfeeb4884
Increment version
2017-08-19 19:52:58 +02:00
Matthew Honnibal
93fb8b64e9
Fix vector loading
2017-08-19 19:52:25 +02:00
Matthew Honnibal
49a615e7d9
Create Vectors object in Vocab
2017-08-19 18:50:16 +02:00
Matthew Honnibal
3d049af563
Improve vectors to/from disk
2017-08-19 18:42:11 +02:00
Matthew Honnibal
d55d6e1cfa
Fix comparison of Token from different docs. Closes #1257
2017-08-19 16:39:32 +02:00
Matthew Honnibal
9b6a5df15e
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-19 16:24:57 +02:00
Matthew Honnibal
4fda02c7e6
Add test for new Span.to_array method
2017-08-19 16:24:38 +02:00
Matthew Honnibal
dea229c634
Fix Span.to_array method
2017-08-19 16:24:28 +02:00
Matthew Honnibal
c606b4a42c
Add test for Doc.char_span
2017-08-19 16:18:23 +02:00
Matthew Honnibal
8b7ac77c23
Allow span label to be string in Doc.char_span
2017-08-19 16:18:09 +02:00
Matthew Honnibal
7c47e38c12
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-19 09:03:15 -05:00
Matthew Honnibal
ab28f911b4
Fix parser learning rates
2017-08-19 09:02:57 -05:00
ines
1fe5e1a4d1
Add language example sentences (see #1107 )
...
da, de, en, es, fr, he, it, nb, pl, pt, sv
2017-08-19 12:22:29 +02:00
Matthew Honnibal
97aabafb5f
Document as_tuples keyword arg of Language.pipe
2017-08-19 12:21:33 +02:00
Matthew Honnibal
80236116a6
Add Doc.char_span method, to get a span by character offset
2017-08-19 12:21:09 +02:00
Matthew Honnibal
482bba1722
Add Span.to_array method
2017-08-19 12:20:45 +02:00
Matthew Honnibal
19c495f451
Fix vectors deserialization
2017-08-19 04:33:03 +02:00
Matthew Honnibal
42d47c1e5c
Fix tagger serialization
2017-08-19 04:16:32 +02:00
Matthew Honnibal
2da96a0ec7
Fix beam test
2017-08-19 04:15:46 +02:00
Matthew Honnibal
a7309a217d
Update tagger serialization
2017-08-18 23:12:05 +02:00
Matthew Honnibal
bae59bf92f
Remove BiLSTM import
2017-08-18 22:46:59 +02:00
Matthew Honnibal
c307a0ffb8
Restore patches from nn-beam-parser to spacy/syntax
2017-08-18 22:38:59 +02:00
Matthew Honnibal
fe90dfc390
Restore changes from nn-beam-parser to spacy/_ml
2017-08-18 22:38:28 +02:00
Matthew Honnibal
de7e8703e3
Restore tests for beam parser
2017-08-18 22:27:42 +02:00
Matthew Honnibal
11c31d285c
Restore changes from nn-beam-parser
2017-08-18 22:26:12 +02:00
Matthew Honnibal
ce321b0322
Restore changes from nn-beam-parser to spacy/_ml
2017-08-18 22:24:46 +02:00
Matthew Honnibal
5f81d700ff
Restore patches from nn-beam-parser to spacy/syntax
2017-08-18 22:23:03 +02:00
Matthew Honnibal
ec482580b5
Restore changes to pipeline.pyx from nn-beam-parser branch
2017-08-18 22:02:35 +02:00
Matthew Honnibal
931509d96a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-18 21:57:15 +02:00
Matthew Honnibal
ed95009b5c
Fix data loading on Python 2
2017-08-18 21:57:06 +02:00
Matthew Honnibal
baf36d0588
Add compat function for importlib.util
2017-08-18 21:56:47 +02:00
Matthew Honnibal
263366729e
Don't import BiLSTM
2017-08-18 21:56:31 +02:00
Matthew Honnibal
28162290b3
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-18 14:55:40 -05:00
Matthew Honnibal
85794c1167
Restore state of _ml.py
2017-08-18 14:55:23 -05:00
Matthew Honnibal
d456d2efe1
Fix conflicts in nn_parser
2017-08-18 20:55:58 +02:00
Matthew Honnibal
1cec1efca7
Fix merge conflicts in nn_parser from beam stuff
2017-08-18 20:50:49 +02:00
Matthew Honnibal
69bcacdc09
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-18 20:47:13 +02:00
Matthew Honnibal
2993b54fff
Load vectors in vocab
2017-08-18 20:46:56 +02:00
Matthew Honnibal
a1ec41298c
Restore CFile loader
2017-08-18 20:46:16 +02:00
Matthew Honnibal
ed4fb991dc
Work on vectors loading
2017-08-18 20:45:48 +02:00
Matthew Honnibal
426f84937f
Resolve conflicts when merging new beam parsing stuff
2017-08-18 13:38:32 -05:00
Matthew Honnibal
5181e8bedb
Fix merge conflict in _ml
2017-08-18 13:35:51 -05:00
Matthew Honnibal
f75420ae79
Unhack beam parsing, moving it under options instead of global flags
2017-08-18 13:31:15 -05:00
Jim Geovedi
7ae45bffcf
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-08-18 10:14:46 +07:00
Dan O'Huiginn
ebf5a3ce59
Allow loading with python < 3.6
...
Don't rely on recent python features to load models
Fixes Issue #1271
2017-08-17 15:15:47 +00:00
Matthew Honnibal
0209a06b4e
Update beam parser
2017-08-16 18:25:49 -05:00
Matthew Honnibal
4b1e7bd6d8
Improve tensorizer model
2017-08-16 18:25:20 -05:00
Matthew Honnibal
a6d8d7c82e
Add is_gold_parse method to transition system
2017-08-16 18:24:09 -05:00
Matthew Honnibal
3533bb61cb
Add option of 8 feature parse state
2017-08-16 18:23:27 -05:00
Matthew Honnibal
1cb2f15d65
Clean up unused predict_confidences function
2017-08-16 18:22:26 -05:00
Matthew Honnibal
210f6d5175
Fix efficiency error in batch parse
2017-08-15 03:19:03 -05:00
Matthew Honnibal
23537a011d
Tweaks to beam parser
2017-08-15 03:15:28 -05:00
Matthew Honnibal
500e92553d
Fix memory error when copying scores in beam
2017-08-15 03:15:04 -05:00
Matthew Honnibal
a8e4064dd8
Fix tensor gradient in parser
2017-08-15 03:14:36 -05:00
Matthew Honnibal
e420e0366c
Remove use of hash function in beam parser
2017-08-15 03:13:57 -05:00
Matthew Honnibal
6259490347
Fix mixture weights in fine_tune
2017-08-14 17:55:18 -05:00
Matthew Honnibal
335fa8b05c
Fix gradient in fine_tune
2017-08-14 14:55:47 -05:00
Matthew Honnibal
d9f82f6b50
Increment version
2017-08-14 14:55:26 +02:00
ines
a29f132ffd
Change python -m spacy to spacy
...
Reflects latest change to entry point or auto-alias
2017-08-14 13:04:48 +02:00
ines
65bf80302c
Increment version
2017-08-14 13:04:30 +02:00
Matthew Honnibal
52c180ecf5
Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
...
This reverts commit ea8de11ad5
, reversing
changes made to 08e443e083
.
2017-08-14 13:00:23 +02:00
Matthew Honnibal
dbbfe595a5
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-14 12:09:28 +02:00
Matthew Honnibal
ac6c25f762
Check SGD is not None in update
2017-08-14 12:09:18 +02:00
Matthew Honnibal
0ae045256d
Fix beam training
2017-08-13 18:02:05 -05:00
Matthew Honnibal
6a42cc16ff
Fix beam parser, improve efficiency of non-beam
2017-08-13 12:37:26 +02:00
Matthew Honnibal
4363b4aa4a
Fix redundant tokvecs updates during update
2017-08-13 12:36:55 +02:00
Matthew Honnibal
12de263813
Bug fixes to beam parsing. Learns small sample
2017-08-13 09:33:39 +02:00
Matthew Honnibal
4ae0d5e1e6
Set defaults for convert command
2017-08-13 09:03:38 +02:00
Matthew Honnibal
92ebab6073
Update beam-update tests
2017-08-13 08:56:02 +02:00
Matthew Honnibal
17874fe491
Disable beam parsing
2017-08-12 19:35:40 -05:00
Matthew Honnibal
69f21867b5
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-12 19:25:56 -05:00
Matthew Honnibal
3e30712b62
Improve defaults
2017-08-12 19:24:17 -05:00
Matthew Honnibal
28e930aae0
Fixes for beam parsing. Not working
2017-08-12 19:22:52 -05:00
Matthew Honnibal
c96d769836
Fix beam parse. Not sure if working
2017-08-12 18:21:54 -05:00
Matthew Honnibal
24b45b45c6
Add test for beam update
2017-08-12 17:15:28 -05:00
Matthew Honnibal
4638f4b869
Fix beam update
2017-08-12 17:15:16 -05:00
Matthew Honnibal
d4308d2363
Initialize State offset to 0
2017-08-12 17:14:39 -05:00
Matthew Honnibal
b353e4d843
Work on parser beam training
2017-08-12 14:47:45 -05:00
ines
d4f2baf7dd
Add create_meta option to package command
...
Re-create meta.json in model directory, even if it exists. Especially
useful when updating existing spaCy models or training with Prodigy.
Ensures user won't end up with multiple "en_core_web_sm" models, and
offers easy way to change the model's name and settings without having
to edit the meta.json file.
2017-08-12 21:44:18 +02:00
Matthew Honnibal
4ab0c8c8e9
Try different drop_layer structure in Tok2Vec
2017-08-12 08:56:57 -05:00
Matthew Honnibal
cd5ecedf6a
Try drop_layer in parser
2017-08-12 08:56:33 -05:00
Matthew Honnibal
8870d491f1
Remove redundant pickling during training
2017-08-12 08:55:53 -05:00
Matthew Honnibal
680043ebca
Improve efficiency of tagger.set_annotations for GPU
2017-08-12 08:54:21 -05:00
Matthew Honnibal
ebe0f7f641
Pass embed size correctly in tagger, and cache embeddings for efficiency
2017-08-12 05:45:20 -05:00
Matthew Honnibal
1a59db1c86
Fix dropout and learn rate in parser
2017-08-12 05:44:39 -05:00
Matthew Honnibal
d01dc3704a
Adjust parser model
2017-08-09 20:06:33 -05:00
Matthew Honnibal
f37528ef58
Pass embed size for parser fine-tune. Use SELU
2017-08-09 17:52:53 -05:00
Matthew Honnibal
f93f2bed58
Revert use of layer normalization in Tok2Vec
2017-08-09 17:47:03 -05:00
Matthew Honnibal
20944dd8aa
Fix conflict in parser fine-tuning
2017-08-09 16:43:05 -05:00
Matthew Honnibal
ac2de6dced
Switch to ReLu layers in Tok2Vec
2017-08-09 16:41:25 -05:00
Matthew Honnibal
bbace204be
Gate parser fine-tuning behind feature flag
2017-08-09 16:40:42 -05:00
Matthew Honnibal
a59a1deac4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-09 16:23:19 -05:00
Matthew Honnibal
bcce6f7de0
Fix parser fine tuning
2017-08-09 16:23:12 -05:00
ines
28e2fec23b
Fix autolinking failure on fresh model install ( resolves #1138 )
...
On fresh install via subprocess, pip.get_installed_distributions()
won't show new model, so is_package check in link command fails.
Solution for now is to get model package path explicitly and pass it to
link command.
2017-08-09 11:52:38 +02:00
Jim Geovedi
c62b49b7cc
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-08-09 09:17:46 +07:00
Matthew Honnibal
dbdd8afc4b
Fix parser fine-tune training
2017-08-08 15:46:07 -05:00
Matthew Honnibal
88bf1cf87c
Update parser for fine tuning
2017-08-08 15:34:17 -05:00
Matthew Honnibal
5d837c3776
Add mix weights on fine_tune
2017-08-07 06:32:59 -05:00
Matthew Honnibal
42bd26f6f3
Give parser its own tok2vec weights
2017-08-06 18:33:46 +02:00
Matthew Honnibal
3ed203de25
Use LayerNorm and SELU in Tok2Vec
2017-08-06 18:33:18 +02:00
Matthew Honnibal
78498a072d
Return Transition for missing actions in lookup_action
2017-08-06 14:16:36 +02:00
Matthew Honnibal
4a5cc89138
Fix tagger 'fine_tune', to keep private CNN weights
2017-08-06 14:15:48 +02:00
Matthew Honnibal
3cb8f06881
Fix NeuralLabeller
2017-08-06 14:15:14 +02:00
Matthew Honnibal
0acce0521b
Fix Language.update for pipeline
2017-08-06 14:13:03 +02:00
Matthew Honnibal
bfffdeabb2
Fix parser batch-size bug introduced during cleanup
2017-08-06 14:10:48 +02:00
Matthew Honnibal
0eec7c9e9b
Fix Language.evaluate
2017-08-06 02:18:31 +02:00
Matthew Honnibal
0a566dc320
Add update_tensors flag to Language.update. Experimental, re #1182
2017-08-06 02:18:12 +02:00
Matthew Honnibal
cc19ea0e7c
Add update_tensors flag to Language.update. Experimental, re #1182
2017-08-06 02:17:10 +02:00
Matthew Honnibal
4cfb7a54e7
Fix tagger
2017-08-06 01:53:31 +02:00
Matthew Honnibal
e9ab800e15
Fix tagging model
2017-08-06 01:50:08 +02:00
Matthew Honnibal
468c138ab3
WIP: Add fine-tuning logic to tagger model, re #1182
2017-08-06 01:13:23 +02:00
Matthew Honnibal
7f876a7a82
Clean up some unused code in parser
2017-08-06 00:00:21 +02:00
Matthew Honnibal
ae1ad81069
Increment version
2017-08-05 18:09:32 +02:00
Jim Geovedi
cc4772cac2
reworks
2017-08-03 13:08:38 +07:00
Jim Geovedi
37f19f5ed2
added more currencies based on corpus data
2017-08-03 13:03:25 +07:00
Jim Geovedi
30fd068d42
hashtag prefix should be handled somewhere else
2017-08-03 13:03:02 +07:00
Jim Geovedi
4705ae19ba
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-08-03 12:40:19 +07:00
Jim Geovedi
ba07e23c87
added USD in currency rules
2017-08-02 22:42:47 +07:00
Matthew Honnibal
5c323daa1a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-01 22:10:37 +02:00
Matthew Honnibal
2e00361522
Fix update when 0 docs
2017-08-01 22:10:17 +02:00
Matthew Honnibal
8fce187de4
Fix ArcEager for missing values
2017-08-01 22:10:05 +02:00
ines
78e262140f
Add workaround for displaCy server on Python 2/3 ( resolves #1227 )
...
Make sure status and headers are bytes on Python 2 and strings on
Python 3
2017-08-01 01:11:35 +02:00
Jim Geovedi
2572a9ddf0
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-07-30 21:24:16 +07:00
Jim Geovedi
bb08d696f9
added hashtag rule and fixed currency rules
2017-07-30 21:23:28 +07:00
Jim Geovedi
e9af79a803
added u-\d+ rules (sports team)
2017-07-30 21:23:01 +07:00
Matthew Honnibal
27abc56e98
Add method to get beam entities
2017-07-29 21:59:02 +02:00
Matthew Honnibal
ec63f4fe7b
Add option to control how missing entities are handled when getting NER tags
2017-07-29 21:58:37 +02:00
Jim Geovedi
e5adc26c72
simplified rules
2017-07-29 18:21:32 +07:00
Jim Geovedi
783f7d8b86
added test set for Indonesian language
2017-07-29 18:21:07 +07:00
Jim Geovedi
4d04898dea
updated regexp
2017-07-29 17:44:57 +07:00
Jim Geovedi
7d96d477ea
updated like_num
2017-07-29 17:44:46 +07:00
Jim Geovedi
3cca4ed798
added lex attrs rules
2017-07-29 17:22:21 +07:00
Jim Geovedi
8b814c63f1
more exceptions
2017-07-27 19:46:30 +07:00
Jim Geovedi
6c725e8dcf
updated lemma
2017-07-27 19:46:21 +07:00
Jim Geovedi
c194f7ae26
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-07-27 10:55:34 +07:00
Jim Geovedi
547973b92a
wip syntax iterators
2017-07-27 10:51:34 +07:00
Jim Geovedi
bbc75da38d
enable syntax iterator and lemma lookup
2017-07-27 10:51:15 +07:00
Jim Geovedi
24a8c8bf28
added wip lemma dict
2017-07-26 21:39:54 +07:00
Jim Geovedi
63f14ba46b
added hyphen-suffix rules
2017-07-26 19:28:57 +07:00
Jim Geovedi
f288964441
removed -el from suffix rules
2017-07-26 19:28:38 +07:00
Jim Geovedi
6eee7a7411
updated tokenizer exceptions
2017-07-26 19:13:47 +07:00
Jim Geovedi
edec51b1b1
update punctuation rules
2017-07-26 19:13:36 +07:00
Jim Geovedi
62443d495a
enable token match
2017-07-26 19:13:14 +07:00
Jim Geovedi
c97f5ae0bb
updated tokenizer exceptions
2017-07-26 19:12:52 +07:00
Matthew Honnibal
aff325b7e0
Increment version
2017-07-25 19:41:20 +02:00
Matthew Honnibal
6780132821
Fix tagger loading
2017-07-25 19:41:11 +02:00
Matthew Honnibal
fd20a4af55
Increment version
2017-07-25 18:58:34 +02:00
Matthew Honnibal
523b0df2c9
Update text classification model
2017-07-25 18:57:59 +02:00
Matthew Honnibal
7c7fac9337
Add spacy.blank() loading function
2017-07-25 18:56:37 +02:00
Jim Geovedi
73f6ac9d9b
added hyhen
2017-07-24 15:56:31 +07:00
Jim Geovedi
68454c40bf
added missing import
2017-07-24 14:12:34 +07:00
Jim Geovedi
eaf9cbd708
cursed of copy & paste
2017-07-24 14:11:51 +07:00
Jim Geovedi
7aad6718bc
enable tokenizer exceptions
2017-07-24 14:11:10 +07:00
Jim Geovedi
ad56c9179a
added tokenizer exceptions list
2017-07-24 14:10:16 +07:00
Jim Geovedi
c1f3fe99fe
updated punctuation rules
2017-07-24 13:57:21 +07:00
Jim Geovedi
37fa2c8c80
punctution rules
2017-07-24 06:17:18 +07:00
Jim Geovedi
082e94ac1c
added inflix rules
2017-07-24 06:17:07 +07:00
Jim Geovedi
d0ec484725
reverted
2017-07-24 06:16:29 +07:00
Jim Geovedi
0e590c711f
added prefix & suffix rules
2017-07-23 23:46:40 +07:00
Jim Geovedi
ba922e30e8
added ampere hour unit
2017-07-23 23:46:18 +07:00
Jim Geovedi
3b17eba27b
added frequency units
2017-07-23 23:10:52 +07:00
Jim Geovedi
d5fd32a572
added known currencies
2017-07-23 22:56:48 +07:00
Jim Geovedi
f6f15678fb
added lex_attrs
2017-07-23 22:55:22 +07:00
Jim Geovedi
bed8162d00
added tokenizer_exceptions
2017-07-23 22:55:05 +07:00
Jim Geovedi
b80c35bc9a
added norm_exceptions
2017-07-23 22:54:49 +07:00
Jim Geovedi
b5de329ea3
added norm_exceptions
2017-07-23 22:54:19 +07:00
Jim Geovedi
082e9ade46
fixed typo
2017-07-23 21:30:34 +07:00
Jim Geovedi
e2efeb186e
added stopwords
2017-07-23 20:52:37 +07:00
Jim Geovedi
da98676839
use template
2017-07-23 20:51:31 +07:00
Jim Geovedi
c2b4dd7809
start working on Indonesian language
2017-07-23 20:50:56 +07:00
Matthew Honnibal
5771bd1ff8
Increment version
2017-07-23 14:18:38 +02:00
Matthew Honnibal
c4a81a47a4
Fix deserialization
2017-07-23 14:11:07 +02:00
Matthew Honnibal
2df563ad24
Remove optimization for textcat that caused loading problem
2017-07-23 14:10:51 +02:00
Matthew Honnibal
4fe77bced2
Add cfg attr to pipeline components
2017-07-23 00:52:47 +02:00
Matthew Honnibal
d8aa721664
Compute Language.meta with a property
2017-07-23 00:50:18 +02:00
Matthew Honnibal
a88a7deffe
Five save/load of textcat config
2017-07-23 00:33:43 +02:00
Matthew Honnibal
9bae0ddc50
Fix minibatching
2017-07-22 20:14:49 +02:00
Matthew Honnibal
ded0df5e2f
Expose hyper-param as keyword arg
2017-07-22 20:14:37 +02:00
Matthew Honnibal
f5de8deeec
Increment version
2017-07-22 20:04:53 +02:00
Matthew Honnibal
b55714d5d1
Make gold_tuples arg optional in begin_training
2017-07-22 20:04:43 +02:00
Matthew Honnibal
ed6c85fa3c
Fix loading of text categories in GoldParse
2017-07-22 20:04:03 +02:00
Matthew Honnibal
6ffec9dfea
Update _ml, for textcat model
2017-07-22 20:03:40 +02:00
Matthew Honnibal
d6a5c2c85a
Add test for NER
2017-07-22 01:48:58 +02:00
Matthew Honnibal
28244df4da
Add test for beam parsing
2017-07-22 01:48:35 +02:00
Matthew Honnibal
c86445bdfd
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-07-22 01:14:28 +02:00
Matthew Honnibal
b3a749610e
Fix name of TextCategorizer
2017-07-22 01:14:07 +02:00
Matthew Honnibal
2424493970
Remove unnecessary import of Mock
2017-07-22 01:13:54 +02:00
Matthew Honnibal
baa3d81c35
Add text categorizer to Language
2017-07-22 01:13:36 +02:00
Matthew Honnibal
a6a2159969
Add slot for text categories to Doc
2017-07-22 00:34:15 +02:00
Matthew Honnibal
374ab3ecfb
Increment alpha version
2017-07-22 00:32:49 +02:00
Matthew Honnibal
289f23df51
Test beam parsing
2017-07-20 15:03:10 +02:00
Matthew Honnibal
3da1063b36
Add beam decoding to parser, to allow NER uncertainties
2017-07-20 15:02:55 +02:00
Matthew Honnibal
0ca5832427
Improve negative example handling in NER oracle
2017-07-20 00:18:49 +02:00
Matthew Honnibal
a231b56d40
Add text-classification hook to pipeline
2017-07-20 00:18:15 +02:00
Matthew Honnibal
7ea50182a5
Add support for text-classification labels to GoldParse
2017-07-20 00:17:47 +02:00
Matthew Honnibal
727481377e
Add text-classifer thinc models
2017-07-20 00:17:17 +02:00
Matthew Honnibal
f014138c11
Fix parser tests
2017-07-20 00:16:52 +02:00
Ines Montani
c91642efd5
Port over changes from #1168
2017-07-01 11:43:54 +02:00
Jim Regan
d81ceb0cd5
Merge branch 'develop' into polish
2017-06-26 22:42:27 +01:00
Jim O'Regan
2f84c73585
a start
2017-06-26 22:40:04 +01:00
Jim O'Regan
28d7f0a672
reference
2017-06-26 22:38:28 +01:00
Matthew Honnibal
91e52543ef
Merge pull request #1118 from Gregory-Howard/patch-2
...
Update _tokenizer_exceptions_list (adding cities)
2017-06-20 11:16:07 +02:00
Matthew Honnibal
8ea785e01a
Merge pull request #1119 from oroszgy/patch-3
...
Fixed conllu converter
2017-06-20 11:14:41 +02:00
Tpt
7745b3ae04
Adds noun chunks to French syntax iterators
2017-06-12 15:29:58 +02:00
Tpt
57e8254f63
Adds function to extract french noun chunks
2017-06-12 15:20:49 +02:00
György Orosz
62dbf9025c
Fixed conllu converter
2017-06-09 22:53:56 +02:00
Grégory Howard
cd974b32b7
Update _tokenizer_exceptions_list (adding cities)
2017-06-09 17:58:18 +02:00
ines
34a2eecb17
Add simple "naughty strings" test (see #1107 )
2017-06-06 17:43:51 +02:00
ines
045574a936
Update package name and increment version
2017-06-05 20:41:30 +02:00
Matthew Honnibal
1f5874a927
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-05 20:20:00 +02:00
ines
03db56f48c
Detect spaCy version and add package title
...
Package title allows customised package names (like spacy-nightly)
2017-06-05 20:11:02 +02:00
Matthew Honnibal
c0d90f52f7
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-05 19:20:13 +02:00
ines
cc9c5dc7a3
Fix noun chunks test
2017-06-05 16:39:04 +02:00
Matthew Honnibal
836bfa2d0f
Add factory for experimental SimilarityHook component
2017-06-05 15:40:22 +02:00
Matthew Honnibal
d59fa32df1
Add experimental SimilarityHook omponent
2017-06-05 15:40:03 +02:00
Matthew Honnibal
5489b49203
Remove print statement
2017-06-05 13:20:41 +02:00
Matthew Honnibal
fc4204a12a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-05 13:13:23 +02:00
Matthew Honnibal
2479cde446
Support disable keyword in Language.__init__
2017-06-05 13:13:07 +02:00
ines
ea167e14db
Fix model package loading from link
2017-06-05 13:10:49 +02:00
ines
dd6dc4c120
Update spacy.load() helper functions
2017-06-05 13:02:31 +02:00
Matthew Honnibal
b4cdd05466
Add vectors.pyx in setup
2017-06-05 12:45:29 +02:00
Matthew Honnibal
280d419529
Add pickle method for vectors
2017-06-05 12:36:04 +02:00
Matthew Honnibal
30369d580f
Start testing Vectors class
2017-06-05 12:32:49 +02:00
Matthew Honnibal
eb7cbb62c2
Flesh out Vectors class
2017-06-05 12:32:08 +02:00
ines
51d7414e94
Make sure sents are a list
2017-06-05 12:30:13 +02:00
Matthew Honnibal
ebb6c49cd5
Make alignment case-insensitive for gold
2017-06-04 20:26:42 -05:00
Matthew Honnibal
fc4dd62e84
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-04 20:19:05 -05:00
Matthew Honnibal
8f8f90b46b
Disable labeller if not parsing
2017-06-04 20:18:54 -05:00
Matthew Honnibal
c52fde40f4
Improve train CLI
2017-06-04 20:18:37 -05:00
Matthew Honnibal
a053b1218e
Fix item counting during training
2017-06-04 20:18:20 -05:00
Matthew Honnibal
b3b5521625
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-04 20:17:18 -05:00
Matthew Honnibal
9bc4a26213
Add option of data augmentation noise
2017-06-04 20:16:57 -05:00
Matthew Honnibal
7b2ede783d
Add SP tag to tag map if missing
2017-06-04 20:16:30 -05:00
ines
a0f4592f0a
Update tests
2017-06-05 02:26:13 +02:00
ines
3e105bcd36
Update tests
2017-06-05 02:09:27 +02:00
Matthew Honnibal
516798e9fc
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-05 01:35:21 +02:00
Matthew Honnibal
193bf913c0
Set is_tagged=True after tagging
2017-06-05 01:35:07 +02:00
ines
078232932c
Fix tokenizer fixture scope
2017-06-05 01:06:34 +02:00
Matthew Honnibal
58be0e1f6f
Update tests
2017-06-04 16:35:06 -05:00
Matthew Honnibal
b78cc318c3
Fix loading of morphology exceptions
2017-06-04 16:34:32 -05:00
Matthew Honnibal
bb98d45a63
Fix tests
2017-06-04 16:00:44 -05:00
Matthew Honnibal
55d0621532
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-04 15:53:25 -05:00
Matthew Honnibal
5b9f116aca
Update tests
2017-06-04 15:53:17 -05:00
Matthew Honnibal
2a3bd5ee90
Fix fetching of noun chunk iterator
2017-06-04 15:53:05 -05:00
Matthew Honnibal
3680c51b8f
Avoid clobbering preset POS tags
2017-06-04 15:52:42 -05:00
Matthew Honnibal
939e8ed567
Add lookup properties for components in Language
2017-06-04 15:52:09 -05:00
Matthew Honnibal
e28f90b672
Fix syntax iterators
2017-06-04 15:51:50 -05:00
ines
8a29308d0b
Remove unused imports
2017-06-04 22:39:29 +02:00
Ines Montani
112c5787eb
Merge pull request #1101 from oroszgy/hu_tokenizer_fix
...
More robust Hungarian tokenizer.
2017-06-04 22:37:51 +02:00
ines
96867a24ae
Fix typo
2017-06-04 22:36:40 +02:00
ines
f432bb4b48
Fix fixture scopes
2017-06-04 22:34:31 +02:00
Matthew Honnibal
6d0356e6cc
Whitespace
2017-06-04 14:55:24 -05:00
Matthew Honnibal
8a683a4494
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-04 21:53:56 +02:00
Matthew Honnibal
92ae36f84e
Improve way noun chunks iterator is looked up
2017-06-04 21:53:39 +02:00
ines
9254a3dd78
Import and add Spanish syntax iterators
2017-06-04 21:42:15 +02:00
ines
7db1a0e83e
Make sure printed values are always strings
2017-06-04 21:27:20 +02:00
Matthew Honnibal
51e1541ddb
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-04 14:26:29 -05:00
Matthew Honnibal
add9a33782
Return False for vocab.has_vector
2017-06-04 14:26:14 -05:00
Matthew Honnibal
675f448313
Fix vector linkage on Doc
2017-06-04 14:25:30 -05:00
Matthew Honnibal
f4662e9218
Fix vector linkage for token
2017-06-04 14:19:58 -05:00
ines
070e026ed9
Ensure path on read_json
2017-06-04 20:44:37 +02:00
ines
e1e73936b1
Raise correct error
2017-06-04 20:44:27 +02:00
ines
848e47669e
Fix typo
2017-06-04 20:44:15 +02:00
ines
c4614c02a2
Fix dev resources URL
2017-06-04 15:45:50 +02:00
ines
a66cf24ee8
xfail tokenizer serialization tests for now
...
Tests pass locally, but not on Travis – needs more investigation
2017-06-04 13:58:20 +02:00
ines
7b7d46b64e
Fix typo and success message
2017-06-04 13:45:50 +02:00
ines
90d117f378
Update version
2017-06-04 13:41:16 +02:00
Matthew Honnibal
7ca215bc26
Resolve lex_attr_getters conflict
2017-06-03 16:12:01 -05:00
Matthew Honnibal
21eef90dbc
Support specifying which GPU
2017-06-03 16:10:23 -05:00
Matthew Honnibal
d0e42f9275
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-03 15:30:32 -05:00
Matthew Honnibal
8a17b99b1c
Use NORM attribute, not LOWER
2017-06-03 15:30:16 -05:00
ines
4c643d74c5
Add norm exceptions to other Language classes
2017-06-03 22:29:21 +02:00
ines
fa7e576c57
Change order of exception dicts
2017-06-03 21:52:06 +02:00
Matthew Honnibal
3f5c85d8de
Reorder setting of lex attrs, to avoid clobbering
2017-06-03 14:47:55 -05:00
Matthew Honnibal
aeb7520133
Make norm use lower-case
2017-06-03 14:47:38 -05:00
Matthew Honnibal
de3954843e
Populate norm exceptions with lower-case
2017-06-03 14:47:12 -05:00
Matthew Honnibal
f6955a459c
Fix prev commit
2017-06-03 14:38:37 -05:00
Matthew Honnibal
468ca6c760
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-03 14:33:51 -05:00
Matthew Honnibal
c647a0d33e
Fix training counter for gold preprocessing
2017-06-03 14:33:39 -05:00
ines
e47eef5e03
Update German tokenizer exceptions and tests
2017-06-03 21:07:44 +02:00
ines
d77c2cc8bb
Add tests for English norm exceptions
2017-06-03 20:59:50 +02:00
ines
0d6fa8b241
Add German norm exceptions
2017-06-03 20:54:18 +02:00
ines
5bd311c77e
Fix update of norm exceptions
2017-06-03 20:54:09 +02:00
Matthew Honnibal
94e063ae2a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-03 13:31:40 -05:00
Matthew Honnibal
fea1144e6d
Set max batch size in evaluate
2017-06-03 13:31:33 -05:00
Matthew Honnibal
805495af27
Fix off-by-one in number of tags
2017-06-03 13:29:23 -05:00
Matthew Honnibal
e62f46d39f
Clarify gold.pyx slightly
2017-06-03 13:28:52 -05:00
Matthew Honnibal
43353b5413
Improve train CLI script
2017-06-03 13:28:20 -05:00
ines
746653880c
Add English norm exceptions to lex_attrs
2017-06-03 20:27:28 +02:00
ines
095eeeb12f
Update English tokenizer exceptions and add norms
2017-06-03 20:27:16 +02:00
ines
e5d426406a
Add base norm exceptions
2017-06-03 20:27:05 +02:00
ines
4c2bbc3ccc
Add add_lookups util function
2017-06-03 19:44:47 +02:00
ines
05fe6758a7
Set lexeme attributes for tokenizer special cases
2017-06-03 19:44:39 +02:00
ines
3152ee5ca2
Update serialization tests for tokenizer
2017-06-03 17:05:28 +02:00
ines
7c919aeb09
Make sure serializers and deserializers are ordered
2017-06-03 17:05:09 +02:00