Matthew Honnibal
ead78c7b9b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-02 12:55:25 +02:00
Matthew Honnibal
5e6a9e7dcc
Add rule-based SBD
2017-09-02 12:53:38 +02:00
Matthew Honnibal
a824cf8f9a
Adjust text classification model
2017-09-02 11:41:00 +02:00
Matthew Honnibal
9bffcaa73d
Update test to make it slightly more direct
...
The `nlp` container should be unnecessary here. If so, we can test the tokenizer class just a little more directly.
2017-09-01 21:16:56 +02:00
Matthew Honnibal
ac040b99bb
Add support for pre-trained vectors in text classifier
2017-09-01 16:39:55 +02:00
Matthew Honnibal
7742a6d559
Add GloVe vectors reader
2017-09-01 16:39:22 +02:00
Matthew Honnibal
789e1a3980
Use 13 parser features, not 8
2017-08-31 14:13:00 -05:00
Matthew Honnibal
30e35d9666
Fix syntax error
2017-08-30 17:35:39 -05:00
Matthew Honnibal
4ceebde523
Fix gradient bug in parser
2017-08-30 17:32:56 -05:00
Vimos Tan
a6d9fb5bb6
fix issue #1292
2017-08-30 14:49:14 +08:00
Paul O'Leary McCann
8b3e1f7b5b
Handle out-of-vocab words
...
Wasn't handling words out of the tokenizer dictionary vocabulary
properly. This adds a fix and test for that. -POLM
2017-08-29 23:58:42 +09:00
ines
173089a45a
Add more validation for model meta
2017-08-29 11:21:46 +02:00
Matthew Honnibal
2e28982e28
Merge pull request #1288 from geovedi/indonesian
...
Indonesian language support
2017-08-26 21:31:13 +02:00
ines
7e04b7f89c
Fix info text on pipeline in package cli
2017-08-26 18:30:59 +02:00
ines
40afa13a8a
Increment version
2017-08-26 18:30:49 +02:00
Matthew Honnibal
876f38c548
Merge pull request #1279 from oroszgy/model_cli_v2
...
Added vector loading to model cli
2017-08-26 15:57:50 +02:00
Matthew Honnibal
cfc055734e
Split % in units, for compatibility with corpus
2017-08-25 20:03:37 -05:00
Matthew Honnibal
4bb6bc3f9e
Add support for sent_start to GoldParse
2017-08-25 20:03:14 -05:00
Matthew Honnibal
44589fb38c
Fix Break oracle
2017-08-25 19:50:55 -05:00
Matthew Honnibal
6d4e8e14ca
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-25 12:37:16 -05:00
Matthew Honnibal
4ce5531389
Use layer norm instead of batch norm
2017-08-25 12:37:10 -05:00
Matthew Honnibal
20dd66ddc2
Constrain sentence boundaries to IS_PUNCT and IS_SPACE tokens
2017-08-25 19:35:47 +02:00
Jim Geovedi
58d8078971
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-08-25 09:21:49 +08:00
Matthew Honnibal
6ceb0f0518
Allow Lexeme.rank to be set
2017-08-24 21:43:00 +02:00
Jeffrey Gerard
884ba168a8
Capture more noun chunks
2017-08-23 21:18:53 -07:00
Matthew Honnibal
44a1fa80d3
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-23 13:02:16 +02:00
ines
bb1abbeba5
Only link model if download was successfull
2017-08-23 12:36:31 +02:00
Matthew Honnibal
bb2541ffd3
Fix PROB attr for OOV words
2017-08-23 12:11:52 +02:00
Matthew Honnibal
1c5c256e58
Fix fine_tune when optimizer is None
2017-08-23 10:51:33 +02:00
Matthew Honnibal
9c580ad28a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-22 17:02:04 -05:00
Matthew Honnibal
a4633fff6f
Restore use of batch norm in model
2017-08-22 17:01:58 -05:00
Matthew Honnibal
03b5b9727a
Fix Doc.vector for empty doc objects
2017-08-22 19:52:19 +02:00
Matthew Honnibal
0551b7b03a
Fix doc.vector
2017-08-22 19:46:52 +02:00
Matthew Honnibal
83f8e98450
Fix retrieval of OOV vectors
2017-08-22 19:46:35 +02:00
Matthew Honnibal
df2745eb08
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-22 19:00:43 +02:00
Matthew Honnibal
5b329acbf2
Fix vectors_length property in vocab
2017-08-22 19:00:27 +02:00
Paul O'Leary McCann
95050201ce
Add importorskip for Japanese fixture
2017-08-22 21:30:59 +09:00
Matthew Honnibal
1fe605dfe5
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-21 19:18:31 -05:00
Matthew Honnibal
18b64e79ec
Fix fine tuning
2017-08-21 19:18:26 -05:00
Matthew Honnibal
682346dd66
Restore optimized hidden_depth=0 for parser
2017-08-21 19:18:04 -05:00
Matthew Honnibal
a21d8f3f0b
Add predict paths to _ml models
2017-08-21 23:23:45 +02:00
Matthew Honnibal
cec76801dc
Add profile command to CLI
2017-08-21 23:23:05 +02:00
Matthew Honnibal
7be5f30f17
Add profile function
2017-08-21 23:22:49 +02:00
ines
a68dc891ea
Port over changes from #1281
2017-08-21 23:19:18 +02:00
Matthew Honnibal
5e50a65252
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-21 14:15:46 -05:00
Matthew Honnibal
80acbc5f1f
Fix fine-tune weight mixture
2017-08-21 14:15:29 -05:00
Paul O'Leary McCann
bcf2b9b4f5
Update tagger & tokenizer tests
...
Tagger is now parametrized and has two sentences with more tag coverage.
The tokenizer tests are updated to reflect differences in tokenization
between IPAdic and Unidic. -POLM
2017-08-22 00:03:11 +09:00
Paul O'Leary McCann
adfd987316
Update the TAG_MAP
2017-08-22 00:02:55 +09:00
Paul O'Leary McCann
53e17296e9
Fix pronoun handling
...
Missed this case earlier.
連体詞 have three classes for UD purposes:
- その -> DET
- それ -> PRON
- 同じ -> ADJ
-POLM
2017-08-22 00:01:49 +09:00
Paul O'Leary McCann
c435f748d7
Put Mecab import in utility function
2017-08-22 00:01:28 +09:00