Commit Graph

8807 Commits

Author SHA1 Message Date
Matthew Honnibal
f37528ef58 Pass embed size for parser fine-tune. Use SELU 2017-08-09 17:52:53 -05:00
Matthew Honnibal
f93f2bed58 Revert use of layer normalization in Tok2Vec 2017-08-09 17:47:03 -05:00
Matthew Honnibal
20944dd8aa Fix conflict in parser fine-tuning 2017-08-09 16:43:05 -05:00
Matthew Honnibal
ac2de6dced Switch to ReLu layers in Tok2Vec 2017-08-09 16:41:25 -05:00
Matthew Honnibal
bbace204be Gate parser fine-tuning behind feature flag 2017-08-09 16:40:42 -05:00
Matthew Honnibal
a59a1deac4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-09 16:23:19 -05:00
Matthew Honnibal
bcce6f7de0 Fix parser fine tuning 2017-08-09 16:23:12 -05:00
ines
495e042429 Add entry point-style auto alias for "spacy"
Simplest way to run commands as spacy xxx instead of python -m spacy
xxx, while avoiding environment conflicts
2017-08-09 12:17:30 +02:00
ines
764540a6dd Don't ignore /bin directory 2017-08-09 12:16:30 +02:00
ines
28e2fec23b Fix autolinking failure on fresh model install (resolves #1138)
On fresh install via subprocess, pip.get_installed_distributions()
won't show new model, so is_package check in link command fails.
Solution for now is to get model package path explicitly and pass it to
link command.
2017-08-09 11:52:38 +02:00
Jim Geovedi
c62b49b7cc Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-09 09:17:46 +07:00
Matthew Honnibal
dbdd8afc4b Fix parser fine-tune training 2017-08-08 15:46:07 -05:00
Matthew Honnibal
88bf1cf87c Update parser for fine tuning 2017-08-08 15:34:17 -05:00
Jim O'Regan
c069b4acb5 fix in UD submitted; map either way 2017-08-08 19:22:14 +01:00
Jim O'Regan
76c22dec4d UD Irish tag mapping 2017-08-08 19:04:52 +01:00
Jim O'Regan
95921d7d4c Merge branch 'develop' into develop-irish 2017-08-08 17:21:27 +01:00
Ines Montani
a9465271a7 Merge pull request #1245 from delirious-lettuce/fix_typos
Fix typos
2017-08-07 23:11:20 +02:00
Paul O'Leary McCann
6e9e686568 Sample implementation of Japanese Tagger (ref #1214)
This is far from complete but it should be enough to check some things.

1. Mecab transition. Janome doesn't support Unidic, only IPAdic, but UD
tag mappings are based on Unidic. This switches out Mecab for Janome to
get around that.

2. Raw tag extension. A simple tag map can't meet the specifications for
UD tag mappings, so this adds an extra field to ambiguous cases. For
this demo it just deals with the simplest case, which only needs to look
at the literal token. (In reality it may be necessary to look at the
whole sentence, but that's another issue.)

3. General code structure. Seems nobody else has implemented a custom
Tagger yet, so still not sure this is the correct way to pass the
vocabulary around, for example.

Any feedback would be greatly appreciated. -POLM
2017-08-08 01:27:15 +09:00
Matthew Honnibal
5d837c3776 Add mix weights on fine_tune 2017-08-07 06:32:59 -05:00
Delirious Lettuce
d3b03f0544 Fix typos:
* `auxillary` -> `auxiliary`
  * `consistute` -> `constitute`
  * `earlist` -> `earliest`
  * `prefered` -> `preferred`
  * `direcory` -> `directory`
  * `reuseable` -> `reusable`
  * `idiosyncracies` -> `idiosyncrasies`
  * `enviroment` -> `environment`
  * `unecessary` -> `unnecessary`
  * `yesteday` -> `yesterday`
  * `resouces` -> `resources`
2017-08-06 21:31:39 -06:00
Matthew Honnibal
42bd26f6f3 Give parser its own tok2vec weights 2017-08-06 18:33:46 +02:00
Matthew Honnibal
3ed203de25 Use LayerNorm and SELU in Tok2Vec 2017-08-06 18:33:18 +02:00
Matthew Honnibal
b7b121103f Merge pull request #1244 from gideonite/patch-1
improve pipe, tee, izip explanation
2017-08-06 14:34:07 +02:00
Matthew Honnibal
78498a072d Return Transition for missing actions in lookup_action 2017-08-06 14:16:36 +02:00
Matthew Honnibal
4a5cc89138 Fix tagger 'fine_tune', to keep private CNN weights 2017-08-06 14:15:48 +02:00
Matthew Honnibal
3cb8f06881 Fix NeuralLabeller 2017-08-06 14:15:14 +02:00
Matthew Honnibal
0acce0521b Fix Language.update for pipeline 2017-08-06 14:13:03 +02:00
Matthew Honnibal
bfffdeabb2 Fix parser batch-size bug introduced during cleanup 2017-08-06 14:10:48 +02:00
Gideon Dresdner
7e98a3613c improve pipe, tee, izip explanation
Use an example from an old issue https://github.com/explosion/spaCy/issues/172#issuecomment-183963403.
2017-08-06 13:21:45 +02:00
Matthew Honnibal
0eec7c9e9b Fix Language.evaluate 2017-08-06 02:18:31 +02:00
Matthew Honnibal
0a566dc320 Add update_tensors flag to Language.update. Experimental, re #1182 2017-08-06 02:18:12 +02:00
Matthew Honnibal
cc19ea0e7c Add update_tensors flag to Language.update. Experimental, re #1182 2017-08-06 02:17:10 +02:00
Matthew Honnibal
4cfb7a54e7 Fix tagger 2017-08-06 01:53:31 +02:00
Matthew Honnibal
e9ab800e15 Fix tagging model 2017-08-06 01:50:08 +02:00
Matthew Honnibal
468c138ab3 WIP: Add fine-tuning logic to tagger model, re #1182 2017-08-06 01:13:23 +02:00
Matthew Honnibal
7f876a7a82 Clean up some unused code in parser 2017-08-06 00:00:21 +02:00
Matthew Honnibal
ae1ad81069 Increment version 2017-08-05 18:09:32 +02:00
Jim Geovedi
cc4772cac2 reworks 2017-08-03 13:08:38 +07:00
Jim Geovedi
37f19f5ed2 added more currencies based on corpus data 2017-08-03 13:03:25 +07:00
Jim Geovedi
30fd068d42 hashtag prefix should be handled somewhere else 2017-08-03 13:03:02 +07:00
Jim Geovedi
4705ae19ba Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-03 12:40:19 +07:00
Jim Geovedi
ba07e23c87 added USD in currency rules 2017-08-02 22:42:47 +07:00
Matthew Honnibal
5c323daa1a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-01 22:10:37 +02:00
Matthew Honnibal
2e00361522 Fix update when 0 docs 2017-08-01 22:10:17 +02:00
Matthew Honnibal
8fce187de4 Fix ArcEager for missing values 2017-08-01 22:10:05 +02:00
ines
78e262140f Add workaround for displaCy server on Python 2/3 (resolves #1227)
Make sure status and headers are bytes on Python 2 and strings on
Python 3
2017-08-01 01:11:35 +02:00
Jim Geovedi
2572a9ddf0 Merge remote-tracking branch 'upstream/develop' into indonesian 2017-07-30 21:24:16 +07:00
Jim Geovedi
bb08d696f9 added hashtag rule and fixed currency rules 2017-07-30 21:23:28 +07:00
Jim Geovedi
e9af79a803 added u-\d+ rules (sports team) 2017-07-30 21:23:01 +07:00
Matthew Honnibal
c16ef0a85c Clarify train textcat example 2017-07-29 21:59:27 +02:00