Ines Montani
436b26fe0f
Revert other changes
2020-02-25 15:48:29 +01:00
Ines Montani
c1a5ece65f
Tidy up setup and update requirements tests
2020-02-25 15:46:39 +01:00
Ines Montani
5d21d3e8b9
Merge branch 'develop' into pr/5008
2020-02-25 15:24:47 +01:00
Ines Montani
acb4e3c7ba
Merge pull request #5039 from adrianeboyd/typo/website-token-api-shape
...
Fix formatting in Token API
2020-02-25 14:57:25 +01:00
Ines Montani
d50152b917
Merge pull request #5019 from questoph/master
...
Optimizing tokenization for Luxembourgish (dealing with apostrophe infixes)
2020-02-25 14:48:50 +01:00
Ines Montani
4440a072d2
Merge pull request #5006 from svlandeg/bugfix/multiproc-underscore
...
load Underscore state when multiprocessing
2020-02-25 14:46:02 +01:00
Ines Montani
38fc05986c
Merge pull request #5058 from bryant1410/patch-1
...
Add missing comma in a dependency specification
2020-02-25 14:44:29 +01:00
svlandeg
d848a68340
thinc 7.4.0.dev2
2020-02-25 12:07:42 +01:00
Santiago Castro
54d8665ff7
Add missing comma in a dependency specification
...
Conda is complaining that it can't parse that line otherwise.
2020-02-24 16:15:28 -05:00
svlandeg
d5bfebe1c5
it's moving day
2020-02-24 10:04:24 +01:00
svlandeg
217c16c7a9
running tests BEFORE deleting them ?
2020-02-24 09:38:43 +01:00
svlandeg
6f846c2cbf
removing --pyargs for testing purposes
2020-02-24 09:19:08 +01:00
svlandeg
d821c95eb0
debugging prints
2020-02-23 17:38:33 +01:00
svlandeg
58568bd0cd
fix
2020-02-23 16:45:37 +01:00
svlandeg
0f55e51704
assert we found the root_dir
2020-02-23 16:33:58 +01:00
svlandeg
783da088ea
avoid try except
2020-02-23 16:21:21 +01:00
svlandeg
b49a3afd0c
use clean_underscore fixture
2020-02-23 15:49:20 +01:00
Ines Montani
4890db6339
Auto-format and fix image [ci skip]
2020-02-23 13:56:50 +01:00
Tom Keefe
ddf63b97a8
make idx available via to_array ( #5030 )
2020-02-22 14:13:06 +01:00
Sofie Van Landeghem
44f4142ce4
add two abbreviations and some additional unit tests ( #5040 )
2020-02-22 14:12:32 +01:00
Sofie Van Landeghem
479bd8d09f
add lemma option to displacy 'dep' visualiser ( #5041 )
...
* add lemma option to displacy 'dep' visualiser
* more compact list comprehension
* add option to doc
* fix test and add lemmas to util.get_doc
* fix capital
* remove lemma from get_doc
* cleanup
2020-02-22 14:11:51 +01:00
Adriane Boyd
3853d385fa
Fix formatting in Token API
2020-02-20 13:41:24 +01:00
adrianeboyd
2164e71ea8
Improved Romanian tokenization for UD RRT ( #5036 )
...
Modifications to Romanian tokenization to improve tokenization for
UD_Romanian-RRT.
2020-02-19 16:15:59 +01:00
svlandeg
9f1447bf71
where areth thou, file ?
2020-02-19 17:09:29 +02:00
svlandeg
9834527f2c
hack to switch between CLI folder setup and local setup
2020-02-19 16:22:48 +02:00
svlandeg
5c2f645470
root dir one level up
2020-02-19 16:15:56 +02:00
svlandeg
303c4bcd4c
include requirements in manifest
2020-02-19 15:52:55 +02:00
svlandeg
b20351792a
assert prints for more clarity
2020-02-19 15:51:53 +02:00
Ines Montani
8137b24928
Merge pull request #5028 from explosion/refactor/remove-symlinks
...
Remove symlinks, data dir and related stuff
2020-02-19 00:20:23 +01:00
Ines Montani
a3335d36b8
Merge branch 'develop' into refactor/remove-symlinks
2020-02-18 17:22:20 +01:00
Ines Montani
a138acb220
Merge pull request #5027 from explosion/chore/sync-develop-master
...
Sync develop with master, tidy up, auto-format
2020-02-18 17:22:03 +01:00
Ines Montani
09cbeaef27
Remove symlinks, data dir and related stuff
2020-02-18 17:20:17 +01:00
Ines Montani
e3f40a6a0f
Tidy up and auto-format
2020-02-18 15:38:18 +01:00
Ines Montani
1278161f47
Tidy up and fix issues
2020-02-18 15:17:03 +01:00
Ines Montani
de11ea753a
Merge branch 'master' into develop
2020-02-18 14:47:23 +01:00
Ines Montani
80e95d02b1
Allow spacy attr in token pattern
2020-02-18 14:32:53 +01:00
Jan Jessewitsch
c7e4fe9c5c
Fix/Improve german stop words ( #5024 )
...
* Fix german stop words
Two stop words ("einige" and "einigen") are sticking together.
Remove three nouns that may serve as stop words in a specific context (e.g. religious or news) but are not applicable for general use.
* Create Jan-711.md
2020-02-17 18:59:22 +01:00
Kabir Khan
f6ed07b85c
Use nlp.pipe in EntityRuler for phrase patterns in add_patterns ( #4931 )
...
* Fix ent_ids and labels properties when id attribute used in patterns
* use set for labels
* sort end_ids for comparison in entity_ruler tests
* fixing entity_ruler ent_ids test
* add to set
* Run make_doc optimistically if using phrase matcher patterns.
* remove unused coveragerc I was testing with
* format
* Refactor EntityRuler.add_patterns to use nlp.pipe for phrase patterns. Improves speed substantially.
* Removing old add_patterns function
* Fixing spacing
* Make sure token_patterns loaded as well, before generator was being emptied in from_disk
2020-02-16 18:17:47 +01:00
Sofie Van Landeghem
72c964bcf4
define pretrained_dims which is used by build_text_classifier ( #5004 )
2020-02-16 17:21:17 +01:00
adrianeboyd
3b22eb651b
Sync Span __eq__ and __hash__ ( #5005 )
...
* Sync Span __eq__ and __hash__
Use the same tuple for `__eq__` and `__hash__`, including all attributes
except `vector` and `vector_norm`.
* Update entity comparison in tests
Update `assert_docs_equal()` test util to compare `Span` properties for
ents rather than `Span` objects.
2020-02-16 17:20:36 +01:00
adrianeboyd
0c47a53b5e
Use int only in key2row for better performance ( #4990 )
...
Cast all keys and rows to `int` in `vectors.key2row` for more efficient
access and serialization.
2020-02-16 17:19:41 +01:00
adrianeboyd
5b102963bf
Require HEAD for is_parsed in Doc.from_array() ( #5011 )
...
Modify flag settings so that `DEP` is not sufficient to set `is_parsed`
and only run `set_children_from_heads()` if `HEAD` is provided.
Then the combination `[SENT_START, DEP]` will set deps and not clobber
sent starts with a lot of one-word sentences.
2020-02-16 17:17:09 +01:00
Sofie Van Landeghem
2572460175
add tok2vec parameters to train script to facilitate init_tok2vec ( #5021 )
2020-02-16 17:16:41 +01:00
Sofie Van Landeghem
a27c77ce62
add message when cli train script throws exception ( #5009 )
...
* add message when cli train script throws exception
* fix formatting
2020-02-15 15:50:17 +01:00
Christos Aridas
ff8e71f46d
Update streamlit app ( #5017 )
...
* Update streamlit app [ci skip]
* Add all labels by default
* Tidy up and auto-format
Co-authored-by: Ines Montani <ines@ines.io>
2020-02-15 15:49:09 +01:00
nlptechbook
979a3fd1f5
Update universe.json ( #5022 )
...
e-book is available from https://nostarch.com/NLPPython
2020-02-15 15:44:55 +01:00
questoph
5352fc8fc3
Update tokenizer_exceptions.py
2020-02-14 12:02:15 +01:00
questoph
d1f0b397b5
Update punctuation.py
2020-02-13 22:18:51 +01:00
svlandeg
2729d9164d
cleanup
2020-02-12 22:59:37 +01:00
svlandeg
6bbd816569
formatting
2020-02-12 22:50:27 +01:00