Ines Montani
ffebdad08d
Add cheat sheet to spaCy 101
2019-03-23 16:32:55 +01:00
Ines Montani
06bf130890
💫 Add better and serializable sentencizer ( #3471 )
...
* Add better serializable sentencizer component
* Replace default factory
* Add tests
* Tidy up
* Pass test
* Update docs
2019-03-23 15:45:02 +01:00
Matthew Honnibal
d9a07a7f6e
💫 Fix class mismap on parser deserializing ( closes #3433 ) ( #3470 )
...
v2.1 introduced a regression when deserializing the parser after
parser.add_label() had been called. The code around the class mapping is
pretty confusing currently, as it was written to accommodate backwards
model compatibility. It needs to be revised when the models are next
retrained.
Closes #3433
2019-03-23 13:46:25 +01:00
Matthew Honnibal
444a3abfe5
Add xfail test for #3433 . Improve test for add label.
2019-03-23 12:36:00 +01:00
Ines Montani
6b6e9b638e
Fix test for #3468
2019-03-23 11:24:29 +01:00
Ines Montani
fbec72b4c3
Slightly modify test for #3468
...
Check for Token.is_sent_start first (which is serialized/deserialized correctly)
2019-03-23 11:22:44 +01:00
Ines Montani
02d9378d8c
Add xfailing test for #3468
2019-03-23 11:19:11 +01:00
Ines Montani
ed91592726
Merge branch 'master' into spacy.io
2019-03-22 19:02:26 +01:00
Ines Montani
dcd6e06c47
Improve landing example [ci skip]
2019-03-22 19:02:15 +01:00
Ines Montani
c2bb39dcb4
Merge branch 'master' into spacy.io
2019-03-22 18:50:16 +01:00
Ines Montani
a841324034
Update landing example [ci skip]
2019-03-22 18:50:00 +01:00
Ines Montani
a9ad735241
Merge branch 'master' into spacy.io
2019-03-22 18:36:28 +01:00
Ines Montani
b532386a60
Fix typo [ci skip]
2019-03-22 18:36:17 +01:00
Ines Montani
7b5496027b
Merge branch 'master' into spacy.io
2019-03-22 18:21:16 +01:00
Ines Montani
d8533f0149
Update Binder [ci skip]
2019-03-22 18:16:46 +01:00
svlandeg
46f4eb5db3
error and warning messages
2019-03-22 16:55:05 +01:00
svlandeg
9de9900510
adding future import unicode literals to .py files
2019-03-22 16:18:04 +01:00
svlandeg
b4cd5d5ee9
property annotations for fields with only a getter
2019-03-22 16:10:49 +01:00
Matthew Honnibal
4c5f265884
Fix train loop for train_textcat example
2019-03-22 16:10:11 +01:00
Ines Montani
680eafab94
Merge branch 'master' into spacy.io
2019-03-22 15:17:51 +01:00
Christos Aridas
9cee3f702a
Add missing space in landing page ( #3462 ) [ci skip]
2019-03-22 15:17:35 +01:00
Ines Montani
5073ce63fd
Merge branch 'spacy.io' [ci skip]
2019-03-22 15:17:11 +01:00
svlandeg
9751312aff
specify unicode strings for python 2.7
2019-03-22 14:15:18 +01:00
svlandeg
5318ce88fa
'entity_linker' instead of 'el'
2019-03-22 13:55:10 +01:00
svlandeg
ec3e860b44
Merge remote-tracking branch 'upstream/master' into feature/el-framework
2019-03-22 13:47:08 +01:00
Ines Montani
c9bd0e5a96
Set version to 2.1.2
2019-03-22 13:44:47 +01:00
svlandeg
12d4caf341
Merge remote-tracking branch 'upstream/master' into feature/el-framework
2019-03-22 13:44:36 +01:00
Matthew Honnibal
e65b5bb9a0
Fix tokenizer on Python2.7 ( #3460 )
...
spaCy v2.1 switched to the built-in re module, where v2.0 had been using
the third-party regex library. When the tokenizer was deserialized on
Python2.7, the `re.compile()` function was called with expressions that
featured escaped unicode codepoints that were not in Python2.7's unicode
database.
Problems occurred when we had a range between two of these unknown
codepoints, like this:
```
'[\\uAA77-\\uAA79]'
```
On Python2.7, the unknown codepoints are not unescaped correctly,
resulting in arbitrary out-of-range characters being matched by the
expression.
This problem does not occur if we instead have a range between two
unicode literals, rather than the escape sequences. To fix the bug, we
therefore add a new compat function that unescapes unicode sequences
using the `ast.literal_eval()` function. Care is taken to ensure we
do not also escape non-unicode sequences.
Closes #3356 .
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-22 13:42:47 +01:00
Ines Montani
c81923ee30
Update wasabi pin
2019-03-22 13:31:58 +01:00
Ines Montani
188ccd5750
Fix xfail marker
2019-03-22 12:54:14 +01:00
Ines Montani
7dd5e2f564
Update v2-1.md
2019-03-22 12:43:23 +01:00
svlandeg
7cf0bc9a8c
delete sandbox folder
2019-03-22 12:25:11 +01:00
svlandeg
5b1cd49222
error msg and unit tests for setting kb_id on span
2019-03-22 12:05:35 +01:00
svlandeg
3c9ac59ea0
Merge branch 'backup_el' of https://github.com/svlandeg/spaCy into backup_el
2019-03-22 11:43:52 +01:00
svlandeg
a48241e9a2
use nlp's vocab for stringstore
2019-03-22 11:36:45 +01:00
svlandeg
1ee0e78fd7
select candidate with highest prior probabiity
2019-03-22 11:36:45 +01:00
svlandeg
7b708ab8a4
name per entity
2019-03-22 11:36:45 +01:00
svlandeg
c593607ce2
minimal EL pipe
2019-03-22 11:36:45 +01:00
svlandeg
c71123dd0c
ensure no candidates are returned for unknown aliases
2019-03-22 11:36:45 +01:00
svlandeg
b6c3255a9f
Entity class
2019-03-22 11:36:45 +01:00
svlandeg
1289cd6e8f
property getters and keep track of KB internally
2019-03-22 11:36:45 +01:00
svlandeg
98ae77a682
unit test on number of candidates generated
2019-03-22 11:36:45 +01:00
svlandeg
9a46c431c3
store entity hash instead of pointer
2019-03-22 11:36:45 +01:00
svlandeg
9819dca80e
create candidate object from entry pointer (not fully functional yet)
2019-03-22 11:36:45 +01:00
svlandeg
a9074e0886
check the length of entities and probabilities vector + unit test
2019-03-22 11:36:45 +01:00
svlandeg
d133ffaff9
correct size, not counting dummy elements in the vector
2019-03-22 11:36:45 +01:00
svlandeg
33f8a0fe2e
check and unit test in case prior probs exceed 1
2019-03-22 11:36:45 +01:00
svlandeg
b55baaa1dc
avoid value 0 in preshmap and helpful user warnings
2019-03-22 11:36:45 +01:00
svlandeg
20a7b7b1c0
raising error when adding alias for unknown entity + unit test
2019-03-22 11:36:45 +01:00
svlandeg
8843f9279c
use StringStore
2019-03-22 11:36:45 +01:00