spaCy/test_issue2901.py at 5d0b60999d7502e88e811f72a7201719b0ed1f2b - spaCy - Gitea

explosion/spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-02 10:53:05 +03:00

Sofie 46dfe773e1 Replacing regex library with re to increase tokenization speed (#3218 )

* replace unicode categories with raw list of code points

* simplifying ranges

* fixing variable length quotes

* removing redundant regular expression

* small cleanup of regexp notations

* quotes and alpha as ranges instead of alterations

* removed most regexp dependencies and features

* exponential backtracking - unit tests

* rewrote expression with pathological backtracking

* disabling double hyphen tests for now

* test additional variants of repeating punctuation

* remove regex and redundant backslashes from load_reddit script

* small typo fixes

* disable double punctuation test for russian

* clean up old comments

* format block code

* final cleanup

* naming consistency

* french strings as unicode for python 2 support

* french regular expression case insensitive

2019-02-01 18:05:22 +11:00

18 lines

305 B

Python

Raw Blame History

 # coding: utf8
 from __future__ import unicode_literals
 import pytest
 from ...lang.ja import Japanese
 def test_issue2901():
     """Test that `nlp` doesn't fail."""
     try:
         nlp = Japanese()
     except ImportError:
         pytest.skip()
     doc = nlp("pythonが大好きです")
     assert doc