mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-14 21:57:15 +03:00
eae35e9b27
- add files to specify rules for German tokenization - change generate_specials.py to generate from an external file (abbrev.de.tab) - copy gazetteer.json from lang_data/en/ - init_model.py - change doc freq threshold to 0 - add train_german_tagger.py - expects conll09-formatted input
28 lines
76 B
Plaintext
28 lines
76 B
Plaintext
,
|
||
"
|
||
(
|
||
[
|
||
{
|
||
*
|
||
<
|
||
>
|
||
$
|
||
£
|
||
„
|
||
“
|
||
'
|
||
``
|
||
`
|
||
#
|
||
US$
|
||
C$
|
||
A$
|
||
a-
|
||
‘
|
||
....
|
||
...
|
||
‚
|
||
»
|
||
_
|
||
§
|