mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-11 17:56:30 +03:00
avoid trailing slash
This commit is contained in:
parent
d093abe9a8
commit
39c1edd6dc
|
@ -28,8 +28,8 @@ include ./meta.jade
|
||||||
|
|
||||||
p For instance, consider #[code nltk.parse]. You might think that amongst all this code there was something that could actually predict the syntactic structure of a sentence for you, but you would be wrong. There are wrappers for the BLLIP and Stanford parsers, and since March there's been an implementation of Nivre's 2003 transition-based dependency parser. Unfortunately no model is provided for it, as they rely on an external wrapper of an external learner, which is unsuitable for the structure of their problem. So the implementation is too slow to be actually useable.
|
p For instance, consider #[code nltk.parse]. You might think that amongst all this code there was something that could actually predict the syntactic structure of a sentence for you, but you would be wrong. There are wrappers for the BLLIP and Stanford parsers, and since March there's been an implementation of Nivre's 2003 transition-based dependency parser. Unfortunately no model is provided for it, as they rely on an external wrapper of an external learner, which is unsuitable for the structure of their problem. So the implementation is too slow to be actually useable.
|
||||||
|
|
||||||
p This problem is totally avoidable, if you just sit down and write good code, instead of stitching together external dependencies. I pointed NLTK to my tutorial describing #[a(href="https://spacy.io/blog/parsing-english-in-python/") how to implement a modern dependency parser], which includes a BSD-licensed implementation in 500 lines of Python. I was told "thanks but no thanks", and #[a(href="https://github.com/nltk/nltk/issues/694") the issue was abruptly closed]. Another researcher's offer from 2012 to implement this type of model also went #[a(href="http://arxiv.org/pdf/1409.7386v1.pdf") unanswered].
|
p This problem is totally avoidable, if you just sit down and write good code, instead of stitching together external dependencies. I pointed NLTK to my tutorial describing #[a(href="https://spacy.io/blog/parsing-english-in-python") how to implement a modern dependency parser], which includes a BSD-licensed implementation in 500 lines of Python. I was told "thanks but no thanks", and #[a(href="https://github.com/nltk/nltk/issues/694") the issue was abruptly closed]. Another researcher's offer from 2012 to implement this type of model also went #[a(href="http://arxiv.org/pdf/1409.7386v1.pdf") unanswered].
|
||||||
|
|
||||||
p The story in #[code nltk.tag] is similar. There are plenty of wrappers, for the external libraries that have actual taggers. The only actual tagger model they distribute is #[a(href="https://spacy.io/blog/part-of-speech-POS-tagger-in-python/") terrible]. Now it seems that #[a(href="https://github.com/nltk/nltk/issues/1063") NLTK does not even know how its POS tagger was trained]. The model is just this .pickle file that's been passed around for 5 years, its origins lost to time. It's not okay to offer this to people, to recommend they use it.
|
p The story in #[code nltk.tag] is similar. There are plenty of wrappers, for the external libraries that have actual taggers. The only actual tagger model they distribute is #[a(href="https://spacy.io/blog/part-of-speech-POS-tagger-in-python") terrible]. Now it seems that #[a(href="https://github.com/nltk/nltk/issues/1063") NLTK does not even know how its POS tagger was trained]. The model is just this .pickle file that's been passed around for 5 years, its origins lost to time. It's not okay to offer this to people, to recommend they use it.
|
||||||
|
|
||||||
p I think open source software should be very careful to make its limitations clear. It's a disservice to provide something that's much less useful than you imply. It's like offering your friend a lift and then not showing up. It's totally fine to not do something – so long as you never suggested you were going to do it. There are ways to do worse than nothing.
|
p I think open source software should be very careful to make its limitations clear. It's a disservice to provide something that's much less useful than you imply. It's like offering your friend a lift and then not showing up. It's totally fine to not do something – so long as you never suggested you were going to do it. There are ways to do worse than nothing.
|
||||||
|
|
|
@ -92,13 +92,13 @@ include ./meta.jade
|
||||||
|
|
||||||
h3 Part-of-speech Tagger
|
h3 Part-of-speech Tagger
|
||||||
|
|
||||||
p In 2013, I wrote a blog post describing #[a(href="/blog/part-of-speech-POS-tagger-in-python/") how to write a good part of speech tagger]. My recommendation then was to use greedy decoding with the averaged perceptron. I think this is still the best approach, so it's what I implemented in spaCy.
|
p In 2013, I wrote a blog post describing #[a(href="/blog/part-of-speech-POS-tagger-in-python") how to write a good part of speech tagger]. My recommendation then was to use greedy decoding with the averaged perceptron. I think this is still the best approach, so it's what I implemented in spaCy.
|
||||||
|
|
||||||
p The tutorial also recommends the use of Brown cluster features, and case normalization features, as these make the model more robust and domain independent. spaCy's tagger makes heavy use of these features.
|
p The tutorial also recommends the use of Brown cluster features, and case normalization features, as these make the model more robust and domain independent. spaCy's tagger makes heavy use of these features.
|
||||||
|
|
||||||
h3 Dependency Parser
|
h3 Dependency Parser
|
||||||
|
|
||||||
p The parser uses the algorithm described in my #[a(href="/blog/parsing-english-in-python/") 2014 blog post]. This algorithm, shift-reduce dependency parsing, is becoming widely adopted due to its compelling speed/accuracy trade-off.
|
p The parser uses the algorithm described in my #[a(href="/blog/parsing-english-in-python") 2014 blog post]. This algorithm, shift-reduce dependency parsing, is becoming widely adopted due to its compelling speed/accuracy trade-off.
|
||||||
|
|
||||||
p Some quick details about spaCy's take on this, for those who happen to know these models well. I'll write up a better description shortly.
|
p Some quick details about spaCy's take on this, for those who happen to know these models well. I'll write up a better description shortly.
|
||||||
|
|
||||||
|
|
|
@ -19,4 +19,4 @@ include ./meta.jade
|
||||||
+TweetThis("Computers don't understand text. This is unfortunate, because that's what the web is mostly made of.", Meta.url)
|
+TweetThis("Computers don't understand text. This is unfortunate, because that's what the web is mostly made of.", Meta.url)
|
||||||
|
|
||||||
p If none of that made any sense to you, here's the gist of it. Computers don't understand text. This is unfortunate, because that's what the web almost entirely consists of. We want to recommend people text based on other text they liked. We want to shorten text to display it on a mobile screen. We want to aggregate it, link it, filter it, categorise it, generate it and correct it.
|
p If none of that made any sense to you, here's the gist of it. Computers don't understand text. This is unfortunate, because that's what the web almost entirely consists of. We want to recommend people text based on other text they liked. We want to shorten text to display it on a mobile screen. We want to aggregate it, link it, filter it, categorise it, generate it and correct it.
|
||||||
p spaCy provides a library of utility functions that help programmers build such products. It's commercial open source software: you can either use it under the AGPL, or you can buy a commercial license under generous terms (Note: #[a(href="/blog/spacy-now-mit/") spaCy is now licensed under MIT]).
|
p spaCy provides a library of utility functions that help programmers build such products. It's commercial open source software: you can either use it under the AGPL, or you can buy a commercial license under generous terms (Note: #[a(href="/blog/spacy-now-mit") spaCy is now licensed under MIT]).
|
||||||
|
|
|
@ -28,11 +28,10 @@
|
||||||
- Page.links = [];
|
- Page.links = [];
|
||||||
- if (type == "home") {
|
- if (type == "home") {
|
||||||
- Page.url = "";
|
- Page.url = "";
|
||||||
- Page.canonical_url = Site.url + Page.url;
|
|
||||||
- } else {
|
- } else {
|
||||||
- Page.url = "/" + type;
|
- Page.url = "/" + type;
|
||||||
- Page.canonical_url = Site.url + Page.url.replace(/\/?$/, '/');
|
|
||||||
- }
|
- }
|
||||||
|
- Page.canonical_url = Site.url + Page.url;
|
||||||
-
|
-
|
||||||
- // Set defaults
|
- // Set defaults
|
||||||
- Page.description = Site.description;
|
- Page.description = Site.description;
|
||||||
|
@ -59,7 +58,7 @@
|
||||||
- Page.description = Meta.description
|
- Page.description = Meta.description
|
||||||
- Page.date = Meta.date
|
- Page.date = Meta.date
|
||||||
- Page.url = Meta.url
|
- Page.url = Meta.url
|
||||||
- Page.canonical_url = Site.url + Page.url.replace(/\/?$/, '/');
|
- Page.canonical_url = Site.url + Page.url;
|
||||||
- Page.active["blog"] = true
|
- Page.active["blog"] = true
|
||||||
- Page.links = Meta.links
|
- Page.links = Meta.links
|
||||||
- if (Meta.image != null) {
|
- if (Meta.image != null) {
|
||||||
|
|
|
@ -58,4 +58,4 @@ mixin example(name)
|
||||||
ul
|
ul
|
||||||
li: a(href="/docs#api") API documentation
|
li: a(href="/docs#api") API documentation
|
||||||
li: a(href="/docs#tutorials") Tutorials
|
li: a(href="/docs#tutorials") Tutorials
|
||||||
li: a(href="/docs/#spec") Annotation specs
|
li: a(href="/docs#spec") Annotation specs
|
||||||
|
|
Loading…
Reference in New Issue
Block a user