Commit Graph

114 Commits

Author SHA1 Message Date
Miroslav Stampar
bb6e8fd4ce Minor bug fix (reported privately via email) 2016-12-15 16:09:09 +01:00
Miroslav Stampar
26d4dec5fb Minor refactoring 2016-05-31 13:02:26 +02:00
Miroslav Stampar
d0d676ccce Update of copyright string 2016-01-06 00:06:12 +01:00
Miroslav Stampar
193f8190c4 Adding new warning message 2015-11-07 23:30:24 +01:00
Miroslav Stampar
95ce5a4a09 Fixes #1444 2015-10-05 16:33:10 +02:00
Miroslav Stampar
16f8e4c8ba Removing unused imports 2015-07-12 12:25:02 +02:00
Miroslav Stampar
dbfa8f1cfc Fix for a bug reported by the user (conf.scheme/conf.hostname/conf.port were None in multiple targets mode) 2015-04-14 11:05:17 +02:00
Miroslav Stampar
0e4800f73c Changing default answer for sitemap checking to N 2015-04-14 09:30:01 +02:00
Miroslav Stampar
1e7f2d6da2 Implements #1215 2015-04-06 22:07:22 +02:00
Miroslav Stampar
25b23750e8 Bug fix for crawling over non-80 port 2015-03-12 11:49:52 +01:00
Miroslav Stampar
9f4a32ca2b Automatically checking for sitemap existence in case of --crawl 2015-01-20 10:03:35 +01:00
Miroslav Stampar
45bdefd29b Update of copyright 2015-01-06 15:02:16 +01:00
Miroslav Stampar
605b126758 Patch for an Issue #976 2014-11-26 13:38:21 +01:00
Miroslav Stampar
1a8b58fca6 Minor update 2014-11-20 16:42:06 +01:00
Miroslav Stampar
f8a8cbf9a6 Storing crawling results to a temporary file (for eventual further processing) 2014-11-20 16:29:17 +01:00
Miroslav Stampar
34aed7cde0 Bug fix (now it's possible to use multiple parsed requests without mixing associated headers) 2014-10-22 13:49:29 +02:00
Miroslav Stampar
4e3a4eb0ff Added a prompt for choosing a number of threads when in crawling mode 2014-10-10 12:09:08 +02:00
Bernardo Damele
43a4e85749 updated copyright 2014-01-13 17:24:49 +00:00
stamparm
2bfdac5ebc Minor update for crawler 2013-04-30 18:32:46 +02:00
stamparm
ebe8ee3500 Fix for crawler and redirection case 2013-04-30 18:08:26 +02:00
stamparm
3c110b3620 Minor bug fix 2013-04-30 16:40:16 +02:00
stamparm
8c9da95343 Style and consistency update (url -> URL) 2013-04-09 11:48:42 +02:00
stamparm
3948b527dd Update for an Issue #429 2013-04-09 11:36:33 +02:00
stamparm
91054099aa Minor style update 2013-04-09 10:42:58 +02:00
Bernardo Damele
4b9d8ed673 reverted a previous commit as not all distributions create a link file /usr/bin/python2 to the Python interpreter 2013-02-14 11:32:17 +00:00
Bernardo Damele
a67ef4117f make sure to use Python 2 interpreter when default system Python is version 3 2013-02-14 11:25:04 +00:00
Bernardo Damele
a43202f3c0 updated copyright 2013-01-18 14:07:51 +00:00
Miroslav Stampar
ca3d35a878 Some PEP8 related style cleaning 2013-01-10 13:18:44 +01:00
Miroslav Stampar
bf5544903b Minor style update 2013-01-09 16:10:26 +01:00
Miroslav Stampar
3d4f381ab5 Patch for an Issue #169 2013-01-09 15:22:21 +01:00
Miroslav Stampar
cb91729913 Fix for an Issue #324 (crawling when HTML is not well-formed) 2012-12-27 20:55:37 +01:00
Miroslav Stampar
712cf4e4db Fix for an Issue #316 2012-12-20 20:55:59 +01:00
Miroslav Stampar
c2c4601d6e Minor restyling 2012-12-20 11:06:52 +01:00
Miroslav Stampar
974407396e Doing some more style updating (capitalization of exception classes; using _ is enough for private members - __ is used in Python specific methods) 2012-12-06 14:14:19 +01:00
Miroslav Stampar
baccbd6f48 Implementation for an Issue #283 2012-12-06 11:57:57 +01:00
Miroslav Stampar
b6650add46 Introducing 'new style classes' (idea from Pull request #284) 2012-12-06 10:42:53 +01:00
Miroslav Stampar
2de52927f3 Code refactoring (epecially Google search code) 2012-10-30 18:38:10 +01:00
Miroslav Stampar
87ecf205cb More work for Issue #66 2012-07-14 17:01:04 +02:00
Bernardo Damele
162da75a04 modified homepage address 2012-07-12 18:38:03 +01:00
jekil
c39e5a85ba Removed $id$ tags 2012-06-27 20:56:43 +02:00
Miroslav Stampar
6c4bd84d18 minor fix (turning back the functionality of kb.suppressResumeInfo) 2012-06-25 16:19:51 +00:00
Miroslav Stampar
d2dd47fb23 some more refactoring 2012-06-14 13:52:56 +00:00
Miroslav Stampar
b3bd4144f5 removing of unused imports together with some general code refactoring 2012-02-22 10:40:11 +00:00
Miroslav Stampar
95f89ab63a updating copyright date 2012-01-11 14:59:46 +00:00
Miroslav Stampar
29f502fe29 some refactoring 2011-12-28 16:27:17 +00:00
Miroslav Stampar
bdc724cb46 minor bug fix 2011-12-20 10:34:28 +00:00
Miroslav Stampar
ef987c6954 adding compatibility support for using --crawl and --forms together 2011-10-29 09:32:20 +00:00
Miroslav Stampar
f5e45bf113 quick fix for a bug reported by jovon.itwaru@gmail.com 2011-07-11 08:54:39 +00:00
Bernardo Damele
aedcf8c8d7 Changed homepage address 2011-07-07 20:10:03 +00:00
Miroslav Stampar
e00cf81f7e minor update 2011-06-24 19:50:13 +00:00
Miroslav Stampar
e9286ddd5b fix for a bug reported by g@brindi.si (UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
47: ordinal not in range(128))
2011-06-24 19:24:11 +00:00
Miroslav Stampar
eaa2a4202f changing to: --crawl=CRAWLDEPTH 2011-06-24 05:40:03 +00:00
Miroslav Stampar
dfc02d8c3c sorry Bernardo, i hope your mobile is turned off :))) 2011-06-20 22:47:24 +00:00
Miroslav Stampar
2a4a284a29 crawler fix (skip binary files) 2011-06-20 22:41:38 +00:00
Miroslav Stampar
20bb1a685b really minor update 2011-06-20 21:57:53 +00:00
Miroslav Stampar
812cd2f19b minor update 2011-06-20 21:47:03 +00:00
Miroslav Stampar
e8ac7414f2 bug fix 2011-06-20 21:36:15 +00:00
Miroslav Stampar
d6062e8fc9 minor fix for crawler and far less message overlaps in future 2011-06-20 21:18:12 +00:00
Miroslav Stampar
8968c708a0 minor update 2011-06-20 14:27:24 +00:00
Miroslav Stampar
17fac6f67f minor update 2011-06-20 13:53:39 +00:00
Miroslav Stampar
4d1fa5596b added support for --scope in --crawl mode 2011-06-20 12:37:51 +00:00
Miroslav Stampar
42746cc706 bug fix 2011-06-20 12:18:46 +00:00
Miroslav Stampar
cda39ca350 minor update 2011-06-20 11:46:23 +00:00
Miroslav Stampar
07e2c72943 adding Beautifulsoup (BSD) into extras; adding --crawl to options 2011-06-20 11:32:30 +00:00