Commit Graph

3775 Commits

Author SHA1 Message Date
Miroslav Stampar
52ba3c281e minor update 2011-06-22 14:59:49 +00:00
Miroslav Stampar
4ca37901da thread safe logging+stdout (no more overlapping of log messages and raw output) 2011-06-22 14:53:42 +00:00
Miroslav Stampar
84bc8c3a37 update 2011-06-22 14:39:31 +00:00
Miroslav Stampar
938db1b513 replacing xmlobject logic with our own 2011-06-22 14:33:52 +00:00
Miroslav Stampar
7c830c2b1a removing xmlobject 2011-06-22 14:33:03 +00:00
Bernardo Damele
1cb12ea659 replaced third-party library python-mysql with python pymysql, http://code.google.com/p/pymysql/ (MIT license) 2011-06-22 13:31:07 +00:00
Miroslav Stampar
e76cb19e35 minor patch 2011-06-22 09:11:12 +00:00
Miroslav Stampar
019f4d344a update of THANKS file 2011-06-21 21:03:50 +00:00
Miroslav Stampar
b16b92fe46 minor update 2011-06-21 20:59:34 +00:00
Miroslav Stampar
2220afbdf5 fix by request 2011-06-21 20:50:16 +00:00
Miroslav Stampar
9e232256f4 reverting that last commit because there is a mess with default dumping (startLimit is set to 0 which is not so friendly with --start and --stop logic) 2011-06-21 18:29:23 +00:00
Miroslav Stampar
3536320fc9 --stop is inclusive ("Last query output entry to retrieve") 2011-06-21 18:08:33 +00:00
Miroslav Stampar
dfc02d8c3c sorry Bernardo, i hope your mobile is turned off :))) 2011-06-20 22:47:24 +00:00
Miroslav Stampar
2a4a284a29 crawler fix (skip binary files) 2011-06-20 22:41:38 +00:00
Miroslav Stampar
20bb1a685b really minor update 2011-06-20 21:57:53 +00:00
Miroslav Stampar
812cd2f19b minor update 2011-06-20 21:47:03 +00:00
Miroslav Stampar
e8ac7414f2 bug fix 2011-06-20 21:36:15 +00:00
Miroslav Stampar
d6062e8fc9 minor fix for crawler and far less message overlaps in future 2011-06-20 21:18:12 +00:00
Miroslav Stampar
8968c708a0 minor update 2011-06-20 14:27:24 +00:00
Miroslav Stampar
17fac6f67f minor update 2011-06-20 13:53:39 +00:00
Miroslav Stampar
29314f425e minor fix 2011-06-20 13:42:31 +00:00
Miroslav Stampar
d9015ed800 fix for a bug reported by krasn@deventum.com 2011-06-20 13:25:19 +00:00
Miroslav Stampar
f09340fc89 minor update 2011-06-20 12:40:14 +00:00
Miroslav Stampar
4d1fa5596b added support for --scope in --crawl mode 2011-06-20 12:37:51 +00:00
Miroslav Stampar
42746cc706 bug fix 2011-06-20 12:18:46 +00:00
Miroslav Stampar
67fab9f2e2 putting this to info messages (user needs to know at this place why is it waiting) 2011-06-20 12:17:19 +00:00
Miroslav Stampar
b1426b5131 bug fix 2011-06-20 12:11:09 +00:00
Miroslav Stampar
cda39ca350 minor update 2011-06-20 11:46:23 +00:00
Miroslav Stampar
07e2c72943 adding Beautifulsoup (BSD) into extras; adding --crawl to options 2011-06-20 11:32:30 +00:00
Miroslav Stampar
8c04aa871a english typo 2011-06-20 11:00:23 +00:00
Bernardo Damele
d7da71ce8e politeness 2011-06-20 09:10:04 +00:00
Miroslav Stampar
bdb530da1f minor update 2011-06-19 10:11:27 +00:00
Miroslav Stampar
d5bc149636 made changes by buawig request (504 is treated as a classical timeout) 2011-06-19 09:57:41 +00:00
Miroslav Stampar
83af83da9e minor beautification (WordsSet is considered as a bad english) 2011-06-18 15:47:19 +00:00
Bernardo Damele
4b94ef2b7c A little bit more polite 2011-06-18 13:03:55 +00:00
Bernardo Damele
f8c32cf6b9 Moved folder 2011-06-18 12:34:41 +00:00
Bernardo Damele
28ef61b997 Use getPageTextWordsSet() also in --common-columns 2011-06-18 12:30:26 +00:00
Bernardo Damele
6b2f44de14 Minor layout adjustment 2011-06-18 12:27:12 +00:00
Miroslav Stampar
ca6f9acf30 minor fix for resuming in multi threading mode 2011-06-18 12:23:18 +00:00
Bernardo Damele
cd07139919 Layout adjustments 2011-06-18 11:58:14 +00:00
Miroslav Stampar
31ad0875b4 added by request 2011-06-18 11:34:51 +00:00
Miroslav Stampar
e4be141602 minor fix for --smoke-test 2011-06-18 11:26:17 +00:00
Bernardo Damele
c7e1aeeef2 layout 2011-06-18 11:02:48 +00:00
Miroslav Stampar
905fef0eae now user can explicitly state number of UNION affected columns via --union-cols (e.g. --union-cols=5) 2011-06-18 10:51:14 +00:00
Miroslav Stampar
7c537f6896 adding Chrome to the user-agents.txt 2011-06-18 10:12:06 +00:00
Miroslav Stampar
0c5d7d4535 removing crawling random agent strings as some sites appear different to them (minor possibility to screw blind engine) 2011-06-18 09:56:21 +00:00
Miroslav Stampar
fde3e4cece better 2011-06-18 09:52:07 +00:00
Miroslav Stampar
2f129b01c0 "Please consider to provide" is a bad English 2011-06-18 09:46:22 +00:00
Miroslav Stampar
1440c9f2d4 minor update 2011-06-17 22:28:07 +00:00
Miroslav Stampar
87e9842371 better language 2011-06-17 22:13:45 +00:00