博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
nutch 2.1安装问题集锦
阅读量:7265 次
发布时间:2019-06-29

本文共 3884 字,大约阅读时间需要 12 分钟。

参照官方文档http://nlp.solutions.asia/?p=180
中间碰到的问题,解决方法参考
http://blog.javachen.com/2014/05/20/nutch-intro/
问题1:

compile-core:

    [javac] Compiling 180 source files to /root/nutch/build/classes

    [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._zipfs.jar; error in opening zip file

    [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._sunec.jar; error in opening zip file

    [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._sunjce_provider.jar; error in opening zip file

    [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._sunpkcs11.jar; error in opening zip file

    [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._jfxrt.jar; error in opening zip file

    [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._dnsns.jar; error in opening zip file

    [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._nashorn.jar; error in opening zip file

    [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._localedata.jar; error in opening zip file

    [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._cldrdata.jar; error in opening zip file

    [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6

    [javac] 9 errors

    [javac] 1 warning

BUILD FAILED

/root/nutch/build.xml:101: Compile failed; see the compiler error output for details.

原ext文件夹没有._这些jar,但是有同名zipfs,直接copy,编译通过;
问题2:

root@iZ280izbfjqZ:~/nutch/runtime/local# bin/nutch crawl urls -depth 3 -topN 5

Exception in thread "main" java.lang.ClassNotFoundException: org.apache.gora.sql.store.SqlStore

at java.net.URLClassLoader$1.run(URLClassLoader.java:372)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:360)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:259)

at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:90)

at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:74)

at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)

at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)

at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)

at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)

参考以下文章:

http://blog.sina.com.cn/s/blog_3c9872d00101p4f0.html
问题三:

root@iZ280izbfjqZ:~/nutch/runtime/local# bin/nutch crawl urls -depth 3 -topN 5

InjectorJob: Using class org.apache.gora.sql.store.SqlStore as the Gora storage class.

InjectorJob: total number of urls rejected by filters: 0

InjectorJob: total number of urls injected after normalization and filtering: 1

Exception in thread "main" java.lang.RuntimeException: job failed: name=generate: *, jobid=job_local1888916405_0002

at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:55)

at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:199)

at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)

at org.apache.nutch.crawl.Crawler.run(Crawler.java:152)

at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)

nutch/src/java/org/apache/nutch/crawl/GeneratorReducer.java,然后看其100行左右
batchId=new Utf8(conf.get(GeneratorJob.BATCH_ID));
改为:
int randomSeed = Math.abs(new Random().nextInt());
String batchIdStr = (System.currentTimeMillis()/1000)+"-"+randomSeed;
batchId = new Utf8( batchIdStr );
问题4.
解决

alter table webpage add batchId varchar(767) DEFAULT NULL;

然后就成功了,庆祝一下

转载地址:http://lwgdm.baihongyu.com/

你可能感兴趣的文章
用Keras开发字符级神经网络语言模型
查看>>
Socket编程中的强制关闭与优雅关闭及相关socket选项
查看>>
1682亿!!阿里工程师如何喝着茶创造双11奇迹?
查看>>
《音乐达人秀:Adobe Audition实战200例》——1.3 数字录音记录生活越来越便捷
查看>>
东半球最先进的 debug 技巧
查看>>
《CCNP安全防火墙642-618认证考试指南》——第1章Cisco ASA自适应安全设备概述
查看>>
ToroDB —— 基于 PostgreSQL 的 JSON 数据库
查看>>
尊敬的开发世界,现出你的梦魇吧,我来了
查看>>
《Java多线程编程核心技术》——1.9节yield方法
查看>>
《WebGL入门指南》——第2章,第2.5节本章小结
查看>>
《Android开发基础教程》——6.2节Gallery界面组件——画廊展示
查看>>
《图数据库》——1.4 小结
查看>>
《Android 应用案例开发大全(第3版)》——第1.3节Android开发环境的搭建
查看>>
《Python数据可视化编程实战》——5.3 创建3D直方图
查看>>
中断引起的nio连接断开
查看>>
《Lua游戏AI开发指南》一1.2 小结
查看>>
PL/SQL学习笔记(一)
查看>>
Apache Storm 官方文档 —— 序列化
查看>>
《Adobe Dreamweaver CC经典教程》——1.7 创建自定义的快捷键
查看>>
ArrayList
查看>>