Author Archives: Rui

亲爱的老板:程序员的10分钟就是3个小时

导读:国外程序员艾德·韦斯曼(Ed Weissman )从业32年。某天老板告诉他产品有个问题,10分钟可以修复问题,谁知结果一干就是3个小时。本文就是艾德记录下的过程。

10:48

老板:嗨,艾德,苏在底特律说,“产品历史屏幕”上经常出现错误的发票号码(Invoice Part Number)。你能帮我们搞定这个问题么?

艾德: 我现在在忙其他事。你到我的任务队列中提交一个ticket吧。

老板: 这事10分钟就够了。

艾德: 你确信么?

老板: 嗯,确定。我一会开个网络会议。苏会演示给你看,然后你有空的时候再仔细看看。

艾德: 好的。

老板: 嗯。去你的 Outlook 中查收(会议)邀请吧。

11:05

收到 11:30 的网络会议的 Outlook 邀请,接受。

11:25

从我的IP电话呼叫了网络会议的800号码。拨了两次,都占线。从IP电话打我手机,同样是忙。哎,IP电话系统再次坏了。从我手机呼叫了网络会议的号码。我是第一个上线了,然后又挂掉了。在浏览器中点击链接进入了网络会议,还是第一个。

(艾德开始在浏览器的另一个选项卡中看 Hacker News。)

11:38

老板进入会议,问:苏在哪里?

艾德: 我不知道。

老板: 你能看到我的屏幕么?

艾德: 不能。

老板: 哦,等一下。我来做主(Let me be the host)。现在能看到了么?

艾德: ?嗯,可以了。但我想苏是不是去展示问题了。

老板: 对。我一会让她做主。

(艾德开始在浏览器的另一个选项卡中看 Hacker News。) Read more »

langdetect, a simple java language identifier library

When I use Apache Nutch to crawl the Chinese website, there’s a problem make me unhappy, the language-identifier plugin which nutch provide can’t detect Chinese characters, I have to find a new method to identify the language that the website uses. Finally I went about solving this problem with ‘langdetect’, an open source java language detection library. The project is deployed on http://code.google.com/p/language-detection/.

LangDetect support 53 languages, awesome! We can check the support language list from here: http://code.google.com/p/language-detection/wiki/LanguageList.

And we can use it as simple as the sample show in it’s project homepage.

import java.util.ArrayList;
import com.cybozu.labs.langdetect.Detector;
import com.cybozu.labs.langdetect.DetectorFactory;
import com.cybozu.labs.langdetect.Language;

class LangDetectSample {
    public void init(String profileDirectory) throws LangDetectException {
        DetectorFactory.loadProfile(profileDirectory);
    }
    public String detect(String text) throws LangDetectException {
        Detector detector = DetectorFactory.create();
        detector.append(text);
        return detector.detect();
    }
    public ArrayList<Language> detectLangs(String text) throws LangDetectException {
        Detector detector = DetectorFactory.create();
        detector.append(text);
        return detector.getProbabilities();
    }
}

When I test it, I found I need to add jsonic-1.2.x.jar into the project’s build path. Which is not contain in the package when I downloaded langdetect. So I have to download jsonic by myself and add it to build path of the project. After all, everything goes on the track now. Enjoy it!

By the way, langdetect provide a build of nutch’s plugin, we can integerate it with our cluster conviniently.

UBUNTU使用Ganglia监控Hadoop集群

Ganglia是一个监控服务器,集群的开源软件,能够用曲线图表现最近一个小时,最近一天,最近一周,最近一月,最近一年的服务器或者集群的cpu负载,内存,网络,硬盘等指标。

Ganglia的强大在于:ganglia服务端能够通过一台客户端收集到同一个网段的所有客户端的数据,ganglia集群服务端能够通过一台服务端收集到它下属的所有客户端数据。这个体系设计表示一台服务器能够通过不同的分层能够管理上万台机器。

操作系统环境:Ubuntu 11.10 Server x64

集群环境:

namenode 192.168.1.1

datanode1 192.168.1.2

datanode2 192.168.1.3

ganglia的服务端安装在namenode上

$ sudo apt-get install  ganglia-monitor ganglia-webfront gmetad

安装脚本可能会出错,解决方法是执行以下语句在ganglia用户组下添加用户ganglia:

$ sudo useradd ganglia -g ganglia

在/etc/ganglia/下会产生一个gmond.conf的配置文件。 Read more »

Build Nutch 1.4 cluster with Hadoop

The current released version of Apache Nutch is 1.4. Since Nutch 1.3, there was no Hadoop distribution integrated with Nutch’s release package. So I have to build a Hadoop cluster seperately first, and then configure Nutch 1.4 work with Hadoop. My server OS is ubuntu 10.04 LTS, I have two server names cluster1 and cluster2. I’ll note the steps here. Read more »

WordPress with Nginx: Solve the 404 after changing permalinks structure

Today I tried to change my site’s permalinks, but I can’t visit all old posts with a 404 error. I searched for a solution and found that I can add following section in my vhost conf file under nginx configurations.

location / {
        ……
if (!-e $request_filename) {

rewrite ^(.*)$ /index.php?q=$1 last;
break;

        }
}
Then restart nginx and change the permalinks structure in wordpress. Everything goes on track now.