write down,forget

Gitorious

<Category: DevOPS>

试用了下Gitorious,是github enterprise的很好的替代品,大部分操作和github基本一致,github以人的repo为主,Gitorious更强调project以及team,作为内部的源码管理平台实在很合适。

推荐使用bitnami的installer来安装

http://bitnami.com/stack//

安装很简单,唯一需要注意的是,必须设置一个domain,用ip不行

如果想换一下域名,在配置“/opt/gitorious-2.4.12-1/apps/gitorious/htdocs/config/gitorious.yml”里面替换就行了

另外本地hosts文件配置一下域名的解析,这步应该是可选的。

这下gitolite可以功成身退了。

本文来自: Gitorious

发布个插件:elasticsearch-river-email

<Category: Diving Into ElasticSearch>

最近发现vps上面跑的用来收邮件的python脚本占用了30%的cpu,并且一直就有写个邮件river的想法,不过一直没有付诸行动,今天下班抽空完成了这个插件,理论支持的协议:
/**
now support:
imap
imaps
pop3s
pop3
*/
不过只有时间测试了pop3协议,正常收取。
地址:https://github.com/medcl/elasticsearch-

创建river的方式:

$ curl -XPUT 'localhost:9200/_river/google/_meta' -d '{
  "type": "email",
  "email": {
    "config" : [ {
        "host": "pop.exmail.qq.com",
        "port": 110,
        "type":"pop3",
        "username":"river@infinitbyte.com",
        "password":"ail?sid=9UL",
        "check_interval": 5000,
        "skip_count": 1,
        }
    ]
  },
  "index":{
    "index":"google",
    "type":"gmail"
  }
}'

RTF已经包含该插件,并测试通过:

https://github.com/medcl/elasticsearch-rtf/tree/master/elasticsearch/plugins/river-email

本文来自: 发布个插件:elasticsearch-river-email

mongodb-river重新同步数据

<Category: Diving Into ElasticSearch>

elasticsearch的mongodb-river没有提供对一个库重新同步数据的方法,在很多情况下我们需要这么做,比如修改了elasticsearch的mapping,这个时候,就只能重建数据,所以需要重新从mongodb里面pull数据,然后重建索引,怎么办?

其实我们只需要清除mongodb-river记录的同步信息就行了,然后mongodb就能自动重新初始化,就跟新安装的一样。

1.第一步,查看那些信息需要删除,所有的信息都在_river索引里面

curl -XGET http://192.168.2.99:9200/_river/_search?q=*

返回结果,类似这样的,就是记录数据同步的位置信息了

 {
                "_index": "_river",
                "_type": "mongodb",
                "_id": "testmongo.person",
                "_score": 1,
                "_source": {
                    "mongodb": {
                        "_last_ts": "{ \"$ts\" : 1363082244 , \"$inc\" : 1}"
                    }
                }
            },

怎么处理呢?干掉就行,这个记录其实也就是一条elasticsearch的索引文档数据,找到index,type,id删除就行了。
我这里全部删除了,你可别照着来

curl -XDELETE http://192.168.2.99:9200/_river/_query?q=*

第二步,目标索引如果需要修改mapping,删除数据,等等
第三步,重新创建river配置信息,啥,没有备份,慢慢哭去吧

到这里,数据应该就可以马上看到了,速度非常快。

本文来自: mongodb-river重新同步数据

发布个jubatus-classifier脚本

<Category: 数据挖掘>

地址:https://github.com/medcl/-classifier
修改自官方的例子,将一些参数提取出来了。

简单介绍一下怎么使用,
第一步,启动服务,参照前面两篇即可:

Jubatus单机测试

Jubatus集群测试

配置文件:config.json

阅读这篇文章的其余部分

本文来自: 发布个jubatus-classifier脚本

Jubatus集群测试

<Category: 数据挖掘>

http://jubat.us/en/tutorial_distributed.html

#安装,运行zookeeper
wget http://mirror.bjtu.edu.cn/apache/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
 1055  tar vxzf zookeeper-3.4.5.tar.gz 
 1056  cd zookeeper-3.4.5
[root@ghost-rider zookeeper-3.4.5]# cp conf/zoo_sample.cfg  conf/zoo.cfg 
[root@ghost-rider zookeeper-3.4.5]# bin/zkServer.sh start
JMX enabled by default
Using config: /root/zookeeper-3.4.5/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
#往zookeeper上注册配置文件
jubaconfig --cmd write --zookeeper=localhost:2181 --file config.json --name tutorial --type classifier
#启动Jubatus Keeper,带上zookeeper地址
[root@ghost-rider jubatus-tutorial-python]# jubaclassifier_keeper --zookeeper=localhost:2181 --rpc-port=9198
I0320 18:29:25.938608 16930 server_util.cpp:333] starting jubaclassifier_keeper 0.4.2 RPC server at 192.168.2.100:9198
    pid            : 16930
    user           : root
    timeout        : 10
    thread         : 16
    logdir         : 
    loglevel       : INFO(0)
    zookeeper      : localhost:2181
2013-03-20 18:29:25,938:16930(0x7f3616cd7720):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2013-03-20 18:29:25,938:16930(0x7f3616cd7720):ZOO_INFO@log_env@716: Client environment:host.name=ghost-rider
2013-03-20 18:29:25,938:16930(0x7f3616cd7720):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2013-03-20 18:29:25,938:16930(0x7f3616cd7720):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.32-71.el6.x86_64
2013-03-20 18:29:25,938:16930(0x7f3616cd7720):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Fri May 20 03:51:51 BST 2011
2013-03-20 18:29:25,938:16930(0x7f3616cd7720):ZOO_INFO@log_env@733: Client environment:user.name=root
2013-03-20 18:29:25,938:16930(0x7f3616cd7720):ZOO_INFO@log_env@741: Client environment:user.home=/root
2013-03-20 18:29:25,939:16930(0x7f3616cd7720):ZOO_INFO@log_env@753: Client environment:user.dir=/root/jubatus/jubatus-tutorial-python
2013-03-20 18:29:25,939:16930(0x7f3616cd7720):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=(nil) sessionId=0 sessionPasswd=<null> context=(nil) flags=0
2013-03-20 18:29:25,939:16930(0x7f3616ac7700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:2181]
2013-03-20 18:29:25,947:16930(0x7f3616ac7700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:2181], sessionId=0x13d8757e0900002, negotiated timeout=10000
I0320 18:29:25.960101 16930 keeper.cpp:53] start listening at port 9198
I0320 18:29:25.981381 16930 membership.cpp:128] keeper created: /jubatus/jubakeepers/classifier/192.168.2.100_9198
I0320 18:29:25.981425 16930 keeper.cpp:58] registered group membership
I0320 18:29:25.981451 16930 keeper.cpp:60] jubaclassifier_keeper RPC server startup
#可启动多个分类器实例,并带上自定义名称,为测试速度,先启动一个试试
$ jubaclassifier --rpc-port=9180 --name=tutorial --zookeeper=localhost:2181 &
$ jubaclassifier --rpc-port=9181 --name=tutorial --zookeeper=localhost:2181 &
$ jubaclassifier --rpc-port=9182 --name=tutorial --zookeeper=localhost:2181 &
#在zookeeper里面查看注册上的节点
[root@ghost-rider zookeeper-3.4.5]# bin/zkCli.sh -server localhost:2181
Connecting to localhost:2181
2013-03-20 18:32:30,589 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
2013-03-20 18:32:30,593 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=ghost-rider
2013-03-20 18:32:30,594 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.6.0_33
2013-03-20 18:32:30,594 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Sun Microsystems Inc.
2013-03-20 18:32:30,595 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/local/jdk/jre
2013-03-20 18:32:30,595 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/root/zookeeper-3.4.5/bin/../build/classes:/root/zookeeper-3.4.5/bin/../build/lib/*.jar:/root/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/root/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/root/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/root/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/root/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/root/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/root/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/root/zookeeper-3.4.5/bin/../conf:.:/usr/local/jdk//jre/lib/rt.jar:/usr/local/jdk//lib/dt.jar:/usr/local/jdk//lib/tools.jar
2013-03-20 18:32:30,596 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/local/jdk/jre/lib/amd64/server:/usr/local/jdk/jre/lib/amd64:/usr/local/jdk/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2013-03-20 18:32:30,596 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2013-03-20 18:32:30,596 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2013-03-20 18:32:30,597 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2013-03-20 18:32:30,597 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2013-03-20 18:32:30,598 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=2.6.32-71.el6.x86_64
2013-03-20 18:32:30,598 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=root
2013-03-20 18:32:30,599 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/root
2013-03-20 18:32:30,600 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/root/zookeeper-3.4.5
2013-03-20 18:32:30,602 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@2e0ece65
Welcome to ZooKeeper!
JLine support is enabled
2013-03-20 18:32:30,646 [myid:] - INFO  [main-SendThread(ghost-rider:2181):ClientCnxn$SendThread@966] - Opening socket connection to server ghost-rider/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-03-20 18:32:30,652 [myid:] - INFO  [main-SendThread(ghost-rider:2181):ClientCnxn$SendThread@849] - Socket connection established to ghost-rider/127.0.0.1:2181, initiating session
2013-03-20 18:32:30,662 [myid:] - INFO  [main-SendThread(ghost-rider:2181):ClientCnxn$SendThread@1207] - Session establishment complete on server ghost-rider/127.0.0.1:2181, sessionid = 0x13d8757e0900003, negotiated timeout = 30000
[zk: localhost:2181(CONNECTED) 0] 
WATCHER::
 
WatchedEvent state:SyncConnected type:None path:null
 
[zk: localhost:2181(CONNECTED) 0] ls /jubatus/actors/classifier/tutorial/nodes
Node does not exist: /jubatus/actors/classifier/tutorial/nodes
[zk: localhost:2181(CONNECTED) 1] ls /jubatus/actors/classifier/tutorial/nodes
[192.168.2.100_9180]
[zk: localhost:2181(CONNECTED) 2]
#执行训练和预测客户端,端口指向Jubatus Keeper的端口,并指定分类器名称
$ python tutorial.py --server_port=9198 --name=tutorial

随着往训练数据的增加,正确率直线上升,牛逼啊,一边训练,一边还能继续进行预测,互不影响。

本文来自: Jubatus集群测试

Jubatus单机测试

<Category: 推荐系统, 数据挖掘>

https://github.com//

http://jubat.us/en/tutorial.html 照着这个教程简单在单机上试用了一下,待继续研究
阅读这篇文章的其余部分

本文来自: Jubatus单机测试

mongodb&mongodb-river(elasticsearch)部署

<Category: Diving Into ElasticSearch>

#下载编译好的版本

wget http://fastdl.mongodb.org/linux/mongodb-linux-x86_64-2.2.3.tgz
tar vxzf mongodb-linux-x86_64-2.2.3.tgz
cd mongodb-linux-x86_64-2.2.3

阅读这篇文章的其余部分

本文来自: mongodb&mongodb-river(elasticsearch)部署

淘宝阿里妈妈广告屏蔽hosts

<Category: 未分类>

阅读这篇文章的其余部分

本文来自: 淘宝阿里妈妈广告屏蔽hosts

[转] T检验、F检验和统计学意义(P值或sig值)

<Category: 统计>

1,T检验和F检验的由来

一般而言,为了确定从样本(sample)统计结果推论至总体时所犯错的概率,我们会利用统计学家所开发的一些统计方法,进行统计检定。

阅读这篇文章的其余部分

本文来自: [转] T检验、F检验和统计学意义(P值或sig值)

[转]libSVM 简易入门

<Category: 数据挖掘, 机器学习>

简单易懂,libsvm最佳入门。

阅读这篇文章的其余部分

本文来自: [转]libSVM 简易入门