找到SM Master节点所在的机器的方法

通过sminfo确认master的guid,然后使用ibstat在每台局域网中的机器上执行,找到对应的Port

sminfo 

sminfo: sm lid 62 sm guid <strong>0x98039b0300e24101</strong>, activity count 1932118 priority 14 state 3 SMINFO_MASTER

ibstat

CA 'mlx4_0'
        CA type: MT4099
        Number of ports: 2
        Firmware version: 2.36.5150
        Hardware version: 1
        Node GUID: 0x0002c903001996a0
        System image GUID: 0x0002c903001996a3
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 67
                LMC: 0
                SM lid: 62
                Capability mask: 0x0251486a
                Port GUID: 0x0002c903001996a1
                Link layer: InfiniBand
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 64
                LMC: 0
                SM lid: 62
                Capability mask: 0x02514868
                Port GUID: 0x0002c903001996a2
                Link layer: InfiniBand
CA 'mlx4_1'
        CA type: MT4099
        Number of ports: 2
        Firmware version: 2.42.5000
        Hardware version: 1
        Node GUID: 0x0002c90300449310
        System image GUID: 0x0002c90300449313
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 5
                LMC: 0
                SM lid: 62
                Capability mask: 0x02514868
                Port GUID: 0x0002c90300449311
                Link layer: InfiniBand
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 9
                LMC: 0
                SM lid: 62
                Capability mask: 0x02514868
                Port GUID: 0x0002c90300449312
                Link layer: InfiniBand

Centos环境中更新grub 选项的正确方法

新公司的工作是开发管理VM的插件,类似于实现nova中的libvirtdriver

在处理SR-IOV设备时需要打开宿主机的iommu
网上的方法就是修改grub

grub在centos中有两个位置
/etc/default/grub
/boot/grub2/grub.cfg
以前我都是简单粗暴的个性/boot/grub2/grub.cfg 现在发现并不合理

正确的方法是
修改/etc/default/grub
然后执行:

dracut --regenerate-all --force
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot

pyppeteer 常见问题解决方法

1, 无法启动chromium浏览器Running as root without –no-sandbox

错误如下:

[0505/060845.882080:ERROR:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported.

解决方法:
a,不用root运行
b,修改pyppeteer/launcher.py

    self.chrome_args.extend([
        '--headless',
        '--disable-gpu',
        '--no-sandbox', #增加这一行
        '--hide-scrollbars',
        '--mute-audio',
    ])

certificate verify failed 解决方法

开始研究pyppeteer,运行第一个example时就出现了,SSL: CERTIFICATE_VERIFY_FAILED,这是个常见的错,看了下原码是在下载chromium浏览器时无法验证证书造成的。

网上的解决方法都是,修改原码,不验证certificate,但这个不好。真正的解决方法为:

yum install ca-certificates
pip3.6 install -U requests[security]

运行完以上两步后再执行就OK了

Python Twisted 框架学习问题汇总

1. 使用Eclipse+Pydev时,有大量的方法在使用时编辑器会报Undefined variable错误,并下标红线,但实际上,运行并没有问题。如twisted.internet.reactor中的一些方法。

原因:
这是一个与Eclipse/Pydev执行静态分析方式相关的已知问题。
如果跳转到reactor源代码可以看到,在导入时,twisted.internet模块中实际上不存在reactor对象。模块为空。
当eclipse/pydev试图编译字节码时,静态分析在twisted.internet模块中看不到reactor对象,并将其标记为未定义的变量,即使它实际上在运行时存在(twisted是怎么实现的还得好好看看源码才行,现在是一头雾水)

解决方法:
在对报错的行的末尾写入: #@UndefinedVaria,这样可以让eclipse忽略错误,如

reactor.run() #@UndefinedVariable

kubeadm init 时报timeout错误比较好的解决方法

由于众所周知道的原因,k8s pull image时总是不会成功,以前都是用的梯子,虽然直接但有时因为一些网络问题,总会出来各种各样的坑,花在填坑的时间就好多。最近看到网上别人的一个方法,试了下可行,感觉更好,这是一个思路,在很多pull的地方都可以用这个方法。

具体如下,init时肯定会报错:

kubeadm init --apiserver-advertise-address 192.168.29.131 --pod-network-cidr=10.244.0.0/16

错误如下:

[init] Using Kubernetes version: v1.13.4
[preflight] Running pre-flight checks
        [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06
        [WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.13.4: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.13.4: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.13.4: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.13.4: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/pause:3.1: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/etcd:3.2.24: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns:1.2.6: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1

这里要记录的就是各组件的版本号,然后新建一个shell文件,要做的事情就是从docker官方的hub里pull对应版本的image,然后给一个kubeadm init检测的tag,最后把原img删除除。

cat << EOF >> ./before_kubeadm_init.sh
#!/bin/sh
docker pull mirrorgooglecontainers/kube-apiserver:v1.13.4
docker pull mirrorgooglecontainers/kube-controller-manager:v1.13.4
docker pull mirrorgooglecontainers/kube-scheduler:v1.13.4
docker pull mirrorgooglecontainers/kube-proxy:v1.13.4
docker pull mirrorgooglecontainers/pause:3.1
docker pull mirrorgooglecontainers/etcd:3.2.24
docker pull coredns/coredns:1.2.6

docker tag mirrorgooglecontainers/kube-proxy:v1.13.4  k8s.gcr.io/kube-proxy:v1.13.4
docker tag mirrorgooglecontainers/kube-scheduler:v1.13.4 k8s.gcr.io/kube-scheduler:v1.13.4
docker tag mirrorgooglecontainers/kube-apiserver:v1.13.4 k8s.gcr.io/kube-apiserver:v1.13.4
docker tag mirrorgooglecontainers/kube-controller-manager:v1.13.4 k8s.gcr.io/kube-controller-manager:v1.13.4
docker tag mirrorgooglecontainers/pause:3.1  k8s.gcr.io/pause:3.1
docker tag mirrorgooglecontainers/etcd:3.2.24  k8s.gcr.io/etcd:3.2.24
docker tag coredns/coredns:1.2.6 k8s.gcr.io/coredns:1.2.6

docker rmi mirrorgooglecontainers/kube-apiserver:v1.13.4
docker rmi mirrorgooglecontainers/kube-controller-manager:v1.13.4
docker rmi mirrorgooglecontainers/kube-scheduler:v1.13.4
docker rmi mirrorgooglecontainers/kube-proxy:v1.13.4
docker rmi mirrorgooglecontainers/pause:3.1
docker rmi mirrorgooglecontainers/etcd:3.2.24
docker rmi coredns/coredns:1.2.6
EOF
chmod +x ./before_kubeadm_init.sh
./before_kubeadm_init.sh

再次执行kubeadm init就可以顺利通过了。

keystone-manage bootstrap设置错误后,如何修改各URL

重新温故openstack,时偷懒直接照着文档操作,结果忘了修改里面的hostname

keystone-manage bootstrap --bootstrap-password 000000 \
--bootstrap-admin-url http://controller:5000/v3/ \
--bootstrap-internal-url http://controller:5000/v3/ \
--bootstrap-public-url http://controller:5000/v3/ \
--bootstrap-region-id RegionOne

我的hostname是cloud,之后的命令就一直会报错了

[root@cloud ~]# openstack user list

Unable to establish connection to http://controller:5000/v3/users?: HTTPConnectionPool(host='controller', port=5000): Max retries exceeded with url: /v3/users (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb3e5ae5050>: Failed to establish a new connection: [Errno -2] Name or service not known',))

重新执行一遍修改为cloud也没有用
google了很多也找不到一个可行的方法
最后灵机一动,直接去修改keystone数据库
果然,直接连上mysql db 后use keystone database
表对应的endpoint, 可以select看一下
直接一句SQL更新即可

update endpoint set url="http://cloud:5000/v3/";
[root@cloud etc]# openstack user list                    
+----------------------------------+-------+
| ID                               | Name  |
+----------------------------------+-------+
| 08cd9f1addd04fafb82550336e50dfba | admin |
+----------------------------------+-------+
[root@cloud etc]# 

使用Ajax的PUT方法将文件PUT致Aws S3或Ceph RGW

在为吉致汽车金融服务的过程中,对方的外包人员遇到了无法通过Ajax PUT文件到Ceph RGW的情况,经过分析是他的data指向了一个FormData对象,将文件内容append到了FormData中,Key值为file,
如下:
var formData = new FormData();
formData.append(“file”, document.getElementById(“file1”).files[0]);

这样RGW是无法解析的 ajax的data项应该直接为文件内容,故而修改如下:

$.ajax({
      type: 'PUT',
      url: "<YOUR_PRE_SIGNED_UPLOAD_URL_HERE>",
      contentType: 'binary/octet-stream',//如不确认类型,可以False,填写了则必须与服务端相匹配
      processData: false, //关键
      data: $("#file1").get()[0].files[0],
    })
    .success(function() {
      alert('File uploaded');
    })
    .error(function() {
      alert('File NOT uploaded');
      console.log( arguments);
});

调试成功~

解决Nginx+Ceph RGW跨域问题

在给吉致气车金融做Ceph驻场顾问时,遇到跨域PUT object到返回CORS的错误,经过抓包和日志分析后,发现是其Nginx的配置问题,记录一下
原始的配置

upstream ceph_radosgw_zone {
        server 10.10.100.101:9999 weight=1 max_fails=2 fail_timeout=5;
        server 10.10.100.102:9999 weight=1 max_fails=2 fail_timeout=5;
}

server {
        listen  80;
        location / {
            proxy_pass http://ceph_radosgw_zone;
            proxy_buffering    off;
            client_max_body_size   0;
        }
}

在这样的配置下,由于通过web中间件进行PUT操作,就会产生如下错误

Access to XMLHttpRequest at 'http://10.10.100.100/cbpdev/2019030210121212?x-amz-acl=public-read&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20190222T073759Z&X-Amz-SignedHeaders=host&X-Amz-Expires=29999&X-Amz-Credential=EDEOOIR47GEQ3W3HGCCN%2F20190222%2Fus-east-1%2Fs3%2Faw' from origin 'http://localhost:3000' has been blocked by CORS policy: Request header field lang is not allowed by Access-Control-Allow-Headers in preflight response.

错误写的很明确了,header中需要Access-Control-Allow-Headers就像普通的跨域问题
Nginx中加入对应的header应该就可以解决。

server {
        listen  80;
        location / {
            add_header Access-Control-Allow-Origin *;
            add_header Access-Control-Allow-Methods 'GET, POST, OPTIONS, HEAD, PUT';
            add_header Access-Control-Allow-Headers 'DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization';

            proxy_pass http://ceph_radosgw_zone;
            proxy_buffering    off;
            client_max_body_size   0;
        }
}

测试发现,普通的GET请求是支持跨域了但在PUT文件是依然反悔跨域错误。经过分析发现PUT过程分为OPTIONS和PUT两个Request Method, 只需要在OPTIONS时解决跨域,PUT method时将Header传入Proxy即可
再次修改

server {
        listen  80;
        location / {
            if ($request_method = 'OPTIONS') {
                add_header Access-Control-Allow-Origin *;
                add_header Access-Control-Allow-Methods 'GET, POST, OPTIONS, HEAD, PUT';
                add_header Access-Control-Allow-Headers 'DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization';
            }
            proxy_pass http://ceph_radosgw_zone;
            proxy_pass_header Server;
            proxy_buffering    off;
            proxy_redirect     off;
            proxy_set_header   Host             $host:$server_port;
            proxy_set_header   X-Real-IP        $remote_addr;
            proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
            client_max_body_size   0;
        }
}

完美解决

编写命令行程序增加自动补全功能

最近正在编写一套完整的存储系统,涉及到命令行程序,需要增加自动补全功能以提高用户体验
这里用到了bash-completion工具可以直接yum装一下。

在/etc/bash_completion.d/ 目录下添加任意文件名文件,如flystor
比如你有如下命令树:

set 
   net 
      ip
      dns
      gw
   option
      hostname
      date

即当打完set用双击tab 应该可以显示二层net option两个子命令,如输入为set net 双击tab则应该出现 ip dns gw的提示,并且所有命令都带有自动补全功能

对应flystor文件中的内容为:

function _sms()
{
        local cur prev

        COMPREPLY=()
        cur="${COMP_WORDS[COMP_CWORD]}"
        prev="${COMP_WORDS[COMP_CWORD-1]}"

        case "${prev}" in
           set)
                COMPREPLY=( $(compgen -W "net option" ${cur}) )
                return 0
                ;;
           net)
                COMPREPLY=( $(compgen -W "ip dns gw" ${cur}) )
                return 0
                ;;
           option)
                COMPREPLY=( $(compgen -W "hostname date" ${cur}) )
                return 0
                ;;
        esac
}

代码初潜易懂,但这样写就很实用了。

完成后需要退出当前用户重新登录即可生效。