Welcome to Kyle's Blog!

努力做一个全栈工程师

python面试题

2018-03-14

python

python
搜集一些常见python面试题

最近换工作了，准备找python web服务器相关，虽然上一个项目都是一个人边自学边撸出来的，但毕竟是小项目，有些常见知识会用不到，还是有必要看一些面试题，提高面试通过率。

1. 实现python中单例模式

方法1. 使用模块
```
#mysingleton.py
class MySingleton(object):
    def foo(self):
        print("call foo()")

my_singleton = MySingleton()
```
将上面代码保存在mysingleton.py中，然后这样使用
```
from mysingleton import my_singleton
my_singleton.foo()
```
方法2. 使用 new
```
class Singleton(object):
    _instance = None
    def __new__(cls, *args, **kw):
        if cls._instance is None:
            cls._instance = super(Singleton, cls).__new__(cls, *args, **kw)
        return cls._instance

class MyClass(Singleton):
    a = 1

a = MyClass()
b = MyClass()
if a == b:
    print("Good Singleton.")
else:
    print("Bad Singleton.")
print(id(one))
print(id(two))
```
方法3. 使用装饰器
```
def singleton(cls):
    instances = {}
    def getinstance(*args, **kw):
        if cls not in instances:
            instances[cls] = cls(*args, **kw);
        return instances[cls]
    return getinstance

@singleton
class MyClass(object):
    a = 1
```
方法4. 使用 metaclass 元类（metaclass）可以控制类的创建过程，它主要做三件事：拦截类的创建修改类的定义返回修改后的类使用元类实现单例模式的代码如下：
```
class Singleton(type):
    _ins = None
    def __call__(cls, *args, **kw):
        if cls._ins is None:
            cls._ins = super(Singleton, cls).__call__(*args, **kw)
        return cls._ins

#py2
class MyClass(object):
    __metaclass__ = Signleton

#py3
class MyClass(metaclass=Singleton):
    pass
```
2. 什么是lambda函数

lambda表达式通常是在需要一个函数，但又不想费神去命名一个函数的场景使用，也就是匿名函数。

例如：
```
add = lambda x,y: x+y
add(1,2) # result = 3
```
3. python是如何进行内存管理的？

主要有两点
1. 使用引用计数+标记-清除对象
2. 需要被清除的基础对象（自定义对象是否会立刻还给OS待确定）不会立刻还给OS（会在适当的时候归还），而是放到一个private memory pool中以便下次使用需要重点注意循环引用的问题
4. 说说decorator的用法和它的应用场景，并写一个decorator

装饰器就是把函数包装一下，为函数添加一些附加功能。装饰器是一个函数，参数为被包装的函数，返回包装后的函数：
```
def dec(fp):
    def _d(*args, **kw):
        print("before call func.")
        r = fp(*args, **kw)
        print("after call func.")
        return r
    return _d

@dec
def func():
    print("call func")

func();

# *** 等价与 ***
def dec(fp):
    def _d(*args, **kw):
        print("before call func.")
        r = fp(*args, **kw)
        print("after call func.")
        return r
    return _d

def func():
    print("call func")

f = dec(func)
f()
```
5. Python中pass语句的作用是什么？

pass语句什么也不做，一般作为占位符。

6. 名词解释CGI，FastCGI, WSGI

CGI全称是“公共网关接口”(CommonGateway Interface)，HTTP服务器与你的或其它机器上的程序进行“交谈”的一种工具，其程序须运行在网络服务器上。　CGI可以用任何一种语言编写，只要这种语言具有标准输入、输出和环境变量。如php,perl,tcl等。

FastCGI像是一个常驻(long-live)型的CGI，它可以一直执行着，只要激活后，不会每次都要花费时间去fork一次(这是CGI最为人诟病的fork-and-execute模式)。它还支持分布式的运算, 即 FastCGI 程序可以在网站服务器以外的主机上执行并且接受来自其它网站服务器来的请求。

FastCGI是语言无关的、可伸缩架构的CGI开放扩展，其主要行为是将CGI解释器进程保持在内存中并因此获得较高的性能。众所周知，CGI解释器的反复加载是CGI性能低下的主要原因，如果CGI解释器保持在内存中并接受FastCGI进程管理器调度，则可以提供良好的性能、伸缩性、Fail- Over特性等等。

WSGI的全称为： PythonWeb Server Gateway Interface v1.0 （Python Web 服务器网关接口），它是 Python 应用程序和 WEB 服务器之间的一种接口。它的作用，类似于FCGI 或 FASTCGI 之类的协议的作用。 WSGI 的目标，是要建立一个简单的普遍适用的服务器与 WEB 框架之间的接口。

7. python代码得到列表list的交集与并集

a = [1,2,3]

b = [2,3,4]

交集

print([item for item in a if item in b ])

并集

print list(set(a).union(set(b)))

8. map, reduce, filter, sorted用法

map()函数接收两个参数，一个是函数，一个是序列，map将传入的函数依次作用到序列的每个元素，并把结果作为新的list返回。
```
>>> def f(x):
...     return x * x
...
>>> map(f, [1, 2, 3, 4, 5, 6, 7, 8, 9])
[1, 4, 9, 16, 25, 36, 49, 64, 81]

>>> map(str, [1, 2, 3, 4, 5, 6, 7, 8, 9])
['1', '2', '3', '4', '5', '6', '7', '8', '9']
```
reduce把一个函数作用在一个序列[x1, x2, x3…]上，这个函数必须接收两个参数，reduce把结果继续和序列的下一个元素做累积计算
```
>>> def add(x, y):
...     return x + y
...
>>> reduce(add, [1, 3, 5, 7, 9])
25

>>> def fn(x, y):
...     return x * 10 + y
...
>>> reduce(fn, [1, 3, 5, 7, 9])
13579
```
filter()也接收一个函数和一个序列。和map()不同的时，filter()把传入的函数依次作用于每个元素，然后根据返回值是True还是False决定保留还是丢弃该元素。
```
def is_odd(n):
    return n % 2 == 1

filter(is_odd, [1, 2, 4, 5, 6, 9, 10, 15])
# 结果: [1, 5, 9, 15]
```
sorted()函数就可以对list进行排序,也可以接收一个比较函数来实现自定义的排序。与容器内置sort()区别是sort()函数直接修改原对象，而sorted()函数则返回一个新序列。
```
>>> sorted([36, 5, 12, 9, 21])
[5, 9, 12, 21, 36]

def reversed_cmp(x, y):
    if x > y:
        return -1
    if x < y:
        return 1
    return 0
>>> sorted([36, 5, 12, 9, 21], reversed_cmp)
[36, 21, 12, 9, 5]
```
9. 标准库线程安全的队列是哪一个？不安全的是哪一个？logging是线程安全的吗？

都是线程安全的。普通容器如list, tuple, dict, set是非线程安全的。 logging是线程安全的

10. 什么是迭代器？

(Iterator)迭代器是带状态的对象,它会记录当前迭代所在的位置,以方便下次迭代的时候获取正确的元素.

11. 什么是生成器？
1. 生成器函数
  def scq(N): for i in range(N): yield i * 2 for i in scq(5): print i
2. 生成器表达式
使用列表推倒，将会一次产生所有结果
```
s = [x*2 for x in range(5)]
print(s) # [0,2,4,6,8]
```
将[] => ()，返回生成器表达式
```
s = (x*2 for x in range(5))
print(s) # <generator object at 0x00B2EC88>
print next(s) # 0
print next(s) # 4
```
生成器的好处是延迟计算，一次返回一个结果。也就是说，它不会一次生成所有的结果，这对于大数据量处理，将会非常有用。注意事项：生成器只能遍历一次。例如：
```
def scq(N):
  for i in range(N):
    yield i * 2

x = scq(10)
for i in x:
  print i

print("第二次遍历开始")
for i in x:
  print i # 不会打印任何东西
print("第二次遍历结束")
```
12. 谈谈python多进程与多线程？

多线程和多进程最大的不同在于，多进程中，同一个变量，各自有一份拷贝存在于每个进程中，互不影响，而多线程中，所有变量都由所有线程共享，所以，任何一个变量都可以被任何一个线程修改，因此，线程之间共享数据最大的危险在于多个线程同时改一个变量，把内容给改乱了。

13. 什么是GIL?

GIL(Global Interpreter Lock)，任何Python线程执行前，必须先获得GIL锁，然后，每执行100条字节码，解释器就自动释放GIL锁，让别的线程有机会执行。这个GIL全局锁实际上把所有线程的执行代码都给上了锁，所以，多线程在Python中只能交替执行，即使100个线程跑在100核CPU上，也只能用到1个核。 GIL是Python解释器设计的历史遗留问题，通常我们用的解释器是官方实现的CPython，要真正利用多核，除非重写一个不带GIL的解释器。所以，在Python中，可以使用多线程，但不要指望能有效利用多核。如果一定要通过多线程利用多核，那只能通过C扩展来实现，不过这样就失去了Python简单易用的特点。不过，也不用过于担心，Python虽然不能利用多线程实现多核任务，但可以通过多进程实现多核任务。多个Python进程有各自独立的GIL锁，互不影响。
Read All
uWSGI listen queue 队列溢出的问题

2018-02-06

uwsgi

uwsgi
nginx对应也会出现错误**** upstream time out，报错信息为： ** uWSGI listen queue of socket “127.0.0.1:9001 #注：指定某个固定端口” (fd: 3) full !!! (101/100) ***

改大配置文件中的process和threads即可，默认队列为100，即使最大并发数=100*进程数网上的做法在docker中不方便修改：

修改/etc/sysctl.conf文件,添加或者修改这几个参数值
```
net.core.somaxconn = 262144
#表示SYN队列的长度，默认为1024，加大队列长度为8192，可以容纳更多等待连接的网络连接数
net.ipv4.tcp_max_syn_backlog = 8192
#网卡设备将请求放入队列的长度
net.core.netdev_max_backlog = 65536
```
修改完成之后要记得 sysctl -p 重新加载参数，另外调大uwsgi配置中 –listen=1024的数目是提高并发能力最有效的办法

2018.5.15更新：
```
报错：(2003, "Can't connect to MySQL server on 'localhost' ([Errno 99] Cannot assign requested address)")
修复方法：
net.ipv4.tcp_tw_reuse = 1

net.ipv4.tcp_syncookies = 1
新的连接可以重新使用TIME-WAIT套接字
net.ipv4.tcp_tw_reuse=1
启动TIME-WAIT套接字状态的快速循环功能
net.ipv4.tcp_tw_recycle=1
套接字关闭时，保持FIN-WAIT-2状态的时间
net.ipv4.tcp_fin_timeout=30
对于所有协议的队列，设置最大系统发送缓存(wmen)和接收缓存(rmem)到8M
net.core.wmem_max=8388608
net.core.rmem_max=8388608
```
2018.05.30 更新修改max_connections Ubuntu has moved from Upstart to Systemd from version 15.04 and no longer respects the limits in /etc/security/limits.conf for system services. These limits now apply only to user sessions.

The limits for the MySQL service are defined in the Systemd configuration file, which you should copy from its default location into /etc/systemd and then edit the copy.

sudo cp /lib/systemd/system/mysql.service /etc/systemd/system/ sudo vim /etc/systemd/system/mysql.service # or your editor of choice Add the following lines to the bottom of the file:

LimitNOFILE=infinity LimitMEMLOCK=infinity You could also set a numeric limit, eg LimitNOFILE=4096

Now reload the Systemd configuration with:

sudo systemctl daemon-reload Restart MySQL and it should now obey the max_connections directive.

然后 /etc/mysql/mysql.conf.d/mysqld.cnf 添加 [mysqld] max_connections = 5000 max_connect_errors = 10000

非常重要的一点，重启完mysql后，服务器程序也要重启一下，否则连接池里面可能会有异常的连接！！！！

参考：
```
https://stackoverflow.com/questions/24884438/2003-cant-connect-to-mysql-server-on-127-0-0-13306-99-cannot-assign-reques
https://www.percona.com/blog/2014/12/08/what-happens-when-your-application-cannot-open-yet-another-connection-to-mysql/
https://serverfault.com/questions/829072/cant-connect-to-mysql-server-mysql-server-ip-99
https://blog.csdn.net/tenfyguo/article/details/8499248
https://www.digitalocean.com/community/questions/max_connections-will-not-change-in-ubuntu
```
Read All
Flask-Sqlalchemy使用过程中一个诡异问题

2017-11-08

flask-sqlalchemy

flask-sqlalchemy
- 方案1：
- 方案2：
今天项目在做测试的时候发现了一个日志报错。报错内容是“This result object does not return rows. It has been closed automatically”，而报错的位置确是最普通的一些查询操作，非常诡异。我的生产环境是Ubuntu + nginx + uwsgi, uwsgi执行脚本如下(注意多线程环境)：
```
uwsgi --socket 127.0.0.1:9001 --chdir mysite --wsgi-file flask_app.py --callable app --pidfile /home/labor/pidfile.pid --master --processes 2 --threads 2
```
示例代码如下：

flask_app.py主工程文件：
```
from base import db

app = Flask(__name__)
db.app = app
db.init_app(app)
add_system_log(0, int(time.time()), u'服务器启动成功...', "", "")
...
if __name__ == '__main__':
	...
```
base.py内容：
```
from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy()
```
其中add_system_log代码如下：
```
def add_system_log(stype, timestamp, description, ip, user_agent):
    try:
        sys_log = SystemLog(stype, timestamp, description, ip, user_agent)
        db.session.add(sys_log)
        db.session.commit()
    except Exception, e:
        logging.error("commondb.py add_system_log exception:" + str(e))
```
然而报错的位置并不在上述代码中，而是一些普通请求中的简单查询操作。

我仔细对比了官方给的例子发现例子中并没有db.app = app这句话。但是我去掉这句话后，add_system_log就会报错，报错内容是“application not registered on db instance and no applicationbound to current context”。

经过一些资料查找，找到了两个解决方案:

方案1：
```
1). 去掉db.app = app
2). 去掉add_system_log 
缺点：不能在项目启动过程中操作数据库, 启动完成后，在get/post请求中操作
```
方案2：
```
修改uwsgi执行脚本(添加了 --enable-threads --lazy-apps)：
uwsgi --socket 127.0.0.1:9001 --chdir mysite --wsgi-file flask_app.py --callable app --pidfile /home/labor/pidfile.pid --master --processes 2 --threads 2 --enable-threads --lazy-apps
缺点：第一次启动会有数据库不存在的exception，后续每次启动会有两条'服务器启动成功'日志。
```
我最终选择用方案2.

参考资料：
Read All

MySQL 怎样检测某数据库是否存在？

2017-11-02

mysql

第一种方法，就是使用SQL语句”SHOW DATABASES;”, 根据其结果判断是否某数据库存在即可。 python版sqlalchemy实现：

	engine = create_engine(sql_uri)
	conn = engine.connect()
	conn.execute("commit")
	existing_databases = conn.execute("SHOW DATABASES;")
	existing_databases = [d[0] for d in existing_databases]
	if DATABASE in existing_databases:
		# exist
		pass
	else:
		# NOT exist
		pass
	conn.close()

第二种方法是在sqlalchemy-utils包源码中学到的，代码如下

	engine = create_engine(sql_uri)
	text = ("SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA "
			"WHERE SCHEMA_NAME = '%s'" % database)
	return bool(engine.execute(text).scalar())

Read All

MySQL thread-safe get_or_create function

2017-11-01

mysql

MySQL数据库如何创建一个线程安全的get_or_create方法？
MySQL数据库如何创建一个线程安全的某字段自增方法？

MySQL数据库如何创建一个线程安全的get_or_create方法？

# 这里用到了sqlalchemy
def get_or_create(param1):
    try:
        db.session.execute('LOCK TABLES table_name WRITE;')
        a = TableName.query.filter_by(param1=param1).first()
        if a is None:
            a = TableName(param1)
            db.session.add(a)
            db.session.commit()
        db.session.execute('UNLOCK TABLES;')
        return a
    except Exception, e:
        logging.error("xxx.py get_or_create exception:" + str(e))
    return None

SQL语句‘LOCK TABLES table_name WRITE;’不能重入，保证了被lock/unlock包裹的代码段只能同时被一个线程执行。

MySQL数据库如何创建一个线程安全的某字段自增方法？

def increase(uid):
    try:
        User.query.filter_by(userid=uid).update({"money":User.money + 1})
        db.session.commit()
    except Exception, e:
        logging.error("xxx.py increase exception:" + str(e))
    return None

Read All

Cookie字段解释

2017-10-26

HTTP请求Response Header字段如下

accept-ranges:bytes
cache-control:public, max-age=43200
date:Thu, 26 Oct 2017 05:33:47 GMT
etag:"1508741996.79-8997-4145288068"
expires:Thu, 26 Oct 2017 17:33:47 GMT
server:nginx/1.10.3 (Ubuntu)
set-cookie:session=eyJfcGVybWFuZW50Ijp0cnVlLCJhcHBfbWFuYWdlcl9zZXNzaW9uX2tleSI6eyIgYiI6IlJYVnNRV3RFTUhnelpHNXZOWEZOU0ZwVldqUXdXRGwyVUROQlVXMWtSWGRyT0VrMiJ9fQ.DNMFOw.xywAq7tBglj_NPgLRPZaptNB_N0; Expires=Thu, 02-Nov-2017 05:33:47 GMT; HttpOnly; Path=/
status:304
strict-transport-security:max-age=63072000; includeSubdomains; preload
x-content-type-options:nosniff
x-frame-options:DENY

这里主要解释set-cookie字段内容

session=eyJfcGVybWFuZW50Ijp0cnVlLCJhcHBfbWFuYWdlcl9zZXNzaW9uX2tleSI6eyIgYiI6IlJYVnNRV3RFTUhnelpHNXZOWEZOU0ZwVldqUXdXRGwyVUROQlVXMWtSWGRyT0VrMiJ9fQ.DNMFOw.xywAq7tBglj_NPgLRPZaptNB_N0; Expires=Thu, 02-Nov-2017 05:33:47 GMT; HttpOnly; Path=/

session值,其实就是我们常说的sessionid

Expires是指过期时间

HttpOnly是指不允许使用js(类似document.cookie)获取cookie数据，能有效防止XSS攻击，降低cookie被窃取的风险。

Path是指浏览器存储cookie路径

secure是指必须使用https通信，http通信将不附带cookie数据

session字段数据被’.’分为三个段

第一段是cookie内容（通常是服务端session存储的内容）

第二段是时间戳

import time
from itsdangerous import base64_decode, bytes_to_int, EPOCH

# 时间戳
timestamp = bytes_to_int(base64_decode("DNMFOw")) + EPOCH  # EPOCH是一个常量时间戳(2011.1.1)，设计目的是减少数据传输
# 日期
print time.strftime('%Y-%m-%d %H:%I:%S', time.localtime(timestamp))

第三段是通过HMAC算法签名的校验信息。也就是说即使你修改了前面的值，由于签名值有误，flask不会使用该session。签名过程用到了SECRET_KEY。所以一定要保存好SECRET_KEY。一旦让别人知道了SECRET_KEY，就可以通过构造cookie伪造session值。

Read All

3/7

Welcome to Kyle's Blog!

python面试题

1. 实现python中单例模式

2. 什么是lambda函数

3. python是如何进行内存管理的？

4. 说说decorator的用法和它的应用场景，并写一个decorator

5. Python中pass语句的作用是什么？

6. 名词解释CGI，FastCGI, WSGI

7. python代码得到列表list的交集与并集

交集

并集

8. map, reduce, filter, sorted用法

9. 标准库线程安全的队列是哪一个？不安全的是哪一个？logging是线程安全的吗？

10. 什么是迭代器？

11. 什么是生成器？

12. 谈谈python多进程与多线程？

13. 什么是GIL?

uWSGI listen queue 队列溢出的问题

Flask-Sqlalchemy使用过程中一个诡异问题

方案1：

方案2：

MySQL 怎样检测某数据库是否存在？

MySQL thread-safe get_or_create function

MySQL数据库如何创建一个线程安全的get_or_create方法？

MySQL数据库如何创建一个线程安全的某字段自增方法？

Cookie字段解释