nginx反向代理，nginx的upstream和fastcgi模块（第31天）

反向代理：

工作在应用层的反向代理服务器，该代理服务器必须监听在80端口。lvs是工作在netfilter中的INPUT链上的，并不需要监听在80端口，因为lvs这货在数据从PREROUTING链进入之后，到达INPUT链就被LVS截获处理掉了（也就是调度到后端的RS了），所以请求报文根本没有机会进入到应用层。

反向代理服务器工作原理：
首先反向代理服务器本身也是一个http服务，监听在80端口，当它接受到客户端的请求之后，该反向代理服务器会把自己从当成客户端，来向后端的服务器发送客户端的请求。后端的服务器接收到该请求后，向该反向代理服务器发送响应报文。反向代理服务器接收到响应报文之后就将该报文发回给客户端。这里的后端服务器在nginx中被称之为upstream server。

因为反向代理工作在应用层，该代理可以知道http请求的URL，分析该URL，可以获取到请求的资源是静态的（http://xxxxx.jpg .css .js），或者是动态的(http://xxxxx.php .do .jsp)，从而可以做到动静分离。调度到后端的upstream server，upstream server也分为静态和动态server，而且，静态资源不需要做session保持，动态资源需要做session保持。

session保持方法：
session绑定：sh算法或者持久链接，来自同一个来源IP的请求都调度至同一个upstream server
session复制：每一个upstream server都保存其它的upstream server的session
session服务器：session集中保存到memcached，redis（key-value，kv store）中

缓存为王的时代，可以在反向代理服务器中加入对于upstream server的缓存。比如加入PCI-E的固态作为缓存硬件。缓存一般也都是key-value存储的，一般来说key可以存储URL的哈希结果，value就存储URL的对应的资源。

对于任何一个负载均衡器都提供对于后端服务器的健康状态监测。

有一点，nginx的反向代理是工作在应用层的，必然限制与socket数量限制。撑死了能并发65535个请求。

如果这样的一套系统在nginx反向代理出现了瓶颈，那么可以在增加一个nginx反向代理，然后在这2个nginx反向代理的前端加入一个lvs调度器。

maybe，可以做多套这样的系统，依据应用的类型划分。比如一套用于桌面端，一套用于wap/mobile端。或者可以按照应用的功能做划分，比如一套用于做团购，一套用于做搜索，一套用于做o2o等等，现实中要依据具体的业务划分。

对于动态资源服务器端的存储来说，可以使用的是mysql数据库。那么这边的瓶颈如何解决？其实，很多应用程序有一个特定，就是对于数据库是读多写少的，那么这时就可以对数据库做读写分离。只提供读的数据库服务器可以有多台，只提供写的数据库服务器可以只有1台，这样一来，只要有写数据库的操作，都调度给这台提供写服务的数据库，然后再把数据同步给其他的只提供读的数据库服务器。
提供写操作的数据库服务器如果出现了瓶颈了呢？
可能用到数据库切片等等

nginx的反向代理和haproxy
xxxx

5种IO模型:
同步和异步：关注的是被调用方的响应模式
阻塞和非阻塞：关注的是调用方是否被挂起

nginx使用了libevent程序包，该程序包在linux对于事件驱动的实现函数是epoll()。

nginx原理图：

nginx的配置：
main ，events，http相关

nginx是模块化的，其中有：
nginx自身的核心模块ngx_core_module
还有http相关的许多模块，比如：ngx_http_core_module表示http的核心模块，
下面的模块表示http相关的模块：比如fastcgi，uwsgi，proxy，gzip，rewrite，upstream等等：
ngx_http_access_module
ngx_http_addition_module
ngx_http_api_module
ngx_http_auth_basic_module
ngx_http_auth_jwt_module
ngx_http_auth_request_module
ngx_http_autoindex_module
ngx_http_browser_module
ngx_http_charset_module
ngx_http_dav_module
ngx_http_empty_gif_module
ngx_http_f4f_module
ngx_http_fastcgi_module
ngx_http_flv_module
ngx_http_geo_module
ngx_http_geoip_module
ngx_http_gunzip_module
ngx_http_gzip_module
ngx_http_gzip_static_module
ngx_http_headers_module
ngx_http_hls_module
ngx_http_image_filter_module
ngx_http_index_module
ngx_http_js_module
ngx_http_keyval_module
ngx_http_limit_conn_module
ngx_http_limit_req_module
ngx_http_log_module
ngx_http_map_module
ngx_http_memcached_module
ngx_http_mirror_module
ngx_http_mp4_module
ngx_http_perl_module
ngx_http_proxy_module
ngx_http_random_index_module
ngx_http_realip_module
ngx_http_referer_module
ngx_http_rewrite_module
ngx_http_scgi_module
ngx_http_secure_link_module
ngx_http_session_log_module
ngx_http_slice_module
ngx_http_spdy_module
ngx_http_split_clients_module
ngx_http_ssi_module
ngx_http_ssl_module
ngx_http_status_module
ngx_http_stub_status_module
ngx_http_sub_module
ngx_http_upstream_module
ngx_http_upstream_conf_module
ngx_http_upstream_hc_module
ngx_http_userid_module
ngx_http_uwsgi_module
ngx_http_v2_module
ngx_http_xslt_module
ngx_mail_core_module
ngx_mail_auth_http_module
ngx_mail_proxy_module
ngx_mail_ssl_module
ngx_mail_imap_module
ngx_mail_pop3_module
ngx_mail_smtp_module
ngx_stream_core_module
ngx_stream_access_module
ngx_stream_geo_module
ngx_stream_geoip_module
ngx_stream_js_module
ngx_stream_keyval_module
ngx_stream_limit_conn_module
ngx_stream_log_module
ngx_stream_map_module
ngx_stream_proxy_module
ngx_stream_realip_module
ngx_stream_return_module
ngx_stream_split_clients_module
ngx_stream_ssl_module
ngx_stream_ssl_preread_module
ngx_stream_upstream_module
ngx_stream_upstream_hc_module
ngx_google_perftools_module

ngx_http_proxy_module模块详解：
这是nginx的反向代理模块

该模块的工作原理：

server {
listen
server_name

local / {
proxy_pass http://192.168.3.7:8000;
proxy_set_header Host $host; 代理服务器对客户端的请求报文重新构建来发送给后端的真正提供服务的服务器，$host保存的就是客户端http请求头部中的host对应的值（因为可能后端真正提供服务的服务器（192.168.3.7）有虚拟主机）。proxy_set_header设置请求报文的头部
proxy_set_header X-Real-IP $remote_addr; $remote_addr保存的就是真实的客户端ip地址
}
}

举例：
node2：172.16.100.7
将node2作为upstream server。并且在node2上启动web服务。

noed1：172.16.100.6
noed1作为反向代理服务器配置如下：location是放在servver上下文中的
location / {
#root /usr/share/nginx/html;
proxy_pass http://172.16.100.7/;
index index.html index.htm;
}

location /forum/ {
proxy_pass http://172.16.100.7/bbs/;
proxy_set_header X-Real-IP $remote_addr;
index index.html index.htm;
}

location ~* \.(jpg|png|gif)$ {
proxy_pass http://172.16.100.7; 这表示请求URL中以jpg或png或gif结尾的，都代理至172.16.100.7服务器中，做模式匹配时，172.16.100.7后面不能跟任何路径。
proxy_set_header X-Real-IP $remote_addr;
} 例如访问 http://www.mageedu.com/images/22.jpg 会被代理成http://172.16.100.7/images/22.jpg 这部分/images/22.jpg ，整个跟到172.16.100.7后面。

在172.16.100.7 主机上/var/log/httpd/access_log文件上记录的日志会显示来源ip为172.16.100.6 那么172.16.100.6并不是真实的客户端ip，如何让日志记录真实的客户端ip地址呢？

在172.16.100.7 主机上的/etc/http/conf/http.conf文件中，查找LogFormat将 %h 替换为 %{X-Real-IP}i

还有一点：cip到pip这一段可以提供https，lip到uip这一段可以提供https吗？
可以的：配置指令如下: 把这些指令添加到上面的例子中的location上下文中即可
proxy_ssl_certificate
proxy_ssl_certificate_key
proxy_ssl_ciphers
proxy_ssl_crl
proxy_ssl_name
proxy_ssl_password_file
proxy_ssl_server_name
proxy_ssl_session_reuse
proxy_ssl_protocols
proxy_ssl_trusted_certificate
proxy_ssl_verify
proxy_ssl_verify_depth

nginx的反向代理功能支持缓存（cache）：配置指令如下：
nginx的缓存中的key保存在内存中，key是url的哈希值，value是url对应的资源
例如：
proxy_cache_path /data/nginx/cache levels=1:2 keys_zone=one:10m; 这个表示1级子目录是1个字符，2级子目录是使用2个字符，类似于：
/data/nginx/cache/c/29/b7f54b2df7773722d382f4809d65029c
keys_zone=one:10m 指定名称和内存大小

proxy_cache_path指令只能用于http上下文中，然后在http，server，location上下文中使用proxy_cache zone | off 来使用之前定义好的缓存。

proxy_cache_methods 使用哪些http方法才缓存
proxy_cache_min_uses 某一个请求被最少响应了几次之后才缓存

proxy_cache_purge string ... 表示定义哪种请求是用来做缓存修剪的
例如：
proxy_cache_path /data/nginx/cache keys_zone=cache_zone:10m;
map $request_method $purge_method {
PURGE 1;
default 0;
}
server {
...
location / {
proxy_pass http://backend;
proxy_cache cache_zone;
proxy_cache_key $uri;
proxy_cache_purge $purge_method;
}
}

proxy_cache_revalidate on | off; 重新验证缓存
Enables revalidation of expired cache items using conditional requests with the “If-Modified-Since” and “If-None-Match” header fields.

proxy_cache_use_stale error | timeout | invalid_header | updating | http_500 | http_502 | http_503 | http_504 | http_403 | http_404 | http_429 | off ...;
在与代理服务器通信期间，确定在哪些情况下可以使用过时的缓存响应。

proxy_cache_valid [code ...] time;
Sets caching time for different response codes. For example, the following directives
例如：
proxy_cache_valid 200 302 10m;
proxy_cache_valid 404 1m;

proxy_connect_timeout time:
代理连接的超时时间

proxy_hide_header field;
代理服务器给客户端响应的时候，不打算响应的首部

proxy_read_timeout time;
后端服务器的响应的超时时间

proxy_pass_request_body on|off;
表示代理服务器是否将请求body传输给后端服务器

proxy_pass_request_header on|off;
表示代理服务器是否将请求header传输给后端服务器

proxy_buffers number size;
代理服务器的缓冲区的大小

proxy_cache_bypass string ...;
设置在何种情况下nginx将不从cache取出数据
例如：
proxy_cache_bypass $cookie_nocache $arg_nocache$arg_comment;
proxy_cache_bypass $http_pragma $http_authorization;

proxy_set_header field value;
设置首部（设置首部）

ngx_http_upstream_module模块详解：该模块主要用来做负载均衡

这个模块定义被proxy_pass, fastcgi_pass, uwsgi_pass, scgi_pass, and memcached_pass指令引用的服务组。
例如：
upstream backend { upstream指令只能用在http上下文中
server backend1.example.com weight=5;
server backend2.example.com:8080;
server unix:/tmp/backend3;

server backup1.example.com:8080 backup;
server backup2.example.com:8080 backup;
}

server {
location / {
proxy_pass http://backend; proxy_pass指令后面跟上upstream指令指定的名称
}
}

ip_hash 类似于lvs的sh算法，同一个来源的IP，始终调度至同一个后端服务器
例如：
upstream backend {
ip_hash;

server backend1.example.com;
server backend2.example.com;
server backend3.example.com down;
server backend4.example.com;
}

ip_hash的弊端？
SNAT模式的大量Client只能获得运营商提供的几个有限的公网IP作为客户端IP。如此一来便无法标识每一个用户，所以服务器一般使用cookie和session技术。客户端的cookie头部是由服务器端通过set-cookie头部设置的，cookie用来标识唯一的session，以此来达到识别每一个用户。cookie技术LVS无法实现，因为LVS工作在netfilter上，根本无法识别http头部的cookie。

upstream中的server可以定义多个parameters ：
weight=number
max_conns=number
max_fails=number 最多允许失败几次
fail_timeout=time 每一次的最大超时时长
server中的服务器可以标记为backup（备用服务器），down，resolve等状态

upstream中的sticky指令的用法：
sticky用来做session绑定的：同一个客户端的请求将发往同一个后端服务器，有cookie，route，learn 3种可用方法。如下：
sticky cookie name [expires=time] [domain=domain] [httponly] [secure] [path=path];
sticky route $variable ...;
sticky learn create=$variable lookup=$variable zone=name:size [timeout=time] [header];
例子：
upstream backend {
server backend1.example.com;
server backend2.example.com;

sticky cookie srv_id expires=1h domain=.example.com path=/;
}

upstream中的least_conn；
调度方法，选出当前拥有最少连接的upstream server来响应客户端的请求。

upstream中的keeplive：
keepalive connections; 激活代理服务器和后端服务器之间的持久连接
例如：
upstream memcached_backend {
server 127.0.0.1:11211;
server 10.0.0.2:11211;

keepalive 32;
}
server {
...
location /memcached/ {
set $memcached_key $uri;
memcached_pass memcached_backend;
}
}

health_check 健康状态监测，该指令用于location上下文中
例如：
upstream dynamic {
zone upstream_dynamic 64k;

server backend1.example.com weight=5;
server backend2.example.com:8080 fail_timeout=5s slow_start=30s;
server 192.0.2.1 max_fails=3;
server backend3.example.com resolve;
server backend4.example.com service=http resolve;

server backup1.example.com:8080 backup;
server backup2.example.com:8080 backup;
}
server {
location / {
proxy_pass http://dynamic;
health_check;
}
}

例如：
http {
...
match server_ok {
status 200-399;
body !~ "maintenance mode";
}

server {
...
location / {
proxy_pass http://backend;
health_check match=server_ok;
}
}
}

ngx_http_upstream_module模块的一些变量：
$upstream_addr upstream server的ip地址
$upstream_cache_status 缓存的状态，取值有“MISS”, “BYPASS”, “EXPIRED”, “STALE”, “UPDATING”, “REVALIDATED”, or “HIT”
$upstream_cookie_name cookie with the specified name sent by the upstream server in the “Set-Cookie” response header field (1.7.1). Only the cookies from the response of the last server are saved.

$upstream_status keeps status code of the response obtained from the upstream server.

备注：
ngx_http_headers_module模块中的添加响应首部的指令add_header：
如下的例子表示在http的响应头部中加入X-Via和X-Cache（X-Via和X-Cache这两个响应头部的名称可以自定义），$server_add表示服务器地址，$upstream_cache_status表示后端服务器的状态。例子如下：
例如：
add_header X-Via $server_addr;
add_header X-Cache $upstream_cache_status;

下图分别是第二次请求和首次请求：

ngx_http_fastcgi_module fastcgi模块详解
该模块将请求发送给FastCGI server处理。
例如：
location ～ \.php$ {
root /usr/share/nginx/html;
fastcgi_pass localhost:9000; 请求代理至localhost:9000
fastcgi_index index.php;

fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name;

include fastcgi_params; 包含fastcgi_params文件，该文件的内容包含如下等内容，这些参数主要用于传递给fpm server。

fastcgi_param QUERY_STRING $query_string;
fastcgi_param REQUEST_METHOD $request_method;
fastcgi_param CONTENT_TYPE $content_type;
fastcgi_param CONTENT_LENGTH $content_length;
}

要使用该fastcgi，首先要安装php-fpm 使用yum install -y php-fpm，该服务默认监听在9000端口
可以把多个fastcgi server定义成一个upstream组。

fastcgi_cache_path 指令的用法，提供缓存，该指令只能用于http上下文中
fastcgi_cache_path path [levels=levels] [use_temp_path=on|off] keys_zone=one:10m [inactive=time] [max_size=size] [manager_files=number] [manager_sleep=time] [manager_threshold=time] [loader_files=number] [loader_sleep=time] [loader_threshold=time] [purger=on|off] [purger_files=number] [purger_sleep=time] [purger_threshold=time];

定义好fastcgi缓存之后，使用fastcgi_cache指令开启缓存功能
fastcgi_cache zone | off; off表示关闭缓存功能

fastcgi_cache_valid
例如：
fastcgi_cache_valid 200 302 10m;
fastcgi_cache_valid 301 1h;
fastcgi_cache_valid any 1m;

fastcgi_cache_methods GET | HEAD | POST;

fastcgi_cache_min_uses number;

fastcgi_cache_use_stale error | timeout | invalid_header | updating | http_500 | http_503 | http_403 | http_404 | http_429 | off;

fastcgi_limit_rate rate; 限制从FastCGI server中读取响应的速率

fastcgi_param parameter value [if_not_empty];

fastcgi_store on | off | string; 允许fastcgi server处理的结果保存到磁盘

思考一个问题？
location中的root为不同的路径，具体如下所示，如何实现动静分离，wordpress程序应该放在/web/app/wp或者/web/htdocs目录才能实现动静分离？
location \.php$ {
root /web/app/wp;
}
location / {
root /web/htdocs;
}

思考另一个问题：
nginx反向代理服务器自己作为静态资源的响应服务器，动态资源通过fastcgi协议提交给另外的fpm server服务器处理，如下图所示
location \.php$ {
fastcgi_pass fastcgi://172.168.100.9:9000
}
location / {
root /web/htdocs;
}
如上情况，如何将wordpress程序放置呢，因为对于.php文件的访问可以通过fastcgi协议转发都后端服务器172.168.100，但是对于静态内容比如css，图片文件的访问则是访问本机的/web/htdocs目录，那么wordpress程序要如何放置，才能实现动静分离？？

总结：
cache有2种：
proxy_cache
fastcgi_cache 有些动态资源是不可以缓存的，要根据业务而定

一个完整的nginx配置实例（生产环境中使用）
user nobody nobody;
worker_processes 4;
worker_rlimit_nofile 51200;

error_log log/error.log notice;

pid /var/run/nginx.pid;

events {
use epoll; 事件驱动模型
worker_connections 51200;
}

http {
server_tokens off; 比如出现404页面，如果此参数为on会输出详细错误信息
include mime.types;

proxy_redirect off;
proxy_set_header Host $host; 反向代理服务器设置请求头部，发送给后端服务器
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; nginx是可以多级代理的， 3级代理 X-Forwarded-For:S1IP_addr,S2IP_addr,S3IP_addr

client_max_body size 20m; 客户端请求时body的最大size
client_body_buffer_size 256k; 客户端请求时的内存缓冲区的大小

proxy_connect_timeout 90;
proxy_send_timeout 90;
proxy_read_timeout 90;
proxy_buffer_size 128k;
proxy_buffers 4 64k;
proxy_busy_buffers_size 128k;
proxy_temp_file_write_size 128k;

defaut_type application/octet-tream;
charset utf-8;

client_body_temp_path /var/tmp/client_body_temp 1 2;
proxy_temp_path /var/tmp/proxy_temp 1 2;
fastcgi_temp_path /var/tmp/fastcgi_temp 1 2;
uwsgi_temp_path /var/tmp/uwsgi_temp 1 2;
scgi_temp_path /var/tmp/scgi_temp 1 2;

ignore_invalid_header on; 忽略不合法的首部
server_names_hash_max_size 256;
server_names_hash_bucket_size 64;
client_header_buffer_size 8k;
large_client_header_buffers 4 32k;
connection_pool_size 256;
request_pool_size 64k;

output_buffers 2 128k;
postpone_output 1460;

client_header_timeout 1m;
client_body_timeout 3m;
send_timeout 3m;

log_format main '$server_addr $remote_addr [$time_local] $msec+$connection'
'"$request" $status $connection $request_time $body_bytes_sent "$http_referer"'
'"$http_user_agent" "$http_x_forwarded_for"';

open_log_file_cache max=1000 inactime=20s min_uses=1 valid=1m;

access_log logs/access.log main;
log_not_found on;

sendfile on;
tcp_nodelay on;
tcp_nopush off;

reset_timeout_connection on;
keepalive_time 10 5;
keepalive_requests 100;

gzip on;
gzip_http_version 1.1;
gzip_vary on;
gzip_proxied any;
gzip_min_length 1024;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_proxied expired on-cache no-store private auth no_last_modified no_etag;
gzip_types text/plain application/x-javascript text/css application/xml application/json;
gzip disable "MSIE[1-6]\.(?!.*SV1)";

upstream tomcat8080 { 定义upstream server组
ip_hash;

server 172.16.100.103:8080 weight=1 max_fails=2;
server 172.16.100.104:8080 weight=1 max_fails=2;
server 172.16.100.105:8080 weight=1 max_fails=2;
}

server {
listen 80;
server_name www.magedu.com;
# config_app_begin
root /data/webapps/htdocs;
access_log /var/logs/webapp.accesss.log main;
error_log /var/logs/webapp.error.log notice;

location / {
location ~* ^.*/favicon.ico$ {
root /data/webapps;
expires 180d;
break;
}

if ( !-f $request_filename ) {
proxy_pass http://tomcat8080;
break;
}
}

error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}

server {
listen 8088;
server_name nginx_status;

location / {
access_log off;
deny all;
return 503;
}

location /status {
stub_status on;
access_log off;
allow 127.0.0.1;
allow 172.16.100.71;
deny all;
}

}

}

任务：
一个主机安装nginx，作为反向代理服务器。提供2台后端服务器安装lamp和wordpress和phpmyadmin。如果请求的是静态资源，则将缓存的目录放在该反向代理服务器中。
（1）手动更新所有节点的pma至新版本
（2）写脚本实现如上过程
（3）

Tengine
特性如下：
继承Nginx-1.8.1的所有特性，兼容Nginx的配置；
动态模块加载（DSO）支持。加入一个模块不再需要重新编译整个Tengine；
支持HTTP/2协议，HTTP/2模块替代SPDY模块；
流式上传到HTTP后端服务器或FastCGI服务器，大量减少机器的I/O压力；
更加强大的负载均衡能力，包括一致性hash模块、会话保持模块，还可以对后端的服务器进行主动健康检查，根据服务器状态自动上线下线，以及动态解析upstream中出现的域名；
输入过滤器机制支持。通过使用这种机制Web应用防火墙的编写更为方便；
支持设置proxy、memcached、fastcgi、scgi、uwsgi在后端失败时的重试次数
动态脚本语言Lua支持。扩展功能非常高效简单；
支持按指定关键字(域名，url等)收集Tengine运行状态；
组合多个CSS、JavaScript文件的访问请求变成一个请求；
自动去除空白字符和注释从而减小页面的体积
自动根据CPU数目设置进程个数和绑定CPU亲缘性；
监控系统的负载和资源占用从而对系统进行保护；
显示对运维人员更友好的出错信息，便于定位出错机器；
更强大的防攻击（访问速度限制）模块；
更方便的命令行参数，如列出编译的模块列表、支持的指令等；
可以根据访问文件类型设置过期时间；
……

一致性hash算法：对request uri进行哈希计算求出一个数值，将这个数值除以2的32次方求出余数a（余数一定小于2的32次方）。将所有的缓存服务器的名称做哈希计算求出数值，将这写数值除以2的32次方求出一组余数，这些余数分散在闭合环中（0到2的32次方-1）。a按顺时针方向寻找最近的余数（也就是缓存服务器）。
一致性hash算法的缺点，偏斜问题：缓存服务器过于集中

搜索此博客

test

nginx反向代理，nginx的upstream和fastcgi模块（第31天）

评论

发表评论

此博客中的热门博文

OAuth 2教程

rootfs，linux目录结构，（第3天）