web缓存varnish(第36天)

三月 01, 2019

pass

web缓存

程序具有局部性：
时间局部性
空间局部性

程序的局部性才让缓存发挥威力

缓存一般是key-value:
key: 访问路径，URL, hash
value：web content

一般缓存热点数据，经济效益高。

缓存的命中率：hit/(hit+miss)
文档命中率：从文档个数进行衡量；
字节命中率：从内容大小进行衡量；

注意：
缓存对象：生命周期；定期清理；
缓存空间耗尽：LRU（最近最少使用的缓存清理掉，腾出空间）
可缓存，不可缓存（用户私有数据）

缓存处理的步骤：
接收请求 --> 解析请求（提取请求的URL及各种首部）--> 查询缓存 --> 新鲜度检测 --> 创建响应报文 --> 发送响应 --> 记录日志

缓存的新鲜度检测机制有如下：
1：过期日期：
HTTP/1.0 Expires
Expires:Thu, 04 Jun 2015 23:38:18 GMT
HTTP/1.1 Cache-Control: max-age 缓存网页最大的存活时间600秒
Cache-Control:max-age=600

2：服务器的原始内容的有效性再验正：revalidate
如果服务器的原始内容未改变，则仅响应首部（不附带body部分），响应码304 （Not Modified）
如果服务器的原始内容发生改变，则正常响应，响应码200；
如果服务器的原始内容消失，则响应404，此时缓存中的cache object也应该被删除；

3：条件式请求首部：
If-Modified-Since：基于请求内容的时间戳作验正；
If-Unmodified-Since 自从某个时间之后是否有修改
If-Match：是否匹配，每个页面生成Etag，如果Etag改变，则内容修改过
If-None-Match：是否未匹配
Etag: faiy89345
条件是请求首部最常用的 If-Modified-Since和If-None-Match。

Age首部：当代理服务器用自己缓存的实体取响应请求时候，用该头部表明该实体从产生到现在经过多长时间了
Cache-Control = "Cache-Control" ":" 1#cache-directive
cache-directive = cache-request-directive
| cache-response-directive
cache-request-directive = 请求
"no-cache" 不要用缓存响应我，去web服务器取数据
| "no-store" (backup)
| "max-age" "=" delta-seconds 只接受age值小于max-age值，且没有过期的对象
| "max-stale" [ "=" delta-seconds ] 告诉服务器可以接受过期缓存，但是过期时间必须小于max-stale
| "min-fresh" "=" delta-seconds
| "no-transform"
| "only-if-cached"
| cache-extension
cache-response-directive = 响应
"public" 可以响应给任何用户
| "private" [ "=" <"> 1#field-name <"> ] 只能用该缓存响应先前请求该内容的那个用户
| "no-cache" [ "=" <"> 1#field-name <"> ] 这表示告诉客户端可以缓存，但是必须跟服务器验证了其有效后，才能返回给客户端
| "no-store" 告诉客户端不要缓存
| "no-transform"
| "must-revalidate"
| "proxy-revalidate"
| "max-age" "=" delta-seconds
| "s-maxage" "=" delta-seconds
| cache-extension

常见的缓存服务开源解决方案：
varnish, squid (类似nginx --> apache)
nginx和apache也有缓存功能

varnish, squid都有反向代理功能，但是主要功能是缓存

站点： https://www.varnish-cache.org

varnish引用DSL语言，也就是vcl（varnish configuration language），C语言风格。

varnish架构：

如图所示：
varnish的管理进程：编译VCL并应用新配置；监控vanish；初始化varnish；CLI接口；
varnish的Child/cache提供的功能如下：
Acceptor：接收新的连接请求；
worker threads：处理用户请求；
Expiry：清理缓存中的过期对象；

varnish的日志：Shared Memory Log，共享内存日志大小默认一般为90MB，分为两部分，前一部分为计数器，后一部分请求相关的数据；

vcl: Varnish Configuration Language
缓存策略配置接口；
基于“域”的简单编程语言；

varnish的安装：
yum install varnish =y

内存分配和回收：
malloc(), free() 性能不高，varnish使用jmalloc
yum info jmalloc

varnish如何存储缓存对象：
file: 单个文件；不支持持久机制；重启就失效
malloc: 内存；内存4K，4M，缓存对象缓存在内存中
persistent：基于文件的持久存储；
硬件直接上PCI-e接口固态

varnish监听6081端口和6082端口，6081端口提供服务（可改为80端口），6082端口是管理接口

配置varnish的三种应用：
1、varnishd应用程序的命令行参数；
监听的socket, 使用的存储类型等等；额外的配置参数；
-p param=value 设定参数=值
-r param,param,... : 设定只读参数列表；

/etc/varnish/varnish.params

2、-p选项指明的参数，也就是运行时参数，配置Child/Cache进程：
也可在程序运行中，通过其CLI进行配置；varnishadm 登录后设置

3、vcl：配置缓存系统的缓存机制；
通过vcl配置文件进行配置；/etc/varnish/default.vcl
先编译，后应用；依赖于c编译器；

varnish命令行工具：
varnishadm -S /etc/varnish/secret -T 127.0.0.1:6082 管理命令varnishadm

Log:日志工具命令有如下2个
varnishlog
varnishncsa

Statistics 查看varnish状态
varnishstat

Top：排序
varnishtop

vcl:
vcl内部有多个state engine：各引擎之间存一定程度上的相关性；前一个engine如果可以有多种下游engine，则上游engine需要用return指明要转移的下游engine；
vcl_recv
vcl_hash
vcl_hit
vcl_miss
vcl_fetch
vcl_deliver
vcl_pipe
vcl_pass
vcl_error

vcl编程语言语法：
(1) //, #, /* */ 用于注释；会被编译器忽略；
(2) sub $name: 用于定义子例程；
sub vcl_recv {

}
(3) 不支持循环；
(4) 有众多内置的变量，变量的可调用位置与state engine有密切相关性；
(5) 支持终止语句，return(action) 表示到action中去，没有返回值；
(6) "域"专用；
(7) 操作符：=, ==, ~, !, &&, ||

条件判断语句：
if (CONDTION) {

} else {

}

变量赋值：
set name=value 设置变量
unset name 取消变量

req.http.HEADER：调用request报文中http协议的指定的HEADER首部；例子如下：
req.http.X-Forwarded-For
req.http.Auhtorization
req.http.cookie

req.request 表示请求方法，也就是get，post，head，put，delete等

client.ip 表示客户端IP；

vcl_recv引擎的工作原理如下：

vcl_hash原理：

vcl_fetch引擎原理：

state engine workflow(v3)有如下几种情况:
vcl_recv --> vcl_hash --> vcl_hit --> vcl_deliver
vcl_recv --> vcl_hash --> vcl_miss --> vcl_fetch --> vcl_deliver
vcl_recv --> vcl_pass --> vcl_fetch --> vcl_deliver
vcl_recv --> vcl_pipe

state engine workflow(v4)
vcl_recv
vcl_pass
vcl_pipe
vcl_hash
vcl_hit
vcl_miss

vcl_backend_fetch
vcl_backend_response
vcl_backend_error

vcl_purge
vcl_synth

sub vcl_recv {
if (req.method == "PRI") {
/* We do not support SPDY or HTTP/2.0 */
return (synth(405));
}

if (req.method != "GET" &&
req.method != "HEAD" &&
req.method != "PUT" &&
req.method != "POST" &&
req.method != "TRACE" &&
req.method != "OPTIONS" &&
req.method != "DELETE") {
/* Non-RFC2616 or CONNECT which is weird. */
return (pipe);
}

if (req.method != "GET" && req.method != "HEAD") {
/* We only deal with GET and HEAD by default */
return (pass);
}
if (req.http.Authorization || req.http.Cookie) {
/* Not cacheable by default */
return (pass);
}
return (hash);
}

定义在.vcl文件中的sub vcl_deliver中，向响应给客户端的报文添加一个自定义首部X-Cache；
if (obj.hits>0) {
set resp.http.X-Cache = "HIT";
} else {
set resp.http.X-Cahce = "MISS";
}

varnish中的内置变量：
变量种类：
client 客户端
server varnish自身
req 客户端请求
resp
bereq varnish向后端发送的请求
beresp 后端发送的响应
obj
storage

bereq
bereq.http.HEADERS: 由varnish发往backend server的请求报文的指定首部；
bereq.request：请求方法；
bereq.url：
bereq.proto：
bereq.backend：指明要调用的后端主机；

beresp （backend server response）
beresp.proto
beresp.status：后端服务器的响应的状态码
beresp.reason：原因短语；
beresp.backend.ip
beresp.backend.name
beresp.http.HEADER: 从backend server响应的报文的首部；
beresp.ttl：后端服务器响应的内容的余下的生存时长；

obj
obj.ttl: 对象的ttl值；
obj.hits：此对象从缓存中被命中的次数；

server
server.ip
server.hostname

官方文档：https://www.varnish-cache.org/docs/

使varnish支持虚拟主机：
if (req.http.host == "www.magedu.com") {

}

强制对某资源的请求，不检查缓存；
/admin
/login

if (req.url ~ "(?i)^/login" || req.url ~ "(?i)^/admin") {
return(pass);
} //表示/admin和/login目录下的资源的访问不缓存

对特定类型的资源取消其私有的cookie标识，并强行设定其可以varnish缓存的时长：
vcl_backend_response
if (beresp.http.cache-control !~ "s-maxage") {
if (bereq.url ~ "(?i)\.jpg$") { //没有设置公共缓存s-maxage，并且为jpg资源
set beresp.ttl = 3600s; //设置公共缓存时长
unset beresp.http.Set-Cookie;
}
if (bereq.url ~ "(?i)\.css$") {
set beresp.ttl = 600s;
unset beresp.http.Set-Cookie;
}
}
官方配置示例：https://www.varnish-cache.org/trac/wiki/VCLExamples

backend server的定义：
backend name {
.attribute = "value";
}

.host: BE主机（后端主机）的IP；
.port：BE主机监听的PORT；

.probe: 对BE做健康状态检测；
.max_connections：并连接最大数量；

后端主机的健康状态检测方式：
probe name {
.attribute = "value";
}

.url: 判定BE健康与否要请求的url;
.expected_response：期望响应状态码；默认为200；

示例1：
backend websrv1 {
.host = "172.16.100.68";
.port = "80";
.probe = {
.url = "/test1.html";
}
}

backend websrv2 {
.host = "172.16.100.69";
.port = "80";
.probe = {
.url = "/test1.html";
}
}

sub vcl_recv {
if (req.url ~ "(?i)\.(jpg|png|gif)$") {
set req.backend_hint = websrv1;
} else {
set req.backend_hint = websrv2;
}
}

示例2：负载均衡
import directors;

sub vcl_init {
new mycluster = directors.round_robin();
mycluster.add_backend(websrv1);
mycluster.add_backend(websrv2);
}

sub vcl_recv {
set req.backend_hint = mycluster.backend();
}

负载均衡算法：
fallback, random, round_robin, hash

掌握：varnishlog, varnishncsa, varnishtop, varnishstat的用法

作业：

vcl: backend, backend的调用，健康状态检测的定义，负载均衡的实现，根据资源类型不同完成分发，控制哪些内容不查或查询缓存；

搜索此博客

test

web缓存varnish(第36天)

评论

发表评论

此博客中的热门博文

OAuth 2教程

rootfs，linux目录结构，（第3天）