本文共 9661 字,大约阅读时间需要 32 分钟。
一般来说,查询的结果类似于下边这种的(我转换成了 JSON 格式的):
{ "took": 993, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 10000, "relation": "gte" }, "max_score": 1.0, "hits": [ { "_index": "bank", "_type": "account", "_id": "1", "_score": 1.0, "_source": { ...
可以看到结果里边有一些参数,参数含义如下:
官方的数据可以从 GitHub 下载:。account.json 结构类似于下面这种:
{"index":{"_id":"1"}}{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}{"index":{"_id":"6"}}{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}{"index":{"_id":"13"}}{"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","email":"nanettebates@quility.com","city":"Nogal","state":"VA"}{"index":{"_id":"18"}}{"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","employer":"Boink","email":"daleadams@boink.com","city":"Orick","state":"MD"}...
但是这些数据量还根本不够 10000 条,所以没法进行超量测试。不过,我们可以自己写个脚本把这个 account.json 扩容一下(写得很粗糙,请不要介意),我这儿只是把原来的数据的 id 改了一下创建了新的数据,你可以有自己的想法,只要数据量够就可以了,构造的新数据存在 test.json 文件里边:
# Rubyrequire 'json'dat = File.open("accounts.json").readlinesmyFile = File.new("test.json","a+")(1..30).each do |i| data = Array.new(dat) data.each do |line| line = JSON.parse(line) if line.key? "index" line["index"]["_id"] = (line["index"]["_id"].to_i * i).to_s end if line.key? "account_number" line["account_number"] *= i end myFile.puts line.to_json endendmyFile.close
然后将构造的数据使用下面的命令批量导入到 elasticsearch 数据库:
[looking@master ruby_learning]$ curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' -H 'Content-Type: application/json' --data-binary "@test.json"
我们可以看到索引 bank 的数据量是超过 10000 的,所以我们来进行超量测试应该是没什么问题的。
[looking@master ruby_learning]$ curl localhost:9200/_cat/indices?vhealth status index uuid pri rep docs.count docs.deleted store.size pri.store.sizeyellow open megacorp Ed5gg9hoRM24dE3AAD1DEQ 1 1 3 0 11.5kb 11.5kbyellow open website svdjlzy2TfCQFLHcKWhjRw 1 1 2 0 8.9kb 8.9kbyellow open bank cSxDPBsyRVyjsHGe9b2RZA 1 1 13022 16978 5.6mb 5.6mbyellow open blogs qfnun_91RI2O1lgTjnBmCQ 3 1 0 0 849b 849b
先写个简单的数据查询脚本:
# Rubyrequire 'elasticsearch'require 'json'host = '127.0.0.1'port = 9200client = Elasticsearch::Client.new url: "http://#{host}:#{port}"size = 10query = { query: { 'match_all': {} }, size: size}result = client.search index: 'bank', body: queryputs JSON.pretty_generate(result)
输出了 10 条数据,输出结果也很好看:
{ "took": 1, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 10000, "relation": "gte" }, "max_score": 1.0, "hits": [ { "_index": "bank", "_type": "account", "_id": "1", "_score": 1.0, "_source": { ...... }, { "_index": "bank", "_type": "account", "_id": "347", "_score": 1.0, "_source": { "account_number": 347, "balance": 36038, "firstname": "Gould", "lastname": "Carson", "age": 24, "gender": "F", "address": "784 Pulaski Street", "employer": "Mobildata", "email": "gouldcarson@mobildata.com", "city": "Goochland", "state": "MI" } } ] }}
我们修改 size = 10000,好像还可以正常运行。
我们修改 size = 10001,好像出问题了:
[looking@master ruby_learning]$ ruby test2.rbTraceback (most recent call last): 5: from test2.rb:14:in `' 4: from /usr/local/ruby-2.7.1/lib/ruby/gems/2.7.0/gems/elasticsearch-api-7.9.0/lib/elasticsearch/api/actions/search.rb:103:in `search' 3: from /usr/local/ruby-2.7.1/lib/ruby/gems/2.7.0/gems/elasticsearch-transport-7.9.0/lib/elasticsearch/transport/client.rb:176:in `perform_request' 2: from /usr/local/ruby-2.7.1/lib/ruby/gems/2.7.0/gems/elasticsearch-transport-7.9.0/lib/elasticsearch/transport/transport/http/faraday.rb:37:in `perform_request' 1: from /usr/local/ruby-2.7.1/lib/ruby/gems/2.7.0/gems/elasticsearch-transport-7.9.0/lib/elasticsearch/transport/transport/base.rb:347:in `perform_request'/usr/local/ruby-2.7.1/lib/ruby/gems/2.7.0/gems/elasticsearch-transport-7.9.0/lib/elasticsearch/transport/transport/base.rb:218:in `__raise_transport_error': [400] {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"bank","node":"hmeiFSEDRZK4hY0jQ1eV7Q","reason":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.","caused_by":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}},"status":400} (Elasticsearch::Transport::Transport::Errors::BadRequest)
这儿修改了 bank 索引的配置,重新设置了 max_result_window 的值。
[looking@master ruby_learning]$ curl -XPUT "localhost:9200/bank/_settings?pretty" -H "Content-Type: application/json" -d '> {> "index" : { "max_result_window" : 100000000}> }> '{ "acknowledged" : true}
再次运行脚本查询,这次虽然没报错了,但是结果仍然只有 10000 条:
{ "took": 993, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 10000, "relation": "gte" }, "max_score": 1.0, "hits": [ { "_index": "bank", "_type": "account", "_id": "1", "_score": 1.0, "_source": { ...
查询语句中加入:track_total_hits: true。
require 'elasticsearch'require 'json'host = '127.0.0.1'port = 9200client = Elasticsearch::Client.new url: "http://#{host}:#{port}"size = 10001query = { track_total_hits: true, query: { 'match_all': {} }, size: size,}result = client.search index: 'bank', body: queryputs JSON.pretty_generate(result)
这次我们看到索引统计的结果:
[looking@master ruby_learning]$ curl -X GET "localhost:9200/_cat/indices?"vhealth status index uuid pri rep docs.count docs.deleted store.size pri.store.sizeyellow open bank cSxDPBsyRVyjsHGe9b2RZA 1 1 13022 16978 5.6mb 5.6mbyellow open website svdjlzy2TfCQFLHcKWhjRw 1 1 2 0 8.9kb 8.9kbyellow open megacorp Ed5gg9hoRM24dE3AAD1DEQ 1 1 3 0 11.5kb 11.5kbyellow open blogs qfnun_91RI2O1lgTjnBmCQ 3 1 0 0 849b 849b
和下边
# result.json{ "took": 215, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 13022, "relation": "eq" }, "max_score": 1.0, "hits": [ { "_index": "bank", "_type": "account", "_id": "1", "_score": 1.0, "_source": { ...
里的统计结果保持一致(hits.total.value 表示总命中计数的值,由于 hits.total.relation 的值是 eq,说明 hits.total.value 的值是准确计数):
"total": { "value": 13022, "relation": "eq" }
真正返回的数据条数仍然为 size 设置的条数(当 size < total 的时候):
# Ruby# test.rbrequire 'json'aa = JSON.load(File.open('result.json'))puts aa['hits']['hits'].size------------------------------------------------------------[looking@master ruby_learning]$ ruby test.rb10001
如果你把size 设置超过文档总数的话,也就把全部查询结果返回来了(比如我设置 size = 20000):
# Ruby# test.rbrequire 'json'aa = JSON.load(File.open('result.json'))puts aa['hits']['hits'].size------------------------------------------------------------[looking@master ruby_learning]$ ruby test.rb13022
转载地址:http://zjjqi.baihongyu.com/