TL;DR - 1 million documents with readings across 20 sensors (1 reading per document), find the number of readings and their average per sensor. Aggregation framework does it in 4.1 seconds, map/reduce in 67.2 seconds, map/reduce with jsMode=true in 45.5 seconds, and reading all documents into PHP and doing it there takes 3.3 seconds. Conclusion - MongoDB's map/reduce has horrible performance for some reason.
Load the initial data with
Load the initial data with
time mongo load_initial.jswhich on my laptop took 1m32.271s.
// load_initial.js // // Create a collection of documents where each document is // { "sensor_id": 4, "ts": 123456.78, "reading": 2.3 } for(var i=0; i < 1000000; i++) { sensor_id = Math.floor(Math.random() * 20 + 1); // Random int 1-20. ts = Math.random() * 3000000; // Random float from 0-2999999.9999 reading = Math.random() * 100; // Random float 0.0-99.9999 obj = {"sensor_id": sensor_id, "ts": ts, "reading": reading}; db.sensors.save(obj); }Run
mongo --eval 'db.sensors.count()'as a sanity check, which on my laptop printed
MongoDB shell version: 2.2.0 connecting to: localhost:27017/test 1000000For each sensor, let's count the number of readings and find the average reading. First let's do it using the aggregation framework:
// count_avg_by_sensor_aggregate.js // // For each sensor, find the number of readings, and their average value. res = db.sensors.aggregate( { $group: { _id: "$sensor_id", "readings": {$sum: 1}, "average": {$avg: "$reading"} } } ).result.sort( function(a, b) { return a._id - b._id; } ); printjson(res);Run it with
time mongo count_avg_by_sensor_aggregate.jsI omitted the output, but this took 0m4.111s. Now let's do this with a map/reduce:
// count_avg_by_sensor_mr.js // // For each sensor, find the number of readings, and their average value. if(typeof jsMode === "undefined") { jsMode = false; } res = db.sensors.mapReduce( function() { emit(this.sensor_id, { "readings": 1, "total": this.reading }); }, function(key, values) { var result = { "readings": 0, "total": 0 }; for(var i = 0; i < values.length; i++) { result.readings += values[i].readings; result.total += values[i].total; } return result; }, { "out": { "inline": 1 }, "finalize": function(key, value) { return { "readings": value.readings, "average": value.total / value.readings }; }, "jsMode": jsMode } ).results; newres = []; for(var i = 0; i < res.length; i++) { newres.push( { // When jsMode is true, _id becomes a string for some reason. "_id": parseInt(res[i]._id), "readings": res[i].value.readings, "average": res[i].value.average } ); } newres = newres.sort( function(a, b) { return a._id - b._id; } ); printjson(newres);Run it with
time mongo count_avg_by_sensor_mr.jsThis took 1m7.171s . Now do the same thing, but use
jsMode=true
:
time mongo --eval jsMode=true count_avg_by_sensor_mr.jsThis took 0m45.459s . And finally, let's do the same calculation in PHP:
<?php // count_avg_by_sensor.php // // For each sensor, find the number of readings, and their average value. $mongo = new Mongo('mongodb://localhost:27017', array( 'db' =>'test' )); $db = $mongo->selectDB('test'); $sensors = $db->selectCollection('sensors'); $cursor = $sensors->find(); $result = array(); foreach($cursor as $doc) { $_id = (int)$doc['sensor_id']; if(!array_key_exists($_id, $result)) { $result[$_id] = array("readings" => 0, "total" => 0); } $result[$_id]["readings"] += 1; $result[$_id]["total"] += $doc['reading']; } ksort($result); $newresult = array(); foreach($result as $_id => $value) { $newresult[] = array( "_id" => $_id, "readings" => $value["readings"], "average" => $value["total"] / $value["readings"] ); } echo json_encode($newresult, JSON_PRETTY_PRINT);Run it:
time php count_avg_by_sensor.phpThis took 0m3.313s .