从NodeJS查询Oracle数据库中的大数据集

问题描述:

我目前正在工作的一个项目中,我有一个Oracle 10数据库表,其中约310K给出或接受10-30K行.

I'm currently working on a project from work where i have an Oracle 10 database table with about 310K give or take 10-30K rows.

目标是将这些行显示在一个有角度的前端中,但是通过NodeJS返回所有这些行会花费很多时间.

The goal is to display those rows in an angular frontend, however returning all of those through NodeJS is taking a lot of time.

考虑到我是第一次使用NodeJS和oracledb,我假设我必须缺少某些东西?

Given that I'm using both NodeJS and oracledb for the first time, i'm assuming i must be missing something?

var oracledb = require('oracledb');
var config = require(__dirname+'/../db.js');

function get(req,res,next)
{
var table = req.query.table;
var meta;

oracledb.getConnection(config.oracle)
.then( function(connection)
{
    var stream = connection.queryStream('SELECT * FROM '+table);

    stream.on('error', function (error) 
    {
        console.error(error);
        return next(err);
    });

    stream.on('metadata', function (metadata) {
        console.log(metadata);
    });

    stream.on('data', function (data) {
        console.log(data);
    });

    stream.on('end', function () 
    {
      connection.release(
        function(err) {
          if (err) {
            console.error(err.message);
            return next(err);
          }
        });
    });
})
.catch(function(err){
    if(err){
        connection.close(function(err){
            if(err){
                console.error(err.message);
                return next(err);
            }
        });
    }
})
}

module.exports.get = get;

30 MB是要加载到前端的大量数据.它可以在某些情况下工作,例如桌面Web应用程序,其中缓存"数据的好处抵消了加载数据所需的时间(并且可以增加陈旧数据).但是在其他情况下(例如移动设备),它不能很好地工作.

30 MB is a lot of data to load into the front end. It can work in some cases, such as desktop web apps where the benefits of "caching" the data offset the time needed to load it (and increased stale data is okay). But it will not work well in other cases, such as mobile.

请记住,必须将30 MB从数据库移至Node.js,然后再从Node.js移至客户端.这些之间的网络连接将极大地影响性能.

Keep in mind that the 30 MB must be moved from the DB to Node.js and then from Node.js to the client. The network connections between these will greatly impact performance.

我将指出一些有助于提高性能的内容,尽管并非所有都与该问题完全相关.

I'll point out a few things that can help performance, though not all are exactly related to this question.

首先,如果您使用的是Web服务器,则应使用连接池,而不是专用/一次性连接.通常,您将在index/main/app.js中创建连接池,并在完成并准备好之后启动Web服务器.

First, if you're using a web server, you should be using a connection pool, not dedicated/one-off connections. Generally, you'd create the connection pool in your index/main/app.js and start the web server after that's done and ready.

这是一个例子:

const oracledb = require('oracledb');
const express = require('express');
const config = require('./db-config.js');
const thingController = require('./things-controller.js');

// Node.js used 4 background threads by default, increase to handle max DB pool.
// This must be done before any other calls that will use the libuv threadpool.
process.env.UV_THREADPOOL_SIZE = config.poolMax + 4;

// This setting can be used to reduce the number of round trips between Node.js
// and the database.
oracledb.prefetchRows = 10000;

function initDBConnectionPool() {
  console.log('Initializing database connection pool');

  return oracledb.createPool(config);
}

function initWebServer() {
  console.log('Initializing webserver');

  app = express();

  let router = new express.Router();

  router.route('/things')
    .get(thingController.get);  

  app.use('/api', router);

  app.listen(3000, () => {
    console.log('Webserver listening on localhost:3000');
  });
}

initDBConnectionPool()
  .then(() => {
    initWebServer();
  })
  .catch(err => {
    console.log(err);
  });

这将创建一个添加到内部池缓存" .这样一来,您就可以轻松地从其他模块访问它(以后的示例).

That will create a pool which is added to the internal pool cache in the driver. This allows you to easily access it from other modules (example later).

请注意,使用连接池时,通常最好增加Node.js可用的线程池,以允许池中的每个连接并发工作.上面提供了一个示例.

Note that when using connection pools, it's generally a good idea to increase the thread pool available to Node.js to allow each connection in the pool to work concurrently. An example of this is included above.

此外,我正在增加 oracledb.prefetchRows .此设置与您的问题直接相关.网络往返用于在DB和Node.js之间移动数据.通过此设置,您可以调整每次往返获取的行数.因此,随着prefetchRows越来越高,需要的往返次数会减少,并且性能会提高.请注意,不要高估Node.js服务器中的内存量.

In addition, I'm increasing the value of oracledb.prefetchRows. This setting is directly related to your question. Network round trips are used to move the data between the DB and Node.js. This setting allows you to adjust the number of rows fetched with each round trip. So as prefetchRows goes higher, fewer round trips are needed and performance increases. Just be careful you don't go to high as per the memory you have in your Node.js server.

我进行了一个通用测试,模拟了30 MB数据集的大小.将oracledb.prefetchRows保留为默认值100时,测试在1分6秒内完成.当我将其提高到10,000时,它在27秒内完成.

I ran a generic test that mocked the 30 MB dataset size. When oracledb.prefetchRows was left at the default of 100, the test finished in 1 minute 6 seconds. When I bumped this up to 10,000, it finished in 27 seconds.

好的,转到基于您的代码的"things-controller.js".我已经更新了代码以执行以下操作:

Okay, moving on to "things-controller.js" which is based on your code. I've updated the code to do the following:

  • 确认该表是有效的表名.您当前的代码容易受到SQL注入的攻击.
  • 使用一个模拟try/catch/finally块的promise链仅关闭一次连接并返回遇到的第一个错误(如果需要).
  • 工作,以便我可以运行测试.
  • Assert that table is a valid table name. Your current code is vulnerable to SQL injection.
  • Use a promise chain that emulates a try/catch/finally block to close the connection just once and return the first error encountered (if needed).
  • Work so I could run the test.

结果如下:

const oracledb = require('oracledb');

function get(req, res, next) {
    const table = req.query.table;
    const rows = [];
    let conn;
    let err; // Will store the first error encountered

    // You need something like this to preven SQL injection. The current code
    // is wide open.
    if (!isSimpleSqlName(table)) {
        next(new Error('Not simple SQL name'));
        return;
    }

    // If you don't pass a config, the connection is pulled from the 'default'
    // pool in the cache.
    oracledb.getConnection() 
        .then(c => {
            return new Promise((resolve, reject) => {
                conn = c;

                const stream = conn.queryStream('SELECT * FROM ' + table);

                stream.on('error', err => {
                    reject(err);
                });

                stream.on('data', data => {
                    rows.push(data); 
                });

                stream.on('end', function () {
                    resolve();
                });
            });
        })
        .catch(e => {
            err = err || e;
        })
        .then(() => {
            if (conn) { // conn assignment worked, need to close/release conn
                return conn.close();
            }
        })
        .catch(e => {
            console.log(e); // Just log, error during release doesn't affect other work
        })
        .then(() => {
            if (err) {
                next(err);
                return;
            }

            res.status(200).json(rows);
        });
}

module.exports.get = get;

function isSimpleSqlName(name) {
  if (name.length > 30) {
    return false;
  }

  // Fairly generic, but effective. Would need to be adjusted to accommodate quoted identifiers,
  // schemas, etc.
  if (!/^[a-zA-Z0-9#_$]+$/.test(name)) {
    return false;
  }

  return true;
}

我希望能有所帮助.如果您有任何问题,请告诉我.

I hope that helps. Let me know if you have questions.