20 条回复  ·  2324 次点击
henbf 小成 2025-2-7 21:58:18
喷 Node.js 之前反思一下自己是不是应该先搞清楚 I/O 和流的基本概念
zhouyin 楼主 小成 2025-2-7 22:24:51
@henbf 我不是 nodejs 高手 我把 a.js 更新了 使用了输出流 但现在报堆溢出错误了 : ```bash -bash-4.2# node a.js (node:17974) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [WriteStream]. Use emitter.setMaxListeners() to increase limit (Use `node --trace-warnings ...` to show where the warning was created) (node:17974) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [WriteStream]. Use emitter.setMaxListeners() to increase limit (node:17974) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [WriteStream]. Use emitter.setMaxListeners() to increase limit <--- Last few GCs ---> [17974:0x1c3dbf0] 40306 ms: Scavenge (reduce) 2046.8 (2082.1) -> 2046.5 (2082.6) MB, 44.4 / 0.0 ms (average mu = 0.342, current mu = 0.316) allocation failure [17974:0x1c3dbf0] 40396 ms: Scavenge (reduce) 2047.2 (2082.6) -> 2046.8 (2082.8) MB, 31.1 / 0.0 ms (average mu = 0.342, current mu = 0.316) allocation failure <--- JS stacktrace ---> FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory 1: 0x7fcfb6136908 node::Abort() [/lib64/libnode.so.93] 2: 0x7fcfb6024451 [/lib64/libnode.so.93] 3: 0x7fcfb732a552 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/lib64/libnode.so.93] 4: 0x7fcfb732a8e7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/lib64/libnode.so.93] 5: 0x7fcfb74ea305 [/lib64/libnode.so.93] 6: 0x7fcfb74ea3e5 [/lib64/libnode.so.93] 7: 0x7fcfb74fe77c v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/lib64/libnode.so.93] 8: 0x7fcfb74ff0a1 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/lib64/libnode.so.93] 9: 0x7fcfb7502269 v8::internal::Heap::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/lib64/libnode.so.93] 10: 0x7fcfb75022f7 v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/lib64/libnode.so.93] 11: 0x7fcfb74c27d0 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [/lib64/libnode.so.93] 12: 0x7fcfb74badb4 v8::internal::FactoryBase::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [/lib64/libnode.so.93] 13: 0x7fcfb74bcbdf v8::internal::FactoryBase::NewRawOneByteString(int, v8::internal::AllocationType) [/lib64/libnode.so.93] 14: 0x7fcfb74c4d5d v8::internal::Factory::NewStringFromUtf8(v8::base::Vector const&, v8::internal::AllocationType) [/lib64/libnode.so.93] 15: 0x7fcfb733d59d v8::String::NewFromUtf8(v8::Isolate*, char const*, v8::NewStringType, int) [/lib64/libnode.so.93] 16: 0x7fcfb6215390 node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, node::encoding, v8::Local*) [/lib64/libnode.so.93] 17: 0x7fcfb6123ef3 [/lib64/libnode.so.93] 18: 0x7fcfb71ba3cc [/lib64/libnode.so.93] Aborted ```
henbf 小成 2025-2-7 22:35:38
@zhouyin 你的写的不对 const { createReadStream, createWriteStream } = require("fs"); const { parse } = require("csv-parse"); const inputPath = "../outpy.csv"; const outputPath = "./test.txt"; const readStream = createReadStream(inputPath); const writeStream = createWriteStream(outputPath, { flags: "a" }); const parser = parse({ delimiter: ",", from_line: 2 }); readStream.pipe(parser); parser.on("data", (row) => { writeStream.write(row.join(",") + "\n"); }); parser.on("end", () => { console.log("finished"); writeStream.end(); }); parser.on("error", (error) => { console.error("CSV Parsing Error:", error); });
zhouyin 楼主 小成 2025-2-7 22:45:18
一开始我就是差不多你这样写的 没想到速度没提升 所以改成那样 以为 write 那里有缓冲区 一字不换把你的代码 运行 结果 耗时 一分钟多 望 python 莫及 -bash-4.2# time node a.js finished real 1m3.579s user 1m4.103s sys 0m2.478s
henbf 小成 2025-2-7 22:59:02
@zhouyin 这中间还要看你对 csv 的每一行进行了怎么样的处理,你用 python 只是一读一写没有任何额外的处理,相当于复制。用 Node.js ,你却把每一行转换成数组,写的时候又把数组转换成字符串,当然慢了。 const { createReadStream, createWriteStream } = require("fs"); const inputPath = "../outpy.csv"; const outputPath = "./test.txt"; const readStream = createReadStream(inputPath, { highWaterMark: 256 * 1024 }); const writeStream = createWriteStream(outputPath, { flags: "a" }); readStream.pipe(writeStream); readStream.on("end", () => { console.log("finished"); writeStream.end(); }); readStream.on("error", (err) => { console.error("Error reading file:", err); }); writeStream.on("error", (err) => { console.error("Error writing file:", err); });
zhouyin 楼主 小成 2025-2-7 23:03:36
@henbf python 返回的是数组 只是写入的也是数组
zhouyin 楼主 小成 2025-2-7 23:17:41
@henbf 我又用了一个库 csvwriter 慢得不得了 python 库就是设计得好 不服不行
zhouyin 楼主 小成 2025-2-7 23:21:39
@zhouyin 用了 csvwriter 时间 3 分多 -bash-4.2# time node a.js finished real 3m45.028s user 4m12.751s sys 2m59.847s
henbf 小成 2025-2-7 23:49:19
@zhouyin ✅✅✅,Node.js 不适合解析 csv ,Python 牛逼
stabc 初学 2025-2-8 03:17:35
1. 解析 csv ,要一个字符一个字符拆分和拼接,底层语言绝对优势,因为可以根据位置拿来直接用,而 node 每次都创建新 string 对象。 2. python 标准库就有 csv 模块,所以也是底层在执行,那么他比 go 语言慢那么多,说明写的比较差。 3. 我刚才简单测试了一下,node 如果优化一下解析过程,减少字符串拼接,解析 400M 的 csv 文件,总用时可以压缩到 5 秒以内。
返回顶部