当前位置：首页 > news >正文

优秀国外设计网站徐州网站简介

news 2026/5/7 20:56:37

优秀国外设计网站,徐州网站简介,杭州建设工程交易网,360建筑网如何删除自己的信息前言本节内容我们完成Flume数据采集的一个多路复用案例#xff0c;使用三台服务器#xff0c;一台服务器负责采集本地日志数据#xff0c;通过使用Replicating ChannelSelector选择器#xff0c;将采集到的数据分发到另外俩台服务器#xff0c;一台服务器将数据存储到hd…前言本节内容我们完成Flume数据采集的一个多路复用案例使用三台服务器一台服务器负责采集本地日志数据通过使用Replicating ChannelSelector选择器将采集到的数据分发到另外俩台服务器一台服务器将数据存储到hdfs另外一台服务器将数据存储在本机使用Avro的方式完成flume之间采集数据的传输。整体架构如下正文 ①在hadoop101服务器的/opt/module/apache-flume-1.9.0/job目录下创建job-file-flume-avro.conf配置文件用于监控hive日志并传输到avro sink - job-file-flume-avro.conf配置文件 # Name the components on this agent a1.sources r1 a1.sinks k1 k2 a1.channels c1 c2 # 将数据流复制给所有 channel a1.sources.r1.selector.type replicating # Describe/configure the source a1.sources.r1.type exec a1.sources.r1.command tail -F /tmp/hadoop/hive.log a1.sources.r1.shell /bin/bash -c # Describe the sink # sink 端的 avro 是一个数据发送者 a1.sinks.k1.type avro a1.sinks.k1.hostname hadoop102 a1.sinks.k1.port 4141 a1.sinks.k2.type avro a1.sinks.k2.hostname hadoop103 a1.sinks.k2.port 4142 # Describe the channel a1.channels.c1.type memory a1.channels.c1.capacity 1000 a1.channels.c1.transactionCapacity 100 a1.channels.c2.type memory a1.channels.c2.capacity 1000 a1.channels.c2.transactionCapacity 100 # Bind the source and sink to the channel a1.sources.r1.channels c1 c2 a1.sinks.k1.channel c1 a1.sinks.k2.channel c2②在hadoop102服务器的/opt/module/apache-flume-1.9.0/job目录下创建job-avro-flume-hdfs.conf配置文件将监控数据传输到hadoop的hdfs系统 - job-avro-flume-hdfs.conf配置文件 # Name the components on this agent a2.sources r1 a2.sinks k1 a2.channels c1 # Describe/configure the source # source 端的 avro 是一个数据接收服务 a2.sources.r1.type avro a2.sources.r1.bind hadoop102 a2.sources.r1.port 4141 # Describe the sink a2.sinks.k1.type hdfs a2.sinks.k1.hdfs.path hdfs://hadoop101:8020/flume2/%Y%m%d/%H #上传文件的前缀 a2.sinks.k1.hdfs.filePrefix flume2- #是否按照时间滚动文件夹 a2.sinks.k1.hdfs.round true #多少时间单位创建一个新的文件夹 a2.sinks.k1.hdfs.roundValue 1 #重新定义时间单位 a2.sinks.k1.hdfs.roundUnit hour #是否使用本地时间戳 a2.sinks.k1.hdfs.useLocalTimeStamp true #积攒多少个 Event 才 flush 到 HDFS 一次 a2.sinks.k1.hdfs.batchSize 100 #设置文件类型可支持压缩 a2.sinks.k1.hdfs.fileType DataStream #多久生成一个新的文件 a2.sinks.k1.hdfs.rollInterval 30 #设置每个文件的滚动大小大概是 128M a2.sinks.k1.hdfs.rollSize 134217700 #文件的滚动与 Event 数量无关 a2.sinks.k1.hdfs.rollCount 0 # Describe the channel a2.channels.c1.type memory a2.channels.c1.capacity 1000 a2.channels.c1.transactionCapacity 100 # Bind the source and sink to the channel a2.sources.r1.channels c1 a2.sinks.k1.channel c1③在hadoop103服务器的/opt/module/apache-flume-1.9.0/job目录下创建job-avro-flume-dir.conf配置文件将监控数据传输到/opt/module/apache-flume-1.9.0/flume3目录下 - job-avro-flume-dir.conf配置文件 # Name the components on this agent a3.sources r1 a3.sinks k1 a3.channels c2 # Describe/configure the source a3.sources.r1.type avro a3.sources.r1.bind hadoop103 a3.sources.r1.port 4142 # Describe the sink a3.sinks.k1.type file_roll a3.sinks.k1.sink.directory /opt/module/apache-flume-1.9.0/flume3 # Describe the channel a3.channels.c2.type memory a3.channels.c2.capacity 1000 a3.channels.c2.transactionCapacity 100 # Bind the source and sink to the channel a3.sources.r1.channels c2 a3.sinks.k1.channel c2 - 创建数据存储目录/opt/module/apache-flume-1.9.0/flume3 ④启动hadoop集群 ⑤启动hadoop102上的flume任务job-avro-flume-hdfs.conf - 命令 bin/flume-ng agent -c conf/ -n a2 -f job/job-avro-flume-hdfs.conf -Dflume.root.loggerINFO,console ⑥启动hadoop103上的flume任务job-avro-flume-dir.conf - 命令 bin/flume-ng agent -c conf/ -n a3 -f job/job-avro-flume-dir.conf -Dflume.root.loggerINFO,console ⑦启动hadoop101上的flume任务job-file-flume-avro.conf - 命令 bin/flume-ng agent -c conf/ -n a1 -f job/job-file-flume-avro.conf -Dflume.root.loggerINFO,console ⑧启动hive ⑨查看监控结果 - 查看hdfs - 查看存储目录/opt/module/apache-flume-1.9.0/flume3下的文件结语至此关于Flume数据采集之复制和多路复用案例实战到这里就结束了我们下期见。。。。。。

查看全文

http://www.hkea.cn/news/14572943/