hadoop重写方法有哪些

这篇文章主要介绍“hadoop重写方法有哪些”，在日常操作中，相信很多人在hadoop重写方法有哪些问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”hadoop重写方法有哪些”的疑惑有所帮助！接下来，请跟着小编一起来学习吧！1. 下载（略）2. 编译（略）3. 配置（伪分布、集群略）4. Hdfs1. Web interface:http://namenode-name:50070/(显示datanode列表和集群统计信息)2. shell command & dfsadmin comman3. checkpoint node & backup node1. fsimage和edits文件merge原理2. （猜测是早期版本的特性）手动恢复宕掉的集群：import checkpoint；3. backupnode:Backup Node在内存中维护了一份从Namenode同步过来的fsimage，同时它还从namenode接收edits文件的日志流，并把它们持久化硬盘，Backup Node把收到的这些edits文件和内存中的fsimage文件进行合并，创建一份元数据备份。Backup Node高效的秘密就在这儿，它不需要从Namenode下载fsimage和edit，把内存中的元数据持久化到磁盘然后进行合并即可。4. banlancer:平衡各rock和datanodes数据不均衡5. Rock awareness：机架感知6. Safemode：当数据文件不完整或者手动进入safemode时，hdfs只读，当集群检查达到阈值或手动离开安全模式时，集群恢复读写。7. Fsck：块文件检查命令8. Fetchdt:获取token（安全）9. Recovery mode:恢复模式10. Upgrade and Rollback：升级、回滚11. File Permissions and Security12. Scalability13. 5. Mapreduce1. public class MyMapper extends Mapper private Text word = new Text(); private IntWritable one = new IntWritable(1); // 重写map方法 @Override public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer stringTokenizer = new StringTokenizer(value.toString()); while(stringTokenizer.hasMoreTokens()){ word.set(stringTokenizer.nextToken()); // （word,1）进行传递 context.write(word, one); } }}public class MyReducer extends Reducer{ private IntWritable result = new IntWritable(0); // 重写reduce方法 @Override protected void reduce(Text key, Iterable iterator, Context context) throws IOException, InterruptedException { int sum = 0; for(IntWritable i : iterator){ sum += i.get(); } result.set(sum); // reduce输出的值 context.write(key, result); }}public class WordCountDemo { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, “word count”); job.setJarByClass(WordCountDemo.class); // 设置map、reduce class job.setMapperClass(MyMapper.class); job.setReducerClass(MyReducer.class); job.set开发云主机域名CombinerClass(MyReducer.class); // 设置最终输出的格式 job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); // 设置FileInputFormat outputFormat FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}2. Job.setGroupingComparatorClass(Class).3. Job.setCombinerClass(Class),4. CompressionCodec5. Map数：Configuration.set(MRJobConfig.NUM_MAPS, int) => dataSize/blockSize6. Reducer数：Job.setNumReduceTasks(int).With0.95all of the reduces can launch immediately and start transferring map outputs as the maps finish. With1.75the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.7. Reduce->shuffle: Input to theReduceris the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. –> reduce是mapper排序后的输出的结果。在这一阶段，框架通过http抓取所有mapper输出的有关分区。8. Reduce ->sort：The framework groupsReducerinputs by keys (since different mappers may have output the same key) in this stage. The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.-> 在这一阶段，框架按照输入的key（不同的mapper可能输出相同的key）分组reducer。Shuffle和sort会同时发生，当map输出被捕捉时，他们又会进行合并。9. Reduce ->reduce：10. Secondary sort11. Partitioner12. Counter：MapperandReducerimplementations can use theCounterto report statistics.13. Job conf：配置 -> speculativemanner (setMapSpeculativeExecution(boolean))/setReduceSpeculativeExecution(boolean)), maximum number of attempts per task (setMaxMapAttempts(int)/setMaxReduceAttempts(int)) etc.OrConfiguration.set(String, String)/Configuration.get(String)14. Task executor & environment -> The user can specify additional options to the child-jvm via themapreduce.{map|reduce}.java.optsand configuration parameter in theJobsuch as non-standard paths for the run-time linker to search shared libraries via-Djava.library.path=etc. If themapreduce.{map|reduce}.java.optsparameters contains the symbol@taskid@it is interpolated with value oftaskidof the MapReduce task.15. Memory management – > Users/admins can also specify the maximum virtual memory of the launched child-task, and any sub-process it launches recursively, usingmapreduce.{map|reduce}.memory.mb. Note that the value set here is a per process limit. The value formapreduce.{map|reduce}.memory.mbshould be specified in mega bytes (MB). And also the value must be greater than or equal to the -Xmx passed to JavaVM, else the VM might not start.16. Map Parameters …… (http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#MapReduce_Tutorial)17. Parameters ()18. Job submission and monitoring:1.Jobprovides facilities to submit jobs, track their progress, access component-tasks’ reports and logs, get the MapReduce cluster’s status information and so on.2. The job submission process involves:1. Checking the input and output specifications of the job.2. Computing the InputSplit values for the job.3. Setting up the requisite accounting information for the DistributedCache of the job, if necessary.4. Copying the job’s jar and configuration to the MapReduce system directory on the FileSystem.5. Submitting the job to the ResourceManager and optionally monitoring it’s status.3. Job history19. Job controller1. Job.submit() || Job.waitForCompletion(boolean)2. 多Mapreduce job1. 迭代式mapreduce（上一个mr作为下一个mr的输入，缺点：创建job对象的开销、本地磁盘读写io和网络开销大）2. MapReduce-JobControl：job封装各个job的依赖关系，jobcontrol线程管理各个作业的状态。3. MapReduce-ChainMapper/ChainReduce：（chainMapper.addMap().可以在一个job中链接多个mapper任务，不可用于多reduce的job）。20. Job input & output1. InputFormat TextInputFormat FileInputFormat2. InputSplit FileSplit3. RecordReader4. OutputFormat OutputCommitter到此，关于“hadoop重写方法有哪些”的学习就结束了，希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习，快去试试吧！若想继续学习更多相关知识，请继续关注开发云网站，小编会继续努力为大家带来更多实用的文章！

相关推荐: 网站空间申请流程

网站空间申请流程？针对这个问题，这篇文章详细介绍了相对应的分析和解答，希望可以帮助更多想解决这个问题的小伙伴找到更简单易行的方法。一个好的网站无论从排名还是从用户体验上来说，配备一个好的网站空间是必不可少的。在申请网站空间时，要按步骤来实行。首先，我们需要对自…