Fehler beim Shuffle im Fetcher#1 Hadoop

Fehler beim Shuffle im Fetcher#1 Hadoop

Ich versuche, das Hadoop Pi-Beispiel auszuführen. Es lief ohne Probleme auf einem einzelnen Knoten. Aber jetzt arbeite ich an einem Multiknoten und es tritt der folgende Fehler auf. Wenn jemand einen Rat geben kann, bitte.

mapred-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- In: conf/mapred-site.xml -->
<property>
  <name>mapred.job.tracker</name>
  <value>master:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>
<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx2048m</value>
</property>
<property>
    <name>mapred.shuffle.input.buffer.percent</name>
    <value>0.2</value>
  </property>
</configuration>

Konsolenausgabe:

Number of Maps  = 3
Samples per Map = 10
14/10/11 20:34:20 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
14/10/11 20:34:54 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Starting Job
14/10/11 20:34:54 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/10/11 20:34:55 INFO input.FileInputFormat: Total input paths to process : 3
14/10/11 20:34:55 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/10/11 20:34:55 INFO mapreduce.JobSubmitter: number of splits:3
14/10/11 20:34:55 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null
14/10/11 20:34:55 INFO mapreduce.Job: Running job: job_201410112034_0001
14/10/11 20:34:56 INFO mapreduce.Job:  map 0% reduce 0%
14/10/11 20:35:05 INFO mapreduce.Job:  map 33% reduce 0%
14/10/11 20:35:08 INFO mapreduce.Job:  map 100% reduce 0%
14/10/11 20:35:14 INFO mapreduce.Job:  map 100% reduce 11%
14/10/11 20:35:31 INFO mapreduce.Job: Task Id : attempt_201410112034_0001_r_000000_0, Status : FAILED
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
    at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
    at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:234)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)

14/10/11 20:35:32 INFO mapreduce.Job:  map 100% reduce 0%
14/10/11 20:35:41 INFO mapreduce.Job:  map 100% reduce 11%
14/10/11 20:35:49 INFO mapreduce.Job: Task Id : attempt_201410112034_0001_m_000000_0, Status : FAILED
Too many fetch-failures
14/10/11 20:35:49 WARN mapreduce.Job: Error reading task outputhttp://userA:50060/tasklog?plaintext=true&attemptid=attempt_201410112034_0001_m_000000_0&filter=stdout
14/10/11 20:35:49 WARN mapreduce.Job: Error reading task outputhttp://userA:50060/tasklog?plaintext=true&attemptid=attempt_201410112034_0001_m_000000_0&filter=stderr
14/10/11 20:36:13 INFO mapreduce.Job: Task Id : attempt_201410112034_0001_r_000000_1, Status : FAILED
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
    at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
    at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:234)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)

14/10/11 20:36:14 INFO mapreduce.Job:  map 100% reduce 0%
14/10/11 20:36:22 INFO mapreduce.Job: Task Id : attempt_201410112034_0001_m_000001_0, Status : FAILED
Too many fetch-failures
14/10/11 20:36:22 WARN mapreduce.Job: Error reading task outputhttp://userA:50060/tasklog?plaintext=true&attemptid=attempt_201410112034_0001_m_000001_0&filter=stdout
14/10/11 20:36:22 WARN mapreduce.Job: Error reading task outputhttp://userA:50060/tasklog?plaintext=true&attemptid=attempt_201410112034_0001_m_000001_0&filter=stderr
14/10/11 20:36:23 INFO mapreduce.Job:  map 100% reduce 11%
14/10/11 20:36:32 INFO mapreduce.Job:  map 100% reduce 100%
14/10/11 20:36:34 INFO mapreduce.Job: Job complete: job_201410112034_0001
14/10/11 20:36:34 INFO mapreduce.Job: Counters: 33
    FileInputFormatCounters
        BYTES_READ=354
    FileSystemCounters
        FILE_BYTES_READ=72
        FILE_BYTES_WRITTEN=252
        HDFS_BYTES_READ=765
        HDFS_BYTES_WRITTEN=215
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=1
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    Job Counters 
        Data-local map tasks=5
        Total time spent by all maps waiting after reserving slots (ms)=0
        Total time spent by all reduces waiting after reserving slots (ms)=0
        SLOTS_MILLIS_MAPS=11950
        SLOTS_MILLIS_REDUCES=80809
        Launched map tasks=5
        Launched reduce tasks=3
    Map-Reduce Framework
        Combine input records=0
        Combine output records=0
        Failed Shuffles=1
        GC time elapsed (ms)=6
        Map input records=3
        Map output bytes=54
        Map output records=6
        Merged Map outputs=3
        Reduce input groups=2
        Reduce input records=6
        Reduce output records=0
        Reduce shuffle bytes=84
        Shuffled Maps =3
        Spilled Records=12
        SPLIT_RAW_BYTES=411
Job Finished in 100.067 seconds
Estimated value of Pi is 3.60000000000000000000

Antwort1

Ein Grund für diesen Fehler könnte sein, dass die Kommunikation zwischen den Maschinen in Ihrem Hadoop-Cluster nicht richtig funktioniert. Die Maschinen sollten in der Lage sein, sich gegenseitig anzupingen (zwischen Master und Slaves, aber auch zwischen Slaves). Abhängig von Ihrem Setup müssen Sie möglicherweise die /etc/hostsDateien auf den Maschinen ändern, damit sie sich gegenseitig per Hostnamen anpingen können.

Beispielsweise /etc/hostskönnte die Konfiguration wie folgt aussehen:

127.0.0.1       localhost
<ipslave1>  slave1
<ipmaster> master
<ipslave2> slave2

verwandte Informationen