database - DBInputFormat multiple records processing -
when connect rdbms mysql using hadoop record db user-defined class extends dbwritable , writable. if our sql query generates n records output act of reading record user-defined class done n times. there way in can more number of records mapper @ same time instead of 1 record each time ?
if understand correctly, think hadoop causes n select
statements under hood. not true. can see in dbinputformat
's source, creates chunks of rows based on hadoop deems fit.
obviously, each mapper have execute query fetch data process, , might repeatedly, that's still near number of rows in table.
however, if performance degrades, might better off dumping data hdfs / hive , processing there.
Comments
Post a Comment