database - DBInputFormat multiple records processing -


when connect rdbms mysql using hadoop record db user-defined class extends dbwritable , writable. if our sql query generates n records output act of reading record user-defined class done n times. there way in can more number of records mapper @ same time instead of 1 record each time ?

if understand correctly, think hadoop causes n select statements under hood. not true. can see in dbinputformat's source, creates chunks of rows based on hadoop deems fit.

obviously, each mapper have execute query fetch data process, , might repeatedly, that's still near number of rows in table.

however, if performance degrades, might better off dumping data hdfs / hive , processing there.


Comments

Popular posts from this blog

Delphi XE2 Indy10 udp client-server interchange using SendBuffer-ReceiveBuffer -

Qt ActiveX WMI QAxBase::dynamicCallHelper: ItemIndex(int): No such property in -

Enable autocomplete or intellisense in Atom editor for PHP -