hadoop - How TO Filter by _id in mongodb using pig -
i have mongo documents this:
db.activity_days.findone() { "_id" : objectid("54b4ee617acf9ce0440a3185"), "aca" : 0, "ca" : 0, "cbdw" : true, "day" : isodate("2014-12-10t00:00:00z"), "dm" : 0, "fbc" : 0, "go" : 2500, "gs" : [ ], "its" : [ { "_id" : objectid("551ac8d44f9f322e2b055d3a"), "at" : 2000, "atn" : "running", "cas" : 386.514909469507, "dis" : 2.788989730832084, "du" : 1472, "ibr" : false, "ide" : false, "lcs" : false, "pt" : 0, "rpt" : 0, "src" : 1001, "stp" : 0, "tcs" : [ ], "ts" : 1418257729, "u_at" : isodate("2015-01-13t00:32:10.954z") } ], "po" : 0, "se" : 0, "st" : 0, "tap3c" : [ ], "tzo" : -21600, "u_at" : isodate("2015-01-13t00:32:10.952z"), "uid" : objectid("545eb753ae9237b1df115649") }
i want use pig filter special _id range,i can write mongo query this:
db.activity_day.find(_id:{$gt:objectid("54a48e000000000000000000"),$lt:objectid("54cd6c800000000000000000")})
but don't know how write in pig, knows?
you try using mongo-hadoop
connector pig, see mongo-hadoop: usage pig.
once register
jars (core, pig, , java driver), e.g., register /path-to/mongo-hadoop-pig-<version>.jar;
via grunt run:
set mongo.input.query '{"_id":{"\$gt":{"\$oid":"54a48e000000000000000000},"\$lt":{"\$oid":"54cd6c800000000000000000}}}' rangeactivityday = load 'mongodb://localhost:27017/database.collection' using com.mongodb.hadoop.pig.mongoloader() dump rangeactivityday
you may want use limit before dumping data well.
the above tested using: mongo-java-driver-3.0.0-rc1.jar
, mongo-hadoop-pig-1.4.0.jar
, mongo-hadoop-core-1.4.0.jar
, mongodb v3.0.9
Comments
Post a Comment