linux - regex serde reading log files in hive -
i'm trying create regex serde in hive read log files having issue getting work...
the log file looks this...
14.196.202.16:9123 11329 2016-01-27 17:50:26.965 -5 thread-14960 ccs 6104 1 audit.rds.ccs reportdataservice failure <messages><message><messagestring>rds-err-1047 unable process xml output stream. xml invalid.</messagestring></message> <trace>clientabortexception: java.net.socketexception: broken pipe @ org.apache.catalina.connector.outputbuffer.realwritebytes(outputbuffer.java:369) @ org.apache.tomcat.util.buf.bytechunk.append(bytechunk.java:339) @ org.apache.catalina.connector.outputbuffer.writebytes(outputbuffer.java:392) @ org.apache.catalina.connector.outputbuffer.write(outputbuffer.java:381) @ org.apache.catalina.connector.coyoteoutputstream.write(coyoteoutputstream.java:89) @ java.io.bufferedoutputstream.write(unknown source) @ java.io.bufferedoutputstream.write(unknown source) @ sun.nio.cs.streamencoder.writebytes(unknown source) @ sun.nio.cs.streamencoder.implwrite(unknown source) @ sun.nio.cs.streamencoder.write(unknown source) @ java.io.outputstreamwriter.write(unknown source) @ java.io.bufferedwriter.flushbuffer(unknown source) @ java.io.bufferedwriter.write(unknown source) @ java.io.writer.write(unknown source) @ com.cognos.ccs.fsm.ldxhandler.write(unknown source) @ com.cognos.ccs.fsm.ldxhandler.writeattribute(unknown source) @ com.cognos.ccs.fsm.ldxhandler.writeattribute(unknown source) @ com.cognos.ccs.formats.html.ahtmlelement.writeinlinestyles(unknown source) @ com.cognos.ccs.formats.html.ahtmlelement.writestyles(unknown source) @ com.cognos.ccs.formats.html.ahtmltableelement.closestarttag(unknown source) @ com.cognos.ccs.formats.html.htmllayouttable.processevent(unknown source) @ com.cognos.ccs.fsm.ldxhandler.startelement(unknown source) @ com.cognos.ccs.formats.ccsformatter.startelement(unknown source) @ com.sun.org.apache.xerces.internal.parsers.abstractsaxparser.startelement(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmlnsdocumentscannerimpl.scanstartelement(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl$fragmentcontentdriver.next(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmldocumentscannerimpl.next(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmlnsdocumentscannerimpl.next(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl.scandocument(unknown source) @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source) @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source) @ com.sun.org.apache.xerces.internal.parsers.xmlparser.parse(unknown source) @ com.sun.org.apache.xerces.internal.parsers.abstractsaxparser.parse(unknown source) @ com.sun.org.apache.xerces.internal.jaxp.saxparserimpl$jaxpsaxparser.parse(unknown source) @ com.sun.org.apache.xerces.internal.jaxp.saxparserimpl.parse(unknown source) @ com.cognos.ccs.service.ccsdataresult$processingthread.run(unknown source) caused by: java.net.socketexception: broken pipe @ java.net.socketoutputstream.socketwrite0(native method) @ java.net.socketoutputstream.socketwrite(unknown source) @ java.net.socketoutputstream.write(unknown source) @ org.apache.coyote.http11.internaloutputbuffer.realwritebytes(internaloutputbuffer.java:761) @ org.apache.tomcat.util.buf.bytechunk.flushbuffer(bytechunk.java:448) @ org.apache.tomcat.util.buf.bytechunk.append(bytechunk.java:363) @ org.apache.coyote.http11.internaloutputbuffer$outputstreamoutputbuffer.dowrite(internaloutputbuffer.java:785) @ org.apache.coyote.http11.filters.chunkedoutputfilter.dowrite(chunkedoutputfilter.java:124) @ org.apache.coyote.http11.internaloutputbuffer.dowrite(internaloutputbuffer.java:598) @ org.apache.coyote.response.dowrite(response.java:533) @ org.apache.catalina.connector.outputbuffer.realwritebytes(outputbuffer.java:364) ... 35 more </trace>
i got far:
([^ ]*)\t(-|[0-9]*)\t
and back:
match 1 1. 14.196.202.16:9123 2. 11329
which gives me first 2 correctly...but when add date in this:
([^ ]*)\t(-|[0-9]*)\t([^ ]*)\t
i back:
match 1 1. 17:50:26.965 -5 thread-14960 ccs 6104 1 audit.rds.ccs reportdataservice 2. 3. failure
i'm new regex , trying figure out having trouble...i'm trying use site:
essentially i'm trying this:
1. 14.196.202.16:9123 2. 11329 3. 2016-01-27 17:50:26.965 -5 4. 5. 6. 7. 8. thread-14960 9. ccs 10. 6104 11. 1 12. audit.rds.ccs 13. 14. reportdataservice 15. 16. failure 17. <messages><message><messagestring>rds-err-1047 unable process xml output stream. xml invalid.</messagestring></message> 19. <trace>clientabortexception: java.net.socketexception: broken pipe @ org.apache.catalina.connector.outputbuffer.realwritebytes(outputbuffer.java:369) @ org.apache.tomcat.util.buf.bytechunk.append(bytechunk.java:339) @ org.apache.catalina.connector.outputbuffer.writebytes(outputbuffer.java:392) @ org.apache.catalina.connector.outputbuffer.write(outputbuffer.java:381) @ org.apache.catalina.connector.coyoteoutputstream.write(coyoteoutputstream.java:89) @ java.io.bufferedoutputstream.write(unknown source) @ java.io.bufferedoutputstream.write(unknown source) @ sun.nio.cs.streamencoder.writebytes(unknown source) @ sun.nio.cs.streamencoder.implwrite(unknown source) @ sun.nio.cs.streamencoder.write(unknown source) @ java.io.outputstreamwriter.write(unknown source) @ java.io.bufferedwriter.flushbuffer(unknown source) @ java.io.bufferedwriter.write(unknown source) @ java.io.writer.write(unknown source) @ com.cognos.ccs.fsm.ldxhandler.write(unknown source) @ com.cognos.ccs.fsm.ldxhandler.writeattribute(unknown source) @ com.cognos.ccs.fsm.ldxhandler.writeattribute(unknown source) @ com.cognos.ccs.formats.html.ahtmlelement.writeinlinestyles(unknown source) @ com.cognos.ccs.formats.html.ahtmlelement.writestyles(unknown source) @ com.cognos.ccs.formats.html.ahtmltableelement.closestarttag(unknown source) @ com.cognos.ccs.formats.html.htmllayouttable.processevent(unknown source) @ com.cognos.ccs.fsm.ldxhandler.startelement(unknown source) @ com.cognos.ccs.formats.ccsformatter.startelement(unknown source) @ com.sun.org.apache.xerces.internal.parsers.abstractsaxparser.startelement(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmlnsdocumentscannerimpl.scanstartelement(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl$fragmentcontentdriver.next(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmldocumentscannerimpl.next(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmlnsdocumentscannerimpl.next(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl.scandocument(unknown source) @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source) @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source) @ com.sun.org.apache.xerces.internal.parsers.xmlparser.parse(unknown source) @ com.sun.org.apache.xerces.internal.parsers.abstractsaxparser.parse(unknown source) @ com.sun.org.apache.xerces.internal.jaxp.saxparserimpl$jaxpsaxparser.parse(unknown source) @ com.sun.org.apache.xerces.internal.jaxp.saxparserimpl.parse(unknown source) @ com.cognos.ccs.service.ccsdataresult$processingthread.run(unknown source) caused by: java.net.socketexception: broken pipe @ java.net.socketoutputstream.socketwrite0(native method) @ java.net.socketoutputstream.socketwrite(unknown source) @ java.net.socketoutputstream.write(unknown source) @ org.apache.coyote.http11.internaloutputbuffer.realwritebytes(internaloutputbuffer.java:761) @ org.apache.tomcat.util.buf.bytechunk.flushbuffer(bytechunk.java:448) @ org.apache.tomcat.util.buf.bytechunk.append(bytechunk.java:363) @ org.apache.coyote.http11.internaloutputbuffer$outputstreamoutputbuffer.dowrite(internaloutputbuffer.java:785) @ org.apache.coyote.http11.filters.chunkedoutputfilter.dowrite(chunkedoutputfilter.java:124) @ org.apache.coyote.http11.internaloutputbuffer.dowrite(internaloutputbuffer.java:598) @ org.apache.coyote.response.dowrite(response.java:533) @ org.apache.catalina.connector.outputbuffer.realwritebytes(outputbuffer.java:364) ... 35 more </trace>
edit:
so think i'm on right track here:
i have now:
([\d+]\s+[\d+])\t(\d+)\t([\d+]\s+[\d+] [\d+]\s+[\d+])\t(-[\d+])\t(\w+|\s+|\s+)\t(\w+|.)\t(\w+|\s+|\s+|-)\t(\w+|\s+|\s+|-)\t(\w+|\s+|\s+|-)\t(\w+|\s+|\s+|-)\t(\w+|\s+|\s+|-)\t(\w+|\s+|\s+|-)(\w+|\s+|\s+|-)\t(\w+|\s+|\s+|-)(\w+|\s+|\s+|-)(\w+|\s+|\s+|-)\t
but still can't <message>
, <trace>
group.
i got regex work...here ended going with
([\d+]\s+[\d+])\t(\d+)\t([\d+]\s+[\d+] [\d+]\s+[\d+])\t(-[\d+])\t([a-za-z0-9_\s]*)\t([a-za-z0-9_\s]*)\t([a-za-z0-9_\s]*)\t([a-za-z0-9_\s]*)\t([a-za-z0-9_\s]*)\t([a-za-z_\s]*)\t([0-9]*)\t([0-9]*)\t([a-za-z_\s]*)\t([a-za-z_\s]*)\t([a-za-z_\s ]*)\t([a-za-z_\s ]*)\t([a-za-z_\s ]*)\t([a-za-z_\s ]*)\t([a-za-z_\s ]*)
Comments
Post a Comment