linux - regex serde reading log files in hive -


i'm trying create regex serde in hive read log files having issue getting work...

the log file looks this...

14.196.202.16:9123  11329   2016-01-27 17:50:26.965 -5                  thread-14960    ccs 6104    1   audit.rds.ccs       reportdataservice       failure <messages><message><messagestring>rds-err-1047 unable process xml output stream. xml invalid.</messagestring></message>   <trace>clientabortexception:  java.net.socketexception: broken pipe     @ org.apache.catalina.connector.outputbuffer.realwritebytes(outputbuffer.java:369)     @ org.apache.tomcat.util.buf.bytechunk.append(bytechunk.java:339)  @ org.apache.catalina.connector.outputbuffer.writebytes(outputbuffer.java:392)     @ org.apache.catalina.connector.outputbuffer.write(outputbuffer.java:381)  @ org.apache.catalina.connector.coyoteoutputstream.write(coyoteoutputstream.java:89)   @ java.io.bufferedoutputstream.write(unknown source)   @ java.io.bufferedoutputstream.write(unknown source)   @ sun.nio.cs.streamencoder.writebytes(unknown source)  @ sun.nio.cs.streamencoder.implwrite(unknown source)   @ sun.nio.cs.streamencoder.write(unknown source)   @ java.io.outputstreamwriter.write(unknown source)     @ java.io.bufferedwriter.flushbuffer(unknown source)   @ java.io.bufferedwriter.write(unknown source)     @ java.io.writer.write(unknown source)     @ com.cognos.ccs.fsm.ldxhandler.write(unknown source)  @ com.cognos.ccs.fsm.ldxhandler.writeattribute(unknown source)     @ com.cognos.ccs.fsm.ldxhandler.writeattribute(unknown source)     @ com.cognos.ccs.formats.html.ahtmlelement.writeinlinestyles(unknown source)   @ com.cognos.ccs.formats.html.ahtmlelement.writestyles(unknown source)     @ com.cognos.ccs.formats.html.ahtmltableelement.closestarttag(unknown source)  @ com.cognos.ccs.formats.html.htmllayouttable.processevent(unknown source)     @ com.cognos.ccs.fsm.ldxhandler.startelement(unknown source)   @ com.cognos.ccs.formats.ccsformatter.startelement(unknown source)     @ com.sun.org.apache.xerces.internal.parsers.abstractsaxparser.startelement(unknown source)    @ com.sun.org.apache.xerces.internal.impl.xmlnsdocumentscannerimpl.scanstartelement(unknown source)    @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl$fragmentcontentdriver.next(unknown source)    @ com.sun.org.apache.xerces.internal.impl.xmldocumentscannerimpl.next(unknown source)  @ com.sun.org.apache.xerces.internal.impl.xmlnsdocumentscannerimpl.next(unknown source)    @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl.scandocument(unknown source)  @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source)  @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source)  @ com.sun.org.apache.xerces.internal.parsers.xmlparser.parse(unknown source)   @ com.sun.org.apache.xerces.internal.parsers.abstractsaxparser.parse(unknown source)   @ com.sun.org.apache.xerces.internal.jaxp.saxparserimpl$jaxpsaxparser.parse(unknown source)    @ com.sun.org.apache.xerces.internal.jaxp.saxparserimpl.parse(unknown source)  @ com.cognos.ccs.service.ccsdataresult$processingthread.run(unknown source) caused by: java.net.socketexception: broken pipe   @ java.net.socketoutputstream.socketwrite0(native method)  @ java.net.socketoutputstream.socketwrite(unknown source)  @ java.net.socketoutputstream.write(unknown source)    @ org.apache.coyote.http11.internaloutputbuffer.realwritebytes(internaloutputbuffer.java:761)  @ org.apache.tomcat.util.buf.bytechunk.flushbuffer(bytechunk.java:448)     @ org.apache.tomcat.util.buf.bytechunk.append(bytechunk.java:363)  @ org.apache.coyote.http11.internaloutputbuffer$outputstreamoutputbuffer.dowrite(internaloutputbuffer.java:785)    @ org.apache.coyote.http11.filters.chunkedoutputfilter.dowrite(chunkedoutputfilter.java:124)   @ org.apache.coyote.http11.internaloutputbuffer.dowrite(internaloutputbuffer.java:598)     @ org.apache.coyote.response.dowrite(response.java:533)    @ org.apache.catalina.connector.outputbuffer.realwritebytes(outputbuffer.java:364)     ... 35 more </trace> 

i got far:

([^ ]*)\t(-|[0-9]*)\t 

and back:

match 1 1.  14.196.202.16:9123 2.  11329 

which gives me first 2 correctly...but when add date in this:

([^ ]*)\t(-|[0-9]*)\t([^ ]*)\t 

i back:

match 1 1.  17:50:26.965    -5                    thread-14960    ccs    6104    1    audit.rds.ccs        reportdataservice 2.    3.  failure 

i'm new regex , trying figure out having trouble...i'm trying use site:

http://rubular.com/

essentially i'm trying this:

1. 14.196.202.16:9123    2. 11329     3. 2016-01-27 17:50:26.965 -5 4.  5.  6.  7.  8. thread-14960  9. ccs   10. 6104     11. 1    12. audit.rds.ccs    13.  14. reportdataservice    15.  16. failure  17. <messages><message><messagestring>rds-err-1047 unable process xml output stream. xml invalid.</messagestring></message>    19. <trace>clientabortexception:  java.net.socketexception: broken pipe     @ org.apache.catalina.connector.outputbuffer.realwritebytes(outputbuffer.java:369)     @ org.apache.tomcat.util.buf.bytechunk.append(bytechunk.java:339)  @ org.apache.catalina.connector.outputbuffer.writebytes(outputbuffer.java:392)     @ org.apache.catalina.connector.outputbuffer.write(outputbuffer.java:381)  @ org.apache.catalina.connector.coyoteoutputstream.write(coyoteoutputstream.java:89)   @ java.io.bufferedoutputstream.write(unknown source)   @ java.io.bufferedoutputstream.write(unknown source)   @ sun.nio.cs.streamencoder.writebytes(unknown source)  @ sun.nio.cs.streamencoder.implwrite(unknown source)   @ sun.nio.cs.streamencoder.write(unknown source)   @ java.io.outputstreamwriter.write(unknown source)     @ java.io.bufferedwriter.flushbuffer(unknown source)   @ java.io.bufferedwriter.write(unknown source)     @ java.io.writer.write(unknown source)     @ com.cognos.ccs.fsm.ldxhandler.write(unknown source)  @ com.cognos.ccs.fsm.ldxhandler.writeattribute(unknown source)     @ com.cognos.ccs.fsm.ldxhandler.writeattribute(unknown source)     @ com.cognos.ccs.formats.html.ahtmlelement.writeinlinestyles(unknown source)   @ com.cognos.ccs.formats.html.ahtmlelement.writestyles(unknown source)     @ com.cognos.ccs.formats.html.ahtmltableelement.closestarttag(unknown source)  @ com.cognos.ccs.formats.html.htmllayouttable.processevent(unknown source)     @ com.cognos.ccs.fsm.ldxhandler.startelement(unknown source)   @ com.cognos.ccs.formats.ccsformatter.startelement(unknown source)     @ com.sun.org.apache.xerces.internal.parsers.abstractsaxparser.startelement(unknown source)    @ com.sun.org.apache.xerces.internal.impl.xmlnsdocumentscannerimpl.scanstartelement(unknown source)    @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl$fragmentcontentdriver.next(unknown source)    @ com.sun.org.apache.xerces.internal.impl.xmldocumentscannerimpl.next(unknown source)  @ com.sun.org.apache.xerces.internal.impl.xmlnsdocumentscannerimpl.next(unknown source)    @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl.scandocument(unknown source)  @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source)  @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source)  @ com.sun.org.apache.xerces.internal.parsers.xmlparser.parse(unknown source)   @ com.sun.org.apache.xerces.internal.parsers.abstractsaxparser.parse(unknown source)   @ com.sun.org.apache.xerces.internal.jaxp.saxparserimpl$jaxpsaxparser.parse(unknown source)    @ com.sun.org.apache.xerces.internal.jaxp.saxparserimpl.parse(unknown source)  @ com.cognos.ccs.service.ccsdataresult$processingthread.run(unknown source) caused by: java.net.socketexception: broken pipe   @ java.net.socketoutputstream.socketwrite0(native method)  @ java.net.socketoutputstream.socketwrite(unknown source)  @ java.net.socketoutputstream.write(unknown source)    @ org.apache.coyote.http11.internaloutputbuffer.realwritebytes(internaloutputbuffer.java:761)  @ org.apache.tomcat.util.buf.bytechunk.flushbuffer(bytechunk.java:448)     @ org.apache.tomcat.util.buf.bytechunk.append(bytechunk.java:363)  @ org.apache.coyote.http11.internaloutputbuffer$outputstreamoutputbuffer.dowrite(internaloutputbuffer.java:785)    @ org.apache.coyote.http11.filters.chunkedoutputfilter.dowrite(chunkedoutputfilter.java:124)   @ org.apache.coyote.http11.internaloutputbuffer.dowrite(internaloutputbuffer.java:598)     @ org.apache.coyote.response.dowrite(response.java:533)    @ org.apache.catalina.connector.outputbuffer.realwritebytes(outputbuffer.java:364)     ... 35 more </trace> 

edit:

so think i'm on right track here:

i have now:

([\d+]\s+[\d+])\t(\d+)\t([\d+]\s+[\d+] [\d+]\s+[\d+])\t(-[\d+])\t(\w+|\s+|\s+)\t(\w+|.)\t(\w+|\s+|\s+|-)\t(\w+|\s+|\s+|-)\t(\w+|\s+|\s+|-)\t(\w+|\s+|\s+|-)\t(\w+|\s+|\s+|-)\t(\w+|\s+|\s+|-)(\w+|\s+|\s+|-)\t(\w+|\s+|\s+|-)(\w+|\s+|\s+|-)(\w+|\s+|\s+|-)\t 

but still can't <message> , <trace> group.

i got regex work...here ended going with

([\d+]\s+[\d+])\t(\d+)\t([\d+]\s+[\d+] [\d+]\s+[\d+])\t(-[\d+])\t([a-za-z0-9_\s]*)\t([a-za-z0-9_\s]*)\t([a-za-z0-9_\s]*)\t([a-za-z0-9_\s]*)\t([a-za-z0-9_\s]*)\t([a-za-z_\s]*)\t([0-9]*)\t([0-9]*)\t([a-za-z_\s]*)\t([a-za-z_\s]*)\t([a-za-z_\s ]*)\t([a-za-z_\s ]*)\t([a-za-z_\s ]*)\t([a-za-z_\s ]*)\t([a-za-z_\s ]*) 

Comments

Popular posts from this blog

Delphi XE2 Indy10 udp client-server interchange using SendBuffer-ReceiveBuffer -

Qt ActiveX WMI QAxBase::dynamicCallHelper: ItemIndex(int): No such property in -

Enable autocomplete or intellisense in Atom editor for PHP -