Overriding Java classes is a technique that I use a lot, however after talking to other people it seems to be not so widely known. So what do I mean by overriding java classes? What I'm referring to is how the JVM deals with 2 classes with the same class and package name. What happens is the JVM will only use the class appearing first on the classpath.
So to put this to use, one could apply patches without touching the original package, or us it during patch development for quick compiling and testing.
Using this for Hadoop development or maintenance, one method that I've used a lot is to add a patch dir to hadoop e.g. share/hadoop/patches and make sure that $HADOOP_PREFIX/share/hadoop/patches/* is added first to your classpath. Doing this would allow you to add jars with modified classes that overrides the original ones just by putting them in the patch dir.
To put this to use, lets say we are working on a patch for the Resource manager of yarn, and every time we want to test our changes or dump a new log, instead of spending 10-20 minutes rebuilding the entire Hadoop package, generating a new jar only containing the modified classes and then adding it first on the classpath would allow for sub second compiles meaning a greatly decreased cost for testing changes.
Notes and posts related to to Distributed System development. Keywords: Hadoop, Storm, Kafka, Elasticsearch, Cassandra, Java, Python, Ruby
Sunday, August 23, 2015
Friday, August 7, 2015
Joining Treasure Data
After spending 2.5 years working with distributed analytics systems at DeNA, I joined Treasure Data this week with the goal of challenging my weaknesses and raising my skill sets.
In my time at DeNA I spent most of my time in the Hadoop Infra dept. where I took the lead on Hadoop upgrade CDH3 to HDP2, took part in the introduction of Storm, ElasticSearch, Kafka and Consul. However due to lack of clear stances related to open sourcing and sharing information outside the company, nothing ended up being shared.
With me joining TD this month, one of the major changes is the clear company stance when it comes to sharing with the community. Since the majority of systems at TD are built on open source, we also need to give back to the community. One part of that will be this blog, which I will be using as a notepad for new discoveries and things I learned.
So while I'm thankful for the chances I've gotten and everything I've learned during my time at DeNA, I look forward to getting used to working in a more open way.
In my time at DeNA I spent most of my time in the Hadoop Infra dept. where I took the lead on Hadoop upgrade CDH3 to HDP2, took part in the introduction of Storm, ElasticSearch, Kafka and Consul. However due to lack of clear stances related to open sourcing and sharing information outside the company, nothing ended up being shared.
With me joining TD this month, one of the major changes is the clear company stance when it comes to sharing with the community. Since the majority of systems at TD are built on open source, we also need to give back to the community. One part of that will be this blog, which I will be using as a notepad for new discoveries and things I learned.
So while I'm thankful for the chances I've gotten and everything I've learned during my time at DeNA, I look forward to getting used to working in a more open way.
Subscribe to:
Posts (Atom)