Author Archives: admin

Clean Code: Java

Summarizing clean code practices for Java

Naming Conventions: Use meaningful names conveying the intent of the object.

Constants: Use Constants to manage static values (Constants help improve memory as they are cached by the JVM. For values that are reused across multiple places, create a constant file that holds static values.) use ENUMs to group constants.

Clean Code: Remove Console print statements, Remove Unnecessary comments

Deprecate Methods: Use @deprecated on method/variable names that aren’t meant for future use

Strings: If you need to perform a lot of operations on a String, use StringBuilder or StringBuffer.

Switch statement: Rather than using multiple if-else conditions, use the cleaner and more readable switch-case.

Exception Handling: https://kamalmeet.com/java/exception-handling-basic-principles/

Code Structure: Follow the Separation of Concerns strategy – controller, service, model, utility

Memory Leaks: Unclosed resources, e.g. unclosed URL connections can cause memory leaks. https://rollbar.com/blog/how-to-detect-memory-leaks-in-java-causes-types-tools/

Concurrent code: Avoid unnecessary synchronization, and at the same time identify areas to be synchronized where multiple threads can cause problems.

Lambdas and Streams: If you’re using Java 8+, replacing loops and extremely verbose methods with streams and lambdas makes the code look cleaner. Lambdas and streams allow you to write functional code in Java. The following snippet filters odd numbers in the traditional imperative way:

List<Integer> oddNumbers = new ArrayList<>();
for (Integer number : Arrays.asList(1, 2, 3, 4, 5, 6)) {
    if (number % 2 != 0) {
      oddNumbers.add(number);
  }
}

This is the functional way of filtering odd numbers:

List<Integer> oddNumbers = Stream.of(1, 2, 3, 4, 5, 6)
  .filter(number -> number % 2 != 0)
  .collect(Collectors.toList());

NullPointerException: When writing new methods, try to avoid returning nulls if possible. It could lead to null pointer exceptions.

Use final: When you want to make sure a method should not be overridden

Avoid static: Static can cause issues if not used properly as it shares variables at class level 

Data Structures: Java collections provide ArrayListLinkedListVectorStackHashSetHashMapHashtable. It’s important to understand the pros and cons of each to use them in the correct context. A few hints to help you make the right choice

Least visibility: Use of Public, private, and protected

Stay SOLID: https://kamalmeet.com/design/solid-principles-for-object-oriented-design/

DRY: Don’t Repeat Yourself, common code should be part of utilities and libraries

YAGNI: You Are not Going to Need It, code only what is needed

Static Code Review: SonarQube in Eclipse

Size of Class and Functions: Class and Function should be small – 400 / 40

Input checks: Inputs into methods should be checked for valid data size and range

Database Access: Use best practices like Connection Pool, JPA, Prepared statements, etc.

Cloud Native Application Design – Pillars of Cloud Architecture

Operational Excellence

Efficiently deploy, operate, monitor, and manage your cloud workload

  • automate deployments
  • monitoring, alerting, and logging
  • manage capacity and quota
  • plan for scale – peak traffic
  • automate whenever possible

Reliability

Design for the resilient and highly available system

  • automatically recover from failure
  • test recovery procedures
  • scale horizontally to manage the workload

Security

Secure your data and workload, align with regulatory requirement

  • data security
  • data encryption – at rest and in transit
  • apply security at all layers
  • enable traceability to investigate and take actions automatically
  • no direct interaction with data

Performance Efficiency

Design and tune your resources for the best performance

  • monitor and analyze the performance
  • compute performance optimization
  • optimize storage performance
  • anticipate load and scale
  • best practices like caching, CQRS, sharding, throttling, etc to be used

Cost optimization

Maximize the business value for infrastructure used

  • Monitor and control cost
  • optimize cost- compute, database, storage, etc
  • identify and free unused and underused resources

Tech Trends to Watch- 2023

2023 Gartner Emerging Technologies and Trends Impact Radar
https://www.gartner.com/en/articles/4-emerging-technologies-you-need-to-know-about

My personal favorites

AI and ML: Well, calling this trend to watch will not be correct as AI is already happening. Right from recommending you the next product to buy to self-driving cars.
IoT: World is more connected with home appliances and vehicles publishing data, that gets analyzed in real-time.
Edge Computing: Computation being done near the source of data and quick real-time decisions being taken.
Quantum computing: This will become more practical and accessible providing help in research and performance in different fields.
Digital Twins: Digital replication of physical systems helping analyze and predict the impact of different parameters.
CyberSecurity: With tech becoming part of the day today life, security is a major concern.
Blockchain: The decentralized ledger is going beyond cryptocurrencies to supply chain management and digital identity.
Robotics and Drones: By automating the way work is done and use for delivery, surveillance will increase.
Virtual and Augmented Reality(VR/AR): Providing new ways for people to interact with and experience the world.
Metaverse: Virtual world experience enabled by VR, AR, IoT, NFTs, and blockchain.
Web 3.0: Decentralization of Internet empowered by blockchain, decentralization and AI.

Kafka Basics

Apache Kafka is an open-source, distributed, publish-subscribe messaging system designed to handle large amounts of data.

Important terms

Topic: Messages or data are published to and read from topics.

Partition: Topics can be split into multiple partitions, allowing for parallel processing of data streams. Each partition is an ordered, immutable sequence of records. Partitions provide a way to horizontally scale data processing within a Kafka cluster.

Broker: Kafka cluster supporting pub-sub

Producer: Publish data to the topic.

Consumer: Subscribe to the topic.

Offset: Unique identifiers for messages. Each record in a partition is assigned a unique, sequential offset, and the order of the records within a partition is maintained. This means that data is guaranteed to be processed in the order it was written to the partition.

ZooKeeper: Apache ZooKeeper is a distributed coordination service for managing distributed systems. If a node fails, another node takes over its responsibilities, ensuring high availability. ZooKeeper uses a consensus algorithm to ensure that all nodes in the system have a consistent view of the data. It helps Kafka to manage coordination between brokers and to maintain configuration information.

BigData Ecosystem

Apache Hadoop: an open-source software framework for storing and processing large volumes of data in a distributed computing environment.

Major components- HDFS (Hadoop Distributed File System) – for storing data, and MapReduce – for processing data.

Apache Pig: Works on top of Hadoop to write data analysis programs in an easier way.

Apache Hive: SQL-like query language for Hadoop.

Apache Storm: It is a distributed, real-time processing system for big data. It has the ability to process data in real time, which makes it well-suited for use cases such as real-time analytics for stocks, fraud detection, and event-driven applications.

Apache Spark: It provides an in-memory data processing engine, which makes it faster and more flexible than Hadoop’s MapReduce for many use cases.

4 Vs of Big Data:

  • Volume: which normal RDBMS databases cannot store or are not meant to process.
  • Velocity: Speed at which data is getting added.
  • Variety: Structures, Semi-structured and unstructured. Traditional systems are meant only for Structured data. For example, reviews, comments, images, etc are nonstructured data
  • Veracity: Non-verified data. Data that may or may not be useful. Inconsistent data which cannot be used straight away.

Java Updates since JDK-8

Found this interesting article on updates that have happened in Java since version 8- https://ondro.inginea.eu/index.php/new-features-in-java-versions-since-java-8/

Java 8 is still the most popular version of Java, though Java 17 is a recent Long term support version. Java 8 was an instant hit with the release of features like functional interfaces, Lambda expressions, Streams, Optional classes, etc. The post mentioned above talks about updates since Java 8. Here are some important features introduced as per my understanding

ChatGPT by ChatGPT

ChatGPT: The Advancements in Natural Language Processing

Artificial intelligence (AI) has been revolutionizing various fields, and one of the areas where it has made the most impact is natural language processing (NLP). NLP is the field of computer science and AI that focuses on developing algorithms that can understand and process human language. With the development of powerful language models such as ChatGPT, NLP has taken a significant step forward in recent years.

What is ChatGPT?

ChatGPT is a language model developed by OpenAI, which is one of the largest AI research organizations in the world. ChatGPT is a transformer-based language model that uses deep learning to generate human-like text. It is trained on a massive amount of text data, which allows it to generate coherent and contextually appropriate responses to questions and prompts.

The name ChatGPT is a combination of “chat” and “GPT,” which stands for “Generative Pre-trained Transformer.” The “GPT” part of the name refers to the transformer architecture used in the model, which is a type of neural network that has been very successful in NLP tasks such as language generation and translation.

How Does ChatGPT Work?

ChatGPT is a pre-trained language model, which means that it is trained on a massive amount of text data before it is released to the public. During training, the model is presented with pairs of prompts and text, and it learns to generate a continuation of the text given the prompt. The model uses this training data to learn patterns and relationships in the data, which allows it to generate coherent and contextually appropriate responses.

Once the model is trained, it can be fine-tuned for specific tasks or used as is. For example, it can be fine-tuned for tasks such as question-answering, conversation generation, and summarization. The pre-training allows the model to learn a large amount of general information about the world, which makes it well-suited for a wide range of NLP tasks.

Applications of ChatGPT

ChatGPT has a wide range of applications, from customer service and chatbots to content generation and text summarization. One of the most popular applications of ChatGPT is in the field of customer service, where it can be used to provide fast and accurate answers to customer questions. ChatGPT can also be used in chatbots, where it can generate coherent and contextually appropriate responses to user queries.

Another application of ChatGPT is in the field of content generation, where it can be used to generate articles, summaries, and other types of text. For example, it can be used to generate summaries of long articles, which can save users time and effort.

Finally, ChatGPT can also be used in the field of machine translation, where it can be used to translate text from one language to another. This can be useful for organizations that need to translate large amounts of text quickly and accurately.

Conclusion

ChatGPT is a powerful language model developed by OpenAI, which has taken NLP to new heights. With its pre-training and fine-tuning capabilities, it is well-suited for a wide range of NLP tasks, from customer service and chatbots to content generation and machine translation. With its ability to generate coherent and contextually appropriate responses, it has the potential to change the way we interact with computers and the way we process information.

(The above article is generated by ChatGPT)

Choosing the right database

Choosing the right database is never easy. I have already discussed types of NoSQL databases and choosing between NoSQL and SQL.

I will try to cover some common use cases here

Use CaseChoice
Temporary fast access as Key-ValueRedis Cache
Data to be stored in a time-series fashionOpenTSDB
Object/ File dataBlob Data
Text SearchElastic Search
Structured Data, with relations between objects, need transactional properties, ACID complianceRDBMS
Semi-Structured Data, XML/ JSON document but the structure is not fixed, Flexible queriesDocument Based- MongoDB
Data increases with time, and a limited set of queriesColumnar database- Cassandra
Graph relation between objectsGraphDB- Neo4J
Database choice

Some useful resources from the Internet.

Image
https://storage.googleapis.com/gweb-cloudblog-publish/images/Which-Database_v07-10-21_1.max-2000x2000.jpeg
https://cloud.google.com/blog/topics/developers-practitioners/your-google-cloud-database-options-explained
https://aws.amazon.com/startups/start-building/how-to-choose-a-database/

Microservice Best Practices

Development 

  • Single responsibility- One Task per Microservice
  • Strangler Fig Pattern: https://martinfowler.com/bliki/StranglerFigApplication.html 
  • API Gateway: API Gateway should provide Routing, Aggregating, and SSL Offloading 
  • Offload Non-core responsibilities: Non-core responsibilities including – Security, logging, tracking, etc., should be offloaded to Sidecar or Libraries.   

Design for Failure 

  • Fail fast: Patterns like Circuit breaker Pattern, time out Pattern, rate limit Pattern, etc, help applications fail fast  
  • Isolate Failure: A failure should not propagate and impact other services. Bulkhead Pattern helps maintain such a configuration. 
  • Self-healing system: Health checkpoints and scalability settings will help make sure that system is able to manage a server or pod failure. 

Monitoring  

  • Health Monitoring: Applications should expose health endpoint (e.g. actuator in Java), which is used by Load balancers to keep a check.  
  • Golden Signals: Every application should monitor latency, traffic, and error rate https://sre.google/sre-book/monitoring-distributed-systems/
  • Distributed Tracing: Distributed tracing to check downstream and upstream dependencies.
  • Infrastructure Monitoring: Monitor CPU and Memory usage.

Performance 

  • Stateless: Keep APIs stateless
  • Asynchronous: Asynchronous communication wherever possible.
  • Caching: Cache data for better performance wherever possible.
  • Connection Pool: Database and HTTP connection pools should be enabled wherever possible.  

Good to have  

  • Separate Datastores: This is a double-edged sword, we need to be careful in separating data stores.
  • SAGA for Transaction Management: Commonly used pattern for transaction management in microservices. 
  • 12 Factor App: Generic best practices for developing a web application https://12factor.net/ 

Edge Computing

Edge Computing takes distributed computing close to the information source, rather than relying on centralized data centers. This approach is relatively popular for systems that involve IOT devices.

edge-computing-use-cases-iot
https://innovationatwork.ieee.org/real-life-edge-computing-use-cases/

The idea is to keep computation near to source, to reduce latency. Decisions can be made faster as data does not need to be sent to long distance. This also reduce the amount of data sent to central servers as some level of data filtering and analysis is pre-processed at edge locations.

Edge Computing architecture usually contains following components

  • Edge devices: Devices that collect data from sensors, cameras, and other sources. Examples include IoT devices, cameras, and industrial equipment.
  • Edge gateway: An edge gateway acts as a bridge between the edge devices and the back-end systems.
  • Edge server: This is a server located at the edge of the network that is responsible for processing and analyzing data. It can run applications and services that are optimized for low-latency and high-performance requirements.
  • Fog nodes: These are intermediate devices that sit between the edge devices and the cloud or data center. They are responsible for processing and analyzing data, similar to edge servers, but they are typically more powerful and capable of running more complex applications and services.
  • Cloud/Data center: The data that is processed at the edge is then sent to a cloud or data center for further analysis, storage and sharing.
  • Management and orchestration platform: This is a platform that manages and monitors the edge devices, gateways and servers, and allows for the deployment, configuration, and management of edge applications and services.