Postgres-LogStash-ElasticSearch

Use Case: You got a postgres database and you need to move data to elastic search for exploration

Setup Elastic Search on Ubuntu

  1. sudo apt update
  2. sudo apt install default-jdk
  3. wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
  4. Add the Elasticsearch repository to the package manager echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list
  5. Update the package manager sudo apt update
  6. Install Elasticsearch:sudo apt install elasticsearch
  7. Configure Elasticsearch:
    • sudo vi /etc/elasticsearch/elasticsearch.yml
    • Inside the file, find the network.host setting set it to the IP address of your server or use 0.0.0.0 to listen on all network interfaces.
    • if using network host as 0.0.0.0 set, discovery.seed_hosts: [“127.0.0.1”, “[::1]”]
    • set security –
      • xpack.security.enabled: true
      • xpack.security.transport.ssl.enabled: true
  8. Start and enable Elasticsearch:
    • sudo systemctl start elasticsearch
    • sudo systemctl enable elasticsearch
  9. Verify Elasticsearch installation: -XGET http://localhost:9200
  10. sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive

Setup LogStash

  1. Install Logstash: https://www.elastic.co/downloads/logstash
  2. Create a Logstash Configuration File: postgresql.conf
  3. Run Logstash: bin/logstash -f /path/to/postgresql.conf

Sample postgresql.conf

input {
  jdbc {
    jdbc_connection_string => "jdbc:postgresql://localhost:5432/postgres"
    jdbc_user => "username"
    jdbc_password => "password"
    jdbc_driver_library => "/path/postgresql.jar"
    jdbc_driver_class => "org.postgresql.Driver"
    statement => "select id, name, date_of_birth from employee;"
    jdbc_default_timezone => "UTC"
    jdbc_fetch_size => 1000
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "employee_index"
    document_id => "%{employee_id}"
    user => "elastic"
    password => "password"
  }
}

Cloud Native Application Design – Compute

Cloud-native application design consists of three core pillar decisions- Compute, Database, and Storage. Most cloud providers give various options for all.

When starting with application design, the first decision the team needs to make is how and where we will deploy it.

Virtual Machine: Age old classic option is to go for a virtual machine, where the development team takes complete control over setting up servers, managing virtual machines, Magaing the health of machines, Scalability, etc. Examples are AWS EC2, Azure Virtual Machine, and Google Virtual Machine.

  • Pros– Easiest to get started, gives more control over setup.
  • Cons– The dev team is responsible for managing the machine, the least cost-effective solution.

Containers: Lightweight containers give the perfect solution for a microservices-based design. Container Manager like Kubernetes gives off-the-shelf solutions for managing the health and scalability of container-based implementation. Most Service Providers give options for off-the-shelf service like Azure Kubernetes Service or Amazon Elastic Kubernetes Service.

  • Pros: Best suited for Microservice-based solutions, lightweight hence cost-effective, and easy to manage.
  • Cons: Learning curve to making sure correct use of tools like docker and Kubernetes

Functions: Next deployment option is to deploy using Functions as a Service. Again a very good fit for Microservices gives you on-demand execution of code options. Service Provider gives us options like Azure Functions or AWS Lambda to build our code as Functions.

  • Pros: Can be cost-effective with pay based on the execution model as you only pay for execution time.
  • Cons: Does not fit all use cases, not all scenarios are supported (most vendors have a limit on the time it will take for execution), and Vendor locking.

Specialized options: Apart from the options mentioned above, most service providers give you specialized options like Azure gives you App Service and AWS has Elastic Bean Stalk that helps you deploy popular technologies like Java, Python, etc directly.

  • Pros: Being Managed services, helps the dev team get free from managing underlying infra and focus on development. Also provides features like monitoring and scaling off the shelf.
  • Cons: Learning curve to understand the framework and vendor locking as the deployable is specific to the vendor.

Cloud Native Application Design – Load Balancing

Load Balancing is an important technique in cloud-native application design to achieve scalability, reliability, and availability. The load can be distributed among nodes (physical or containers), based on rules like round robin, weighted, performance-based, geographical distribution, etc.

Load Balancing can be achieved at the following levels

DNS Level: DNS level load balancing is a method of distributing incoming network traffic across multiple servers or IP addresses by using DNS (Domain Name System) servers to resolve domain names to IP addresses. You can choose distribution riles based on need, for example,  you might want to send traffic originating from Europe to hit Europe servers whereas traffic from North America to hit North America servers. While resolving the DNS, the traffic manager will choose the backend endpoint based on the rules set.

Layer 7 or Application Layer: In Layer 7 load balancing, the load balancer analyzes the content of the incoming requests, including the HTTP headers, URLs, and other application-specific data, to determine how to distribute the traffic. For example, we can set rules that /images pattern is getting redirected to a backend, whereas /videos pattern is to another. Additionally one can have features like SSL termination, and WAF (Web Application Firewall, that will protect from threats like SQL injection attacks, Cross Site Scripting or XSS attacks, etc.) implemented.

Layer 4 or Transport Layer: Layer 4 load balancers can route traffic based on basic criteria such as source IP address, destination IP address, source port, destination port, and protocol type. At the transport layer, the load balancer does not have access to request data, hence decisions can only be taken at IP or Port level. At the same time as no parsing is involved, the overall performance is better.

Related: https://kamalmeet.com/cloud-computing/azure-load-balancing-options/

Cloud Native Application Design – Data Security

Security in the cloud can be broadly categorized at the following three levels

  • Infrastructure Security
  • Application Security
  • Data Security

Data Security

  • Encrypt Data at rest and transit: cloud service provider providers mechanism to secure your data at rest and in transit. Encryption is one big tool in your arsenal, for example, a simple step of using HTTPS against HTTP will ensure your data is encrypted and secured while in transit. Similarly, most cloud service providers have encryption available for disks and databases to help secure the data.
  • Data type-specific security: You also need to understand that there will be certain needs specific to the type of data you are storing, for example, if you are storing healthcare-related data, you will need to understand HIPAA (Health Insurance Portability & Accountability Act) needs, for finance-related data you might want to check PCI (Payment Card Industry) data standards. Also, there might be region-specific needs like in Europe we have GDPR or General Data Protection Regulation for personal data.
  • Avoid Weak Encryption: Though most cloud service providers give us options to encrypt our data, filesystems, and disks, it is the responsibility of the architect to make sure strong encryption is implemented. Tools like Key Vault services can help to store encryption keys to avoid manual handling. Also, all your APIs and pages dealing with important data should use HTTPS (Secured) protocol.

Cloud Native Application Design – Application Security

Application Security: When deploying your application in the public cloud, you need to make sure we are taking care of all precautions to safeguard our application from unauthorized access and attacks.

  • Infrastructure as a Code: Avoid accessing resources manually and configuring them, use scripts like terraform, ansible, or cloud-specific options to build infrastructure as a code.
  • No direct access: If a resource is not needed to be available externally, make sure all access is blocked. For example, if a database is to be accessed only by a microservice, give access only to that microservice and block all other access.
  • Automated Deployment: Deployments should not be done manually by actually placing deliverables on target machines manually, automate the process via continuous delivery scripts.
  • Layered Security Approach: When implementing security, most cloud service providers encourage a layered approach. That is, implement security rules at different layers like a load balancer, application server, application code, database, and so on. So that even in case one layer is compromised, your core application and data are still secured.
  • API Security (Authentication / Authorization): All APIs should be behind proper authentication and authorization. Note that a service accessed from the internet will have different security than a service that can only be accessed internally.
  • Common Application Threats: Common attacks like Code Injections, SQL Injections, and Cross-Site Scripting (XSS) can be targeted toward the application. It is the responsibility of the architect and development team to make sure best practices are followed while writing the code to tackle these attacks.
  • Perimeter Layer Attacks: DDOS or Distributed Denial Of Service is a common attack used by hackers to bring an application down. Most cloud service provider gives you out-of-the-box solutions that can help manage these threats.
  • Known Security holes- OWASP: Make sure to understand and take care of common threats like broken access control, inefficient logs, use of old unsecured libraries, etc. https://kamalmeet.com/uncategorized/owasp-top-10-security-threats/
  • Best Practices (API Gateways / Patterns): Use practices like Rate Limit, Circuit breaker, and bulkhead pattern to safeguard your application from attacks. Architectural best practices like API gateway in front of services make sure no direct access to service and also boilerplate responsibilities like security, HTTPS offloading, audit logging, etc can be offloaded from the main service.

Cloud Native Application Design – Infrastructure Security

When talking about security in the cloud, we can broadly categorize it into the following three areas.

  • Infrastructure Security
  • Application Security
  • Data Security

Infrastructure Security

Infrastructure security is about making sure that the infrastructure we are using is accessed only by authorized personnel. This is about both physical and virtual access. An important aspect when it comes to cloud security is understanding that it is a shared responsibility of the public cloud provider and development team.

  • Physical security: At the lowest level of security, one needs to consider the fact that physical machines can be accessed and tampered with. This is more possible when we have on-premise hardware infrastructure than on the cloud. But even when one is choosing a cloud platform, it makes sense to understand and question the level of physical security implemented by the cloud service provider to avoid any unauthorized access. This aspect is handled by the cloud service providers as part of shared responsibility.
  • Virtual access to infrastructure/ Role-based access (RBAC): The next level of access is someone gaining virtual access to the machines manually, programmatically, or through malware. Role-based access to make sure only authorized personnel or code can access data, having security groups and firewalls in place, and making sure security patches and antivirus definitions are always updated can help mitigate this threat.
  • Use Virtual Networks: Create Virtual Networks to group together resources needed by an application. For example, if a service API can only be accessed by an API gateway or a database should only be accessed by a particular microservice, we can make sure these components are in a virtual network and cannot be accessed from the outside world.
  • Manual Errors/ Infrastructure as a Code: A misconfiguration causing a VM exposed through unwanted open ports can be another problem. Implementing infrastructure as a code where automated scripts are responsible for creating and maintaining infrastructure can be helpful in avoiding manual errors.
  • Storage/ Data Access: Who can access a service or a filesystem or a database? What kind of access is required? How can the resources be accessed? One needs to answer these questions before getting started with the application development process. Role-based access is an important tool that can help architects making sure proper security. For example, a user or an application might just need read access on the file system or database, then rules should not allow any read or update access.
  • Audit Tracing: Most cloud service providers allow you to see any changes being done on infrastructure. You can monitor which resources were updated by whom and when. This is an important tool for teams to keep a track of changes.

OWASP Top 10 Security Threats

Here are the top 10 OWASP (Open Web Application Security Project) security threats- https://owasp.org/www-project-top-ten/

Broken Access Control: Proper Access control checks are not implemented at each layer of the application. One example is users can update the API and fetch data they do not have access to, /employee/{Id}, provide an {Id} manually and get the data. Additionally, users can POST, PUT, and DELETE data when they do not have access (because there is no check on the API level). Other Use cases are- when the user is able to manipulate JWT tokens to enhance privilege or CORS misconfiguration allows untrusted origin access.

Cryptographic Failures: Data in transit is not encrypted via HTTPS and TLS. Sensitive data like passwords is not encrypted. Data at rest is not encrypted. Sensitive information is not masked. Not strong enough encryption algorithms.

Injection: Data received is not sanitized for injections. Proper escaping and sanitization are missing in queries. SQL query format not analyzed for injections.

Logs: Ensure log data is encoded correctly. Ensure high-value transactions have an audit trail. Ensure all login, access control, and server-side input validation failures are logged. 

Vulnerable and outdated components: If underlying components or libraries are not kept up to date, this will increase the risk of vulnerabilities in the system.

Identification and Authentication Failure: Handling automated attacks or script attacks. Weak passwords. Not using multifactor authentication. Not invalidating old sessions and tokens.

Security Misconfiguration: Unnecessary ports are kept open, default accounts are not closed, and security patches are not applied.

Software and Data Integrity: Confirm that the data source is correct through a digital signature.

Insecure Design: Best practices like threat modeling are not being followed.

Server Side Request Forgery: Fetching a remote resource without validating the user-supplied URL

Case Study: How Razorpay’s Notification Service Handles Increasing Load

Interesting read on how Event Prioritization and Introducing a Data Stream to manage data asynchronously helped the team to increase the performance of the system and handle corner cases.

  1. They prioritized events to make sure important events do not suffer
  2. Introduced a layer (stream) instead of writing directly to the database
  3. Reduce Conumse priority if the time taken is beyond a limit
  4. Rate Limiting to filter out probable DOS events

https://engineering.razorpay.com/how-razorpays-notification-service-handles-increasing-load-f787623a490f

Rule Engine vs Recommendation Engine

In one of the recent team discussions, I heard the terms Rule Engine and Recommendation Engine being used interchangeably. This was confusing, but understandable as some overlapping areas are there when we try to solve a problem where we are trying to take a decision based on given inputs. In truer terms, these are actually complementing technologies that can help reach a final decision in a complex situation.

What is a rule Engine?

Mostly based on a specific business scenario, where a set of rules can predict the outcome.

Use cases:

  • An insurer determines whether a candidate meets eligibility requirements.
  • A retailer decides which customers get free shipping and a discount.

https://www.toptal.com/java/rules-engines-power-to-the-smeople

Implementation: As a lot is written and talked about when it comes to the implementation of a rule Engine, rather than reinventing here, let me use an example

Source: https://www.baeldung.com/java-rule-engines

@Rule(name = "Hello World rule", description = "Always say hello world")
public class HelloWorldRule {

    @Condition
    public boolean when() {
        return true;
    }

    @Action
    public void then() throws Exception {
        System.out.println("hello world");
    }
}

What is a recommendation Engine?

A very common feature you see in almost every website these days, let it be an e-commerce website recommending you products based on your past purchase history or an OTT site recommending videos to watch next. This takes into account users’ historical data plus data from other users with similar histories and tries to predict future likings. For example, users who like movies A, B, and C ended up looking a movie D.

Use cases

  • Recommend a video to be watched next
  • Recommend the next movie to be watched

Implementation Example-

https://analyticsindiamag.com/top-open-source-recommender-systems-in-python-for-your-ml-project/

Relationship between Rule Engine and Recommendation Engine?

Rules can be implemented on top of predictions coming from machine learning data. Reference Read

https://www.capitalone.com/tech/machine-learning/rules-vs-machine-learning/