Digerati: 2016

Saturday, November 5, 2016

Composite versus Decorator versus Proxy

From GOF

Composite and Decorator have similar structure diagrams, reflecting the fact that both rely

on recursive composition to organize an open-ended number of objects. This commonality might tempt you to think of a decorator object as a degenerate composite, but that misses the point of the Decorator pattern. The similarity ends at recursive composition, again because of differing intents.

Decorator is designed to let you add responsibilities to objects without sub-classing. It avoids the

explosion of sub-classes that can arise from trying to cover every combination of responsibilities

statically. Composite has a different intent. It focuses on structuring classes so that many related objects can be treated uniformly, and multiple objects can be treated as one. Its focus is not on embellishment but on representation.

These intents are distinct but complementary. Consequently, the Composite and Decorator patterns are often used in concert. Both lead to the kind of design in which you can build applications just by

plugging objects together without defining any new classes. There will be an abstract class with some

sub-classes that are composites, some that are decorators, and some that implement the fundamental

building blocks of the system. In this case, both composites and decorators will have a common

interface. From the point of view of the Decorator pattern, a composite is a Concrete Component. From the point of view of the Composite pattern, a decorator is a Leaf. Of course, they don't have to be used together and, as we have seen, their intents are quite different.

Another pattern with a structure similar to Decorator's is Proxy . Both patterns describe how to

provide a level of indirection to an object, and the implementations of both the proxy and decorator

object keep a reference to another object to which they forward requests. Once again, however, they are intended for different purposes.

Like Decorator, the Proxy pattern composes an object and provides an identical interface to clients.

Unlike Decorator, the Proxy pattern is not concerned with attaching or detaching properties dynamically, and it's not designed for recursive composition. Its intent is to provide a stand-in for a subject when it's inconvenient or undesirable to access the subject directly because, for example, it lives on a remote machine, has restricted access, or is persistent.

In the Proxy pattern, the subject defines the key functionality, and the proxy provides (or refuses) access to it. In Decorator, the component provides only part of the functionality, and one or more decorators furnish the rest. Decorator addresses the situation where an object's total functionality can't be determined at compile time, at least not conveniently. That open-endedness makes recursive composition an essential part of Decorator. That isn't the case in Proxy, because Proxy focuses on one

relationship—between the proxy and its subject—and that relationship can be expressed statically.

These differences are significant because they capture solutions to specific recurring problems in object oriented design. But that doesn't mean these patterns can't be combined. You might envision a proxy decorator that adds functionality to a proxy, or a decorator-proxy that embellishes a remote object. Although such hybrids might be useful (we don't have real examples handy), they are divisible into patterns that are useful.

Code Examples

Composite

https://github.com/itnilesh/interviews/tree/master/interviews/src/main/java/com/salpe/interviews/design/structural/composite

Decorator

https://github.com/itnilesh/interviews/tree/master/interviews/src/main/java/com/salpe/interviews/design/structural/decorator

Proxy

https://github.com/itnilesh/interviews/tree/master/interviews/src/main/java/com/salpe/interviews/design/structural/proxy/virtual

Adapter versus Bridge design pattern

From GOF

The Adapter and Bridge patterns have some common attributes. Both promote flexibility by

providing a level of indirection to another object. Both involve forwarding requests to this object from an interface other than its own.

The key difference between these patterns lies in their intents. Adapter focuses on resolving

incompatibilities between two existing interfaces. It doesn't focus on how those interfaces are

implemented, nor does it consider how they might evolve independently. It's a way of making two

independently designed classes work together without re-implementing one or the other. Bridge, on the other hand, bridges an abstraction and its (potentially numerous) implementations. It provides a stable interface to clients even as it lets you vary the classes that implement it. It also accommodates new implementations as the system evolves.

As a result of these differences, Adapter and Bridge are often used at different points in the software

life-cycle. An adapter often becomes necessary when you discover that two incompatible classes should work together, generally to avoid replicating code. The coupling is unforeseen. In contrast, the user of a bridge understands up-front that an abstraction must have several implementations, and both may evolve independently. The Adapter pattern makes things work after they're designed; Bridge makes them work before they are. That doesn't mean Adapter is somehow inferior to Bridge; each pattern merely addresses a different problem.

You might think of a facade as an adapter to a set of other objects. But that

interpretation overlooks the fact that a facade defines a new interface, whereas an adapter reuses an old interface. Remember that an adapter makes two existing interfaces work together as opposed to defining an entirely new one.

Please refer examples

Adapter

https://github.com/itnilesh/interviews/tree/master/interviews/src/main/java/com/salpe/interviews/design/structural/adapter

Bridge

https://github.com/itnilesh/interviews/tree/master/interviews/src/main/java/com/salpe/interviews/design/structural/bridge

Friday, October 28, 2016

Information Retrieval - Notes

Information Retrieval :

Process of finding documents of un-structured nature (usually converted to text ) which satisfies information need from large collection .

Example :

Email search
Desktop search
Legal Search
Search knowledge base
Web search - Google/Bing etc.

Unstructured data is increasing day by day compared structured data .

Information retrieval could be on static or dynamic collection .

Normal Flow for static collection

1. Feed collection of documents to search engine
2. Frame query for information need
3. Analyze results
4. Refine the query (adding removing some terms , adding correct operators ) if results are not as per expectations
5. Do search with refined query continue till you get what you want .

How to compare two information retrieval engines ?

Precision : Fraction of retrieved documents which are relevant to user 's information need.
If we get 10 documents as result only 2 are relevant to user the precision is 2/10.

Recall : Fraction of relevant documents in collection that are retrieved .
In collection there are actual 20 documents which are relevant but query retrieves 10 documents out of which only 2 are relevant then recall is 2/20.

Following things can affect Precision and Recall

1. Query formulation
2. Underlying search engine logic while indexing and searching .

Techniques

Linear Scan /Exhaustive search

Search all documents line by line for query say using grep in linux.
Above approach is time consuming and might be need multiple scans .
Can not process complex query like proximity , ranked search results.

it is time and memory in-efficient .

Core technique is to build inverted index .

Document -Term incident metrics

Example :

We have say 4 documents with all containing 4 distinct terms.

----------------------------------------------------------
terms doc1 doc2 doc3 doc4
----------------------------------------------------------
antony 0 1 0 1

brutus 1 1 1 0

casear 1 1 1 1
----------------------------------------------------------

1 represents that terms is in document while zero represents that terms is not present.

terms are total distinct terms in all documents .

Not possible to build metrics for even say 1 million documents .

For large information this metrics is sparse .
Instead of storing documents ids (postings ) which represents unmatched comments we should store only document ids (postings ) for matched documents .

Query Processing

Logical Operators like AND

Example

------------------------------------------------------
| term | postings |
| | (already sorted) |
-------------------------------------------------------
| brutus | 1 ,4,6,8,10 |
| caesar | 3,6,8,11,12 |
-------------------------------------------------------

Say you query is "brutus AND caesar "

Then you need to merge postings are two terms .
Result will be 6,8 .

You can keep positing list sorted and do merge using merge sort sub-routing .
You need to do N way merge in memory by loading batch of posting one by one as per need.

Phrase search

Say you want to search "San Francisco " then we need to search entry which has "San" and "Francisco" placed next to each other in order .

One technique could be bi-gram . Index two words as one word "San Francisco" what you will do for tri-word and quad-wrods ?

Biwords index is not standard solution for phrase search .

We need to store positions for each posting in postings list .

Say we hve

term   postings (already sorted)

San 1 ,4,6,8,10
Francisco 4,8,22

Now we will have positional index for each posting

for word "San" we have

-------------------------------------------------------
| posting |   offsets |
-------------------------------------------------------
| 1 | 7,8,10,100 |
| 4 |    1 |
| 6 | 4,8 |
| 8 | 7,9,10 |
| 10 | 11,12 |
-------------------------------------------------------

for word "Francisco" we have

-------------------------------------------------------
| posting |   offsets |
-------------------------------------------------------
| 4 | 2.6,9 |
| 8 | 15 ,60   |
| 22 | 1,2,3 |
-------------------------------------------------------

For phrase search "San Francisco" we will have result posting 4 because San is at offset position 1 and Francisco is at offset position 2 so we have these two terms next to one another.

Now while merging postings you have to consider offsets also .
Because of this merge will be come complex and non-trivial problem.

Now you can also search for proximity search means with number of words between query words.

Positional index are costly it will increase index by approx. 30% by size while it will waste 25 % of search time .
So we have to use use mixed techniques like common queries use bi-words example "tale of two cities " index this as one words and also as bi-word.
For rare words we can use positional index .
So mix of strategy could help . This could also done based on search statistics like if some phrase is searched again and again then why to use positional index lets index whole term as it is .

Boolean search vs ranked search

Above all discussion is related to Boolean search which can answer yes/no type question but what about relevancy ? means we want to get most relevant documents first then less relevant .

For Boolean search we might get too few results with ANDs and too many with ORs which is like feast or famine .

So really art of forming perfect query lies with user . So user need to be more expert on it and we do not want to increase learning curve of user .
So ranked search is for rescue so that user always get top relevant /scored document first .
Ranked retrieval is essentially search results sorted by some kind of relevancy score generally it is between 0 to 1 .

Ranked retrieval are generally used for free text search . but for well defined fields with definite set of values Boolean search is better.

Ranked Search - Scoring Models

Jaccard Co-efficent

Commonly used to measure extend of overlap of two sets A & B .

Jaccard(A,B) = |A ∩ B | / |A U B |
Jaccard(A,A) = |A ∩ A | / |A U A | = A/A = 1

if A ∩ B = 0 then
Jaccard(A,B) = 0

A and B may not have same size . So it might skew it for larger set .

Example

Query , Q = "ides of march "
Document , D1 = "caesar died in march"
Document D2 = "the long march"
Document D2 = "the ides march "

Jaccard(Q,D1) = | Q ∩ D1 | / | Q U D1 | = 1/6 = 0.16

Jaccard(Q,D2) = | Q ∩ D2 | / | Q U D2| = 1/ 5 = 0.2

Jaccard(Q,D3) = | Q ∩ D3 | / | Q U D3| = 2/ 4 = 0.5

Observer document is smaller number of word wins .

Jaccard's scoring problem

1. Does not consider terms frequency , generally document with more frequency of words and which are globally in other document rare are more relevant .

2. Need normalization for document length because shorter document always wins than longer document .

Terms frequency weighting

We can create Document -Term incident metrics as actual number of frequencies than zero one based vector.

----------------------------------------------------------
| terms doc1 doc2 doc3 doc4 |
----------------------------------------------------------
antony 0 10 0 12

brutus 11 100 50 11

casear 0 0 0 1 1000
----------------------------------------------------------

But if documents contain same words with combination variation then both documents will have same score .
Example :

Document D1 = Jane is quicker than John
Document D2 = John is quicker than Jane

Document D1 and D2 have same frequency vector.

We need to have positional information for phrase search .

Term Frequency (tf -t,d)

tf -t,d - term t occurring in document d.

Number of time the term occurs in one document .

So document with more more frequent terms is relevant than document with terms has low frequency. But that is not always case . If words is so common in whole collection then importance of that words words should reduce . So we need to dampen globally occurring words but enhance locally frequent word.

We can dampen the total value by taking log of tf-t,d.

w-t,d = 1 + log(tf -t,d) if tf-t,d > 0

w-t,d = 0 if  tf-t,d ==0

Score = summation of all w-t,d for all terms in query .

Score = ∑ t ∈ q ∩ d  w-t,d

Inverse Document Frequency

In search if documents have terms from query which are rare across collection but frequent in the particular document then score of such document need to be higher because generally rare terms are more informative than frequent one .
For example - frequent words like hello , hi , all should be scored low.

df -t = number of document which contains the term t . Repeated occurrence is not counted in same document.

df-t <= N

N = total number of documents in the collection

idf = inverse document frequency

idf-t = log (N/df-t)

Here log function is used to dampen the value .
idf value is between 0 to logN

Example

N = 1000000 = 1 million

---------------------------------------------------------------------------------
term df-t idf-t
---------------------------------------------------------------------------------

calpurnia 1 log(1000000/1) = 6

animal 100 4

sunday 1000 3

fly 10000 2

the 1000000 0

-----------------------------------------------------------------------------------

tf-idf weighting

Wt,d = (1+log(tf-t,d)) * log (N/df-t)

tf-idf is multiplication of term frequency multiply by inverse document frequency .

Vector Space Model

Terms as axes while documents can be seen as points/vectors with wrights .
This N dimensional system with space vectors .

Query and documents are vectors in same space we need to find document vectors which are closer to query vectors means more similar documents to query .

Here say we have three documents with terms containing gossip or jealous .
d1 is more about gossip
d3 is more about jealous
d2 is more about both

Query is expecting two word at same weight then document is more relevant .

Here document d2 is more relevant for that measuring Euclidean distance is not good idea we need to measure angle between unit vectors.

Smaller the angle more similarity is the document to query so we have to take cosine of angle between unit vectors.

cosine similarity is dot product of unit vectors.

`cos(q,d) = q.d/ |q|*|d| = q / |q| * d / |d|

q is tf-idf weight of term i in the query q .
d is tf-idf of term i in document d .

Ranked retrieval in IR

We have term frequency and inverse document frequency so we combine both to create tf-idf which is length of vector then we normalize query term and document term vector to calculate angle between them which represents the score for ranked retrieval . IR system will give you documents sorted by score.

For calculating term frequency and inverse document frequency and for normalization people might use different formula other than discussed here .

Information Retrieval - Notes

Information Retrieval :

Process of finding documents of un-structured nature (usually converted to text ) which satisfies information need from large collection .

Example :

Email search
Desktop search
Legal Search
Search knowledge base
Web search - Google/Bing etc.

Information Retrieval - Notes

Information Retrieval :

Process of finding documents of un-structured nature (usually converted to text ) which satisfies information need from large collection .

Example :

Email search
Desktop search
Legal Search
Search knowledge base
Web search - Google/Bing etc.

Saturday, October 15, 2016

LinkedList top interview questions

I am listing here selected interview questions for linked-list

Scope

Singly linked list
Doubly linked list
Unrolled linked list
Skip List
Linked Array List
Memory efficient linked lists

Lateral Thinking points -

Compare linked list with array using aspects on time and space . Which one to use in which situation ?
Recursive or iterative solution
Linear/Binary Searching
Indexing using hash-table /map
Two runner technique
Brute force using nested loops.
Time , Space trade-off
Number of scans of linkedlist
Cost of saving next and or prev pointers

Easy

Implement linked list with operations like add , delete , put using index position also implement addFirst , addLast , deleteFirst , deleteLast.
Implement stack using linkedlist.
Implement queue using linkedlist .
Reverse linkedlist .
Traverse circular linked list .
Remove duplicated from linkedlist.

Medium

Find kth node from end .
Find middle of linkedlist
Find cycles in linkedlist and find entry point of loop.
Insert node in sorted linked list
find the merge point of two linkedlist which have different length.
Merge two sorted linked lists.
Implement Josephus Circle using linkedlist
Delete middle of linked list , you have access to middle element only.

Advanced

Create linkedlist with O(1) access time.
Reverse k blocks of linked list
Reverse each pair of linked list
Split circular linkedlist
Sort linked list using insertion sort .
Sort linked list using merge sort .
Segregate linked list (data in link is integer) in two list of even and odd numbers.
Add/Subtract/Multiply/Divide two linked list which represents number .
Partition linked list around the pivot value .
Check whether two linked lists are palindrome.

Tough

Implement CRUD on skiplist
Implement CRUD on unrolled linked list
Implement CRUD on Linked-Array List
Implement LRU/MRU cache

Sunday, September 25, 2016

Inheritance versus Composition

This is fairly basic topic in OOAD but I have seen most of programmers do not understand subtleties .
Mostly they do not want to think hard and make inheritance as default choice of implementation for re-use .

Finally end up with BIG FAT parent class extended by many classes with lot of if-else conditions and only god know what is the mess in that class .

Inheritance is achieved by extending class or interface/interfaces in java .
Composition is achieved by building stronger HAS_A relationship .

Inheritance (Compile Time Structure)

1. Uses sub-classing or extending the base class (interface or class).
2. It is called white-box re-use because it knows internals of parent class.
3. New functionality is achieved by partially or completely re-defining of parent functionality .
4. Defined statically at compile time and easy to understand.
5. Inheritance break encapsulation - because it has access to internal state of parent class . If some thing is changed at parent class we need re-test all extended classes also . It might cause ripple effect of change if you change anything in parent class to all sub-classes.
6. To use inheritance you should know all internal state and documentation of parent class so there is learning curve .
7. Relationships can not be re-defined at run time .
8. Has to deal with small number of classes.

Composition (Run Time Structure)

1. Uses composition of small objects which have well defined interfaces .
2.It is called black-box re-use because object using the object do not know internal details of the object .
3. New functionality is developed by using small functionality of number of objects.
4. Composition have well defined interfaces to all small objects so as long as there interface is same user object does not care details .
5. Composition promotes encapsulation allowing to change parent and subclass to change and evolve independently .
6. To use composition you do not need to know internal state of objects and once you know interfaces there is very small learning curve so code is easy to understand .
7. As per need you can compose object with large number of small objects to mimic required behavior .
8. Has to deal with large number of small classes.

So suppose you have small N number of small objects with specific focused behavior ideally you should able to compose 2^n possible compositions .

So Remember favor object composition over class inheritance for more testable code , less learning curve and more encapsulation so reuse .

Saturday, September 24, 2016

Angular JS interview questions

Design patterns in one page

Main Opps concepts are classes, object , inheritance , polymorphism , abstraction and encapsulation .
Classes can have fixed state and fixed behavior while object can have dynamic state and behavior .

You can use either object composition or class inheritance (class inheritance or interface inheritance ) while designing classes .Inheritance is tied to class at compile time .So such behavior is static at run-time . This is the main theme used class scoped design patterns .

You can create large number of granular objects and use dynamic composition through polymorphism to change behavior at run-time as per your need. This is major theme used in object scoped design patterns .

While designing always program to interface and not to implementation so that code is reusable to client and owner could plug-in any implementation as per need which client is unaware of . This is also major theme in most of design patterns .

Main Concepts :

Inheritance (Extending abstract class , implementing one or many interfaces / abstraction/generalization/realization/Is_A )

Inheritance is fixed at compile time so it has static behavior .

Composition (Association/Aggregation/composition / Has_A )
Composition can be decided at run time so it has dynamic behavior .

Creation Patterns

Abstract Factory (Kit)
Provide an interface for creating families of related or dependent objects without specifying their
concrete classes.

Builder
Separate the construction of a complex object from its representation so that the same construction
process can create different representations.

Factory Method
Define an interface for creating an object, but let sub-classes decide which class to instantiate. Factory
Method lets a class defer instantiation to sub-classes.

Prototype
Specify the kinds of objects to create using a prototypical instance, and create new objects by
copying this prototype.

Singleton
Ensure a class only has one instance, and provide a global point of access to it.

Structural Patterns

Adapter
Convert the interface of a class into another interface clients expect. Adapter lets classes work
together that couldn't otherwise because of incompatible interfaces.

Bridge
Decouple an abstraction from its implementation so that the two can vary independently.

Composite
Compose objects into tree structures to represent part-whole hierarchies. Composite lets clients treat
individual objects and compositions of objects uniformly.

Decorator
Attach additional responsibilities to an object dynamically. Decorators provide a flexible alternative
to sub-classing for extending functionality.

Facade
Provide a unified interface to a set of interfaces in a subsystem. Facade defines a higher-level
interface that makes the subsystem easier to use.

Flyweight
Use sharing to support large numbers of fine-grained objects efficiently.

Proxy
Provide a surrogate or placeholder for another object to control access to it.

Behavior Pattern

Chain of responsibility
Avoid coupling the sender of a request to its receiver by giving more than one object a chance to
handle the request. Chain the receiving objects and pass the request along the chain until an object
handles it.

Command
Encapsulate a request as an object, thereby letting you parameterize clients with different requests,
queue or log requests, and support undo-able operations.

Interpreter
Given a language, define a representation for its grammar along with an interpreter that uses the
representation to interpret sentences in the language.

Iterator
Provide a way to access the elements of an aggregate object sequentially without exposing its
underlying representation.

Mediator
Define an object that encapsulates how a set of objects interact. Mediator promotes loose coupling by
keeping objects from referring to each other explicitly, and it lets you vary their interaction

independently.

Memento
Without violating encapsulation, capture and externalize an object's internal state so that the object
can be restored to this state later.

Observer
Define a one-to-many dependency between objects so that when one object changes state, all its

dependents are notified and updated automatically.

State
Allow an object to alter its behavior when its internal state changes. The object will appear to change
its class.

Strategy
Define a family of algorithms, encapsulate each one, and make them interchangeable. Strategy lets
the algorithm vary independently from clients that use it.

Template Method
Define the skeleton of an algorithm in an operation, deferring some steps to sub-classes. Template
Method lets sub-classes redefine certain steps of an algorithm without changing the algorithm's
structure.

Visitor
Represent an operation to be performed on the elements of an object structure. Visitor lets you define
a new operation without changing the classes of the elements on which it operates.

Non GOF patterns

Object Pool
MVC pattern
Dependency Injection

Practical Hybrid Patterns :

Factory Method + Prototype +Singleton

If you have to create very large number of types of objects using factory method then you need to pass some type identifier are parameter to factory method , then in factory method you have to create large number of if-else clauses or long switch clauses to implement logic .
To void this you can keep map or registry of type identifier to object . The sub-classes should be clone-able or have copy constructor or they are singleton . So when request comes to factory method with type identifier , look up in registry get object of the type then either return clone of object or singleton object .
Instead of keeping registry of objects you can keep mapping of type identifier to class then using reflection you can create objects on fly and return for the specified types .
Registry could be map /server or simply property /JON /XML file .

Web application security interview questions

Web Services Interview questions

Distributed Systems interview questions

Cassandra interview questions

Distributed Systems Concepts :

1. What will happen when nodes goes down and does not respond to co-coordinator node ?
2. What synchronization is used in Cassandra nodes ? NTP or vector clock ?
3. How decide nodes in cluster with requirements for read and write load ?
4. What is CAP theorem ? Is it possible to achieve all three ? if not why ?
5. What is eventual consistency ?
6. How to tune read and write consistency in Cassandra ? How to get stronger consistency ? if we get strongest consistency level then how we will sacrifice availability
7. What kind of hashing is used in Cassandra ?
8. What is consistent hashing ?
9. Why write are faster in Cassandra ? Describe write path ?
9. Why read are slower in Cassandra ? Describe read path ?
10. Compare read/write path in Cassandra and in mysql ?
11. When to choose Cassandra over mysql ?
12. Which consistency model Cassandra uses ? CA , AP and CP ? and why ?
13. When some nodes are down , how Cassandra do replication ?
14. What is hinted hand off in Cassandra ?
15. What are consistency levels ? Are consistency levels per query or global ?
16. What are differences between ALL , QUORUM and ONE what is effect on performance ?
17. If my replication factor is 3 then In case of ONE consistency level will Cassandra replicate to 3 nodes ?
18. How to handle OLTP and OLAP load ? what could be strategy ?

NOSQL interview questions

Elastic Search interview questions

Friday, September 9, 2016

Object Oriented Analysis and Design : OOAD Interview Questions (Scale , Design Patterns , Algorithms )

Nowadays nobody will ask you to design systems with limited number of users , always expectation would how you design highly available , partition tolerant , horizontally scaling secure cloud application .

Gone are the days , when interviewer was asking to design chat server which works in one JVM process for two users .

Now people need to start thinking laterally about highly scale-able, distributed, parallel algorithms and how scale only by adding more hardware for dynamic load .
This fact is more essential now a days because architecture remains same for single user or millions of users , only difference is number of machines handling them in background .

Pre-requisite :

- Design patterns
- Distributed Systems
- Cloud Platform - AWS/OpenStack/Azure
- SQL/NO-SQL databases for big data , Tuning consistency in NO-SQL
- Map-Reduce - Hadoop/Apache Spark/ZooKeeper
- Micro- services architecture
- Rule /workflow engines
- Information retrieval - (Elastic search /Solr)
- Shared nothing architecture
- Synchronous systems (queue based producer /consumer) - P2P
- pub-sub models
- Caching (Write-through , Write-back , Write-around , eviction policies ) - Memcached, Redis

Design Goals -

Reusable , Extensible design
High performance ( Acceptable SLAs , Storage )
High Availability
Partition Tolerance
Handles big data
Security
Scale linearly (vertical and horizontal ) with more hardware .

General aim is throw ambiguous questions to candidate and see how he can navigate through open ended scenarios . How he asks questions , how he create use cases , how he consider performance , scale , security and SOLID principles to design .

Such questions are suitable for Face to Face interview or given offline tasks with sufficient time. It is not suitable for telephonic interviews .

Approach

Top to bottom approach

Macro Level -

Build basic use cases and in-variants .Understand success/warning/failure cases .Write down use cases one by one . Ask questions for scenarios using Ws like What, Who, When , Where , Why and also How.
Understand broad level independent logical modules /packages/micro-services .
Identify main entities /interfaces /classes in the system. Understand segregated interfaces . Do not one fat interface .
Understand basic actors in the system .
Draw component /class /object /package diagram .
Anticipate changes in future requirements like if some things are not dynamic today , those could be dynamic in future , adding similar type of specialized functionality etc.

Micro-Level

First do all interface based design , for all major classes (DAO classes , Coordinator classes )
Consider design patterns if applicable , do not over use them to show your knowledge .
Think multi-threaded approach with compute intensive / IO intensive activities .
Understand basic tasks and separable boundaries so that you can create distributed processing of item .If task is fully independent then it will be helpful to use distributed computing - Map-Reduce .
Externalize configurable parts like rules engine , thread-pool sizes in JSON , property file , XML etc.

General Questions

1. Design Coffee Machine
2. Design lift system for a multi-storied building
3. Design Chat Server (typical pub-sub problem)
4. Design distributed cache
5. Design traffic signal system
6. Design bus/train scheduling system
7. Scheduling meetings with given number of meeting rooms.
8. Design online shopping site .
9. Design tiny URL system .
10. Do some kind of analytics on text for large number of files (typical Map-Reduce problem).
11. Design restaurant reservation system .
12. Design restaurant system with rule based pricing and dynamic configurations for menu items .
13. Find connected components in large data . (say you have excel-sheet with some king of graph structure ).
14. Design Search Engine .
15. Design twitter
14. Design parking lot system.
15. Design custom re-entrant lock
16. Design custom thread pool.
17. Design Object thread pool or JDBC connection thread pool.
18. Design distributed file system.

Monday, August 8, 2016

Competencies of Software Architect

Basic Algorithms and Data Structures

Big O notations
Recursion and backtracking
Linear data structures (linked list , dynamic list ,stack , queue , priority queue )
Sorting
Searching (linear , binary , interpolation )
Hashing (Hash function , collision resolution - chaining , open addressing , double hashing/universal hashing )
String Algorithms (KMP, Rabin Karp , Boyce Moore)
Trees (Binary Tree , Binary Search Tree, Binary Heaps )
Tree Balancing algorithms (B-trees , AVL , 2-3 , red-black trees)
Graphs (Minimum Spanning Tree , Shortest Paths , Cycle detection , dynamic connectivity )

Advanced Algorithms and Data Structures

Greedy Algorithms
Dynamic Programming
Linear Programming
Probabilistic Data Structures
Complexity classes (P,NP etc .)

Distributed Algorithm Concepts

2 phase commit
Paxos - solving consensus
Leader election
Consistent Hashing /Routing
Synchronous/Asynchronous communication
Fault tolerance
Eventual Consistency
Replication
pub-sub using distributed queues
Bloom Filters

Distributed/Cloud Computing technologies

Basics of AWS or OpenStack or Azure or IBM bluemix
No-SQL

Key value store - Redis , Memcached
Column Family - Cassandra
Document - MongoDB , Elastic Search
Graph Databases - Neo4j

Distributed Queues (SQS , RabbitMQ , Kafka)
Distributed Locking /co-ordination - Apache ZooKeeper
Map-Reduce frameworks

Hadoop eco-systems
Apache Spark

REST web services (Jersey , apache CXF , Swagger )

REST/SOAP web services

Micro-service architecture

what is micro-service
How to design micro-service which persists data.

SOAP web services in general
Differences between SOAP and REST , why REST ?
REST Design principles

Idempotent (GET , HEAD , PUT) , (POST,PATCH)
HATEOS
Security (stateless , Authentication and Authorization m Security frameworks (OAuth , Basic on SSL , HTTPS , Digest , error messages ) )
Caching (cache-control , etag , validation , expiration )
HTTP headers used for rest (Media-Type , Cache Control , Auth , Accept , Content-Type, If-Non- Match )
Commonly used HTTP codes (1XX, 2XX, 3XX, 4XX, 5XX)
Searching , sorting , filtering and paging (Link headers , rel , href , query parameters )
Documentation - Swagger
HTTP 1.1 vs HTTP 2.0
Handling Asynchronous request (Notification URL , Push /Pull models )
URL designing conventions
Versioning of APIs
Error Messaging (user message , developer message , http error codes 4XX, 5XX)
JSON (snake_case , vs camelCase , DATE format)

REST frameworks - Jersey , Apache CXF , Spring Framework , Play framework , REST Easy

Basic References :
http://www.vinaysahni.com/best-practices-for-a-pragmatic-restful-api
RESTful Web Services
http://www.amazon.in/dp/9352130693
http://www.amazon.in/dp/059680168

Search Technologies

Elastic Search
Apache Solr

Object Oriented Design

Design Patterns (at least design patterns from GOF)
SOLID design principles
UML

Java Technology Stack

Java Language :

Java Language basics (grammar , collections , exception handling )
Concurrency

Java Memory Model
Pessimistic Concurrency control ( exclusive locks , read/write locks )
Optimistic Concurrency control (CAS based operations )
Java Concurrent Package (Locks , Concurrent collections , Semaphore , Latch , Cyclic Barriers , ThreadPools , Executors etc.)
Thread dump analysis

Lambda Expressions
Custom Annotations
Java Serialization Specifications
JVM architecture

Class loaders
Garbage Collection algorithms
Java Security

Regex

Source Code Management

GIT
PerForce
SVN

Testing Frameworks

Junit
TestNG

Concumber

Cryptography

Cryptographic hash functions

Tuesday, July 12, 2016

Binary Tree Traversing algorithms

There could be surprise when people ask you to print binary tree in some special ways .

There following techniques for that

1. Recursion
2. Using stack/queue for non-recursion
3. Keeping track of (X,Y) coordinates (start root with (0,0) ) of node like when you go left decrease x , when you go right increase X

Following could be possible set of questions for iterating over binary tree (is may not be BST ).

Depth First : (with or without recursion )

1. Pre-order
2. In-order
3. Post-order

Breadth First :

1. Breadth first - level order
2. Breadth first - zig-zag , or alternate direction in level order or spiral order traversal

Special Order

1. Print node or sum of nodes , with vertical columns in the binary tree.
2. Print sum of nodes which in diagonal of binary tree.
3 .Print all nodes in circumference of the tree means all nodes in root left , right and bottom borders .
4. Print bottom view of binary tree.
5. Print top view of binary tree.
6. print inner nodes of binary tree.
7. print outer layer of binary tree
8. print only leaf nodes
9. print periphery of binary tree

Other oddities could be there like

If you can write pre-order, in-order , post-order using recursion always think how you can do using it stack or without recursion .

Same logics if possible apply for n-Array tree.

We will solve above all one by one in next blogs .

Saturday, November 5, 2016

Friday, October 28, 2016

Saturday, October 15, 2016

Sunday, September 25, 2016

Saturday, September 24, 2016

Friday, September 9, 2016

Monday, August 8, 2016

Tuesday, July 12, 2016

Nilesh Salpe