Friday, 11 September 2015

50 [UPDATED] ETL TESTING Interview Questions and Answers pdf

50 TOP ETL TESTING Interview Questions with Answers pdf :-

 
1.What is ETL?


2.What are the operations for ETL testing?


3. What is OLAP?


4. What is OLTP?


5. What are Dimensions?


6. What are Confirmed Dimensions?


7. What are Fact Tables?


8.  What are the types of Facts?


9. What are the types of Fact Tables?


10. What is Grain of Fact?
ETL TESTING Interview Questions and Answers pdf
ETL TESTING Interview Questions

11. What is Factless Fact table?


12. What are Measures?


13. What are Cubes?


14. What are Virtual Cubes?


15. What is a Star schema design?


16. What is Snow Flake schema Design?


17. What is Operational Data Store [ODS] ?


18. What is Denormalization?


19. What is Surrogate Key?


20. What are the client components of Informatica 7.1.1?


21. What are the server components of Informatica 7.1.1?


22. What is Metadata?


23. What is a Repository?


24. What is Data Acquisition Process?


25. What are the GUI based ETL tools?


26. What are programmatic based ETL tools?


27.  What is a Transformation?


28. What are the various tools? Name a few?


29. What are features of Informatica Repository Server?


30. What is a Work Flow?


31. What is the uses of Lookup Transformation?


32. What are the different sources of Source systems of Data Warehouse?


33. Types of Slowly Changing Dimensions:


34. What are Update Strategy’s target table options?


35. What does a Mapping document contains?


36. What does the Top Down Approach says?


37. What does the Bottom Up Approach or Ralph Kimball Approach says?


38. Who is the first person in the organization to start the Data Warehouse project?


39. What is a Dimension Modeling?


40. What are the types of OLAPs ?


41. What is worklet?


42. Why we use lookup transformation?


43. What is a Power Center Repository?


44. What are snapshots? What are materialized views & where do we use them? What is a materialized view?


45. Where do we use connected and un connected lookups?


46. What is a Data Warehouse?


47. What are the Advantages of Data warehousing?


48. What is a Data mart?


49.In what way ETL testing is beneficial?  


50.What Tools are mostly used in ETL and report testing?

30 [UPDATED] ETL TESTING Multiple Choice Questions and Answers pdf free download

 ETL TESTING Multiple Choice Questions and Answers :-


****50 [MY EXPERIENCE] ETL TESTING Interview Questions and Answers pdf****CLICK HERE


1. All data in flat file is in this format.
A.    Sort
B.      ETL
C.     Format
D.      String
Ans:  D

2. It is used to push data into a relation database table. This control will be the destination for most fact table data flows.
A.   Web Scraping
B.     Data inspection
C.     OLE DB Source
D.    OLE DB Destination
Ans:  D

3. Logical Data Maps
A.   These are used to identify which fields from which sources are going to with destinations. It allows the ETL developer to identify if there is a need to do a data type change or aggregation prior to beginning coding of an ETL process.
B.   These can be used to flag an entire file-set that is ready for processing by the ETL process. It contains no meaningful data bu the fact it exists is the key to the process.
C.  Data is pulled from multiple sources to be merged into one or more destinations.
D.  It is used to massage data in transit between the source and destination.
Ans:  A

4. Data access methods.
A.  Pull Method
B.   Push and Pull
C. Load in Parallel
D.  Union all
Ans:  B

5. OLTP
A.   Process to move data from a source to destination.
B.   Transactional database that is typically attached to an application. This source provides the benefit of known data types and standardized access methods. This system enforces data integrity.
C.  All data in flat file is in this format.
D. This control can be used to add columns to the stream or make modifications to data within the stream. Should be used for simple modifications.
Ans:  B

6. COBOL
A.  Process to move data from a source to destination.
B.   The easiest to consume from the ETL standpoint.
C.  Two methods to ensure data integrity.
D.   Many routines of the Mainframe system are written in this.
Ans:  D

7. What ETL Stands for
A.   Data inspection
B.    Transformation
C.   Extract, Transform, Load
D.   Data Flow
Ans:  C

8. The source system initiates the data transfer for the ETL process. This method is uncommon in practice, as each system would have to move the data to the ETL process individually.
A.  Custom
B.   Automation
C.  Pull Method
D.  Push Method
Ans:  D

9. Sentinel Files
A.  These are used to identify which fields from which sources are going to with destinations. It allows the ETL developer to identify if there is a need to do a data type change or aggregation prior to beginning coding of an ETL process.
B.   These can be used to flag an entire file-set that is ready for processing by the ETL process. It contains no meaningful data bu the fact it exists is the key to the process.
C.    ETL can be used to automate the movement of data between two locations. This standardizes the process so that the load is done the same way every run.
D.   This is used to create multiple streams within a data flow from a single stream. All records in the stream are sent down all paths. Typically uses a merge-join to recombine the streams later in the data flow.
Ans:  B

10. Checkpoints
A.  Similar to "break up processes", checkpoints provide markers for what data has been processed in case an error occurs during the ETL process.
B.   Similar to XML's structured text file.
C.  Many routines of the Mainframe system are written in this.
D.  It is used to import text files for ETL processing.
Ans:  A
ETL TESTING Multiple Choice Questions and Answers pdf free download
11    Mainframe systems use this. This requires a conversion to the more common ASCII format.
A.     ETL
B.      XML
C.     Sort
D.     EBCDIC
Ans:  D

12      Ultimate flexibility, unit testing is available, usually poor documentation.
A.     ETL
B.     Custom
C.     OLTP
D.     Sort
Ans:  B

13      Conditional Split
A.  Many routines of the Mainframe system are written in this.
B.    Data is pulled from multiple sources to be merged into one or more destinations.
C.    It allows multiple streams to be created from a single stream. Only rows that match the criteria for a given path are sent down that path.
D.    This is used to create multiple streams within a data flow from a single stream. All records in the stream are sent down all paths. Typically uses a merge-join to recombine the streams later in the data flow.
Ans:  C

14     Flat files
A.  The easiest to consume from the ETL standpoint.
B.    Three components of data flow.
C.   Three common usages of ETL.
D.    Two methods to ensure data integrity.
Ans:  A

15     This is used to create multiple streams within a data flow from a single stream. All records in the stream are sent down all paths. Typically uses a merge-join to recombine the streams later in the data flow.
A.  OLTP
B.    Mainframe
C.   EBCDIC
D.   Multicast
Ans:  D

16       There are little to no benefits to the ETL developer when accessing these types of systems and many detriments. The ability to access these systems is very limited and typically FTP of text files is used to facilitate access.
A.  Mainframe
B.    Union all
C.   File Name
D.  Multicast
Ans:  A

17    Shows the path to the file to be imported.
A.  File Name
B.   Mainframe
C.  Format
D.  Union all
Ans:  A

18    Wheel is already invented, documented, good support.
A. Format
B.   COBOL
C. Tool Suite
D.  Flat files
Ans:  C

19     Similar to XML's structured text file.
A.  Data Scrubbing
B.     EBCDIC
C.    String
D.   Web Scraping
Ans:  D

20    Flat file control
A.  Three components of data flow.
B.    It is used to import text files for ETL processing.
C.   The easiest to consume from the ETL standpoint.
D.   Shows the path to the file to be imported.
Ans:  B

21    Two methods to ensure data integrity.
A.    Sources, Transformation, Destination
B.    Data inspection
C.   Row Count Inspection, Data Inspection
D.   Row Count Inspection
Ans:  C

22    Transformation
A.    Data is pulled from multiple sources to be merged into one or more destinations.
B.      It is used to import text files for ETL processing.
C.     Process to move data from a source to destination.
D.     It is used to massage data in transit between the source and destination.
Ans:  D

23    Three common usages of ETL.
A.  Data Scrubbing
B.    Sources, Transformation, Destination
C.     Merging Data
D.    Merging Data, Data Scrubbing, Automation
Ans:  D

24     Load in Parallel
A.  A value of delimited shou;d be selected for delimited files.
B.    Data is pulled from multiple sources to be merged into one or more destinations.
C.  This will reduce the run time of ETL process and reduce the window for hardware failure to affect the process.
D.  this should be check if column name have been included in the first row of the file.
Ans:  C

25   This can be computationally expensive excluding SSD.
A. Hard Drive I/O
B.   Mainframe
C.  Tool Suite
D. Data Scrubbing
Ans:  A

26   A value of delimited shou;d be selected for delimited files.
A.  Sort
B.   Format
C.  String
D.  OLTP
Ans:  B

27   this should be check if column name have been included in the first row of the file.
A.   Row Count Inspection, Data Inspection
B.    Format of the Date
C.   Column names in the first data row checkbox
D.     Do most work in transformation phase
Ans:  C

What are the Advantages of Data warehousing?

1.Maintaining a data warehouse have plenty of benefits which include:
2.Integration of data gathered from various sources
3.Beneficial for short term as well as long term businesses
4.Improves customer satisfaction
5.Easy access to the end users
6.Offers considerably accurate and comprehensive data
7.Integrates internal and external data as well
8.Timely access to the corporate data

In what way ETL testing is beneficial?

Provides data security
Offers accuracy in data
Reporting efficiency
Loss of data is minimal

What Tools are mostly used in ETL and report testing?

For ETL:
Informatica
Data stage
Oracle ware house builder
Abinito
Hyperion
Python

For Reporting:

Microstrategy
Business objects
Siebel analytics
Cognos

Where do we use connected and un connected lookups?

If return port only one then we can go for unconnected. More than one return port is not possible with Unconnected. If more than one return port then go for Connected.
If you require dynamic cache i.e where your data will change dynamically then you can go for connected lookup. If your data is static where your data won't change when the session loads you can go for unconnected lookups

What are snapshots? What are materialized views & where do we use them? What is a materialized view?

Materialized view is a view in which data is also stored in some temp table.i.e if we will go with the View concept in DB in that we only store query and once we call View it extract data from DB. But In materialized View data is stored in some temp tables.

What is a Power Center Repository?

The Power Center Repository allows you to share metadata across repositories to create a data mart domain. In a data mart domain, you can create a single global repository to store metadata used across an enterprise and a number of local repositories to share the global metadata as needed.

Why we use lookup transformation?

Look up Transformations can access data from relational tables that are not sources in mapping. With Lookup transformation, we can accomplish the following tasks.

What is worklet?

The worklet is a group of sessions. To execute the worklet we have to create the workflow.

What are the types of OLAPs ?

DOLAP: The OLAP tool which words with desktop databases are called as DOLAP. Example: Cognos EP 7 Series and Business Objects, Micro strategy.

ROLAP:
The OLAP which works with Relational databases are called as ROLAP. Example: Business Object, Micro strategy, Cognos ReportNet and BRIO.

MOLAP: The OLAP which is responsible for creating multidimensional structures called cubes are called as MOLAP. Example: Cognos ReportNet.

HOLAP: The OLAP which uses the combined features of ROLAP and MOLAP are called as HOLAP. Example Cognos ReportNet.

What is a Dimension Modeling?

A Dimensional Modeling is a high level methodology used to implement the start schema structure which is done by the Data Modeler.

Who is the first person in the organization to start the Data Warehouse project?

The first person to start the Data Warehouse project in a organization is Business Analyst.

What does the Bottom Up Approach or Ralph Kimball Approach says?

The Bottom Down Approach is coined by Ralph Kimball. According to his approach he says “First we need to develop subject oriented database called as “Data Marts” then integrate all the Data Marts to develop the Enterprise data warehouse.

What does the Top Down Approach says?

The Top Down Approach is coined by Bill Immon. According to his approach he says “First we need to implement the Enterprisedata warehouse by extracting the data from individual departments and from the Enterprise data warehouse develop subject oriented databases called as “Data Marts”.

What does a Mapping document contains?

The Mapping document contains the following information :
Source Definition – from where the database has to be loaded
Target Definition – to where the database has to be loaded
Business Logic – what logic has to be implemented in staging area.

What are Update Strategy’s target table options?

Update as Update: Updates each row flagged for update if it exists in the table.
Update as Insert: Inserts a new row for each update.
Update else Insert: Updates if row exists, else inserts.

Types of Slowly Changing Dimensions

Type – 1 (Recent updates)
Type – 11 (Full historical information)
Type – 111 (Partial historical information)

What are the different sources of Source systems of Data Warehouse?

1.RDBMS
2.Flat Files
3.XML Files
4.SAP R/3
5.PeopleSoft
6.SAP BW
7.Web Methods
8.Web Services
9.Seibel
10.Cobol Files
11.Legacy Systems.

What is the uses of Lookup Transformation?

The Lookup Transformation is useful for:

1.     Getting a related value form a table using a key column value
2.     Update slowly changing dimension table
3.     To check whether records already exists in the table.

What is a Work Flow?

A Work Flow is a set of instructions on how to execute tasks such as sessions, emails and shell commands. A WorkFlow is created from Workflow Manager.

What are features of Informatica Repository Server?

Features of Informatica Repository Server.

1.    Informatica client application and Informatica server access the repository database tables through the Repository Server.
2.     Informatica client connects to the repository server through the host name/ IP address and its port number.
3.     The Repository Server can manager multiple repository on different machines on the network.
4.     For each repository database registered with the Repository Server it configures and manages a Repository Agent process.
5.     The Repository Agent is a multi-threaded process that performs the action needed to retrieve, insert and updated metadata in the repository database tables.

What are the various tools? Name a few?

A few more
- Cognos Decision Stream
- Oracle Warehouse Builder
- Business Objects XI (Extreme Insight)
- SAP Business Warehouse
- SAS Enterprise ETL Server
Along with the above, need to include the below tools --Informatica --Abintio --DataStage

What is a Transformation?

A transformation is a repository object that generates, modifies, or passes data. Transformations in a mapping represent the operations the PowerCenter Server performs on the data. Data passes into and out of transformations through ports that you link in a mapping or mapplet. Transformations can be active or passive. An active transformation can change the number of rows that pass through it. A passive transformation does not change the number of rows that pass through it.

What are programmatic based ETL tools?

            1. Pl/Sql
            2. SAS BASE
            3. SAS ACCESS
            4. Tera Data Utilities
                        a. BTEQ
                        b. Fast Load
                        c. Multi Load
                        d. Fast Export
                        e. T (Trickle) Pump

What are the GUI based ETL tools?

The following are the GUI based ETL tools:

1.     Informatica
2.     DataStage
3.     Data Junction
4.     Oracle Warehouse Builder
5.     Abinitio
6.     Business Object Data Integrator
7.     Cognos Decision Stream.

What is Data Acquisition Process?

The process of extracting the data from different source (operational databases) systems, integrating the data and transforming the data into a homogenous format and loading into the target warehouse database. Simple called as ETL (Extraction, Transformation and Loading). The Data Acquisition process designs are called in different manners by different ETL vendors.

            Informatica   —->  Mapping
            Data Stage   —->  Job
            Abinitio        —->  Graph

What is a Repository?

Repository is a centrally stored container which stores the metadata, which is used by the Informatica Power center server andPower Center client tools. The Informatica stores Repository in relational database format.

            Informatica 7.1.1 Repository has 247 database objects
            Informatica 6.1.1 Repository has 172 database objects
            Informatica 5.1.1 Repository has 145 database objects
            Informatica 4.1.1 Repository has 111 database objects

What is Metadata?

Data about data is called as Metadata. The Metadata contains the definition of a data.

What are the server components of Informatica 7.1.1?

Informatica 7.1.1 Server Components:

1.     Informatica Server
2.     Informatica Repository Server.

What are the client components of Informatica 7.1.1?

Informatica 7.1.1 Client Components:

1.     Informatica Designer
2.     Informatica Work Flow  Manager
3.     Informatica Work Flow Monitor
4.     Informatica Repository Manager
5.     Informatica Repository Server Administration Console.

What is Surrogate Key?

A Surrogate Key is a sequence generated key which is assigned to be a primary key in the system (table).

What is Denormalization?

Denormalization means a table with multi duplicate key. The dimension table follows Denormalization method with the technique of surrogate key.

What is Operational Data Store [ODS] ?

It is a collection of integrated databases designed to support operational monitoring. Unlike the OLTP databases, the data in the ODS are integrated, subject oriented and enterprise wide data.

What is Snow Flake schema Design?

In a Snow Flake design the dimension table (de-normalized table) will be further divided into one or more dimensions (normalized tables) to organize the information in a better structural format. To design snow flake we should first design star schema design.

What is a Star schema design?

A Star schema is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart.

What are Virtual Cubes?

These are combination of one or more real cubes and require no disk space to store them. They store only definition and not the data.

What are Cubes?

Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provided multidimensional analysis.

What are Measures?

Measures are numeric data based on columns in a fact table.

What is Factless Fact table?

The Fact Table which does not contains facts is called as Fact Table. Generally when we need to combine two data marts, then one data mart will have a fact less fact table and other one with common fact table.

What is Grain of Fact?

The Grain of Fact is defined as the level at which the fact information is stored in a fact table. This is also called as Fact Granularity or Fact Event Level.

What are the types of Fact Tables?

The types of Fact Tables are:

1.  Cumulative Fact Table: This type of fact tables generally describes what was happened over the period of time. They contain additive facts.
2.  Snapshot Fact Table: This type of fact table deals with the particular period of time. They contain non-additive and semi-additive facts.

What are the types of Facts?

The types of Facts are as follows.

1.     Additive Facts: A Fact which can be summed up for any of the dimension available in the fact table.
2.     Semi-Additive Facts: A Fact which can be summed up to a few dimensions and not for all dimensions available in the fact table.
3.     Non-Additive Fact: A Fact which cannot be summed up for any of the dimensions available in the fact table.

What are Fact Tables?

A Fact Table is a table that contains summarized numerical (facts) and historical data. This Fact Table has a foreign key-primary key relation with a dimension table. The Fact Table maintains the information in 3rd normal form.
            A star schema is defined is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart.

What are Confirmed Dimensions?

The Dimensions which are reusable and fixed in nature Example customer, time, geography dimensions.

What are Dimensions?

Dimensions are categories by which summarized data can be viewed. For example a profit Fact table can be viewed by a time dimension.

What is OLTP?

OLTP stands for Online Transaction Processing Except data warehouse databases the other databases are OLTPs. These OLTP uses normalized schema structure. These OLTP databases are designed for recording the daily operations and transactions of a business.

What is OLAP?

OLAP stands for Online Analytical Processing. It uses database tables (Fact and Dimension tables) to enable multidimensional viewing, analysis and querying of large amount of data.

What is a Data mart?

A Data Mart is a subset of a data warehouse that can provide data for reporting and analysis on a section, unit or a department like Sales Dept, HR Dept, etc. The Data Mart are sometimes also called as HPQS (Higher Performance Query Structure).

What is a Data Warehouse?

Data warehouse is a store which is subject oriented, time variant, non volatile and integrated collection of various data which supports the businesses decision ability.

What are the operations for ETL testing?

Data verification for the transforming process as per the business rules and requirements Ensure that the verified data is loaded in the data warehouse without losing any data Make sure that this application rejects and replaces the inappropriate data and reports the invalid data
Load the data in the data warehouse prior to the prescribed time to improve the efficiency and scalability of the business.

What is ETL?

ETL stands for extraction, transformation and loading. ETL process is used to consolidate the data that is scattered data and can merge it into one system. The aim of ETL is to manage the data coming from multiple departments.