Monday, July 23, 2007

Make Money Blogging

make money blogging, bloggers, blogging, blogging software, blogging tool, blogging tools

Sunday, July 22, 2007

Informatica Interview Questions with Answers

1. Why we use stored procedure transformation?
For populating and maintaining data bases.
2 Why we use partitioning the session in informatica?
Partitioning achieves the session performance by reducing the time
period of reading the source and loading the data into target.
3 Why we use lookup transformations?
Lookup Transformations can access data from relational tables that
are not sources in mapping. With Lookup transformation, we can
accomplish the following tasks:
Get a related value-Get the Employee Name from Employee table based on
the Employee IDPerform Calculation.
Update slowly changing dimension tables - We can use unconnected lookup
transformation to determine whether the records already exist in the
target or not.
4 Why use the lookup transformation ?
To perform the following tasks.
Get a related value. For example, if your source table includes
employee ID, but you want to include the employee name in your target
table to make your summary data easier to read.
Perform a calculation. Many normalized tables include values used in a
calculation, such as gross sales per invoice or sales tax, but not the
calculated value (such as net sales).
Update slowly changing dimension tables. You can use a Lookup
transformation to determine whether records already exist in the
target.
5 Why u use repository connectivity?
When u edit,schedule the sesion each time,informatica server directly
communicates the repository to check whether or not the session and
users r valid.All the metadata of sessions and mappings will be stored
in repository.
6 While importing the relational source defintion from database,what
are the meta data of source U import?
Source name
Database location
Column names
Datatypes
Key constraints
7 Which transformation should we use to normalize the COBOL and
relational sources?
Normalizer Transformation.
When U drag the COBOL source in to the mapping Designer workspace,the
normalizer transformation automatically appears,creating input and
output ports for every column in the source.
8 Which transformation should u need while using the cobol sources
as source defintions?
Normalizer transformaiton which is used to normalize the data.Since
cobol sources r oftenly consists of Denormailzed data.
9 Which tool U use to create and manage sessions and batches and to
monitor and stop the informatica server?
Informatica server manager.
10 Where should U place the flat file to import the flat file
defintion to the designer?
Place it in local folder
11 When the informatica server marks that a batch is failed?
If one of session is configured to "run if previous completes" and
that previous session fails.
12 What r two types of processes that informatica runs the session?
Load manager Process: Starts the session, creates the DTM process,
and sends post-session email when the session completes.
The DTM process. Creates threads to initialize the session, read,
write, and transform data, and handle pre- and post-session operations.
13 What r the unsupported repository objects for a mapplet?
COBOL source definition
Joiner transformations
Normalizer transformations
Non reusable sequence generator transformations.
Pre or post session stored procedures
Target defintions
Power mart 3.5 style Look Up functions
XML source definitions
IBM MQ source defintions
14 What r the types of metadata that stores in repository?
Following r the types of metadata that stores in the repository
Database connections
Global objects
Mappings
Mapplets
Multidimensional metadata
Reusable transformations
Sessions and batches
Short cuts
Source definitions
Target defintions
Transformations
15 What r the types of maping wizards that r to be provided in
Informatica?
The Designer provides two mapping wizards to help you create mappings
quickly and easily. Both wizards are designed to create mappings for
loading and maintaining star schemas, a series of dimensions related to
a central fact table.
Getting Started Wizard. Creates mappings to load static fact and
dimension tables, as well as slowly growing dimension tables.
Slowly Changing Dimensions Wizard. Creates mappings to load slowly
changing dimension tables based on the amount of historical dimension
data you want to keep and the method you choose to handle historical
dimension data.
16 What r the types of maping in Getting Started Wizard?
Simple Pass through maping :
Loads a static fact or dimension table by inserting all rows. Use this
mapping when you want to drop all existing data from your table before
loading new data.
Slowly Growing target :
Loads a slowly growing fact or dimension table by inserting new rows.
Use this mapping to load new data when existing data does not require
updates.
17 What r the types of lookup?
Connected and unconnected
18 What r the types of lookup caches?
Persistent cache: U can save the lookup cache files and reuse them
the next time the informatica server processes a lookup transformation
configured to use the cache.
Recache from database: If the persistent cache is not synchronized with
he lookup table,U can configure the lookup transformation to rebuild
the lookup cache.
Static cache: U can configure a static or readonly cache for only
lookup table.By default informatica server creates a static cache.It
caches the lookup table and lookup values in the cache for each row
that comes into the transformation.when the lookup condition is
true,the informatica server does not update the cache while it
prosesses the lookup transformation.
Dynamic cache: If u want to cache the target table and insert new rows
into cache and the target,u can create a look up transformation to use
dynamic cache.The informatica server dynamically inerts data to the
target table.
shared cache: U can share the lookup cache between multiple
transactions.U can share unnamed cache between transformations inthe
same maping.
19 What r the types of groups in Router transformation?
Input group Output group
The designer copies property information from the input ports of the
input group to create a set of output ports for each output group.
Two types of output groups
User defined groups
Default group
U can not modify or delete default groups.
20 What r the types of data that passes between informatica server
and stored procedure?
3 types of data
Input/Out put parameters
Return Values
Status code.
21 what r the transformations that restricts the partitioning of
sessions?
Advanced External procedure tranformation and External procedure
transformation: This
transformation contains a check box on the properties tab to allow
partitioning.
Aggregator Transformation: If u use sorted ports u can not parttion the
assosiated source
Joiner Transformation : U can not partition the master source for a
joiner transformation
Normalizer Transformation
XML targets.
22 What r the tasks that source qualifier performs?
Join data originating from same source data base.
Filter records when the informatica server reads source data.
Specify an outer join rather than the default inner join
specify sorted records.
Select only distinct values from the source.
Creating custom query to issue a special SELECT statement for the
informatica server to read
source data.
23 What r the tasks that Loadmanger process will do?
Manages the session and batch scheduling: Whe u start the informatica
server the load maneger launches and queries the repository for a list
of sessions configured to run on the informatica server.When u
configure the session the loadmanager maintains list of list of
sessions and session start times.When u sart a session loadmanger
fetches the session information from the repository to perform the
validations and verifications prior to starting DTM process.
Locking and reading the session: When the informatica server starts a
session lodamaager locks the session from the repository.Locking
prevents U starting the session again and again.
Reading the parameter file: If the session uses a parameter
files,loadmanager reads the parameter file and verifies that the
session level parematers are declared in the file
Verifies permission and privelleges: When the sesson starts load manger
checks whether or not the user have privelleges to run the session.
Creating log files: Loadmanger creates logfile contains the status of
session.
24 what r the settiings that u use to cofigure the joiner
transformation?
Master and detail source
Type of join
Condition of the join
25 What r the session parameters?
Session parameters r like maping parameters,represent values U might
want to change between
sessions such as database connections or source files.
Server manager also allows U to create userdefined session
parameters.Following r user defined
session parameters.
Database connections
Source file names: use this parameter when u want to change the name or
location of
session source file between session runs
Target file name : Use this parameter when u want to change the name or
location of
session target file between session runs.
Reject file name : Use this parameter when u want to change the name or
location of
session reject files between session runs.
26 What r the scheduling options to run a sesion?
U can shedule a session to run at a given time or intervel,or u can
manually run the session.
Different options of scheduling
Run only on demand: server runs the session only when user starts
session explicitly
Run once: Informatica server runs the session only once at a specified
date and time.
Run every: Informatica server runs the session at regular intervels as
u configured.
Customized repeat: Informatica server runs the session at the dats and
times secified in the repeat dialog box.
27 What are the reusable transforamtions?
Reusable transformations can be used in multiple mappings.When u need
to incorporate this transformation into maping,U add an instance of it
to maping.Later if U change the definition of the transformation ,all
instances of it inherit the changes.Since the instance of reusable
transforamation is a pointer to that transforamtion,U can change the
transforamation in the transformation developer,its instances
automatically reflect these changes.This feature can save U great deal
of work.
28 What r the rank caches?
During the session ,the informatica server compares an inout row with
rows in the datacache.If the input row out-ranks a stored row,the
informatica server replaces the stored row with the input row.The
informatica server stores group information in an index cache and row
data in a data cache.
29 What r the out put files that the informatica server creates
during the session running?
Informatica server log: Informatica server(on unix) creates a log for
all status and error messages(default name: pm.server.log).It also
creates an error log for error messages.These files will be created in
informatica home directory.
Session log file: Informatica server creates session log file for each
session.It writes information about session into log files such as
initialization process,creation of sql commands for reader and writer
threads,errors encountered and load summary.The amount of detail in
session log file depends on the tracing level that u set.
Session detail file: This file contains load statistics for each
targets in mapping.Session detail include information such as table
name,number of rows written or rejected.U can view this file by double
clicking on the session in monitor window
Performance detail file: This file contains information known as
session performance details which helps U where performance can be
improved.To genarate this file select the performance detail option in
the session property sheet.
Reject file: This file contains the rows of data that the writer does
notwrite to targets.
Control file: Informatica server creates control file and a target file
when U run a session that uses the external loader.The control file
contains the information about the target flat file such as data format
and loading instructios for the external loader.
Post session email: Post session email allows U to automatically
communicate information about a session run to designated recipents.U
can create two different messages.One if the session completed
sucessfully the other if the session fails.
Indicator file: If u use the flat file as a target,U can configure the
informatica server to create indicator file.For each target row,the
indicator file contains a number to indicate whether the row was marked
for insert,update,delete or reject.
output file: If session writes to a target file,the informatica server
creates the target file based on file prpoerties entered in the session
property sheet.
Cache files: When the informatica server creates memory cache it also
creates cache files.For the following circumstances informatica server
creates index and datacache files.
Aggreagtor transformation
Joiner transformation
Rank transformation
Lookup transformation
30 What r the options in the target session of update strategy
transsformatioin?
Insert
Delete
Update
Update as update
Update as insert
Update esle insert
Truncate table

Monday, July 16, 2007

Thursday, July 12, 2007

Data Integration: How Times Have Changed

Data Integration: How Times Have Changed

Enterprise data integration has clearly "arrived." The road had many twists and turns, yet data integration has not just survived, it has grown in strength and stature. How do we apply our collective learning from market developments to position ourselves better for 2007 and beyond?

Enterprise data integration, just a few years ago, meant no more than executing bucketfuls of spaghetti-like extract-transform-load (ETL) processes to load bulky and often unwieldy data marts and data warehouses. That was then. Spurred by product development and refocusing, artful solution convergence, and a flurry of mergers and acquisitions, the data integration landscape is now dramatically different. The primary goal remains to bring data from its source(s) to its destination(s) in a timely manner and useful form, but that is now a very loaded statement. You still have ETL, but in addition, you get access to a wide variety of data sources, services and applications in real-time, near-real time and batch modes. There's also data profiling, cleansing and standardization, query federation and virtual data models as well as master data management or "data verticalization" through hubs. These product hubs and customer hubs are glued together with integrated metadata management and service-oriented architectures, ready for consumption in your applications. Driven more by vendor innovation and "big picture" thinking than by customer demand, data integration moves ever closer to being a much-respected fixture in IT shops.

If you haven't looked at data integration solutions lately, do so today. In particular, customers who need data provisioning through enterprise application integration (EAI) and service-oriented and enterprise service bus architectures (SOA/ESB) would do well to take a close look at data integration technologies as well.

So where is enterprise data integration headed? For many vendors and customers, the primary purpose of integrating data across the enterprise is business intelligence (BI) or its latest avatar, corporate performance management (CPM). There's a lesson in this: if you are looking to maximize your return on BI/BPM investments, consider strengthening the "back end" data integration.

BI or CPM need not be the raison d'etre for data integration efforts. As SOA and collaborative solutions flourish in your organization, data integration becomes an integral component of the enterprise architecture and, thus, a key enabler of the business architecture. Data visualization and reporting solutions will remain important beneficiaries of data integration, but let your vision go beyond BI and BPM.

Data Warehouse Buzz words.

Data Warehouse
techweb
A database designed to support decision making in an organization. Data from the production databases are copied to the data warehouse so that queries can be performed without disturbing the performance or the stability of the production systems.

Data Marts
Data warehouses can become enormous with hundreds of gigabytes of transactions. As a result, subsets, known as "data marts," are often created for just one department or product line.

Updated at the End of a Period
Data warehouses are generally batch updated at the end of the day, week or some period. Its contents are typically historical and static and may also contain numerous summaries.

Operational Data Stores
The data warehouse is structured to support a variety of analyses, including elaborate queries on large amounts of data that can require extensive searching. When databases are set up for queries on daily transactions, they are often called "operational data stores" rather than data warehouses (see ODS). See OLAP, DSS, EIS and BI software.

What is a data stewardship program?

Data are important assets of an organization. An organization should proactively manage, protect and increase its data assets. A data stewardship program is to establish an enterprise data environment. It promotes the data usage and integration across the enterprise. It defines the process and policy to ensure the data are correctly used and shared without putting the enterprise into risks. It provides the oversights, tools and trainings to support the individual organizations within an enterprise.

What are the key Area covered by a data stewardship program?

A data stewardship program should cover the following area:

  • Data Management
    • Data Integration
    • Data Quality
    • Metadata Standard
    • Data Flow & Model standard
  • Data Policy
    • Data Security
      • Access
      • Usage
    • Data Privacy
    • Auditing
    • Reporting

Who should define the data stewardship program?

A data stewardship program can be defined by a data stewardship committee or a council, which consist of members from IT, data source owners, and data user community. However, its execution should belong to a group that has several data stewards.


What is Master Data Management?

Master Data Management is a combination of business processes, software application, and technologies which helps you to manage your master data, such as "Customer", "Supplier", "Employee", and "Product".

Master data management ensures your data quality so you can rely on the data to do your business. Master data may be sourced and maintained in multiple systems. By deploying the master data management, you can enable the cooperation among the diverse systems and bring the consistency across these systems.

Here are the four key components of a MDM solution:

Master Data: A central data model that store the master data

The data model should be best practice based, comprehensive, flexible, configurable, extensible. It should adopt the open standards as much as possible and can be easily integrated with different sources.

Integration Services: Provide different way to access, update, and synchronize the data

The integration service should include public APIs, web services, bulk load, User Interface, UI widgets, event publishing, etc. The integration should serve both needs of the initial load and on-going maintenance.

Data Quality Services: Processes and Utilities that can help to ensure the data quality

Some data searvices should be provided: Data cleansing, duplicate identification, duplicate avoidance, data match and merge, data enrishment, and data certification.

Privacy and Compliance Services: This is getting important. By deploying a MDM process and solution, you can ensure that your organization follows the priacy policy and governance of data collection and usage. You can designate the owner of the data and audit the data creation, update, reference, and view.

Wednesday, July 11, 2007

Data Integration comes in Three Flavors

Those shopping for data integration solutions will find that they come in three flavors that sometimes seem similar, but keep these distinctions in mind.

Stand-alone Tools: Niche data-integration tools from vendors such as MetaMatrix, Group 1 Software, Pervasive and Tableau enable you to provide "spot" solutions for specific problems such as pulling together data from diverse sources into a portal (MetaMatrix) mixing and matching technologies to create your own "data integration stack" (Pervasive or Group 1 together with Tableau). Enterprise and solution architects will relish the opportunity to exercise their creativity to create heterogeneous, best-of-breed solutions at reasonable cost.

Focused Solutions: Solution vendors such as Business Objects, Cognos, Informatica, Initiate Systems and SAS offer high-class capabilities in data integration as well, but usually with a specific purpose, such as business intelligence or customer data integration. If they have the integrated solution that you need, look no further – but if you pick just one component from the solution, the proposition will look less attractive

One-Stop Shops: Large enterprises stand to benefit from emerging one-stop shops like IBM, Microsoft and Oracle (and to a lesser extent, Informatica and SAS). These large vendors have multiple capabilities that, these vendors will tell you, are
"seamlessly integrated." This will be largely true, but it would be wise to dig deeper into this promise. The key is that these integrated capabilities should be flexible as well as seamless. How well are the components integrated at the metadata level, i.e. contribute, share and leverage metadata? To what extent is the integrated solution standards-driven and/or loosely-coupled? Are these mega-solutions compliant with your enterprise architecture, or do they set off in a new direction? Ironically, even as you stand to gain from integrated vendor solutions, you may risk loss of choice, potential vendor lock-in and the sheer unaffordability of large solutions.

The data integration marketplace will not continue to churn (until, that is, IBM, Microsoft and Oracle divvy and buy up all the other remaining vendors), but there is plenty of innovation out there. The entrepreneurial spirit is alive and well, and customers will continue to have plenty of options at various levels of sophistication and magnitude. These are good days for data integration vendors and customers indeed. The future of data integration, to use an aphorism, isn't what it used to be. It's better.

Blog Archive