Posts Tagged ‘tips and tricks’

How much a word can cost

In the previous article we’ve promised to tell you a few things about Apple, words and names.
Steve Jobs, as brilliant and genious he was, seemed to have a problem with names.
The first story to be told is the one about the very name of the Company he’ve founded: Apple.
A declared fan of the Beatles (the British group that turned the world apart during the 60’s) it is highly improbable that Steve Jobs wasn’t aware of the fact that Beatles’ Record Label company was named “Apple”.
The Beatles founded it in 1968 and their vinyl records depicted apples under various forms.
But probably young Jobs thought that “Apple Records” in UK should be a totally different thing from “Apple Computer” in USA and saw no serious risks in using such a common word for a name.
And probably, if “Apple Computer” company wouldn’t have been so successful, it wouldn’t have been remarked by the British in 1978, just 2 years after having been founded.
A long series of law suites started then, opening and then settling various subjects since 1978 until 5 February 2007 when a final agreement was announced by the Companies.
“So what”, you might say, “disputes over trademarks are just so common, why is this one so special to even mention it?”
Well, there are 2 reasons why it’s worth mentioning.
The first reason is the unbelievable price Apple Computers had to pay for its name: the 2007 settlement alone involved 500 millions USD according to the mass-media of the time. And this outside-the-court arrangement was the last but not the only.

The other reason is a rather funny one: Apple Computer seemed to have learned its lesson from the “Apple vs. Apple” case.
So they did their best to avoid any possibile trademarks infringements in the future.
For example, when releasing the “Macintosh” not only they’ve changed the spelling of Jef Raskin’s favourite type of apple from McIntosh to Macintosh, but they also paid to McIntosh Lab , a Company producing Hi-Fi equipment, a certain amount to license the name ( early Mac literature says “Licensed from McIntosh Laboratories, Inc.” ) and later bought all rights to the name outright.
Such exceedingly prudent approach paid off and no trademark issue ever happened to Apple regarding the name “Macintosh”.

But hey, you can never be too prudent: in 2005 Apple released a multi-button USB mouse device named “Mighty Mouse” and again made researches before the release and purchased the license to use the Mighty Mouse name from VIACOM, owner of CBS, owner of the Mighty Mouse notorious cartoon series.
Unfortunately, the CBS cartoon series trademark was covering all kind of merchandise like T-shirts or even multi-vitamins but not computer peripherals, such as mouse device.
Instead, that name for computer devices was actually registered by Man & Machine Inc, a supplier of water- and chemicals-resistant keyboard and mice devices, who sued Apple over the use of name.
A series of disputes started between CBS and Man & Machine, but Apple was already so sick and tired of all that, it stopped using the Mighty Mouse name, replacing it with Apple Mouse in 2009 and, at least until today, never used names from others for their products.

Because in 1994, when they’ve internally code-named the Macintosh 7100 “Carl Sagan”, they got a cease-and-desist letter from Carl Sagan himself, even for as little as an internal, non-public code-name, not a comercial name following a public release.
Carl Sagan was a famous scientist who gained notoriety after publishing popular science books and most of all after co-writing and narrating the TV series Cosmos: A Personal Voyage.
Following his letter, Apple stopped using his name again, the engineers replacing it by “BHA” (standing for “Butt-Head Astronomer”).
Strange enough, although this was still an internal codename and also an abbreviation who could mean anything, Sagan learned about it and sued Apple for defamation.
He lost, then sued Apple again (this time for the initial use of his name), he lost again and finally, an out-of-court agreement was reached, Apple issuing an official statement it never intended to cause the scientist embarrassment or concern.

So, you see, that’s why ORPALIS is not a fruit, not a character’s name and not even an acronym: it’s just a small mystery instead!

See you next week !

Bogdan

Big Browser on June 21

Banned! Google Glass Prohibited at Google Shareholder Meeting Read article How Apple's new Mac Pro revolutionizes the desktop workstation Read article Visual Literacy in an Age of Data Read article Improving Photo Search: A Step Across the Semantic Gap Read article From surrogate storyteller to high-def streaming infotainment, TV has come a long way Read article

Optical Character Recognition: some advices

Hi folks,

This week we continue the Optical Character Recognition subject by providing our general public with few advices for pre-scan and post-scan stages of document archiving.
A properly done OCR task is not simply about text extraction, it also implies a set of operations meant to optimize the OCR process and increase efficiency in overall document-management practice.
To put it in other words, operations commonly considered as “adjacent” can actually really improve or totally distroy text recognition making your later life either comfortable or a living hell.
Here are just a few things to keep in mind :

(1) Before scanning

  • when placing the paper in scaner make sure the pages have the correct text orientation so you won’t have to later waste time by either having to wait for the OCR software to automatically determine the orientation or, even worse, to have to make this operation manually, via file-by-file checking ;
  • make proper scan settings to insure best quality for OCR (for example, 250 or 300 dpi resolutions are considered optimal for most of the documents) ;
  • test OCR output for a few pages before starting a batch scanning operation to make sure your settings are optimally fine-tuned
  • select a lossless file format (such as TIFF) ) and do not be afraid of big sizes if the documents are important to you : storage space is not an issue these days and you can later convert the files to any other format for handling (or sharing) purposes.

 

Actually, for important document archives , maybe the best idea would be to store the “original” files into TIFF format then move them on an external storage device or media (external hard-disk or DVD, etc) and use for current work a duplicated archive containing files converted into a format that you consider optimal for your needs ( JBIG2, PDF, etc).
To a certain extent, this approach would be similar to how camera RAW format works for the professionals in digital photography domain.

 

(2) After scanning

  • use relevant filenames for resulting files and not mind if filenames tend to become lengthy : it isn’t hard to do using automated file naming tools and , even if it might take a bit more of your time at file creation stage it can be a really life saviour later. And make sure that the filename contains important data, such as the language of the text, to name just one important detail for OCR.
  • do not hesitate to use image enhancement techniques : the quality of the paper documents cannot be controled and nor hardware (ie scanner) particular details which might influence output quality (just an example among dozens : tiny scratches on scanner’s glass).

 

To overcome them , professional document imaging software vendors provide their users with a wide range of image correction features.
In this blog you will find some explanations on brightness/contrast/gamma, median filtering and auto-deskew.
But more explanations are yet to come.

Cheers!

Bogdan

 

Big Browser on April 27

Google's secret weapon to fight Redmond and Cupertino Read article Eugene Kasperski : "In terms of security, Apple is 10 years behind Microsoft" Read article Repetitive tasks : geeks vs. non-geeks Read article Read this before naming your startup Read article Why the iPad Has to be Made in China Read Article

Casual Friday on April 27

A beautiful day to play outside.

A beautiful day to play outside.

Camera RAW files formats explained

Hi folks,

This week we will provide our general public with explanations on camera RAW files formats because this subject is often ignored or misunderstood and because our software supports more than 40 such formats.

Let’s start by specifying that RAW is no accronym for anything : in this rare case, “raw” literally means “raw” (“unprocessed”, that is) and the explanation for this term resides in the way digital cameras work.
Each time you are taking a picture, you are actually exposing the digital camera’s photo-sensitive chip to light.
The chip has millions of sensor units (ie, pixels) each one translating the amount of light it was hit by into a voltage level which is then converted to a digital value.
Usually, this resulting digital value can be recorded in a 12 bits or 14 bits workspace, meaning that each pixel can handle 4096 brightness levels (= 2 ^12) or 16384 brightness levels (= 2 ^14).
Commonly, no sensor records colors : imaging chips record greyscales and then convert to color by using filters and color schemes such as the Bayer Matrix .
Finally, when saving a raw file, the camera software adds various metadata (information on camera type, camera settings, etc) but this information has no influence on the stored raw image, it is simply added as tags.
In other words, the raw image data is unprocessed and uncompressed and the various settings associated with it are not applied : they are stored as metadata for later use.
To conclude description of this stage of digital photo image generation in digital cameras, we should add that raw files have big sizes, their format is proprietary to the camera manufacturer (sometimes even specific to a certain camera model) and they are often compared to “negative photo films” from classic photography process.

Let’s keep this good and widespread analogy to describe the next stage of digital photo image generation : “developing” the “negative film” (inside the “dark room”) to obtain the actual photo.
Raw files have to be converted to TIFF or JPEG standard formats similar to how negative films need to be developed to get the prints.
This is usually done by camera’s built-in software immediately after the image was captured and consists of applying various color corrections and file compressions considered by the manufacturer as optimal and by most users as satisfactory but this allows only little control of the user over the “development” process.
For professionals however, such approach might be simply insufficient as they might require full control over processing to determine the final appearance of the image.
Therefore, they would instead use more performant software ,  and hardware to achieve this.
Just for example, they can control brightness, contrast, gamma, sharpening, temperature adjustment (white-balance), noise reduction, tint, etc. not to mention file-saving formats and compression options.

To summarize : raw formats files contain all image data and information allowing later processing (“development”) up to highest levels of image quality or customization.
One can store a photo as a raw file then, based on it, create an infinity of versions of that picture using “dark room software”, either existing or yet to come!
Alternately, camera software have limited processing performance compared to dedicated third-party specialized software, it outputs lossy or lossless images in formats such as JPEG or TIFF but everything is based on a range of settings among which only some are contrallable by user.
This option advantages amateur users as it is fast, painless and the quality is within, if not even beyond, their expectations.

We should not finish this article without mentioning Adobe’s efforts to introduce a standarizaton model for raw formats : they’ve created an openly documented file format named “DNG”  (stands for Digital Negative), not very widely adopted, at least not yet.
But of course, our software, supports DNG format, as well.

Cheers!

Bogdan

Big Browser on April 13

Jack Tremiel, the founder of Commodore computers, passed away Read article The history of super computers Read article Technical books are broken Read article Open source software in C# Read article Poll: Does it matter if Microsoft open sources .NET technologies? Read Article

Casual Friday on April 13

Wireless Technology

Wireless Technology

Image enhancement : median filtering

Hi folks,

We continue the series of explanations on image enhancement techniques meant for our general public and this week we are going to give you some additional info about median filtering.

Images quite often contain artifacts known as “noise”.
“Noise” means, of course, un-wanted sounds occuring in an audition context but the term quickly expanded to other domains, designating the presence of un-wanted randomly disseminated artifacts within any given context.
In imaging domain, for instance, one of the frequently occuring noise-types is called “salt and pepper noise”.
Quite an intuitive name, as images affected by this type of noise look like as if salt and pepper particles were poured over “the clear” image (bright pixels on darker areas and dark pixels on brighter areas of the image).
The usual causes for this issue are hardware related (analog-to-digital conversion, bit errors in transmissions, etc.).

Which brings us to the median filtering : one of the most effective method to remove such noise from images is to apply the median filter.
This is not the place to go deeper into technical details, but for those of you wishing to find out more about this subject, you can read the Wikipedia article or even study this academic material.

Median filtering is yet another must-have feature because not only it renders the image/text documents more comprehensible but it also enhances OCR results if applied prior to OCR submission (because it removes noise but preserves edges).

All our products, SDKs (GdPicture.NET) and general public products (PaperScan and PaperLight BETA) provide the median filtering feature.

Cheers,

Bogdan

Document without PaperScan median effect

Document without PaperScan median effect

Document with PaperScan median effect

Document with PaperScan median effect

Big Browser one March 23

Google Gives Search a Refresh Read article Features NO ONE NOTICED in Visual Studio 11 Express Beta for Web Read article Is Firefox close to its death ? Read article Why is the DOS path character "\"? Read article Ten Inventions inspired by Science-Fiction Read Article

Casual Friday on March 23

Be careful with brands!

Be careful with brands!

Deskew/Autodeskew : what’s that ?

Hi folks,

This week we thought about offering to our general public some explanations about deskew/autodeskew, mainly to answer two questions : what’s that and why is it important to have ?

Skew is an artifact that might appear during document scaning process and it consists of getting the document’s text/images be rotated at a slight angle.
It can have various causes but the most common is paper getting misplaced during scan.
Therefore, deskew is the process of detecting and fixing this issue on scanned files (ie, bitmap) so deskewed images will have the text/images correctly and horizontally alligned.

And why is this important ?
Well, a first benefit will be that you don’t have to scan in again the skewed documents.
Instead of the mechanical and time consuming actions that re-scan involves, everything is done automatically and efficiently by the software providing deskew feature.

But there is yet another important benefit of deskewing : for those who need to OCR the scanned documents, deskew is an important correction to do before submiting to OCR process.
Deskew increases the rate of character recognition accuracy because alligned text is much closer to what the OCR software is supposed to encounter when performing image analysis.

All our products, SDKs (GdPicture.NET) and general public products (PaperScan and PaperLight BETA) provides the autodeskew feature as it is a must-have for any professional document imaging software.

Cheers,

Bogdan

Document without PaperScan autodeskew

Document without PaperScan autodeskew

Document with PaperScan autodeskew

Document with PaperScan autodeskew

Big Browser on March 16

Ten Myths about Patents Read article Daniel Moth : The Way I Think About Diagnostic Tools Read article The Fear of rebooting a Server Read article Most popular JavaScript keywords Read article Must Have tools on Windows - Part 1 Read Article Must Have tools on Windows - Part 2 Read Article

Casual Friday on March 16

Daddy's Boy

Daddy's Boy - Source : http://uberhumor.com/daddys-boy